3 Pitfalls in Social Media Analysis

Posted on Posted in Research & Analytics, Social Media

sopcial-media-post-on-ipadTechnologies that assess sentiment and attitude in online text are becoming mainstream.
But beware; with semantics, a little knowledge can be a dangerous thing…

In a world that reveres scale, marketers will always be preoccupied with the pursuit of volume. And, while reach continues to be the headline metric for many communications, it’s clear that more nuanced means of assessment are increasingly necessary. Two relatively recent developments in marketing have brought new complexity to the perennial ‘quality vs. quantity’ debate.

The first is the widespread recognition that establishing a rapport with consumers is paramount to successful marketing. Customers being exposed to messages and being aware of brands is just the starting point. In fact, even soliciting a response that results in a purchase is no longer considered the ultimate objective. Today, we understand that it’s engagement and empathy that typically deliver long-term value. Jo Consumer will value a business most if it both meets her needs and also shares her values. And if she shares their values, she will be more inclined to demonstrate loyalty and advocacy.

The second development is both a symptom and a cause of the first; the emergence of technologies designed to analyse language to inform marketing. We’ve moved on a long way since rudimentary like/dislike measurements became commonplace. Striving for that elusive rapport, we are now increasingly eager to track and interpret ideas and opinions expressed via digital media in order to respond to customers swiftly and appropriately. Various semantic analysis technologies that “understand” the meaning of text are now widely used in marketing but evidently they are double-edged swords. Sure, they can provide low-cost alternative to conventional qualitative research methods and they can deliver results in near real-time. However, while the choice of cheaper tools is increasing, many are presenting marketers with a new array of challenges. Problems arise from shortcomings in the software, lack of understanding by the users or a disastrous hybrid of the two. I’ve seen three broad categories:

3. Overlooking the underlying issues

Identifying supporters or detractors of your brand or campaign with any degree of accuracy is useful. However, unless you can understand the reasoning affecting their respective points of view, the resultant intelligence is really difficult to act upon.

If the tools you’re using don’t allow you to dig deeper into the underlying issues, it’s easy to to assume that a particular ‘pro’ or ‘anti’ group is homogenous. Analyses of opinions expressed online about fracking undertaken by Polecat’s MeaningMine tool illustrates the point.


The top line data from many popular analytics tools will show that a proportion of people who expressed an opinion are opposed to fracking. However, it’s only when you analyze their exact reasons for opposition that really useful insights emerge. The charts above show important differences between the factors influencing Arabic and French speaking people in North Africa. Specifically, the analyses looked at negative views about oil ad gas extraction using fracking. Treating both groups similarly in PR communications would certainly be a mistake as their opposition is founded on different reasons.  If your social media tool can’t slice and dice with this degree of precision, your well-intentioned outreach program could easily make PR disaster into a crisis.

2. Misunderstanding the context

One of the fundamental challenges of analysing language is that its meaning is not absolute but rather it is affected by the context in which it is used. Words and also phrases can be expressions of entirely different opinions depending on both the words and phrases used around them and also the social context of the usage. Cultural factors can further affect meaning to the extent that similar words and phrases have completely different meanings used by different people in different social groups. Consider terms like “the shit” and “the bomb” or phrases like “they smashed the place up.” Less sophisticated analytics might misinterpret these as indications of aggression. In fact, they’re expressions of glowing praise for the live performance of a band. What’s more, geographical variations and idiosyncrasies are constantly changing and almost infinite in variety.

Hatebase-IkizunguThe Hatebase initiative illustrates this with alarming clarity.
This language analysis facility was built to assist government agencies, NGOs and research organizations to identify precursors or early indicators of violence and unrest. Notably, sometimes language that’s benign or meaningless in one location is pernicious and inflammatory elsewhere. I was not aware that I should be offended by being called an “ikizungu”…

And if we aren’t aware of the existence of highly specific linguistic markers, we can’t even calibrate listening tools to pick up the terms, let alone interpret them. So, if you’re staring at your social media dashboard and think a single sentiment chart is informative. Think again.

1. Failing to measure significance

The third type of problem arising for marketers using semantic tools epitomizes how contemporary digital media analysis is a complicated qual/quant mash-up.

In short, we face the challenge of knowing what and who is really important and this requires a heady blend of statistics and risk assessment. If you overestimate the significance of an opinion expressed online and react to it accordingly, you can create a tricky situation that previously did not exist. It’s almost the antithesis of the Streisand Effect.

I saw a good example of this when the leadership team of a large corporation was whipped into a frenzy by a large cluster of disparaging posts. The top brass were just hours away from placing full-page press ads countering the apparent groundswell of damaging claims, when the boffins came back with some more detailed analyses. The supposed throng of vocal opponents was in fact a single individual with an array of social media accounts, all re-posting each other. She had single-handed created some significant online noise that scared the hell out of the company execs. Moreover, while semantic analyses had indeed identified some outrageous claims, analysis of her network of online associations showed something even more important. Practically nobody was listening to her rants. She had almost zero influence. Panic over.

Applying techniques previously used only by epidemiologists is now essential to fully understand the spread and impact of opinions online. If you treat both good and bad ideas like pathogens (or indeed antigens) it’s a whole lot easier to figure out how best to respond.


Ellen-Twitter-SelfieA simple illustration of this can be seen in the Where Does My Tweet Go?  tool built by MFG Labs in Paris. In one of their showcase analyses (above), the spread of Ellen DeGeneres’ star-studded selfie at the Oscars (left) can be traced through a network of highly connected people, dispersing into the general population in a matter of hours.

We’ve always known that ‘key influencers’ exist. Now we can measure them in minutes. In this example, it could be argued  that the nature of the message is not as significant as it’s origins or its dissemination.

A more difficult assessment to make is about the validity and credibility of opinions and those that express them in cyberspace. Consider if you will a discussion about a pediatric drug or treatment that is alleged to be harmful. If a semantic analysis gives comments from a 26 year old mom who dropped out of high school similar weight to those from a learned pediatrician, the results are going to be unhelpful if not dangerous.

One way to address this is with an additional tier of filtering, whereby linguistic characteristics are used to segment opinions, differentiating between informed, considered posts and those that are just ignorant babble. The pediatrician will use not just particular terminology, which can be mimicked by less knowledgeable people, but also distinctive sentence structures and phrases that identify her as someone who knows her stuff. It’s not that the views of the less educated mom are not worth listening to but it is important to rank her contribution to the debate properly. After all, ranking and segmentation are in chapter 1 of the earliest marketing manuals.

An engine capable of processing large volumes of text to gain insights is unquestionably a valuable asset. These technologies can help to improve strategy, planning and execution to help marketers to get closer to the customers they aim to serve.  My word of caution is that this field of analysis is still in a nascent phase. Many of the software packages on the market should be treated like wind socks, rather than sat navs; providing broad indicators rather than precise direction. And those that are powerful enough to undertake deep, sophisticated assessments require expert operators to achieve reliable results.

To summarise, there’s only one thing more dangerous than ignoring social media dialogue and that’s having incomplete or imprecise understanding of conversations about your brand.