Big questions for big data

Big data is big news.

Search for it and Google will return 75 million results. That’s a different article for every 100 people on earth. There are also 18 million videos on the topic. At 5 minutes each, it would take 170 years to watch them all. Then there’s the one million books on the subject. Stacked on top of each other they’d form a tower 58 times taller than the Empire State Building.

Clearly ‘big news’ is an understatement.

Big data is a zeitgeist. It has monopolised the corporate conversation. It has captured our collective imagination, spread through every sector and intoxicated every industry.

Marketing is no different.

‘Big data’ dominates our discourse. It promises to provide a deeper understanding of our consumers, to remove risk from our decision making and to reduce waste from our investments.

And yet for every promise, there is a problem.

This article argues that ‘big data’ is not the silver bullet we so wish it to be. It argues that we must stop treating it as a singular source of objective truth. And it argues that five big questions should cause us to cast a more critical eye.

Let’s start at the beginning.

The question of collection

Big data has its own language. We use phrases like ‘raw data’, ‘data mining’ and ‘data is the new oil’. Inherent in these phrases is the metaphor of data as a natural resource. The analogy suggests that data resides just beneath the surface, waiting to be delicately uncovered and presented to the world untouched and untarnished.

I believe this metaphor is fundamentally flawed.

When we collect data, we cannot help but exert an influence on it. The tools that we use, and the methods with which we use them, corrupt and contaminate the very data that we are trying to capture.

This may seem far-fetched. But let’s examine a common method of data collection: surveys.

The Behavioural Insights Team worked with Ipsos Mori to conduct two surveys. Each survey asked a group of British adults the same question but phrased in a different way.

The first group were asked if they opposed reducing the voting age from 18 to 16. 56% said they did. The second group were asked if they supported giving 16 and 17-year olds the right to vote. 52% said they did.

The same question, phrased in two different ways, resulted in two different answers. Over half opposed the age reduction. Andover half supported it. Changing the wording of the question changed the response the question received. If you ran either survey, you’d think you had a definitive answer. The Behavioural Insights work suggests you wouldn’t.

The truth is that asking a respondent a question, does not elicit an objective truth. To quote the theoretical physicist Werner Heisenberg:

“What we observe is not nature in itself, but nature exposed to our method of questioning.”

But it doesn’t stop there.

The answer to a question can also be affected by the questions that precede it. Philip Graves, the consumer psychologist, provides a memorable example:

“One poll asking about support for oil drilling in Alaska’s wildlife refuge found the public opposed it by a margin of 17 percentage points. Another poll conducted within a month of the first found the public in favour of it by exactly the same margin (…). The difference? The poll that found more people in favour of drilling preceded that question with thirteen about the cost of oil and the country’s dependence on foreign suppliers. The poll that found more people against asked only the question on drilling in that region of Alaska.”

Each question creates a context that influences the answers to questions that follow. Again, a small change in survey design can entirely reverse the results it garners.

A final example comes from the University of Michigan. Researchers asked respondents to read an article about a mass murder before explaining why the murder occurred. Half of respondents were asked to write on paper with a letterhead stating “The Institution of Social Research”. The other half had paper branded with “The Institution of Personality Research”.

The explanations that the two groups provided differed dramatically.

Those in the ‘social research’ group provided more than three times as many external explanations than internal ones. These reasons included the time, place, situation or larger social context. On the other hand, those in the ‘personality research’ group gave 60% more internal reasons than the first group. These explanations encompassed psychological properties, personality traits, temperament, values and attitudes.

In short, those who thought they were taking a survey for social researchers leaned towards social explanations. Those taking a survey for personality researchers gave personality explanations.

Whosubjects are responding to affects howsubjects respond.

The three examples listed above are just three of a much larger body of effects known as response biases. These biases are not mutually exclusive. The way a question is worded, the preceding questions and the issuing authority can all impact responses in a single survey. If just one of these biases has the power to flip a finding, imagine the effect of a broad set of biases all acting, and interacting, at once.

Whilst there are a number of techniques used to control for these effects, it is almost impossible to make a survey that is truly robust. Philip Graves goes further:

“Even the perfect poll is deeply flawed.”

But let’s imagine that a perfect poll was possible. That we could account for every bias. And control for every influence. We’d still have a problem. Because even if it was possible to remove all bias from the question, it is not possible to remove all bias from the answer.

This brings me on to my next question.

The question of claims

If you have ever used an online dating service, you’ll be familiar with profile pages featuring users’ vital statistics. We joke that people often lie about their age, but data from OkCupid! reveals the same is true of other measures. In an article for the APG, Richard Shotton uses the dating site’s statistics to demonstrate that U.S. users exaggerate their height by an average of two inches. Clearly prospective daters have a natural incentive to stretch the truth. But survey respondents are known to do this even when they do not have such explicit motivation.

The truth is, survey respondents do not always provide honest answers. They provide answers that they want the researcher to believe.

In his book Head In The Clouds William Poundstone provides evidence that many ‘opinions’ are invented on the spot to satisfy the pollster:

“Political scientist George Bishop once demonstrated this by asking people whether they favoured repeal of the “Public Affairs Act of 1975.” There was no such act. But thirty percent took the bait and offered an opinion.”

30% of Bishop’s subjects didn’t want to be thought of as naïve or ignorant. They didn’t want to be caught without an opinion. They wanted to be perceived as intelligent and knowledgeable members of society. So they embellished their answers. They fabricated their opinions. They acted as if they knew, even when they didn’t.

And even when respondents aren’t consciously deceiving the researcher, they may be subconsciously deceiving themselves.

In Truth, Lies and Advertising the legendary Account Planner Jon Steel cites a survey which asked American business people which facilities were most influential in their choice of hotel.

Top of the list was a gym. In fact, 70% of those questioned deemed gyms to be a “very important factor when booking a hotel room”. The data was conclusive and the implications for hotel owners were obvious.

But it wasn’t that clear cut.

When analysing the actual usage of hotel gyms, the researchers found that only 17% of the target audience actually went on to use the gym.

To put it simply, three quarters of those who claimed to want a gym, didn’t actually use it.

In summing up this finding, Jon Steel remarked:

“In research, many people tend to present the personalities and habits they would like to have, rather than the ones they really have. […] I truly believe they do it to impress themselves, to convince themselves that they are more discerning, and live for the moment at least in the body and mind of the person they always wanted to be.”

What we say we want and what we actually want are not the same thing.

Examples of this can be found everywhere.

The book Consumerology references a study conducted by Timothy Wilson, a psychology professor at the University of Virginia. Wilson and his research partner Richard Nisbett asked consumers to evaluate four pairs of tights before choosing the pair which they believed to be the best quality.

When the psychologists reviewed the results, they found the consumers had given a range of nuanced reasons for their choice of favourite. Some selected a pair due to superior sheerness, others chose based on knit. Other cited elasticity. There was only one catch. All four pairs of tights were exactly the same.

In his book Where Did It All Go Wrong the advertising Planner Eaon Pritchard provides an explanation:

“We’ve learned from the recent advances in behavioural economics and consumer psychology that consumers have pretty much, no access to the unconscious mental processes that drive most of their decision making. However, this doesn’t prevent people providing plausible sounding rationalisations for their behaviour.”

To return to where I started question one, there is no such thing as raw data. It is an oxymoron. Survey data will always be subject to influence, either from the researcher or from the respondent. To quote Nick Barrowman, Senior Statistician at the Children’s Hospital of Eastern Ontario Research Institute:

“‘Raw data’ is indeed a contradiction in terms. In the ordinary use of the term “raw data,” “raw” signifies that no processing was performed following data collection, but the term obscures the various forms of processing that necessarily occur before data collection.”

The only way to circumnavigate the biases inherent in questions is to not ask questions at all. To use the data of how people actually behave, rather than how they say they behave.

But even this can only take you so far.

I’ll broach this subject in my next question.

The question of completeness

Boston has a pothole problem.

Every year authorities identify, assess and fix 20,000 of them. Every single day, maintenance staff spread across the city to conduct 55 road repairs. To help allocate its resources efficiently, the City released a smartphone app called StreetBump. As its users travelled around the city, the app used accelerometer and GPS data to passively detect potholes and report their location back to the city.

At first glance, this seems like an elegant solution. There is no biased survey. No irrational human response. The app simply observes reality and records.

Right?

Well, not quite. The app was released in 2012. Back then smartphones were not entirely ubiquitous. According to Kate Crawford of Harvard Business Review:

“StreetBump has a signal problem. People in lower income groups in the US are less likely to have smartphones, and this is particularly true of older residents, where smartphone penetration can be as low as 16%. For cities like Boston, this means that smartphone data sets are missing inputs from significant parts of the population.”

Whilst the data StreetBump collected was entirely accurate it was not entirely complete. The data points were unbiased but the data set was not.

StreetBump created a map of the city’s potholes that heavily favoured the young and the wealthy. A map where neighbourhoods with the fewest resources were largely underrepresented.

Unfortunately, demographic voids are common in big data. And it doesn’t just affect minority groups. Writing for the BBC, Charlotte McDonald exposed a gargantuan gap in her article “Is There a Sexist Data Crisis?”.

“There is a black hole in our knowledge of women and girls around the world. They are often missing from official statistics, and areas of their lives are ignored completely.”

We like to believe that data is neutral. That it doesn’t display preference or prejudice. But the truth is, it inherits the intolerances of its instigators.

Many of the world’s countries remain heavily patriarchal. And this societal asymmetry is reflected in their statistics.

At a 2012 data conference in Washington, former US Secretary of State and presidential candidate Hillary Clinton expanded on the subject:

“For too many countries we lack reliable and regular data on even the basic facts about the lives of women and girls. Facts like when women have their first child. How many hours of paid and unpaid work they do, whether they own the land they farm. Since women make up half the population, that’s like having a black hole at the centre of our data-driven universe.”

For StreetBump, the data was incomplete due to technology. For Clinton, the data was incomplete due to culture. But there is one more factor affecting the completeness of data sets: some things are just plain difficult to quantify.

According to Nate Silver, the American statistician and author of The Signal and the Noise:

“In baseball, for instance, defence has long been much harder to measure than batting or pitching. In the mid-1990s, [Billy] Beane’s Oakland A‘s teams placed little emphasis on defence, and the outfield was managed by slow and bulky players, like Matt Stairs, who came out of the womb as designated hitters. As analysis of defence advanced, it became apparent that the A’s defective defence was costing them as many as 8 to 10 wins per season, effectively taking them out of contention no matter how good their batting statistics were.”

Faced with a tight budget, Billy Beane realised that he could not outspend richer clubs. So he chose to outsmart them. He hired an Ivy League graduate named Peter Brand and used data to recruit players who other teams had written off. Beane’s approach, and his use of data, revolutionised the game. Michael Lewis wrote a book about his success. And Brad Pitt played him in a Hollywood film.

But even his data was incomplete.

Beane made a critical error. He thought that because something wasn’t measurable, it didn’t matter. But this is a false equivalence. In the words of the American sociologist William Bruce Cameron:

“Not everything that counts can be counted, and not everything that can be counted counts.”

Some things are important but immeasurable. In fact, I’d argue that the vast majority of human behaviour falls into this group. Pride, passion, anger, anticipation, sadness, surprise. These are among the messy motivations of our actions. They are hard for us to recognise, difficult for us to process, and almost impossible for us to measure.

To quote Martin Lindstrom, author of Buyology:

“Businesses have come to rely on big data to understand the emotions of their most important asset — customers. And while big data is helping companies see patterns in huge masses of information, it’s proving limited for understanding the most important aspects of customers’ needs and desires.”

Big data will always be incomplete. It will always be restricted by the method of its collection, by the context of the culture and by the degree to which the meaningful can be measured. The danger of data is that it provides the perception of completeness whilst never actually achieving it.

The incompleteness of data is a problem. But the opposite is also true. Too much information can create just as many concerns as too little.

And it is this topic to which I’ll turn next.

The question of correlation

In 2008 the entrepreneur, editor and author, Chris Anderson, published an article that outlined a vision of the world in an era of abundant data. It’s worth quoting at length:

“This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. (…) We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.”

Anderson’s vision pushes big data to its extreme.

His fantasy future is measured and monitored on such a scale, and with such precision, that all acts of scientific inquiry are rendered redundant. For when you have enough data, theorising about what might happen, is replaced by analysing what has. When you have the answers, you no longer need to ask the questions.

I believe this is misguided.

To truly understand a system, you need to understand more than just the variables that have been measured. You need to understand the relationship between them. You need to understand how they interact. How cause leads to effect. How action leads to reaction. And it is within this arena that errors begin to occur.

The more relationships a data set has, the greater the number of coincidental correlations it contains. The bigger the data, the bigger the distraction.

In order to expose this pernicious problem, Tyler Vigen has combined and analysed several big data sets and shared the spurious correlations he found within them. Here are three examples of more than 30,000 that Vigen has documented:

  • The number of films Nicholas Cage features in per year correlates with the number of people who drown by falling in swimming pools.

  • The USA’s spending on space, science and technology correlates with the number of suicides by hanging, suffocation and strangulation.

  • The divorce rate in Maine correlates with per capita consumption of margarine.

Vigen’s findings are clearly coincidental. Eating more margarine does not cause more divorces. Nor does divorce prompt people to binge eat margarine. But in big data, these chance correlations are commonplace. The more we measure, the more meaningless correlations we will find.

But there is a second reason why two variables can be correlated despite there being no causal relationship.

A 1999 study published in Nature found a correlation between children who slept with night lights and children who had the visual condition of myopia.

Left unquestioned, this could have resulted in paediatricians advising parents to avoid the use of ambient lights in their children’s bedrooms.

Fortunately, a year later, a set of different researchers published a study that shed new light on the problem. The research found that myopic parents were more likely to use ambient lighting at night. In other words, myopic parents were more likely to use night lights andmore likely to have myopic children. Night lights don’t cause myopia in children, they are both the results of a common cause. But because the original research did not gather data on the children’s parents, a confounding correlation was treated as a meaningful one.

These spurious and confounding correlations become more common as data sets expand. Noise grows faster than signal. Before long, a small number of meaningful associations is drowned in a sea of confusion. The author of The Black Swan, Nicholas Nassim Taleb, put it beautifully.

“Big data may mean more information, but it also means more false information. […] I am not saying here that there is no information in big data. There is plenty of information. The problem — the central issue — is that the needle comes in an increasingly larger haystack.”

Despite being a central tenet of scientific thought, the maxim that correlation does not imply causation seems to have been forgotten. We’ve fallen into a trap. We believe that because we have more data, we no longer need to think. We’ve begun to overvalue scale and undervalue scrutiny.

And that’s a problem. Because even if we have perfect data, it still has to be interpreted by its user. And this, again, is an opportunity for error to creep in.

This is the subject of my next chapter.

The question of comprehension

On 19th May 2001, the Premier League season drew to a close. Sir Alex Ferguson’s Manchester United had only lost six games all season. His team lifted the title after edging Arsenal to the top of the league by a margin of 3 points. It was Ferguson’s seventh league title in nine years. United dominated English football.

But Ferguson wasn’t content. He thought areas of his team could still be improved. So he looked at the data. And unlike Billy Beane’s Oakland A’s, he had an abundance of information on his team’s defensive prowess.

In his book The Choice Factory, Richard Shotton tells the story of Alex Ferguson’s data driven decision:

“Opta data showed that his star defender, Jaap Stam, was making fewer tackles each season. Ferguson promptly offloaded him in August 2001 to Lazio — keen to earn a high transfer fee before the decline became apparent to rival clubs.

However, Stam’s career blossomed in Italy and Ferguson realised his error — the lower number of tackles was a sign of Stam’s improvement, not decline. He was losing the ball less and intercepting more passes so that he needed to make fewer tackles. Ferguson says selling Stam was the biggest mistake of his managerial career.”

The season following Stam’s sale, United came third. Three points behind Liverpool and ten points behind Arsenal.

For Ferguson, the data had been correct but it didn’t tell the full story. It didn’t provide the answer. It still needed interpretation. From then on, Ferguson refused to be seduced by simplistic data.

Examples of the mis-reading of otherwise accurate data can be found everywhere.

Take Hurricane Sandy.

Sandy developed in the western Caribbean Sea on October 22nd 2012, moved slowly northward toward the Greater Antilles, caused landfall in Jamaica, hit Cuba and tore through the Bahamas. Early on the 29th, Sandy slammed into the US and wreaked havoc across the entire eastern seaboard. Damage in the United States alone amounted to 65 billion dollars.

Harvard Business Review article tells the story of how Twitter and Foursquare data was used to estimate the areas that were worst affected.

“The greatest number of tweets about Sandy came from Manhattan. This makes sense given the city’s high level of smartphone ownership and Twitter use, but it creates the illusion that Manhattan was the hub of the disaster. Very few messages originated from more severely affected locations, such as Breezy Point, Coney Island and Rockaway. As extended power blackouts drained batteries and limited cellular access, even fewer tweets came from the worst hit areas. In fact, there was much more going on outside the privileged, urban experience of Sandy that Twitter data failed to convey.”

Again, the data was correct. More tweets did originate from Manhattan. But the interpretation of that data was misguided. More tweets didn’t mean more damage. In fact, it meant the exact opposite.

Those versed in organisational theory will be familiar with the DIKW model. The acronym stands for data, information, knowledge and wisdom. As you move up through the model’s hierarchy, each layer becomes increasingly difficult but increasingly valuable. Since the model’s inception 40 years ago we have understood that data is the bottom rung. Easy to acquire but not yet useful.

I fear this has been forgotten. According to Nick Barrowman:

“In recent years, data has come to be seen less as the inherently useless raw material for humans to process and refine than as a source of power and insight in its own right. Confidence seems to have shifted away from the products of human reasoning toward the potency of pure, pre-ideological, pre-theoretical, raw data. The DIKW hierarchy has, to an extent, been turned upside down.”

In marketing we often consider data to be the answer. The end, rather than the means. We think that if we have the right data, we will make the right decisions. But this simply isn’t the case.

All data must be interpreted. It must be translated into something comprehensible. Something actionable. But from Manchester to Manhattan, good data gets misread. Infallible figures are processed by fallible people. The right numbers lead to the wrong conclusions. Because, to paraphrase John Hegarty, data gives us information, but it doesn’t give us understanding.

The question of comprehension does not question the data itself but our capacity to understand it. It argues that flawless data does not necessarily mean flawless decisions.

Conclusion

So, there you have it. Big data faces big questions. Questions of collection, claims, correlations, completeness and comprehension.

The tools we use affect the data we collect. Consumer responses don’t represent reality. What matters cannot be measured. The noise grows faster than the signal. And perfect data rarely results in perfect decisions.

One thing is clear. Big data is not perfect. It will never be totally correct or totally complete. It will never be faultless. It will never be foolproof.

It’s time our industry acknowledged this. It’s time we stopped blindly following the figures. It’s time we didn’t devour data without casting a critical eye.

We must release ourselves from our spreadsheet subservience. Data is a powerful tool. But it is just that. A tool. As marketers we must use all the tools at our disposal. We must use big data and small data. Quant and qual. Information and intuition.

The scales have tipped too far.

We must stop treating our audiences as empty averages. We must stop passing people off as percentages. We must stop trying to quantify humanity and start trying to embrace it. Let’s get out from our offices and meet people. Talk to people. Understand what moves and motivates them. What drives their decisions. What alters their actions and affects their attitudes. Let’s go beyond the surface and search for the substantial. Let’s not stop at the measurable in pursuit of the meaningful.

Big data gives us breadth, but more than ever we need depth. Big data gives us information, but more than ever we need insight. Big data gives us predictability, but more than ever we need provocation. To quote Bill Bernbach:

“We are so busy measuring public opinion that we forget we can mould it. We are so busy listening to statistics that we forget we can create them.”

So let’s stop trying to remove all risks. Let’s stop trying to reduce all waste. Let’s stop our obsession with quantifying culture and start our obsession with impacting it.

Let’s aim to make big ideas the big news.

Notes

  • If you found this article valuable, please consider giving it a clap on Medium.

Previous
Previous

The pitfalls of purpose

Next
Next

Magpie marketing