Opinion

The Great Debate

Should we believe more in Big Data or in magic?

One year I spent a lot of time with professional magicians. A few showed me the secrets to their tricks. Whenever they did, the skill and dexterity required for sleight-of-hand struck me as far more impressive than the idea that magic had been performed. It reminded me of my own experience with statistics.

Data analysis is very similar to performing magic. With great skill you can pull things together and create the perception of surprising relationships. Often the magic is getting people to look at one thing, when they should be seeing another. Similarly with statistics, it’s often not the correlation that’s interesting but what you did to find it.

This is important to keep in mind as the world embarks on the big data revolution. Big data is very large data sets, collected by the government, corporations, and institutions, becoming more available. Using this data, firms and policymakers can figure out what programs work (like health treatments, and which people respond to government incentives) and what consumers want. The deluge of information is expected to increase efficiency and lower prices. In a recent report, the McKinsey Global Institute encourages the increased availability of big data. It estimates that greater access to big data has the potential to create $3 trillion a year in value.

It is generally true that more information is better, though big data comes at a cost in terms of privacy and data collection. Yet what concerns me is the proper interpretation of big data. An earlier McKinsey report addresses this issue. It notes a dearth of trained statisticians, estimating that America is short 140,000 to 190,000 workers with the skills to handle data. But lack of talent is not just an impediment; it’s a potential source of danger. People, even those who know better, often take correlations literally and make decisions based on them, without appreciating the magic behind the numbers.

Interpreting data is more of an art than a science. But unlike magicians, most researchers do not intentionally mislead people. A big concern when you run statistics is bias (over- or understating a relationship) and mistaking correlation for causation (whether X causes Y or just that they tend to occur at the same time). You might get biased results by either using the wrong data or an inappropriate estimation technique. Minimizing bias requires making subjective judgments. If you ran numbers on a large data set without inspecting it, removing outliers, and choosing the best model — you’d have much more bias than if you used some discretion.

How to resist Big Brother 2.0

The ubiquity of digital gadgets and sensors, the pervasiveness of networks and the benefits of sharing very personal information through social media have led some to argue that privacy as a social norm is changing and becoming an outmoded concept. In this three-part series Don Tapscott questions this view, arguing that we each need a personal privacy strategy. Part one can be read here, and part two here.

As the Net becomes the basis for commerce, work, entertainment, healthcare, learning and much human discourse, each of us is leaving a trail of digital crumbs as we spend a growing portion of our day touching networks. The books, music and stocks you buy online, your pharmacy purchases, groceries scanned at the supermarket or bought online, your child’s research for a school project, the card reader at the parking lot, your car’s conversations with a database via satellite, the online publications you read, the shirt you purchase in a department store with your store card, the prescription drugs you buy – and the hundreds of other network transactions in a typical day – point to the problem.

Computers can inexpensively link and cross-reference such databases to slice, dice and recompile information about individuals in hundreds of different ways. This makes these databases enormously attractive for government and corporations that are keen to know our whereabouts and activities.

Can we retain privacy in the era of Big Data?

The ubiquity of digital gadgets and sensors, the pervasiveness of networks and the benefits of sharing very personal information through social media have led some to argue that privacy as a social norm is changing and becoming an outmoded concept. In this three-part series Don Tapscott questions this view, arguing that we each need a personal privacy strategy. Part one can be read here.

Privacy is nothing if not the freedom to be let alone, to experiment and to make mistakes, to forget and to start anew, to act according to conscience, and to be free from the oppressive scrutiny and opinions of others.

It may seem an odd notion today, but in its infancy the Internet was a favorite refuge for many seeking privacy. A famous New Yorker cartoon published almost 20 years ago featured two dogs sitting in front of a computer, with one saying to the other: “On the Internet, nobody knows you’re a dog.”

from Paul Smalera:

All your Tumblr are belong to Them

Forget Instagram’s billion-dollar payday. Forget IPOs, past and future, from Facebook, Groupon, LinkedIn and the like. And ignore, please, the online ramblings of attention-hungry venture capitalists and narcissistic Silicon Valley journalists with the off-putting habit of making their inside-baseball sound like the World Series. Their stories, to paraphrase Shakespeare, are tales told by idiots, full of sound and fury, but signifying very little about the impact of technology on most of our lives. (Sure, some of their tales are about great fortunes, but those are only for a select few; to summon the Oracle of Omaha rather than the Bard of Avon, only a fool ever equated price with value.) Their one-in-a-million windfalls are just flashes in the pan. Or, actually, they are solitary data points, meaningless when devoid of context.

That context is here. It’s come, in part, because of the cunningly simple social and curatorial tools that media companies like Twitter, Tumblr, Facebook and Pinterest give away to their users. But making sense of our social world is only possible with the the tools and technology behind what we call Big Data. The massive information collections spawned by our digital world are too big to address directly, so smart scientists have used fast computers to carve the data into real knowledge. This is how Big Data is already changing the way the world works.

But Big Data is young; though there are hundreds of accessible data sets already, there are still many more chaotic stores of information its tools can tame. Take, for example, social media: Yesterday, social media API company Gnip announced that it is providing customers with all of Tumblr’s data, what in techspeak is called the firehose. What Gnip and competitors like DataSift are providing to customers are Social Big Data firehoses that can be perfectly filtered into gently babbling brooks lined with digital gold nuggets. When the tech media wonder out loud how social companies will ever make a buck – sifting the gold out of their user-generated content is a huge piece of the puzzle.

  •