All your Tumblr are belong to Them
Forget Instagram’s billion-dollar payday. Forget IPOs, past and future, from Facebook, Groupon, LinkedIn and the like. And ignore, please, the online ramblings of attention-hungry venture capitalists and narcissistic Silicon Valley journalists with the off-putting habit of making their inside-baseball sound like the World Series. Their stories, to paraphrase Shakespeare, are tales told by idiots, full of sound and fury, but signifying very little about the impact of technology on most of our lives. (Sure, some of their tales are about great fortunes, but those are only for a select few; to summon the Oracle of Omaha rather than the Bard of Avon, only a fool ever equated price with value.) Their one-in-a-million windfalls are just flashes in the pan. Or, actually, they are solitary data points, meaningless when devoid of context.
That context is here. It’s come, in part, because of the cunningly simple social and curatorial tools that media companies like Twitter, Tumblr, Facebook and Pinterest give away to their users. But making sense of our social world is only possible with the the tools and technology behind what we call Big Data. The massive information collections spawned by our digital world are too big to address directly, so smart scientists have used fast computers to carve the data into real knowledge. This is how Big Data is already changing the way the world works.
But Big Data is young; though there are hundreds of accessible data sets already, there are still many more chaotic stores of information its tools can tame. Take, for example, social media: Yesterday, social media API company Gnip announced that it is providing customers with all of Tumblr’s data, what in techspeak is called the firehose. What Gnip and competitors like DataSift are providing to customers are Social Big Data firehoses that can be perfectly filtered into gently babbling brooks lined with digital gold nuggets. When the tech media wonder out loud how social companies will ever make a buck – sifting the gold out of their user-generated content is a huge piece of the puzzle.
At Gnip, Tumblr joins Twitter, WordPress, Disqus and the Chinese microblogging service Sina Weibo as the latest tree in a forest of Social Big Data accessible via API. A well-written API can transform a jumble of numbers into a perfectly organized multiplication table – on the order of millions or even billions of complex data pieces. (See this recent Economist visualization of the data record of a single tweet for more context.)
The data pieces are valuable, but not solely because they help advertisers sell more widgets: In an email, Gnip Chief Operating Officer Chris Moody explained one of the coolest uses of data his company has enabled may have actually helped firefighters do their job better: “During the 4 Mile Canyon Fire in Boulder in 2010, [Gnip customer] VisionLink was able to provide fire crews and managers a realtime view into what was happening on the ground by layering geo-tagged Tweets and Flickr images onto a Google map of the area.”
It’s not just maps, photos and geo locations that number crunchers crave. Tumblr, after all, is a blog network full of cat photos, animated GIFs and other tomfoolery. Yet last year its already booming traffic grew an additional 300 percent. As the Web comic XKCD noted a day before Gnip’s announcement, the proper noun “Tumblr” is perhaps six months away from surpassing “blogging” in online searches, much the way “Google” became synonymous with the verb “search” a decade earlier. Tumblr users know that the site’s tools are heavily oriented toward sharing and signaling, as opposed to pure content creation. On Tumblr, users can of course write posts and upload photos, but they can also follow other users, “heart” each others’ posts and reblog posts they want to share with their own followers – and each action takes little more than one click. All these actions are trackable, all of them indicate some sort of sentiment or preference and all of them are discrete chunks of data in Tumblr’s massive data store – now joining the 90 billion pieces of social data Gnip is already delivering on a monthly basis.
If all of this sounds like a grand plot to aggregate, sift and derive insight into social behavior from all sorts of data that unsuspecting users unwittingly upload – well, it is. But those are the trade-offs that users agree to when they sign up with social-media websites, right when they check that box agreeing to the terms of service.
And it’s not just marketers and advertisers who pore over the data – the Guardian bought a slice of real-time social media from DataSift to help inform its coverage of the 2011 London riots. Joining journalists who want to understand social media with the help of big data are governments, NGOs, charities and non-profits, among all sorts of other organizations. For the first time ever, all these groups can access the data that can help them answer questions they’ve been asking for decades. In industry, science and government, many data pools have been available for several years now.
In a way, social media, despite being cutting-edge, was behind the curve. It had to grow up a little more, out of niche status, before its data set was big enough to provide meaningful insight into the way we live online. It looks as if that day is just about here. (Interestingly, Facebook, oft-criticized for privacy lapses, remains a mostly dark data pool; the company will slice and dice user data for advertisers and app makers but keeps its firehose closely guarded.)
So, yes, to adapt an old Internet meme, All your Tumblr (and much of your other social history) are belong to Big Data. But that’s not necessarily bad, if all the power of social media belongs to all of us. Whether that’s a worthwhile trade-off all depends on how we use that power, and what we finally do with all that data.
PHOTO: A Massachusetts firefighter hauls a firehose into position. REUTERS/Jim Bourg