“I’m not interested in working on this unless it’s going to be a multi-billion dollar idea. If I thought this would be a hundred million dollar company — what’s the point?” – Anonymous entreprerneur discussing his startup. Overheard in front of Ozo Coffee, Boulder, CO.
I’m in Boulder, Colorado for a few days this week to attend Big Boulder, a conference devoted to the social side of “big data.” Gnip, the company hosting the conference, is one I’ve written about before. They’re doing the plumber’s work of connecting all the firehoses of raw, public user data from social media companies like Twitter and Tumblr up to clients that want to derive insights from the wisdom of these online crowds.
A quick note on the definition of “big data.” Generally speaking, it’s the sort of data set that’s so huge, even running a simple report on it won’t tell you anything interesting. For example, if you could ask the IRS for a list of all the 25-30 year olds in the U.S. that paid taxes last year, you’d get back a list, alright. But what would be useful about it? On the other hand, if you could filter that list by several other factors: did they pay capital gains, did they owe over six figures in taxes, what is their self-reported job title, and so on, you might end up with a list highly correlated to young, dot-com millionaires and billionaires, like Mark Zuckerberg. And you might cross reference that list against all the other data sets you can find on them: where they live, where they shop, where they travel, what they watch, eat and listen to. It’s all out there.
Social media companies have woken up to the idea that their user bases are throwing billions of data points that have huge potential value, in aggregate. But to get to the point where big data is useful, the tooling around asking and getting the answers to those sorts of questions has to be very, very good.
That — getting to the point where insights are derived from huge firehoses of content — is where data science comes in, and where Big Boulder attendees get wildly excited about the potential for big data to change the way the world works. (There are plenty of skeptics on the other side of the coin too, that wonder if the phrase “big data” has simply become the latest marketing jargon in the tech industry, even as it has yielded insights in unsexy fields, like milk production, for decades now.)