Some years ago, IBM posted a video on Youtube about their new analytics software. The scientist showed that the software could query Twitter for all tweets that met user defined criteria (over any given period of time in any geographic region, etc.), build a database out of the results and make spectacular looking graphs showing, for example, how people felt about the new iPhone.

This is the sort of insight that supposedly gives Twitter its worth, this is whence Twitter’s revenue comes. Can someone assess whether this data is worth anything at all?

Sure, Twitter can provide demographics on all the tweets, too, and the IBM software can break down the analysis by all sorts of criteria, but I’m still not sure if any legitimate extrapolation can be made from the Twitter users to the consumers at large, even if the Twitter data is controlled for race, age, socio-economic class, etc., etc.

Twitter obviously has some people convinced that its data are relevant, but nobody has scrutinized that claim yet. I venture that no matter how many more users Twitter signs up, the statistical confidence that goes behind a business decision may never be achieved precisely because behavior on Twitter does not in any way represent normal behavior. The statistical fundamentals are flawed.

It would be so nice if an organization with Reuters’ resources would tackle the fundamental underpinnings of a lot of businesses that could potentially be ripping people off.