Our social-media amnesia
It began with a hashtag — #fitn. On the eve of Januaryâs Republican presidential primary, it seemed that every member of the political press, election observer, and New Hampshirite had adopted #fitn as a sort of quasi-official tag. It was a reference to âFirst in the Nation,â a long-used political phrase that dates back to the 1920s. As I watched those tweets fly by, it struck me how ubiquitous its shorthand version had become online. Where did the hashtag come from? Who first injected it into the tweet stream? Twitterâs internal search engine, as it turns out, only goes back so far. I fired up Topsy.com, by general consensus the best tweet search tool going today. But I hit the outer limits of Topsyâs archive far before I uncovered my proto-tweet. I asked Twitter HQ. No go. A smallish company, it lacks the resources, they said, to track a hashtag back to its starting point.
My struggle to find the origins of #fitn is not unique. Weâre tweeting more than 340 million times a day, conducting a robust public conversation on Twitter. Yet, even on Twitterâs sixth birthday today, we still canât track it, canât search it, canât access our archives. There is no public record. Is that really so much to ask?
Maybe, yes. Consider the technological constraints. Brewster Kahle, who runs the Internet Archive, a non-profit online repository for 150 billion Web pages, told me startups have a hard time being âarchive aware.â For them there are more pressing concerns, like integrating servers and avoiding âfail whales.â
Twitterâs internal search tool only reaches back a week or so before you get a note saying that older tweets are not available. Twitter does, to its credit, publish an interface that allows others to pull information from its services. But thereâs a built-in cap on how many tweets can be accessed that way. (It changes, but at one recent point it was in the couple-thousand-tweet range.) And so, weâre left with our current status quo: tweets that seem to fall into a black hole. (Twitter declined to speak on record for this piece.)
Who cares, right? These are tweets, after all. Somehow weâve survived as a culture without recording, say, every phone call we made in the ’80s. But Twitterâs centrality to the political conversation from the U.S. to Egypt has already made it more than mere ephemera. Itâs still the early days of the social-media era, and our vantage point is not a particularly good one to decide whatâs worth saving and whatâs not.
“In the Elizabethan era,” points out Michael Lesk, chair of the Rutgers University Department of Library and Information Science, “plays weren’t saved because only sermons and poetry were considered literature.”
Of course, itâs not really us making the decision about what to save. Itâs Twitter and the other big players in the digital communications realm that are making it. For a few years, it was tempting to let the Webâs natural order take care of things. If Twitter was busy helping people produce huge caches of information, Google was helping to make it searchable. Starting in 2009, Google and Twitter had a deal to include a healthy helping of tweets in its real-time search results. But with the rollout of Google+ and, especially, Googleâs choice to give special treatment to Google+ posts in its organic search results, Google and Twitter have gone from complementary online players to competitors; their real-time search agreement was allowed to lapse last summer.
So, if private industry isnât going to save our tweets, maybe the public sector will? About two years ago, Twitter and the Library of Congress signed an agreement whereby the San Francisco company would gift the institution a copy of its tweet archive and the library would manage access to it. But, today, getting that project off the ground is âgoing to be a while,â according to Library of Congress public relations specialist Jennifer Gavin. Why? âThe process of how to serve [the tweet archive] out to researchers while still maintaining the parameters set by our agreement with Twitter is still being worked out,â Gavin says. According to that agreement, access to the archive is primarily for âbona fideâ researchers. Bulk download is prohibited, and thereâs a six-month delay between the time a tweet goes live on Twitter and when itâs made available through the Library of Congress.
Why might Twitter Inc be interested in throttling access to the Library of Congress and other search engines? A good guess is money. A recent piece of tech industry news peels back the curtain a bit. Twitter, reported the Daily Mail last week, has sold two yearsâ worth of tweets to the British social-data research firm DataSift. DataSift is said to have a thousand companies lined up for the tweets, eager to get access for business analysis purposes. That deal builds upon an earlier reseller agreement with the Colorado social-media data provider Gnip. The giant heap of tweets weâve produced since 2006, when Twitter launched, is potentially the stuff from which great fortunes are made. The more people who have those tweets for free, the less valuable those tweets are.
Imagine an archival spectrum where on one end is a Googlesque stash of all tweets from all time and on the other is something akin to the Svalbard Global Seed Vault on Norwayâs remote Spitsbergen island. Our tweets might be locked away, safe and sound, but nobody can get to them unless thereâs an emergency. Itâs an approach that worries the Internet Archiveâs Brewster Kahle. âAccess is a key to preservation,â he says. âItâs hard to get people motivated for keeping a dark archive robust.â
But even if we assume perfect searchability, there are still reasons to question a strategy where just Twitter, or just Twitter and the Library of Congress, are expected to maintain the tweet archive. âThe Internet Archive is great,â says Kahle as he makes a comparison. âBut you donât want to have just one. The Library of Congress has books, but we donât require it to have the book.â (The Internet Archive had been interested in getting a copy of the tweet archive from Twitter.)
Even before the Internet taught us about all the great advantages of distribution and decentralization, libraries were practicing it. The Library of Congress has some 35 million volumes. But its role (for everyone other than the United States Congress) is lender of last resort. The Library of Congress assumes a high degree of redundancy, a major player but simply one in a web that includes scores of personal, institutional, academic and other libraries.
There are some efforts, like the distributed update platform Identi.ca or The Locker Project personal data service, that are trying to offer users greater control over their digital information. But right now theyâre limited to early adopters.
An alternative, then: What about outsourcing the work of keeping our personal contributions to the public record current and available? Local historical societies might step in to capture the tweet stream of a particular place or event as it happens in real time. Universities might provide students with an archival copy of their college-age tweets. Groups of people with a common interest might choose a provider to capture tweets around a theme in which they maintain a deep interest. With a distributed assortment of stewards of our social data (many of whom, of course, would outsource the technology side of things to tech firms), weâd have not one archive but multiple, redundant mini-archives.
Maybe Iâd find my earliest use of the #fitn hashtag in the archives of the New Hampshire Republican Party, or the Union Leader, or Saint Anselm Collegeâs New Hampshire Political Library. It might take a bit of legwork to find a tweet then. But it would be less work than counting on users to archive their own tweets and more productive than leaving it to Twitter to do. And so, we might soon see smaller archivers of all stripes step in, the local libraries of the social web.
Of course, an answer in âpersonal archivingâ raises all sorts of new questions. Would it be necessary, or possible, or even desirable for those services to always be opt-in? Does the friction introduced by having a system of separate mini-caches mitigate privacy concerns of having tweets reaching far back? Is that a smart trade-off — limiting the searchability of the Web to maintain some notion of personal privacy — or should we just rip the bandage off and admit that we became fully public, in every sense of the word, the minute we posted something on Twitter?
Complicated, challenging questions. Also complicated and challenging: having an always-on global conversation among hundreds of million of people. But it falls short of the imaginativeness that is marking this moment in our technological evolution to simply leave all this archival business to just one company to figure out. âAnd that,â says Kahle, âis no way to run a culture.â