Opinion

The Great Debate

Our social-media amnesia

By Nancy Scola
March 21, 2012

It began with a hashtag — #fitn. On the eve of January’s Republican presidential primary, it seemed that every member of the political press, election observer, and New Hampshirite had adopted #fitn as a sort of quasi-official tag. It was a reference to “First in the Nation,” a long-used political phrase that dates back to the 1920s. As I watched those tweets fly by, it struck me how ubiquitous its shorthand version had become online. Where did the hashtag come from? Who first injected it into the tweet stream? Twitter’s internal search engine, as it turns out, only goes back so far. I fired up Topsy.com, by general consensus the best tweet search tool going today. But I hit the outer limits of Topsy’s archive far before I uncovered my proto-tweet. I asked Twitter HQ. No go. A smallish company, it lacks the resources, they said, to track a hashtag back to its starting point.

My struggle to find the origins of #fitn is not unique. We’re tweeting more than 340 million times a day, conducting a robust public conversation on Twitter. Yet, even on Twitter’s sixth birthday today, we still can’t track it, can’t search it, can’t access our archives. There is no public record. Is that really so much to ask?

Maybe, yes. Consider the technological constraints. Brewster Kahle, who runs the Internet Archive, a non-profit online repository for 150 billion Web pages, told me startups have a hard time being “archive aware.” For them there are more pressing concerns, like integrating servers and avoiding “fail whales.”

Twitter’s internal search tool only reaches back a week or so before you get a note saying that older tweets are not available. Twitter does, to its credit, publish an interface that allows others to pull information from its services. But there’s a built-in cap on how many tweets can be accessed that way. (It changes, but at one recent point it was in the couple-thousand-tweet range.) And so, we’re left with our current status quo: tweets that seem to fall into a black hole. (Twitter declined to speak on record for this piece.)

Who cares, right? These are tweets, after all. Somehow we’ve survived as a culture without recording, say, every phone call we made in the ’80s. But Twitter’s centrality to the political conversation from the U.S. to Egypt has already made it more than mere ephemera. It’s still the early days of the social-media era, and our vantage point is not a particularly good one to decide what’s worth saving and what’s not.

“In the Elizabethan era,” points out Michael Lesk, chair of the Rutgers University Department of Library and Information Science, “plays weren’t saved because only sermons and poetry were considered literature.”

Of course, it’s not really us making the decision about what to save. It’s Twitter and the other big players in the digital communications realm that are making it. For a few years, it was tempting to let the Web’s natural order take care of things. If Twitter was busy helping people produce huge caches of information, Google was helping to make it searchable. Starting in 2009, Google and Twitter had a deal to include a healthy helping of tweets in its real-time search results. But with the rollout of Google+ and, especially, Google’s choice to give special treatment to Google+ posts in its organic search results, Google and Twitter have gone from complementary online players to competitors; their real-time search agreement was allowed to lapse last summer.

So, if private industry isn’t going to save our tweets, maybe the public sector will? About two years ago, Twitter and the Library of Congress signed an agreement whereby the San Francisco company would gift the institution a copy of its tweet archive and the library would manage access to it. But, today, getting that project off the ground is “going to be a while,” according to Library of Congress public relations specialist Jennifer Gavin. Why? “The process of how to serve [the tweet archive] out to researchers while still maintaining the parameters set by our agreement with Twitter is still being worked out,” Gavin says. According to that agreement, access to the archive is primarily for “bona fide” researchers. Bulk download is prohibited, and there’s a six-month delay between the time a tweet goes live on Twitter and when it’s made available through the Library of Congress.

Why might Twitter Inc be interested in throttling access to the Library of Congress and other search engines? A good guess is money. A recent piece of tech industry news peels back the curtain a bit. Twitter, reported the Daily Mail last week, has sold two years’ worth of tweets to the British social-data research firm DataSift. DataSift is said to have a thousand companies lined up for the tweets, eager to get access for business analysis purposes. That deal builds upon an earlier reseller agreement with the Colorado social-media data provider Gnip. The giant heap of tweets we’ve produced since 2006, when Twitter launched, is potentially the stuff from which great fortunes are made. The more people who have those tweets for free, the less valuable those tweets are.

Imagine an archival spectrum where on one end is a Googlesque stash of all tweets from all time and on the other is something akin to the Svalbard Global Seed Vault on Norway’s remote Spitsbergen island. Our tweets might be locked away, safe and sound, but nobody can get to them unless there’s an emergency. It’s an approach that worries the Internet Archive’s Brewster Kahle. “Access is a key to preservation,” he says. “It’s hard to get people motivated for keeping a dark archive robust.”

But even if we assume perfect searchability, there are still reasons to question a strategy where just Twitter, or just Twitter and the Library of Congress, are expected to maintain the tweet archive. “The Internet Archive is great,” says Kahle as he makes a comparison. “But you don’t want to have just one. The Library of Congress has books, but we don’t require it to have the book.” (The Internet Archive had been interested in getting a copy of the tweet archive from Twitter.)

Even before the Internet taught us about all the great advantages of distribution and decentralization, libraries were practicing it. The Library of Congress has some 35 million volumes. But its role (for everyone other than the United States Congress) is lender of last resort. The Library of Congress assumes a high degree of redundancy, a major player but simply one in a web that includes scores of personal, institutional, academic and other libraries.

There are some efforts, like the distributed update platform Identi.ca or The Locker Project personal data service, that are trying to offer users greater control over their digital information. But right now they’re limited to early adopters.

An alternative, then: What about outsourcing the work of keeping our personal contributions to the public record current and available? Local historical societies might step in to capture the tweet stream of a particular place or event as it happens in real time. Universities might provide students with an archival copy of their college-age tweets. Groups of people with a common interest might choose a provider to capture tweets around a theme in which they maintain a deep interest. With a distributed assortment of stewards of our social data (many of whom, of course, would outsource the technology side of things to tech firms), we’d have not one archive but multiple, redundant mini-archives.

Maybe I’d find my earliest use of the #fitn hashtag in the archives of the New Hampshire Republican Party, or the Union Leader, or Saint Anselm College’s New Hampshire Political Library. It might take a bit of legwork to find a tweet then. But it would be less work than counting on users to archive their own tweets and more productive than leaving it to Twitter to do. And so, we might soon see smaller archivers of all stripes step in, the local libraries of the social web.

Of course, an answer in “personal archiving” raises all sorts of new questions. Would it be necessary, or possible, or even desirable for those services to always be opt-in? Does the friction introduced by having a system of separate mini-caches mitigate privacy concerns of having tweets reaching far back? Is that a smart trade-off — limiting the searchability of the Web to maintain some notion of personal privacy — or should we just rip the bandage off and admit that we became fully public, in every sense of the word, the minute we posted something on Twitter?

Complicated, challenging questions. Also complicated and challenging: having an always-on global conversation among hundreds of million of people. But it falls short of the imaginativeness that is marking this moment in our technological evolution to simply leave all this archival business to just one company to figure out. “And that,” says Kahle, “is no way to run a culture.”

Comments
6 comments so far | RSS Comments RSS

After reading this I am not sure how comfortably I will sleep tonight.

Posted by Nevinsdor | Report as abusive
 

Nice to know that at least something from the internet disappears after you write it. Why do we need an archive for everything? Tweets approximate speech: brief remarks tossed off during the course of the day. They’re not meant to be immortalized, any more than they would be if they were spoken. We should have some privacy for our offhand remarks, don’t you think?

Posted by NewsLady | Report as abusive
 

Nancy, Reuters, among other reporting agencies published Twitter’s sale of data. Although I am unaware of how old their salable data is, the value of user behavior is high that I venture to guess they have it all.

This article from InformationWeek “Superhighway To Hell” by Stephen Saunders written June 2010 is very accurate about user information.

http://www.informationweek.com/news/inte rnet/search/225700640

Recent Reuters article about Twitter selling data:

http://www.reuters.com/article/2012/03/0 1/us-twitter-data-idUSTRE8201IU20120301

Posted by GSH10 | Report as abusive
 

According to this Reuters article you can view all your old tweets, delete them, etc… just click on the word “Tweets.”

http://www.reuters.com/article/2012/03/0 1/us-twitter-data-idUSTRE8201IU20120301

Posted by wholetone | Report as abusive
 

While archiving internet content is a good idea, there are some real problems with it. Already, I have concerns about the many sites that disallow deletion of user comments or content. You can literally drive your own reputation into the ground, or reveal your political affiliations, and that be a matter of public record. Of course, as we age, we change, and our opinions may change, but sadly anything written on the Internet remains in ‘Ink’ on far too many sites. So, archiving content is a good and worthy goal, but privacy is also important. Just like the Internet Archive allows individual site owners to opt-out, it would be nice if individual users could censor their names from archives by submitting a request to do just that (with appropriate authentication of the person). It is scary, to me, to think of poor teenagers who throw everything on the web — a few years from now, they’ll have so much biting them in the butt that some may have a hole too deep to ever dig out of. In politically unstable countries, some may become targets. It’s simply a scary Internet ;p

Posted by aha321 | Report as abusive
 

A computer chip designer describes some of the technology trends that have more people creating and sharing more digital content and data, which is compounding the threat of digital amnesia:

“It’s really about who is accountable for assuring the continuity of the digital content you care about. In the past, users would be held accountable for backing up their own data, but lately it has shifted to providers,” said Steve Grobman, an Intel security architecture engineer.

He points out that many services are allowing people to download their activity and content posted to social media and networking sites, but the oneness is on the individual to have a smart approach to selecting cloud-based services http://www.intelfreepress.com/news/digit al-amnesia-a-threat-to-personal-data/611 0

Posted by kenekaplan | Report as abusive
 

Post Your Comment

We welcome comments that advance the story through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can flag it to our editors by using the report abuse links. Views expressed in the comments do not represent those of Reuters. For more information on our comment policy, see http://blogs.reuters.com/fulldisclosure/2010/09/27/toward-a-more-thoughtful-conversation-on-stories/
  •