The Black Hole: How the Web devours history

February 27, 2009

ericauchard1– Eric Auchard is a Reuters columnist. The opinions expressed are his own –

Academics, family researchers and even baseball history nuts have noticed recently how some important archives of older newspapers from around the world have vanished off the Web.

The problems have surfaced since PaperofRecord.com, a collection of more than 20 million newspaper pages of papers ranging from the Toronto Star to Mexican village periodicals to newspapers as far as Perth, Australia, merged into Google News Archive.

The problem, researchers discovered, was that Google has had trouble reformatting the newspaper images and gaining rights to display some of the older publications. It has, at least, temporarily removed some of the archives from public view.

There is an idealized view of the Web that sees it as a storehouse of human knowledge, and in the sense of the breadth of what I can find with a random Google search, this is true.

But for all its openness, the Web has proven to be a leaky vessel for historical preservation, with much of its treasure trove lost in a maze of altered Web pages, broken links and deleted sites.

The head of the British Library recently warned in The Observer newspaper that if this digital memory loss is not fixed, we “are in danger of creating a black hole for future historians and writers.”

Archives of The Sporting News, founded in 1886, and nicknamed the Bible of Baseball, is among the publications that has fallen victim in the transition of PaperofRecord.com to Google ownership. Some older Mexican newspapers are also offline, academics complain.

Preserving history on the Web is a struggle even for Google, whose stated mission is “to organize the world’s information and make it universally accessible and useful.”

“We’re doing our best to find a solution to include as much of the acquired content as possible,” a Google spokesman says of the newspaper archive transition.

But as more and more of our collective memory is hosted online, the danger grows that we lose the content and context of events that happened just days ago, let alone weeks, months or decades back.

Try retracing the links to old scandals or unflattering images on the Web, say to Enron or Parmalat or other fallen corporate names. Most of them are gone, despite the best efforts of sites like Wikipedia or Smoking Gun or the combined energies of the blogosphere to ferret out and preserve such history.

Where is the global sense of outrage that followed the looting of Iraq’s National Museum as U.S. troops stood by in the turmoil following the ouster of Saddam Hussein in 2003? While hard to measure, I think it’s a safe bet that the world suffers the loss of a museum full of artifacts every day by depending upon the Web to host our precious cultural memories.

That’s not to neglect the enormous value of the Web as temporal medium for sharing information. The latest celebrated example of this is how independent analyst Alex Dalmady used financial data from the Stanford Group’s own website to uncover the unlikely financial returns promised by the bank.

His Web detective work is the exception that proves the rule. It was all information hiding in plain site and Dalmady simply had the courage to say the emperor had no clothes.

“One does not have to be a detective, or even a financial expert, to spot financial institutions that may prove insolvent, or worse, with the passage of time,” Dalmady crowed in a report he wrote. “As the saying goes, if it looks like a duck…”

Examples like Dalmady’s are, sadly, the exception.

The World Wide Web as it has evolved over the years has made it almost purpose-built for obscuring or deleting uncomfortable facts. That wasn’t the intention of Web inventor Tim Berners-Lee, whose vision was that every address would point to a discreet page of data. Instead, Web designers have found it convenient to create dynamic Web addresses that may make it impossible to find information the next time you return to a site.

Even Dalmady’s work in January is already hard to reproduce. The Stanford International Bank Ltd site informs visitors the company has been put into receivership and provides no links to its past business.

The recent privacy backlash by Facebook users began when the management of the world’s most popular social networking site attempted to address the issue of who owns the history of conversations that occur between Facebook friends if one of the parties leaves the site.

Changes made last month to Facebook user guidelines implied that the company owned the rights to users’ personal data, including message and photos, even after they shutdown their accounts. The company has since back-peddled and assured its 175 million members that, indeed, users control the data they create on the site.

Susan Feldman, an expert on Web search with research firm IDC in Framingham, Massachusetts, says the problem of the disappearing Web is very real and also partly a mirage. The limitations of current search technology that depend on users choosing the right keywords to find what they are looking for is part of the problem.

Help is on the way from improved search tools such as text analytics and concept clustering technology that will help users find more of the information they may think is lost on the Web.

But until the Web’s important information archives are secured in modern libraries and improved search tools are widely available, the sense that we are losing our collective digital heritage will only grow.

Enjoy the Web’s many benefits, while they are still on your screen. Keep copies of anything you want to remember, or risk losing it, perhaps as early as the next time you refresh your browser. We live in a time where the capacity to record and capture our lives has never been greater.

But using the Web to preserve those memories makes it more and more likely that future generations will consider the early years of the Web to be lost decades.

– At the time of publication Eric Auchard did not own any direct investments in securities mentioned in this article. He may be an owner indirectly as an investor in a fund. For previous columns, Reuters’ customers can click here.  –

25 comments

We welcome comments that advance the story through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can flag it to our editors by using the report abuse links. Views expressed in the comments do not represent those of Reuters. For more information on our comment policy, see http://blogs.reuters.com/fulldisclosure/2010/09/27/toward-a-more-thoughtful-conversation-on-stories/

We are the gatekeepers. Archive, collect, reproduce and distribute.

Posted by Andrea | Report as abusive

There is one project that is attempting plug a hole in the “leaky vessel for historical preservation”: the Wayback Machine on http://www.archive.org

http://www.archive.org/web/web.php

http://en.wikipedia.org/wiki/Internet_Ar chive#Wayback_Machine

Posted by Walter Roth | Report as abusive

some humble ideas:
Dont forget your past or you will have to repeat it.

No one says the internet is an infallible information storage facility but it is an information storage facility.
all the info is not true? we know, we learned that reading books on libraries and paying attention to all kinds of news and politics.

Who tells the story? the one who won the war? honestly i prefer to read the multiple points of views of diferent people.

We do have the technology to save all that information and a lot more, the thing is some still very powerfull people have no interest in thtat.

we the people are the keepers, of all history google other search engines and internet are just tools.

Posted by amattei2000 | Report as abusive

One word:

Alexandria.

The greatest of all libraries stood more than 700 years then, burned. While much of was the collective history, knowledge -it was that which was…appropriated…from other collections, increasing its uniqueness. It was called it ‘laying claim’ once a place was over ran; to collect for this Great Library. Much like Baghdad, it’s tradition.

Point: This one vaporized into smoke however, and with it went all the great collections; all the one-offs. It seems eerily similar to a digital age where the knowledge gets dropped, corrupted so easily.

honico//

Posted by honico | Report as abusive

eric is justified in his concerns that
mega-corporations, such as ‘enron’ are able to irretrievably alter history by deleting their brushstrokes.

what, I find, even more alarming is the richness, colour and tapestry of ’80s to 2009 and beyond has been lost to
civilisation with the rise and rise of digital photography. the percentage of work that reaches the finished, hard copy (onto photographic paper) is at an historical low. this equates to more photos being taken, then at any time in the history of our civilisation, (for want of a better word[uncivilisation, perhaps]) but less photos being produced to the final medium.

all these spurious friends’ shots stored on camera memories, uploaded onto equally random pc memory’s results in a massive black hole in our collective consciousness in this particularly vulgar and turbulent epoch.

well crafted footage and documentation of current species, as they decline into extinction, may not be available and/or accessible, and man will be left with shakey artists’ sketches to reminisce and ponder.

when the last animal of the wild has alighted the
spirit of Man will
pale to a
blue moon…
will they leave a toa for our solace.

toa= a marker left by departing
indigenous Australian
tribes to indicate the
direction in whence they
had departed.eg.feather

Posted by sweeny | Report as abusive