– Eric Auchard is a Reuters columnist. The opinions expressed are his own –
Academics, family researchers and even baseball history nuts have noticed recently how some important archives of older newspapers from around the world have vanished off the Web.
The problems have surfaced since PaperofRecord.com, a collection of more than 20 million newspaper pages of papers ranging from the Toronto Star to Mexican village periodicals to newspapers as far as Perth, Australia, merged into Google News Archive.
The problem, researchers discovered, was that Google has had trouble reformatting the newspaper images and gaining rights to display some of the older publications. It has, at least, temporarily removed some of the archives from public view.
There is an idealized view of the Web that sees it as a storehouse of human knowledge, and in the sense of the breadth of what I can find with a random Google search, this is true.
But for all its openness, the Web has proven to be a leaky vessel for historical preservation, with much of its treasure trove lost in a maze of altered Web pages, broken links and deleted sites.
The head of the British Library recently warned in The Observer newspaper that if this digital memory loss is not fixed, we “are in danger of creating a black hole for future historians and writers.”
Archives of The Sporting News, founded in 1886, and nicknamed the Bible of Baseball, is among the publications that has fallen victim in the transition of PaperofRecord.com to Google ownership. Some older Mexican newspapers are also offline, academics complain.
Preserving history on the Web is a struggle even for Google, whose stated mission is “to organize the world’s information and make it universally accessible and useful.”
“We’re doing our best to find a solution to include as much of the acquired content as possible,” a Google spokesman says of the newspaper archive transition.
But as more and more of our collective memory is hosted online, the danger grows that we lose the content and context of events that happened just days ago, let alone weeks, months or decades back.
Try retracing the links to old scandals or unflattering images on the Web, say to Enron or Parmalat or other fallen corporate names. Most of them are gone, despite the best efforts of sites like Wikipedia or Smoking Gun or the combined energies of the blogosphere to ferret out and preserve such history.
Where is the global sense of outrage that followed the looting of Iraq’s National Museum as U.S. troops stood by in the turmoil following the ouster of Saddam Hussein in 2003? While hard to measure, I think it’s a safe bet that the world suffers the loss of a museum full of artifacts every day by depending upon the Web to host our precious cultural memories.
That’s not to neglect the enormous value of the Web as temporal medium for sharing information. The latest celebrated example of this is how independent analyst Alex Dalmady used financial data from the Stanford Group’s own website to uncover the unlikely financial returns promised by the bank.
His Web detective work is the exception that proves the rule. It was all information hiding in plain site and Dalmady simply had the courage to say the emperor had no clothes.
“One does not have to be a detective, or even a financial expert, to spot financial institutions that may prove insolvent, or worse, with the passage of time,” Dalmady crowed in a report he wrote. “As the saying goes, if it looks like a duck…”
Examples like Dalmady’s are, sadly, the exception.
The World Wide Web as it has evolved over the years has made it almost purpose-built for obscuring or deleting uncomfortable facts. That wasn’t the intention of Web inventor Tim Berners-Lee, whose vision was that every address would point to a discreet page of data. Instead, Web designers have found it convenient to create dynamic Web addresses that may make it impossible to find information the next time you return to a site.
Even Dalmady’s work in January is already hard to reproduce. The Stanford International Bank Ltd site informs visitors the company has been put into receivership and provides no links to its past business.
The recent privacy backlash by Facebook users began when the management of the world’s most popular social networking site attempted to address the issue of who owns the history of conversations that occur between Facebook friends if one of the parties leaves the site.
Changes made last month to Facebook user guidelines implied that the company owned the rights to users’ personal data, including message and photos, even after they shutdown their accounts. The company has since back-peddled and assured its 175 million members that, indeed, users control the data they create on the site.
Susan Feldman, an expert on Web search with research firm IDC in Framingham, Massachusetts, says the problem of the disappearing Web is very real and also partly a mirage. The limitations of current search technology that depend on users choosing the right keywords to find what they are looking for is part of the problem.
Help is on the way from improved search tools such as text analytics and concept clustering technology that will help users find more of the information they may think is lost on the Web.
But until the Web’s important information archives are secured in modern libraries and improved search tools are widely available, the sense that we are losing our collective digital heritage will only grow.
Enjoy the Web’s many benefits, while they are still on your screen. Keep copies of anything you want to remember, or risk losing it, perhaps as early as the next time you refresh your browser. We live in a time where the capacity to record and capture our lives has never been greater.
But using the Web to preserve those memories makes it more and more likely that future generations will consider the early years of the Web to be lost decades.
– At the time of publication Eric Auchard did not own any direct investments in securities mentioned in this article. He may be an owner indirectly as an investor in a fund. For previous columns, Reuters’ customers can click here. –


There never has been an informational storage facility like the internet in terms of reliability and breadth of knowledge. Even the grandest libraries in the world cannot compare to the wealth of information (and misinformation) stored online.
That being said, to assume the internet is supposed to be an infallible information storage facility is a mistake. The reason most information is removed from history is because it was thrown out: either purposefully, accidentally, or with a sense that it didn't matter one way or the other.
To say we should collectively keep everything available to us as a society in the hopes that it helps future generations is unrealistic. It's unrealistic from a tangible real-world sense, and it's unrealistic from a digital sense as well. Information-driven data still requires real-world storage space, and that storage space still belongs to individuals who have to make real decisions about whether they must throw something out to save themselves (eg tearing up a picture of an old flame, shredding important legal documentation), or simply to save space (eg spam, old documents that really don't matter anymore).
Some items that we don't want removed will often get removed anyway. In the case mentioned above regarding Google they made a call as to what was important and what wasn't. If somebody really wants old baseball data, go to baseball-reference.com. If that doesn't solve it, phone calls and legwork probably will. In this particular case, the documents more than likely still exist, even though the internet might not be holding them.
No matter how big the internet becomes, it will still be unrealistic to depend upon it as the keeper of all history of all corners of the globe. It's simply not viable.
Trackback

25 comments so far
Previous | 2 | 1 | Next
eric is justified in his concerns that
mega-corporations, such as ‘enron’ are able to irretrievably alter history by deleting their brushstrokes.
what, I find, even more alarming is the richness, colour and tapestry of ’80s to 2009 and beyond has been lost to
civilisation with the rise and rise of digital photography. the percentage of work that reaches the finished, hard copy (onto photographic paper) is at an historical low. this equates to more photos being taken, then at any time in the history of our civilisation, (for want of a better word[uncivilisation, perhaps]) but less photos being produced to the final medium.
all these spurious friends’ shots stored on camera memories, uploaded onto equally random pc memory’s results in a massive black hole in our collective consciousness in this particularly vulgar and turbulent epoch.
well crafted footage and documentation of current species, as they decline into extinction, may not be available and/or accessible, and man will be left with shakey artists’ sketches to reminisce and ponder.
when the last animal of the wild has alighted the
spirit of Man will
pale to a
blue moon…
will they leave a toa for our solace.
toa= a marker left by departing
indigenous Australian
tribes to indicate the
direction in whence they
had departed.eg.feather
One word:
Alexandria.
The greatest of all libraries stood more than 700 years then, burned. While much of was the collective history, knowledge -it was that which was…appropriated…from other collections, increasing its uniqueness. It was called it ‘laying claim’ once a place was over ran; to collect for this Great Library. Much like Baghdad, it’s tradition.
Point: This one vaporized into smoke however, and with it went all the great collections; all the one-offs. It seems eerily similar to a digital age where the knowledge gets dropped, corrupted so easily.
honico//
some humble ideas:
Dont forget your past or you will have to repeat it.
No one says the internet is an infallible information storage facility but it is an information storage facility.
all the info is not true? we know, we learned that reading books on libraries and paying attention to all kinds of news and politics.
Who tells the story? the one who won the war? honestly i prefer to read the multiple points of views of diferent people.
We do have the technology to save all that information and a lot more, the thing is some still very powerfull people have no interest in thtat.
we the people are the keepers, of all history google other search engines and internet are just tools.
There is one project that is attempting plug a hole in the “leaky vessel for historical preservation”: the Wayback Machine on http://www.archive.org
http://www.archive.org/web/web.php
http://en.wikipedia.org/wiki/Internet_Ar chive#Wayback_Machine
We are the gatekeepers. Archive, collect, reproduce and distribute.