Opinion

The Great Debate

The Black Hole: How the Web devours history

February 27, 2009

ericauchard1– Eric Auchard is a Reuters columnist. The opinions expressed are his own –

Academics, family researchers and even baseball history nuts have noticed recently how some important archives of older newspapers from around the world have vanished off the Web.

The problems have surfaced since PaperofRecord.com, a collection of more than 20 million newspaper pages of papers ranging from the Toronto Star to Mexican village periodicals to newspapers as far as Perth, Australia, merged into Google News Archive.

The problem, researchers discovered, was that Google has had trouble reformatting the newspaper images and gaining rights to display some of the older publications. It has, at least, temporarily removed some of the archives from public view.

There is an idealized view of the Web that sees it as a storehouse of human knowledge, and in the sense of the breadth of what I can find with a random Google search, this is true.

But for all its openness, the Web has proven to be a leaky vessel for historical preservation, with much of its treasure trove lost in a maze of altered Web pages, broken links and deleted sites.

The head of the British Library recently warned in The Observer newspaper that if this digital memory loss is not fixed, we “are in danger of creating a black hole for future historians and writers.”

Archives of The Sporting News, founded in 1886, and nicknamed the Bible of Baseball, is among the publications that has fallen victim in the transition of PaperofRecord.com to Google ownership. Some older Mexican newspapers are also offline, academics complain.

Preserving history on the Web is a struggle even for Google, whose stated mission is “to organize the world’s information and make it universally accessible and useful.”

“We’re doing our best to find a solution to include as much of the acquired content as possible,” a Google spokesman says of the newspaper archive transition.

But as more and more of our collective memory is hosted online, the danger grows that we lose the content and context of events that happened just days ago, let alone weeks, months or decades back.

Try retracing the links to old scandals or unflattering images on the Web, say to Enron or Parmalat or other fallen corporate names. Most of them are gone, despite the best efforts of sites like Wikipedia or Smoking Gun or the combined energies of the blogosphere to ferret out and preserve such history.

Where is the global sense of outrage that followed the looting of Iraq’s National Museum as U.S. troops stood by in the turmoil following the ouster of Saddam Hussein in 2003? While hard to measure, I think it’s a safe bet that the world suffers the loss of a museum full of artifacts every day by depending upon the Web to host our precious cultural memories.

That’s not to neglect the enormous value of the Web as temporal medium for sharing information. The latest celebrated example of this is how independent analyst Alex Dalmady used financial data from the Stanford Group’s own website to uncover the unlikely financial returns promised by the bank.

His Web detective work is the exception that proves the rule. It was all information hiding in plain site and Dalmady simply had the courage to say the emperor had no clothes.

“One does not have to be a detective, or even a financial expert, to spot financial institutions that may prove insolvent, or worse, with the passage of time,” Dalmady crowed in a report he wrote. “As the saying goes, if it looks like a duck…”

Examples like Dalmady’s are, sadly, the exception.

The World Wide Web as it has evolved over the years has made it almost purpose-built for obscuring or deleting uncomfortable facts. That wasn’t the intention of Web inventor Tim Berners-Lee, whose vision was that every address would point to a discreet page of data. Instead, Web designers have found it convenient to create dynamic Web addresses that may make it impossible to find information the next time you return to a site.

Even Dalmady’s work in January is already hard to reproduce. The Stanford International Bank Ltd site informs visitors the company has been put into receivership and provides no links to its past business.

The recent privacy backlash by Facebook users began when the management of the world’s most popular social networking site attempted to address the issue of who owns the history of conversations that occur between Facebook friends if one of the parties leaves the site.

Changes made last month to Facebook user guidelines implied that the company owned the rights to users’ personal data, including message and photos, even after they shutdown their accounts. The company has since back-peddled and assured its 175 million members that, indeed, users control the data they create on the site.

Susan Feldman, an expert on Web search with research firm IDC in Framingham, Massachusetts, says the problem of the disappearing Web is very real and also partly a mirage. The limitations of current search technology that depend on users choosing the right keywords to find what they are looking for is part of the problem.

Help is on the way from improved search tools such as text analytics and concept clustering technology that will help users find more of the information they may think is lost on the Web.

But until the Web’s important information archives are secured in modern libraries and improved search tools are widely available, the sense that we are losing our collective digital heritage will only grow.

Enjoy the Web’s many benefits, while they are still on your screen. Keep copies of anything you want to remember, or risk losing it, perhaps as early as the next time you refresh your browser. We live in a time where the capacity to record and capture our lives has never been greater.

But using the Web to preserve those memories makes it more and more likely that future generations will consider the early years of the Web to be lost decades.

– At the time of publication Eric Auchard did not own any direct investments in securities mentioned in this article. He may be an owner indirectly as an investor in a fund. For previous columns, Reuters’ customers can click here.  –

Comments
25 comments so far | RSS Comments RSS

While I agree, I would offer another consideration for history: what occurs when electricity isn’t available? Paper decomposes less quickly.

 

Some years ago I wrote about this loss of history from a similar perspective. Magazines, newspapers and other media that provide immediate coverage of the news used to send out a photographer and reporter to record the event, then compose a story. The film was processed, then the editors would align the story and image(s) to compliment each other and go to press. Key, is that the images were most often stored in a safe place and over time became a deeper reference should an editor, writer, or historian wish to dig further than the published story.

That’s no longer the case in most instances. Digital files are submitted, a few are chosen and the remaining images are cleared from memory.. quite literally cleared from our collective ability to go back and review the out takes, the in between shots, the images that tell a different story than the spin of the article. In the end its like having only one witness to an event. History is thinned, trimmed to a narrow view.. something the preservation of original film didn’t allow.

James Cordes

Posted by James Cordes | Report as abusive
 

Auchard’s arguments are certainly relevant, but I fear he misses the bigger picture of the issues associated with data preservation in a digital environment. I think the truly mind-boggling aspect is that the problem has two sides which create a seeming paradox. The internet houses and retains massive amounts of information we may not want retained (think social-networking). Yet at the same time, decentralized management and decidedly un-robust systems have created serious challenges for data preservation. The root here is the sheer inordinate complexity of the internet. The decentralized, democratic basis, and the fighting a tidal wave with a paddle that represents search and organization team up to make the internet an extremely difficult organism to manage effectively. Auchard merely uncoveres a tiny corner of this massive problem.

Posted by Brad Smith | Report as abusive
 

Why is there no mention in this article of any of a number of non-profit digital libraries, that are seeking to directly address these concerns, such as Archive.org?

Posted by Poga | Report as abusive
 

There never has been an informational storage facility like the internet in terms of reliability and breadth of knowledge. Even the grandest libraries in the world cannot compare to the wealth of information (and misinformation) stored online.

That being said, to assume the internet is supposed to be an infallible information storage facility is a mistake. The reason most information is removed from history is because it was thrown out: either purposefully, accidentally, or with a sense that it didn’t matter one way or the other.

To say we should collectively keep everything available to us as a society in the hopes that it helps future generations is unrealistic. It’s unrealistic from a tangible real-world sense, and it’s unrealistic from a digital sense as well. Information-driven data still requires real-world storage space, and that storage space still belongs to individuals who have to make real decisions about whether they must throw something out to save themselves (eg tearing up a picture of an old flame, shredding important legal documentation), or simply to save space (eg spam, old documents that really don’t matter anymore).

Some items that we don’t want removed will often get removed anyway. In the case mentioned above regarding Google they made a call as to what was important and what wasn’t. If somebody really wants old baseball data, go to baseball-reference.com. If that doesn’t solve it, phone calls and legwork probably will. In this particular case, the documents more than likely still exist, even though the internet might not be holding them.

No matter how big the internet becomes, it will still be unrealistic to depend upon it as the keeper of all history of all corners of the globe. It’s simply not viable.

 

Preserving everything that’s published on the Internet sounds expensive. It looks to me like a task that the government should undertake.

Posted by Julian | Report as abusive
 

Sorry, I didn’t get it…when I read the title of your piece, I thought it was going to be about an Orwellian/1984 Homer/Odyssey positivist twist on history.You’re very insightful and intelligent, but I’m afraid you need to form a seamless interface between the totalitarian and free spirit in us all.This way you wold understand more in-depth analysis of the information dynamic presented accurately by the NY Times in it’s article:”Open Source Spying”!

 

Mr. Auchard,
You highlight a very serious problem in not only our society, but our world as well, the loss of historical fact. Even worse, the legends and myths that creep into historical fact. Worse than that? The intentional misrepresentation of fact. All three of these phenomena may actually be in your article; hence, you are contributing to the problem as an author/journalist. Where is the responsibility in journalism nowadays to protect the truth? Much like a doctor protects life, I assume most responsible journalist want to seek the truth and report it tour their readers. Let the editorial comments or opinions remain where they belong…in the editorial section. Let readers read facts then develop their own opinion.
Your reference to the looting of the Baghdad museum is a classic example Aeschylus, the Greek dramatist (525 BC – 456 BC), mentions when he said: “In war, truth is the first casualty.” Did looting occur in Baghdad, absolutely. Did US Soldiers just “stand by” and do nothing to prevent the looting of the Iraqi National Museum? Absolutely not. Many of the artifacts “looted” were removed by the museum staff prior to our arrival, some of whom sold the items for profit. Where is the “global outrage” over that fact? Where is the global outrage over the Taliban destruction of the Buddhist statues carved into the mountains of Afghanistan? As the Task Force Executive Officer (second in command) of the unit that led the assault into Baghdad on 7 April, 2003, I can assure you we did everything possible to safeguard as many known cultural sites as possible. Fact. The Iraqi equivalent of the Tomb of the Unknown Soldier was unmolested. Fact. Along with many others that did not make the “news.”
I suggest, Mr. Auchard, that in the future, you stick to the facts or you will intentionally, or unintentionally, contribute to the very problem you are reporting/writing about. Fact.
LTC Ricky J Nussio

Posted by LTC(P) Ricky J Nussio | Report as abusive
 

There will always be writers that keep journals. There will always be journalists that keep notebooks. There will always be professors that keep both.

 

The author uses the popular phrase “exception that proves the rule” above. To quote Ambrose Bierce: “The exception proves the rule” is an expression constantly upon the lips of the ignorant, who parrot it from one another with never a thought of its absurdity. In the Latin, “Exceptio probat regulam” means that the exception tests the rule, puts it to the proof, not confirms it. The malefactor who drew the meaning from this excellent dictum and substituted a contrary one of his own exerted an evil power which appears to be immortal. Just had to throw that in, but I think the author makes a good point. However, before the web there was history and historians and I doubt they all quit when they heard everything was on the web already. Does the world need a record of everything that ever happened? Would the granduer of Imperial Rome seems as grand if we had a record of everything little detail of Roman life? Or would it seem petty and farcical and full of incompetence like our current world does. We don’t want future generations to have all the details. The big picture will be embarrassing enough.

Posted by Todd Dalton | Report as abusive
 

“That wasn’t the intention of Web inventor Tim Berners-Lee, whose vision was that every address would point to a discreet page of data.”

I assume you mean a “discrete page of data”. Discretion may be the better part of valor but it isn’t separate or distinct.

Posted by Steve | Report as abusive
 

Everyone seems to be coming down on the author as if he was making a big deal out of nothing. I think he was trying to convey that this problem, if left to fester, will become a major issue down the line. We don’t need to know every detail of the Roman Empire, but the author is merely pointing out that we will need to know that there was a Roman Empire. If our historical records continue to be eroded then one day perhaps we are left with what many of the posters might consider “a big problem”. I believe Mr. Auchard was trying to direct attention to the ever slippery slope that may be lurking.

Posted by Dave | Report as abusive
 

Is this where we are headed?

There was a prescient science fiction story written some 30 years ago, I believe.

ALL of mankind’s knowledge was stored in a single impregnable, absolutely safe location, accessible to everyone by computer. It was a supreme human achievement.

Data were stored on sub-atomic particles (as is now being researched).

One day, the master index file was misplaced.

Oops!

Posted by ExLibris | Report as abusive
 

I’m surprised at all the petty attacks on the author. This is a very important issue and the article is a good starting point for further research, debate and action. There are archival services like archive.org which need to be improved upon. In the future, digital archaeologists will engage in a kind of digital forensics to connect little pieces of data together to create a picture and I think they will have more to work on than archaeologists working with fragments of clay and papyrus.

Half a century before Tim Berners-lee’s invention of hypertext, Vannevar Bush envisioned the Internet and wrote in the Atlantic Monthy, July 1945:

“Presumably man’s spirit should be elevated if he can better review his shady past and analyze more completely and objectively his present problems. He has built a civilization so complex that he needs to mechanize his records more fully if he is to push his experiment to its logical conclusion.”

 

“While hard to measure, I think it’s a safe bet that the world suffers the loss of a museum full of artifacts every day by depending upon the Web to host our precious cultural memories.”

This article makes some good points, but also exaggerates and confuses some of the issues. For example, in response to the quote above, one can’t very well blame web technology for the looting of the Baghdad museums. No one in the digital community is arguing that museums are henceforth disposible.

That information leaks away is undeniable. But Auchard seems to be looking for reasons to distrust the web. The useful realization that such data loss is taking place does not need to be augmented by dubious side issues.

 

I don’t see what’s new. Receivers have always and still do send out paper copies of the affairs of the businesses they wind up to interested parties, but I have always shredded these after keeping them became pointless, and I imagine most other shareholders and creditors do too.

The author’s point about Enron, Parmalat and Stanford appears to be that the Web does things a bit better than paper archives, but not by very much. Well, nothing’s perfect, and only journalists ever pretended that the Web was going to be…..

Posted by Ian Kemmish | Report as abusive
 

The author has it backwards, there is now more information available than ever before. In fact there is too much information and the loss of some of it is both inevitable and not all that damaging to civilization.

Posted by Shii | Report as abusive
 

and no metion of http://archive.org, which exists to solve this problem?

Posted by Sam | Report as abusive
 

Most true, Susan Feldman. Often a combination of keywords is needed to find the information. Though this is no guarantee, either, since one might discover the information doesn’t answer all the question sought after. The biggest problem, as a researcher, is this: facts varying from site to site. Even government agencies provide differing information on the same topic especially when it comes to number, statistics, dates. This forces one to scrounge around in order to find enough information that corresponds as closely as possible with the with the others. Therefore, information is tabulated by means and averages. Never accept the face value of the written word until corroborated by more than three-my criterion-other sites. “History is not what has happened,” Julian Barnes warns in A HISTORY OF THE WORLD IN 10 1/2 CHAPTERS, “it’s what historians tell us.”

Posted by boredwell | Report as abusive
 

Stuff gets misplaced.
Stuff gets conveniently altered
Stuff gets misplaced
Inconvenient stuff gets dumped
Irrelevant stuff makes the relevant stuff hard to spot.
Over time, hard to distinguish truth from fiction or lies.

Sounds like the human mind.

 

We are the gatekeepers. Archive, collect, reproduce and distribute.

Posted by Andrea | Report as abusive
 

There is one project that is attempting plug a hole in the “leaky vessel for historical preservation”: the Wayback Machine on http://www.archive.org

http://www.archive.org/web/web.php

http://en.wikipedia.org/wiki/Internet_Ar chive#Wayback_Machine

Posted by Walter Roth | Report as abusive
 

some humble ideas:
Dont forget your past or you will have to repeat it.

No one says the internet is an infallible information storage facility but it is an information storage facility.
all the info is not true? we know, we learned that reading books on libraries and paying attention to all kinds of news and politics.

Who tells the story? the one who won the war? honestly i prefer to read the multiple points of views of diferent people.

We do have the technology to save all that information and a lot more, the thing is some still very powerfull people have no interest in thtat.

we the people are the keepers, of all history google other search engines and internet are just tools.

Posted by amattei2000 | Report as abusive
 

One word:

Alexandria.

The greatest of all libraries stood more than 700 years then, burned. While much of was the collective history, knowledge -it was that which was…appropriated…from other collections, increasing its uniqueness. It was called it ‘laying claim’ once a place was over ran; to collect for this Great Library. Much like Baghdad, it’s tradition.

Point: This one vaporized into smoke however, and with it went all the great collections; all the one-offs. It seems eerily similar to a digital age where the knowledge gets dropped, corrupted so easily.

honico//

Posted by honico | Report as abusive
 

eric is justified in his concerns that
mega-corporations, such as ‘enron’ are able to irretrievably alter history by deleting their brushstrokes.

what, I find, even more alarming is the richness, colour and tapestry of ’80s to 2009 and beyond has been lost to
civilisation with the rise and rise of digital photography. the percentage of work that reaches the finished, hard copy (onto photographic paper) is at an historical low. this equates to more photos being taken, then at any time in the history of our civilisation, (for want of a better word[uncivilisation, perhaps]) but less photos being produced to the final medium.

all these spurious friends’ shots stored on camera memories, uploaded onto equally random pc memory’s results in a massive black hole in our collective consciousness in this particularly vulgar and turbulent epoch.

well crafted footage and documentation of current species, as they decline into extinction, may not be available and/or accessible, and man will be left with shakey artists’ sketches to reminisce and ponder.

when the last animal of the wild has alighted the
spirit of Man will
pale to a
blue moon…
will they leave a toa for our solace.

toa= a marker left by departing
indigenous Australian
tribes to indicate the
direction in whence they
had departed.eg.feather

Posted by sweeny | Report as abusive
 

Post Your Comment

We welcome comments that advance the story through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can flag it to our editors by using the report abuse links. Views expressed in the comments do not represent those of Reuters. For more information on our comment policy, see http://blogs.reuters.com/fulldisclosure/2010/09/27/toward-a-more-thoughtful-conversation-on-stories/
  •