The NYT has a clear policy when it comes to primary sources — if you’re writing about a certain document, then you should link to it, if it’s online. Increasingly, the NYT’s journalists are actually doing that.

On the other hand, when it’s the journalists themselves who come up with the original documents, they tend to be very bad at putting them online. And it turns out that when you ask NYT types why they’re not putting those documents online, you often get a very interesting answer: doing so would be a copyright violation. Underneath that answer, however, is a regrettable and old-fashioned attitude towards primary documents: the NYT doesn’t particularly feel any need to post them in the first place.

I had a very interesting conversation yesterday with Richard Samson, the NYT’s top copyright lawyer; you might remember him from his nastygram to the WSJ earlier this month, or his nastygram to Apple with respect to the Pulse RSS reader, which resulted in the app getting temporarily pulled from the iTunes app store. (He says he’s now “in conversations with the developers”, but that “the fact that they’re charging for it certainly is a concern”.) If you’re a restaurateur, say, who reproduces a NYT review on your website, he’s the person you’re likely to hear from.

Samson explained to me that when it comes to posting primary documents, copyright simply isn’t an issue when you’re dealing with anything coming out of the federal government — there’s no copyright there. And it also isn’t an issue when you’re linking to or embedding documents hosted elsewhere: the NYT can happily embed a Scribd document, for instance, if it was uploaded by someone else, no matter who holds the copyright.

But if the NYT obtains a document itself where someone else owns the copyright — the Akin Gump memo on gender disparities at Wal-Mart, for instance — then it might well be legally constrained from posting it online. “We want our readers to respect intellectual property,” says Samson. “Intellectual property is arguably the biggest asset of this company. We value others’ IP rights, and we want their IP rights to be respected.”

What’s more, posting a copyrighted document would unnecessarily expose the NYT to legal jeopardy: “we don’t want to give them an obvious sword”, says Samson, adding that the NYT can be sued for damages even if it takes the document down as soon as it’s first asked to.

At the same time, this seems to be something of a non-issue in the newsroom. Samson talked to some editors about this before our conversation, and they told him that the NYT’s readers were not complaining about the lack of links to primary sources. “The journalist’s job is to read the documents and write the story and tell the reader what’s important,” says Samson. “If the journalist is doing his job there’s no need to provide the source documents. We’re trying to tell the news and if we feel that we’ve done that, then there’s no reason to do more than that.”

I couldn’t agree less. The real news value in something like the Wal-Mart memo is the memo itself, and publishing the memo alongside the story would increase the trust that the NYT’s readers have in it. Here’s how Julian Paul Assange, the founder of Wikileaks, explained it to the New Yorker:

Assange told me, “I want to set up a new standard: ‘scientific journalism.’ If you publish a paper on DNA, you are required, by all the good biological journals, to submit the data that has informed your research—the idea being that people will replicate it, check it, verify it. So this is something that needs to be done for journalism as well. There is an immediate power imbalance, in that readers are unable to verify what they are being told, and that leads to abuse.”

Against this very strong argument — that enforced information asymmetry clearly harms readers — Samson has some very weak arguments for doing nothing. “It’s probably kind of a pain to do it sometimes, because the documents are so big: there’s time considerations, and labor,” he says. “We would feel the need to review an entire document before putting it up on the website.”

This confused me: once the copyright considerations were dealt with, one way or another, what else would the document be reviewed for? Samson was a little vague on that front, but pointed to issues like privacy and obscenity as things that someone would need to look out for before the NYT published anything. That’s the nature of being a newspaper, he said: the publisher needs to vet everything that’s published. He continued by saying that “the role of a newspaper is to analyze and comment. It would be inappropriate to just put it out and say you guys figure it out.”

Of course, I never suggested that the NYT simply post documents without any gloss or commentary at all, but I did ask about sites like The Smoking Gun, which specialize in posting primary documents. Are they not legitimate news organizations? “The Smoking Gun is a different business than being a newspaper,” said Samson.

I also asked Samson about the Guardian’s attempt to crowdsource the mystery of Tony Blair’s finances; he didn’t know about that specific case, but it was obvious that the NYT was a very long way indeed from doing something like that. If the NYT gets documents and doesn’t fully understand what they mean, it won’t ask its readers for help in working them out. A sad wasted opportunity — and hubristic, to boot, since it’s predicated on the idea that if the NYT’s readers can work something out, then then NYT’s journalists should also be able to do that, on their own.

The one big thing I learned from talking to Samson is that when NYT journalists talk about copyright constraints preventing them from putting documents online, they’re not particularly upset about that. In fact, they might secretly be quite happy that there’s no question of posting the document they spent so much effort obtaining. Journalists are human, after all, and can be quite jealous and competitive: they don’t want to simply give the story, on a plate, to their competitors, and will happily sit on documents rather than publishing them if they’re given half a chance to do so. Samson said he couldn’t think of a single instance where a journalist was begging him to be able to publish something and he said no, for copyright reasons.

After all, it’s easy for the NYT to post copyrighted documents if it’s so inclined — it just needs to send them to any one of dozens of organizations who will happily put them online, and then link or embed the document into the story. Or the journalist can just ask their source to go ahead and post the document online, in some anonymous place where it can be linked to or embedded. But that never seems to happen. And even when there’s no copyright at all, as in the case of the Hank Paulson ethics waiver, the NYT went on the record as saying that the reporters “would probably be uncomfortable simply handing over documents” even to one of their colleagues, let alone to the world in general. After all, said Tim O’Brien, an editor there, “they had spent a lot of time and energy to find, analyze, and report on” that document.

And the NYT’s dual standard when it comes to embedding documents is just plain weird. It’s happy to embed Scribd documents posted by someone else without poring over every page, but it refuses to post any document itself without poring over every page. Yet to the reader, the effect is identical: the distinction is entirely legalistic.

In any case, the lesson here is that the next time you see a NYT story which doesn’t provide the source documents it’s working from, don’t be charitable in your assumptions. I know that I, and most of the people I talk to, have historically simply assumed that the NYT would love to post the source documents, but was unable to because their source asked them not to, or because doing so might reveal who their source was. But in reality, NYT journalists don’t particularly want to post such documents in the first place, even if their source would be perfectly happy for them to do so. And when asked why they don’t, they have an easy excuse in copyright law.

The NYT is a brave paper, when it wants to be. If it finds a document and thinks that publishing it would be in the national interest, I’m sure that it could defend its action on public-interest grounds even if copyright in that document was held by someone else. But it doesn’t seem to have the slightest interest in publishing primary documents: it’s perfectly happy to just write about those documents, and say “trust us, we’ve read them, and we’re telling you everything that’s important about them”.

Of course there are lots of people who don’t trust the NYT, and many of the rest of us would love to take the trust-but-verify route. I look forward to the time when the NYT will make it easy for us to do that. But I’m not holding my breath.

Update: Thanks to jennkepka, in the comments, who remembered that the NYT did try a crowdsourcing experiment on its Economix blog last year, with respect to Tim Geithner’s schedules. I’m quite sure that if pressure to post primary documents comes from anywhere within the NYT, it’ll come from the blogs.

Update 2: A great example of a NYT blog putting lots of source documents up online is Dealbook, whose Scribd account has almost 100,000 subscribers and 648 separate uploads, which between them have been viewed almost 2 million times. NYT editors, take note! People do care about these things!

This sounds like you’re asking the NYT to allow you to do original reporting on documents they acquire.

Posted by Zdneal

I think the NYT has done some of the crowd-sourcing that you’re talking about: they posted all of Tim Geithner’s daily schedule from his Fed days online, after all, and asked readers to note their observations and/or send them in.(http://documents.nytimes.com/geithne r-schedule-new-york-fed)

Posted by jennkepka

Zdneal — Yes, that’s exactly right. Obviously, they have a headstart, and they get the documents first. But right now it’s pretty clear that the NYT will never publish anything more about say that Wal-Mart memo. If bloggers or anybody else can find new and interesting stuff in it, they should be able to. And every time they do, they’ll credit the NYT for finding it. Everybody wins.

Posted by FelixSalmon | Report as abusive

You should encourage your Reuters co-workers to post source documents online too – outside of the blogging realm I haven’t seen many of these documents posted (although to their credit, reporters have been pretty good about emailing the docs when I’ve requested them to).

But please don’t encourage people to use that scourge of my existence Scribd!

Posted by br_add | Report as abusive

It’s nice of Felix to keep making helpful suggestions to the NYT while others they’ve presumably been paying good money to come up with a coherent online news identity so obviously haven’t.

There’s probably more to it than this, but it seems NYT hasn’t made up its mind whether its online presence ought to become (because it still is not) a global must-read experience, or atrophy as a stodgy parochial news massage parlor which potential readers may as well live without. At the rate NYT’s been going, by the time they’ve found their stride it may be too late.

Posted by HBC | Report as abusive

If the NYT had made Judith Miller post sources documents, the Iraq war might not have happened.

Posted by vinlander | Report as abusive

