The difference between Google and Aaron Swartz
By Kevin Webb The opinions expressed are his own.
Reading about Aaron Swartz’s recent run-in with the law dredged up all kinds of feelings. I’m a long-time admirer of his work and was saddened to hear of his troubles. At the same time, reading the indictment, I was surprised by the seriousness of the charges and evidence against him.
I was also reminded of my own attempts at similar work, collecting and analyzing journal articles, patents, and various forms of metadata. I’ve lost count of how many hours I’ve spent sitting in basements of academic buildings, breaking federal laws in the pursuit of answers. And I was reminded of my colleagues who still spend their days painstakingly scraping data off the web–sometimes legally sometimes not–the name of academic inquiry.
None of us want to break the law. It’s simply that we don’t have a choice.
The mechanisms for sharing academic discourse are broken. They barely even function as systems for connecting interested parties within existing disciplines. Ask just about anyone who spends their time writing or consuming scholarly work and you will hear a litany of complaints about how poorly suited the academic publishing industry is to modern day collaboration.
I’ve spent most of my professional career just outside of the academy but have seen the failures of these systems first hand. I formed my opinion on the matter as a undergraduate assistant in a major neuroscience laboratory–building publishing tools to help the lab’s director break copyright law.
His work regularly appeared in and on the cover of major journals. Yet he was in a field that was moving faster than the journals could help facilitate. He took matters into his own hands by publishing the articles on the laboratory’s site, almost always violating the licensing terms of his own work (rights now held by Elsevier or AAAS, not the author). I asked about the legality what we were doing and was told not to worry. If the journals didn’t like him bending or breaking the law he’d publish elsewhere and it would be their loss.



I am not entirely convinced that this is correct analysis. First, there is some speculation that this was intended as a test case–tell me who your friends are and I’ll tell you whom you’re trying to piss off). But what I am concerned about is that the issue is not so much the collection of data as the methods deployed in order to achieve it. The copyright issue itself would be a matter for JSTOR to resolve–if JSTOR said that Swartz complied with their policies (even if the agreement was reached after the fact, as it has), the feds would not be able to pursue the case on that basis alone. The hitching post for that horse is that the method of obtaining the information, in itself, was illegal–effectively tantamount to a break-in. Note, specifically, that Swartz is not being charged with copyright violations. He’s not being charged with conspiring to distribute the information. He’s being charged as a hacker, not as pirate.
What you’re alleging concerning your former employer amounts to standard operating practice in academia. In fact, many academics were significantly annoyed by the Kinko decision because it essentially allowed journal publishers to control academic and educational distribution of their work. They could no longer include even their own published papers in readers for their classes without paying royalties (usually highly overpriced) to the publishers. So, instead of using the published papers, many authors simply used earlier internal drafts for distribution. They use these in readers and they share them with colleagues who use them in their readers. They place their own papers on their own websites and, as far as I know, no author, no matter what his academic status, has been challenged by a journal publisher with respect to such placement (although many authors who consider it to be “an abundance of caution” simply place links to the pay-walled copies in their profiles).
It would a very interesting legal challenge if a publisher tried to go after an author of an article in one of its publications. Even if the suit had solid legal standing (which is by no means obvious), the publisher would risk a major boycott of its journals by the academic community. Imagine what happens if even one such academic publication loses all its submissions. No publisher is going to risk that. And this is exactly what you found in the response to you question.
But I am still a bit puzzled about this entire passage: “Yet he was in a field that was moving faster than the journals could help facilitate. He took matters into his own hands by publishing the articles on the laboratory’s site, almost always violating the licensing terms of his own work (rights now held by Elsevier or AAAS, not the author).” If the work had already been published in the journal, it was already available to academia and there would be no need to accelerate access. If it was in submission stages, then the manuscripts would fall into that gray area of copyright where licensing terms are of questionable validity–does the copyright belong to the journal or to the author. Certainly the RESULTS are open–in fact, the author is free to re-package the content of the paper and distribute it on his own–it’s not like the publisher can charge him with plagiarism of his own work. And if the publisher decided to sever ties prior to publication, as the author said, it was THEIR loss–the article or one resembling it in content would go to a different publication. And, as far as I know, no journal or its parent company holds rights to FUTURE publications, so if something was yet to be submitted, certainly the author held the copyright and had every right to publish it as he saw fit. In any case, claiming that self-distribution was “illegal” is a bit of a stretch, at best. And I did not even get into another hairy legal area–fair use.