Where's the internet's Library of Congress? Photo by Carol M. Highsmith/LoC
Link rot, in which old links in digital documents gradually disappear or go offline, is a real threat to the information clearinghouse we call the internet. We all rely heavily on linking to sources, rather than spelling them out. When those source pages go down or change their URL or whatever, that sourcing is gone forever, with the next reader left trying to figure out what the writer was referring to—never a good situation.
And while dead links in blog posts are frustrating tears in the fabric of the web, dead links in Supreme Court documents are even more worrisome. According to a pair of recent studies, such scholarly link rot is a huge problem, with one finding that nearly half of internet links in US Supreme Court documents don't lead to the originally referenced material.
That statistic is according to a draft of a Harvard Law School paper published on the open-access Social Science Research Network. Authored by Jonathan Zittrain, a professor of law and professor of computer science at Harvard, and Kendra Albert, a JD candidate at Harvard Law, looked at citations contained in digital editions of three journals—the Harvard Law Review, the Harvard Journal of Law and Technology, and the Harvard Human Rights Journal—as well as the links in every published Supreme Court opinion.
The link rot problem was pervasive, and surprisingly deep. As mentioned earlier, 50 percent of linked URLs in the Supreme Court opinions were dead. For the Harvard journals, more than 70 percent of links were dead in each. While those are disconcerting statistics, perhaps more worrisome is the fact that the problem has already been well documented.
Most recently, Yale research published in July found that 29 percent of websites cited by Supreme Court decisions are no longer working. Worse, the authors found the phenomenon to be fairly random. "Our research in Supreme Court cases also found that the rate of disappearance is not affected by the type of online document (pdf, html, etc) or the sources of links (government or non-government) in terms of what links are now dead," they wrote. "We cannot predict what links will rot, even within Supreme Court cases."
The problem essentially boils down to the entropy of the Internet: websites go down, hosting services aren't renewed, URLs get changed during site updates without a proper redirect, and so on. The list is endless, and as time goes on, is only getting worse. How much of the Web 1.0 era is even online anymore, do you think? It's not much, and that's just after 15 years or so—the blink of the eye, when compared to the annals of academic history.
For the Supreme Court, it's likely compounded by the government's incredible digital mishmash, where standard operating procedures and best practices for digitizing documents are far from uniform, when they exist at all. Try reading a court case in which half is a searchable PDF and half is a sideways scan of a printed report, and you'll see how haphazardly document digitization can be.
Then go back six months later and try to find the same report. You'd be lucky to find it. Zittrain and Albert cite the work of Mary Rumsey, who found way back in 2002 that 61 percent of links in year-old papers worked, while just 30 percent of links in five-year-old papers worked. That last figure is similar to what Zittrain and Albert found, which suggests that while the internet has gotten faster and more reliable in the last decade, archiving has not improved.
Largely that's because there aren't central repositories for the web. Sure, the Wayback Machine and Google Cache do admirable jobs of storing vast amounts of content, but from an academic perspective, the internet still doesn't have the full equivalent of a library network, in which documents and books are archived via a standard system.
Zittrain and Albert suggest using the Perma system developed by the Harvard Library Innovation Lab, which can essentially copies a webpage into its own database, with standardized citation, at an author's request. That's an interesting concept, but even if every bit of scholarly data were copied into the Perma system (which is kind of what Aaron Swartz was doing, no?), it'd still only be preserved for as long as Perma is around. As we've seen, nothing on the web is permanent.
I suppose everything is impermanent. Libraries can burn down and dogs can eat your research papers, but the internet seems to be more fleeting. A dead link doesn't mean the content being accessed doesn't exist anymore, but it does make it more difficult to find, especially when many times we don't cite sources explicitly specifically because we can link.
That link rot is such a pervasive problem in the Supreme Court just stands to highlight how much of a problem the trend poses. As the Yale authors write, "the phenomenon of link rot in law is troublesome because citations are the cornerstone upon which both judicial opinions and law review articles stand. … The ability to confirm citations and to ensure that they are accurate is essential to ensure that precedents are indeed cited correctly." And with more and more publishers and journals going digital-only link rot will only add more confusion to the scholars of the future, or at least until we have a standardized method of combating it.