The Internet Archive Fixes 9 Million Broken Links on Wikipedia

A heroic, knowledge-saving task involving two of the most important organizations on the internet.
October 2, 2018, 1:39pm
Image: Wikipedia

One of biggest problems on Wikipedia isn’t people maliciously submitting false content—the ecosystem actually moderates what can and can’t be posted with decent efficiency and deals with false information quickly. Instead, it's the third-party citations that Wikipedia relies on. Roughly 20 million links on Wikipedia and Wikimedia sites are added and changed each week. And when a link doesn’t work anymore, the veracity of information in the article can’t be proven.

But the Internet Archive has been actively working to address this problem. According to a blog posted Monday, the archive has spent the last five years backing up nearly 9 million links referenced in Wikipedia and close to 300 Wikimedia sites across 22 different languages. And since 2015, a software bot called IABot has been identifying woking links that can replace the broken ones.

“When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with,” the blog post reads.

Image: Internet Archive

In the future, Internet Archive says that it plans to expand its efforts beyond Wikimedia sites and to other types of websites, e-books, and academic papers. After all, broken links aren’t just a problem on Wikipedia—they’re a problem across the internet at large. In 2013, The Atlantic found that 49 percent of the links cited in Supreme Court decisions are broken, for example.

The Internet Archive’s broken link archiving could prove especially important in terms of Wikipedia’s fact-checking capacity on YouTube. Earlier this year, YouTube CEO Susan Wojcicki announced (without warning Wikipedia) that the site would provide links to Wikipedia articles on videos that have content related to conspiracy theories, a prevalent issue on the site.

However, there are limits to the effectiveness of archiving. In April, NBC anchor Joy Reid claimed that the Internet Archive had been hacked and manipulated to display posts on her former blog which she claims that she didn’t write. Internet Archive denied that any hack occurred. However, as we wrote at the time, the incident highlighted limits to web archiving, because many sites are only backed up to a single archive (usually the Internet Archive’s Wayback Machine.) For instance, certain archived pages can appear incomplete on the Wayback Machine due to issues with Javascript.