We’ve previously discussed how the European Union is considering sweeping new copyright rules that could prove devastating for free speech and the open internet. The EU’s Copyright Directive has a laundry list of problems, from “link taxes” that saddle smaller media outlets with fees for quoting other websites, to the EU-wide implementation of unreliable automatic copyright filters like Google’s Content ID, which—as episodes like the Prince dancing baby fracas often make clear—often result in legitimate content being yanked offline. With a new vote looming for the proposal in the EU Parliament, one German music professor has perfectly illustrated how automated copyright filters repeatedly fail. German music professor Ulrich Kaiser this week wrote about a troubling experiment he ran on YouTube. As a music theory teacher, Kaiser routinely works to catalog a collection of public domain recordings he maintains online in order to teach his students about Beethoven and other classical music composers. The first video Kaiser posted online simply explained his efforts to provide digitized copies of public domain recordings to students, with some of the music in question playing in the background. But within three minutes of being posted online, YouTube’s Content ID system had flagged the music for a copyright violation—despite no copyright actually being violated.
Kaiser then decided to test Google’s system more fully. He opened a new YouTube account named Labeltest, and began sharing additional examples of copyright-free music. “I quickly received Content ID notifications for copyright-free music by Bartok, Schubert, Puccini, and Wagner,” Kaiser said. “Again and again, YouTube told me that I was violating the copyright of these long-dead composers, despite all of my uploads existing in the public domain.” Google’s Content ID is the result of more than $100 million in investment funds and countless development hours. Yet Kaiser found the system was largely incapable of differentiating between copyrighted music and content in the public domain. And the appeals process that Google has erected to tackle these false claims wasn’t any better. Kaiser appealed each takedown request, noting that the composers of the works had been dead for more than seventy years, the recordings were published before 1963 (which under German law means they were now in the public domain), and that the takedown requests failed to provide any solid legal justification for their removal. But Kaiser only received more takedown requests, and found Google’s support systems unhelpful. Only after some lengthy, cumbersome exchanges was Kaiser able to have many of his videos restored, but not under the free license he had hoped would allow them to be easily shared. “Even in cases where my defense to the Content ID claims were successful, the videos were not reverted to this free license, making it much more difficult for others to use and share these digitized works in the way I originally had intended,” Kaiser said. YouTube’s Content ID is the most expensive automated filter system of its kind, yet these kinds of stories are not just common, but comical. Like the time another professor uploaded a ten hour video of white noise, only to have it flagged five times for copyright infringement. “Content ID is made available to the most sophisticated rights holders who maintain copyright-focused teams in an effort to prevent misuse, but even they make mistakes when uploading reference files and initiating claims,” a Google spokesperson told Motherboard when asked for comment.
Given there’s over 75 million audio and visual reference files in the Content ID system, assessing the copyright status of each work can be a logistical challenge. Given there’s numerous rights holders eager to abuse the system, the scale of such automation is daunting. “Algorithmic matching is always going to be imprecise, and companies are legally incentivized to be over-inclusive in their filtering,” Meredith Rose, a lawyer and copyright expert at the consumer rights group Public Knowledge, told me in an email. She noted that on a mathematical level, orchestral recordings of classical music often don't have a lot of variation among them, so “if you're designing an algorithm to catch content, you want to it to be somewhat over-inclusive, so that it can catch cheap attempts at evasion.” Given that YouTube’s systems struggle with simple classical music and even white noise, it’s not particularly surprising that fair-use and creative remixes often find themselves in the crosshairs of such overly-aggressive automated systems. Especially here in the States, where the Digital Millennium Copyright Act (DMCA) has codified such overreaction into draconian law. Given a powerhouse like Google’s inability to build a system that can automatically police copyright without censoring perfectly-legal content, activists are right to worry about the impact of the looming EU Copyright Directive—since it hopes to take the scattered incompetence we’ve seen thus far on this front, and foist it upon the better part of an entire continent.