On Tuesday afternoon, Natalie Weiner, a writer for SB Nation, posted a photo on Twitter of a rejected attempt to create an account on MaxPreps, a website dedicated to high school sports. The reason her account was rejected, according to the site, was because “offensive language discovered in the last name field.”
Soon, Weiner’s mentions were filled with hundreds of comments from people who sympathized with her plight. “I get this a lot, surprisingly,” said Kyle Medick. James Butts “knows these problems” and Matt Cummings has “been there.” Arun Dikshit said algorithmic bias has become almost a daily occurrence. “At one of my jobs, IT had to create a rule on email server to stop my emails from being rejected as porn spam,” said Clark Aycock.
Weiner’s Twitter thread is a who’s who of people with birth names that throw algorithmic obscenity filters through a loop, but the problem is hardly new. These sorts of false positives have been an issue for spam filters pretty much since the beginning of the internet and were so widespread that computer scientists have even christened the issue. They call it the “Scunthrope problem.”
Scunthorpe is an industrial city in the UK that’s about a four-hour drive north from London. It’s home to about 80,000 people and for a brief period in 1996, none of them could register on AOL, one of the largest internet subscription providers at the time. As detailed in the RISKS Digest, a long-running forum popular among system admins, the issue came to AOL’s attention after a Scunthorpe resident named Doug Blackie attempted to register for the service.
When Blackie contacted AOL, he was informed that his registration had failed because of an automatic filtering system that scanned linguistic strings in registration fields to block offensive words. In the case of Scunthorpe, the filter registered the word “cunt” and thought the town name was an obscenity.
According to coverage in RISKS Digest, rather than fixing the problem, AOL “announced that the town will henceforth be known as Sconthorpe” in its systems. As Rob Kling, then a member of the Association of Computing Machinery’s committee on computers and public policy, noted in the RISKS forum, “I can imagine there might even be some people with the last name of Scunthorpe. The willingness of AOL to excise identities in the name of ‘decency’ raises big issues of genuine decency in my view.”
In retrospect, Kling’s critique was remarkably prescient.
As Weiner’s viral tweet demonstrates, the Scunthorpe problem hasn’t gone away in the past two decades, despite remarkable advances in machine learning and algorithmic moderation. Some of the manifestations of the Scunthorpe problem have been quite comical, such as the time members of British Parliament were blocked from viewing the Sexual Offences Bill they had proposed by a government spam filter or when the London Horniman museum’s emails were flagged because systems thought they meant “horny man.” But for people with “offensive” last names, algorithmic censorship is mostly just a pain in the ass.
Michael Veale, a researcher studying responsible machine learning at University College London, told me the reason the Scunthorpe problem is such a tough nut to crack is because creating effective obscenity filters depends on the filter’s ability to understand a word in context. Despite advances in artificial intelligence, this is something that even the most advanced machine-learning algorithms still struggle with today.
“This works both ways around,” Veale told me in an email. “Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children’s settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it.”
Veale cited users who have started using brand names, such as “Googles” or “Skypes,” to refer to other groups they want to target with abuse. “These are the last terms that big platforms want to block, and the technologies really aren’t good enough at reading the context,” Veale said.
Other times, the issue is simply a case of bad programming. Take the case of Jennifer Null, for instance, whose name was frequently rejected in input fields because the program treated her last name as a form of code, rather than text.
This shortcoming in algorithmic moderation has become a major issue of late as platforms like Facebook grapple with the reality of trying to moderate billions of users. Fortunately for those whose names are likely to be flagged by filters, many platforms are increasingly relying on humans to moderate content since they can better understand the context. Although this has its own issues, especially when these humans are used to train AI and thereby introduce their own biases into the system, in many cases it’s the better of two evils.
So until AI gets a better grasp on understanding inputs in contexts, it looks like the Kyle Medicks and James Butts of the world are just going to have to grin and bear it.