This story is over 5 years old.

Australia Today

Instagram Is Using AI to Filter Out Toxic Comments

The bots automatically delete anything that's "intended to harass or upset people.”
Image via Shutterstock

Instagram has updated its comment filter to now automatically remove comments “intended to harass or upset people.” According to the app’s co-founder and CEO, Kevin Systrom, the new bullying filter will specifically hide negative comments about a person’s appearance or character, or any threats to a user’s mental and physical health or wellbeing.

Systrom also noted the need to protect “young public figures” on Instagram to help “them feel comfortable to express who they are and what they care about.” It’s a timely change, given Instagram is considered the most harmful social network for the average 14-24 year olds’ mental health.


Instagram’s Terms of Use, in place since 2013, instructs users to “not defame, stalk, bully, abuse, threaten, impersonate or intimidate people.” But now the service is going to use artificial intelligence to enforce these conditions, using filters “powered by machine learning.”

According to Instagram, human staff have been “training” the AI systems over time to recognise patterns and trends. The filters are designed to evolve over time as new problematic behaviours and practices emerge. Problem reports will be automatically sent to Instagram about “repeated” trends.

This AI filter is an extension of Instagram’s offensive comment filter, introduced in June last year across posts and live videos. However, the company does not explicitly say whether these filters are also applied in direct messages—arguably the most direct way to contact someone on the app.

The new filter, like the original one, is automatically turned on in the user’s profile settings, but the user can opt-out of hiding these comments. Similarly, there is a manual filter, where specific words of phrases can be hidden from interactions. At the same time, Instagram brought in spam filters to remove “obvious” comments in nine different languages.

But how reliable are algorithms for recognising and filtering out negative content?

YouTube faced a backlash after it introduced a “restricted content” mode, which hid several prominent LGBTQIA+ vloggers in the search results. Similar to what Instagram has just introduced, YouTube’s restricted content mode can be toggled on and off, and automatically filters "inappropriate content." However, unlike Insta, YouTube has in the past acknowledged that their mechanisms are “not 100 percent accurate.”

One particular YouTuber, Rowan Ellis, had 40 videos removed when keywords in her video titles were misinterpreted by the filter. The affected videos, which focused on important discussions around sexuality and coming out, were deemed as “mature content” and flagged.

James Grimmelmann, a law professor at Cornell Tech, believes filters are far from understanding all “social context and nuance” from which offensive comments arise from, or are hidden behind. Speaking to the Washington Post, he says that “even humans have a hard time distinguishing between hate speech and a parody of hate speech, and AI is way short of human capabilities.”

Follow Millie on Twitter