It's not very hard to think of subreddits that are home to nasty environments. A recent r/AskReddit post asking "What popular subreddit has a really toxic community?" garnered thousands of responses. But if we were to quantify the nastiness of comments across Reddit, would the subreddits that spring to mind actually be the most toxic?
Not necessarily. Data analysis firm Idibon recently created an algorithm to determine which subreddits were actually the most toxic and some of the results were surprising. The worst culprit was /r/ShitRedditSays, according to Idibon's analysis, with subreddits such as /r/OpieandAnthony (for fans of a now-cancelled talk radio show) and /r/SubredditDrama close behind.
The red circles represent subreddits that were nominated in the r/AskReddit post, while grey circles are from the top 250 subreddits. Interactive courtesy Ben Bell
"I was surprised that /r/ShitRedditSays was at the top. I didn't expect it to be number one," Ben Bell, a data scientist at Idibon who ran the analysis, told me. "There were subreddits like /r/JusticePorn that we initially thought would come out on top."
The whole analysis took a few weeks, Bell said. Before they could analyze which subreddits were more toxic than others, the team had to define what "toxic" actually meant. They decided to define an individual comment as toxic if it was either an ad hominem attack (as in non-constructive criticism) or if it was blatantly bigoted, or both. Bell offered "GASP are they trying CENSOR your FREE SPEECH??? I weep for you /s," as an example.
But just tracking toxic comments can skew the results against subreddits that are naturally more contentious, so they also decided to track supportive comments too, which Bell and his colleagues defined as any comment that included supportive or appreciative language—something like "we're rooting for you!" The balance between toxic and supportive comments, as well as whether those comments were voted up or down, all went into the analysis.
To get the raw data, Bell used the Reddit API to scrape 1,000 comments from each of the top 250 subreddits, as well as the most popular subreddits mentioned in the original /r/AskReddit post.
Bell decided to include the vote counts on each post in his analysis, which shifted the results to give a clearer picture of just which subreddits were toxic.
"There are some subreddits where you have a lot of people writing really nasty things, but the fact is that the community as a whole doesn't necessarily support those comments," Bell said. "We used the scoring as a way to gauge what the community is doing with the toxic comments it's getting."
It took the team at Idibon a few weeks before they had the results. Bell then used an analysis model Idibon had developed for a client that was designed to search for positive and negative language on Twitter and had it narrow down the comments to eliminate all of the neutral comments, leaving a dataset of 100 comments each for 100 subreddits.
The 10,000 remaining comments all included language that the model had flagged as toxic or supportive. Bell took these comments to the crowdsourced task site CrowdFlower to get annotators to read the comments and indicate which were toxic and which were supportive. Each comment was analyzed three times.
Then they took the total number of supportive and toxic comments for each subreddit and compared them to the upvotes and downvotes. Bell shared the algorithm if you're into this kind of thing:
"Including how the community handles the toxic comments it gets was a way to rise above the level of any individual comment," Bell said.
He also used the results to isolate the bigoted comments to compare across subreddits and, unsurprisingly, found subs like /r/TheRedPill and /r/BlackPeopleTwitter to be particularly distasteful. Those subreddits also upvoted their bigoted comments, while other subreddits that attracted some nasty comments like /r/jokes wouldn't tolerate bigoted content.
Interactive courtesy Ben Bell
"A lot of them were really predictable," Bell said. "I was pretty happy that the results made sense because it meant the methodology worked pretty well. The fact that it corroborates a lot of what the Reddit thread already pointed out is a good indicator of that."
But some of the Reddit opinions didn't match the data. The subreddit /r/GetMotivated was flagged as toxic by one user, whose comment had 3,967 points and earned the user Reddit gold. But Bell's analysis found that sub is actually 50 percent supportive and only 6 percent toxic, which shows how much individual experiences can skew our perspectives of how insidious a subreddit community is.
Bell plans to do more Reddit analysis in the future to see whether our assumptions and anecdotes about the site line up with the data on what's actually there.
"One interesting thing I didn't include was thinking about which are the most polarizing and the least polarizing subreddits," Bell said. "I'd also like to break the distinction apart from toxicity versus negative and positive comments."
Considering how much we all like to talk about what a cesspool of human interaction Reddit can be, it's about time somebody actually quantified if it's really as bad as we think.