How Based Is Your Favorite Musician's Twitter Account?

Thanks to the wonders of Data Journalism, we can find out.

by Chad Major
Feb 9 2015, 8:05pm

Image via Creative Commons

Music is one of the primary forms of artistic creation, a rare conduit of true communication between humans in a world where isolation is the norm. Music is also a huge business; watching a brief part of last night's Grammy's definitely didn't make me feel closer to anyone. Ideally, the latter enables the former, as the modern music industry distributes the right music to the right listeners, creating relationships that would not otherwise have existed. More often, though, the commercial aspect of music robs it of its authenticity.

One technology that had the potential to tip the scales in the other direction was Twitter. In the early days, it served as an unfiltered platform for artists to reach their fans and express themselves. In 2015, that is no longer the case: Twitter and other online manifestations of the musical scene can only honestly be discussed in the language of marketing—as brands.

All brands have to know how important Twitter is as a tool for outreach and self-promotion, but it's still new enough that there's no framework for how best to do it. For major corporate brands, Twitter is like a knife in the hands of a sixteen year-old boy—sure, it might come in handy every now and then, but the most likely outcome is that you forget about it and leave it in your backpack during your class trip to Six Flags and then tweet a picture of a toy airplane in a vagina.

But for up-and-coming brands, especially more personality-driven ones like musical acts, Twitter has a lot more upside. It allows you to engage with other popular Twitter accounts and create drama or camaraderie, and to interact directly with the Twitter accounts of fans and to give them the impression of personal contact with your brand.

All of this engagement leaves a trail for other new brands to emulate or avoid as they see fit. Since this is information that might be of interest to many consumers, journalists report on it. If this were 2013, they would go through the more popular accounts and write words about their impressions of the strategies the brands were employing, and it would be subjective and useless. Thankfully, it's 2015, and things are much better and more objective now, because we have Data Journalism. The idea behind Data Journalism is that people want to learn objective facts about the world, and that if you aggregate a lot of numbers into attractive, minimalist figures, you transcend the subjectivity inherent in the process of traditional reporting. While this pretense of true objectivity is obviously false, traditional journalism is literally incapable of answering the questions I aim to answer ("Out of these 50 musicians, whose Twitter account exhibits the most lexical diversity?"), but Data Journalism can at least take a stab at it.


The first step in analyzing Twitter usage among popular musical brands is to pick which to analyze. In true Data Journalistic fashion, I know almost nothing about music, so I looked at Noisey’s year-end list and compiled a list of Twitter handles of the brands that seemed to be performing well. I added a few ubiquitous and weird accounts run by musicians, and in the course of my analysis, dropped any brands that didn't have enough tweets or were too boring.

All of my analysis was done using R (a popular language for statistical analysis) , and my replication code can be found here, if anyone wants to quibble with my methodology. I scraped the most recent 1,000 tweets from each account, removed retweets, and cleaned the text to get rid of numbers and, sadly, emojis (if you ever want to make a computer almost explode, try running emojis through some analytics software). Because some of the brands had fewer tweets left over after the cleaning, I limited my analysis to the 250 most recent tweets that remained for each brand.

I explored four dimensions:

1. Positivity: Using a publicly-available sentiment dictionary which classifies thousands of words as having either positive or negative affect, the ratio of positive to negative words.

2. Lexical Diversity: The ratio of unique words to total words.

3. Offensiveness: Using a hand-crafted dictionary of offensive terms, the ratio of offensive to non-offensive words.

4. Based: The cosine similarity (the extent to which the count of words used is similar to the count of words used another user) with Lil B.


Each of these figures re-arranges the accounts from lowest to highest on the dimension of interest. In this case, we see that Quavo of Migos is the best at keepin' it posi, followed by pop darling ArianaGrande, and Grimes, whose handle "grimezsz" makes it look like her cat fell asleep on her keyboard halfway through making her Twitter account. Nicki Minaj’s tweets exhibit a high ratio of positive-to-negative words as well.

On the other end of the chart are a bunch of tough-guy rap brands: Bobby Shmurda (who, to be fair, hasn't tweeted since mid-December on account of being locked up in Riker's Island); Rick Ross, a textbook example of a successful rebrand; and Noisey digital cover recipient Freddie Gibbs. Also among the nastiest accounts are Noisey reader favorites Nothing, as well as the Swedish child Yung Lean.

Lexical Diversity

Of the four measures, this is perhaps the least well-suited to describe Twitter usage: the 140-character format restricts the ability of many brands to bring the full force of their diction to bear. Still, we see the famously clever Diplo in the number one spot, followed by lo-fi guitar bro Wavves, and another rebranding all-star whose brand has recently hit its stride, Lana Del Rey.

The bottom three in this category are an odd collection: Lil B, Paul McCartney, and Riff Raff. Lil B's score definitely suffers from his habit of signing all of his tweets with “- Lil B” and frequent use of the word "love." In Sir Paul's defense, his Twitter account refers to him in the third person, creating a transparent separation between the man and the brand. @Jodyhighroller, meanwhile, is the exact opposite—Riff Raff is a being of pure branding, and the account's score suffers because of a heavy use of the word "i."


There isn't much variation between the 30 or so least offensive accounts. These brands basically never tweeted any offensive words, and while I'm not surprised to see wholesome brands like "taylorswift13" near the bottom on this dimension, I was surprised to see the famously profane Eminem and famously scatological Lil Wayne there too, and even Cher has been known to get a little feisty, though rarely in a profane way.

Action Bronson wins the day as the most profane brand, and while that's probably due to his YouTube show "Fuck, That's Delicious." The fact that newly minted MRA OG Maco's most famous song has the word "Bitch" in the title has a similar effect, and while the handle of "fucktyler" doesn't contribute to Tyler, the Creator's score, the overall effect is probably negligible because his brand is pretty much founded on being as offensive as humanly possible.


Unsurprisingly, Lil B is the most similar to Lil B. The number two slot goes to child philosopher "officialjaden,” whose existentialist musings definitely qualify as Based. Also Based are St. Vincent, the recently resurgent meme factory Pharrell, and Mish Way of White Lung.

On the other end of the list is an example which illustrates the limitations of Data Journalism: Quavo of Migos, who we have established as the Most Positive brand in our sample, is also the Least Based. Now, I know that being Based is not the opposite of being positive; if anything, Basedness is a form of radical positivity. There's no fancy methodology that would allow us to adjudicate between these opposing rankings. They're both the result of perfectly valid, transparent statistical approaches. Herein lies the emotional poverty of Data Journalism. Humans—especially humans spending time in the dizzying postmodernity that is social media—seem to have an intrinsic love for rankings, for bringing order to a realm in which it is severely lacking. The figures shown here purport to rank artists by how they make you feel, a perfect perversion of what makes music special in the first place. Not all things benefit from being quantified. If you really want to know how Based or Positive Quavo is, go listen to some Migos and find out.