Designer and coder, Matt Daniels, is finally using data science for something I can relate to intimately. The creator of the Etymology of "Shorty" and Outkast In Graphs And Charts has shared a new data project that finally gets us closer to the age-old question: Which rapper has the most bars?
The Largest Vocabulary in Hip-Hop starts with the factoid that Shakespeare used just under 29,000 words across his entire works, suggesting he knew over 100,000 words and "arguably had the largest vocabulary, ever." Daniels cleverly decided to compare this information with the most famous wordsmiths of modern lore: Jay-Z, Drake, Wu-Tang, and other rap icons. He took the various artists's first 35,000 lyrics (which typically spans at least s), and discovered who really has the most unique rhymes and word usage. If certain MC's didn't hit the 35,000 mark through official releases (ahem, Kendrick), Daniels included mixtapes.
To determine the size of each artist's vocabulary, the coder gathered lyrics from Rap Genius then used a method called token analysis, where each word gets counted once. He used the first 5,000 words of seven Shakespeare works, and the first 35,000 words of Moby Dick, as benchmarks, too. "It still isn't perfect," writes Daniels. "Hip hop is full of slang that is hard to transcribe (e.g. shorty vs. shawty), compound words, featured vocalists, and repetitive choruses." Still, the results are impressive, and maybe not too surprising.
Coming in at number one is Aesop Rock. Daniels almost didn't include the tongue-twisting guru, thinking he was too obscure. Then Reddit's nefarious hip hop
community freaked out, and they were right; Aesop's datapoint "is so far to the right that he should be off the chart."
Interestingly, Wu-Tang come in at number 6, and GZA, Ghostface, Raekwon, and Method Man are all in the top twenty (with GZA at #2). Daniels hypothesizes that "perhaps their countless hours of studio time together (and RZA's mentorship) exposed each rapper's vocabulary to one another," but I think it has something to do with the magical implementation of liquid swords and a dash of some Tiger Style. Studio time had nothing to do with it.
Finally, arguably the best part of Daniels' incredible work comes at the end where he illustrates that many of the world's current most famous rappers are in the bottom 20% of the data set. So maybe this is proof that quality (from a populist's perspective) has nothing to do with quantity. After all, if we think about some of the more entertaining rappers as of late—Chief Keef, Juicy J, 2 Chainz—their lyrical prowess probably isn't the source of their popularity. Whereas someone like Kool Keith (#3 on the list) has stayed mostly underground, despite his next-level word play:
Now we want to see this info explored even more—who uses the most puns, double entendres, and other interesting word play? Swears? References to their male bits? If Daniels can update with this information, he deserves Editor, Moderate, and genius-level IQ points on Rap Genius.
See more of the amazing data here: The Largest Vocabulary In Hip Hop