This story is over 5 years old.

Big Data Shows How American Politics Have(n’t) Changed

Text analysis of every State of the Union address ever pegs the start of the modern era at the end of World War I.
Image: courtesy of the authors

Every year since 1790 (with the exception of FDR in 1933), the president of the United States has delivered a State of the Union address to Congress, an oratory that is sometimes somber, sometimes joyous, or sometimes just downright bizarre. Simultaneously retrospective and speculative, the SOTU provides the American people with an annual portrait of themselves and their country, which when taken together provides a stunning panorama of the shifting fault lines of the American political imagination since the country's inception.


This was the general idea behind a paper published this week in the Proceedings of the National Academy of Sciences, which used text analysis of every SOTU to determine the precise date when America became modern. According to the results of the study, America entered modernity at approximately the same time it found itself embroiled in WWI, the nation officially coming of age in 1917.

The researchers at Columbia University and University of Paris developed algorithms to analyze the nearly 1.8 million words that comprised 227 SOTUs since Washington's inaugural delivery in 1790. In their analysis, the team was not merely looking at raw usage of certain keywords, but also examined how the meaning of these words change over time, whether due to the fluidity of language, new inventions in the world, or the reorganization of conceptual categories by analyzing the words jointly.

The SOTU remains remarkably stable over the course of history, at least lexically speaking.

What the team found was rather striking. Despite how much we may think that we've changed as a nation since our founding, the SOTU remains remarkably stable over the course of history, at least lexically speaking. While the team did notice some dramatic departures in terms of its content, these coincided less with the mode of delivery—the switch from oral to written letters presented to congress, for example, or the first televised address in 1947—than external events such as the war of 1812 and the two world wars. There was also that time when Grover Cleveland devoted his entire address to making a case for tariff reform in a bid for a reelection he didn't secure, thus making 1887 a pretty drastic outlier in the data set.


Interestingly, the reports notes that, "the Civil War, often considered in conventional histories to have transformed the country's political consciousness, while apparent in political discourse of the time, seems not to have made a lasting imprint on the unfolding of the dominant categories of social and political thought in the [State of the Union]." The event which marked the largest shift in conceptual categories was the United States' entry into World War I, which the study pegs as the date that the US discovered its modern understanding of politics.

This was the year that the words such as "democracy," "unity," "peace," and "terror" began cropping up in terms of foreign policy, replacing a former emphasis on statecraft and diplomacy. This was also the year that discussions of domestic policy began centering around the size of government and its role in the economy, marking the transition to the modern welfare state through the the replacement of words such as "Treasury," "amount," and "expenditures" with "tax relief," "incentives," and "welfare."

The study examined the joint use of words in order to determine their relevance and shifting meaning, relying "on the straightforward idea that words acquire meaning through their relationships with other words." This co-occurrence approach to text analysis is contrasted to the more common and, as the authors argue, flawed dictionary-based approaches in which the analysis of words is compared against a predefined and structured set of terms, which fails to take into account the way the meaning and use of certain words changes over time.

This methodology yielded some intriguing results, particularly about our ever evolving view of our relationship to the constitution. For instance, in the nation's infancy, "constitution" was most closely linked with "people," but following the Civil war it became linked with "state." This lasted until the world wars when it became linked with "law" before reverting back to "people" again in the 1970s.

As our nation marches on into a new millennium that is become increasingly defined by technologies that were all but inconceivable in the 20th century, it is interesting to think how our SOTU will define the present era to the future denizens of the ol' US of A. Perhaps it is not too far of a stretch to imagine a future analysis carried about by quantum computers in which words like "surveillance," "natural disaster," and "autonomous warfare" begin to dominate the national political consciousness as seen through the State of the Union—and if they don't, what will that tell future about our contemporary political process?