While data analytics and visualization tools have accumulated a significant historical record of accomplishments, now, in turn, this technology is being applied to actual significant historical accomplishments. Let’s have a look.
Every year in January, the President of the United States gives the State of the Union speech before both houses of the U.S. Congress. This is to address the condition of the nation, his legislative agenda and other national priorities. The requirement for this presentation appears in Article II of the U.S. Constitution.
This talk with the nation has been given every year (with only one exception), since 1790. The resulting total of 224 speeches presents a remarkable and dynamic historical record of U.S. history and policy. Researchers at Columbia University and the University of Paris have recently applied sophisticated data analytics and visualization tools to this trove of presidential addresses. Their findings were published in the August 10, 2015 edition of the Proceedings of the National Academy of Sciences in a truly fascinating paper entitled Lexical Shifts, Substantive Changes, and Continuity in State of the Union Discourse, 1790–2014, by Alix Rule, Jean-Philippe Cointet, and Peter S. Bearman.
A very informative and concise summary of this paper was also posted in an article on Phys.org, also on August 10, 2015, entitled in a post entitled Big Data Analysis of State of the Union Remarks Changes View of American History, (no author is listed). I will summarize, annotate and post a few questions of my own. I highly recommend clicking through and reading the full report and the summary article together for a fuller perspective on this achievement. (Similar types of textual and graphical analyses of US law were covered in the May 15, 2015 Subway Fold post entitled Recent Visualization Projects Involving US Law and The Supreme Court.)
The researchers developed custom algorithms for their research. They were applied to the total number of words used in all of the addresses, from 1790 to 2014, of 1.8 million. By identifying the frequencies of “how often words appear jointly” and “mapping their relation to other clusters of words”, the team was able to highlight “dominant social and political” issues and their relative historical time frames. (See Figure 1 at the bottom of Page 2 of the full report for this lexigraphical mapping.)
One of the researchers’ key findings was that although the topics of “industry, finance, and foreign policy” were predominant and persist throughout all of the addresses, following World War II the recurring keywords focus further upon “nation building, the regulation of business and the financing of public infrastructure”. While it is well know that these emergent terms were all about modern topics, the researchers were thus able to pinpoint the exact time frames when they first appeared. (See Page 5 of the full report for the graphic charting these data trends.)
Foreign Policy Patters
The year 1917 struck the researchers as a critical turning point because it represented a dramatic shift in the data containing words indicative of more modern times. This was the year that the US sent its troops into battle in Europe in WWI. It was then that new keywords in the State of the Union including “democracy,” “unity,” “peace” and “terror” started to appear and recur. Later, by the 1940’s, word clusters concerning the Navy appeared, possibly indicating emerging U.S. isolationism. However, they suddenly disappeared again as the U.S. became far more involved in world events.
Domestic Policy Patterns
Over time, the researchers identified changes in the terminology used when addressing domestic matters. These concerned the government’s size, economic regulation, and equal opportunity. Although the focus of the State of the Union speeches remained constant, new keywords appeared whereby “tax relief,” “incentives” and “welfare” have replaced “Treasury,” “amount” and “expenditures”.
An important issue facing this project was that during the more than two centuries being studied, keywords could substantially change in meaning over time. To address this, the researchers applied new network analysis methods developed by Jean-Philippe Cointet, a team member, co-author and physicist at the University of Paris. They were intended to identify changes whereby “some political topics morph into similar topics with common threads” as others fade away. (See Figure 3 at the bottom of Page 4 of the full paper for this enlightening graphic.*)
As a result, they were able to parse the relative meanings of words as they appear with each other and, on a more macro level, in the “context of evolving topics”. For example, it was discovered that the word “Constitution” was:
- closely associated with the word “people” in early U.S. history
- linked to “state” following the Civil War
- linked to “law” during WWI and WWII, and
- returned to “people” during the 1970’s
Thus, the meaning of “Constitution” must be assessed in its historical context.
My own questions are as follows:
- Would this analytical approach yield new and original insights if other long-running historical records such as the Congressional Record were like subject to the research team’s algorithms and analytics?
- Could companies and other commercial businesses derive any benefits from having their historical records similarly analyzed? For example, might it yield new insights and recommendations for corporate governance and information governance policies and procedures?
- Could this methodology be used as an electronic discovery tool for litigators as they parse corporate documents produced during a case?
* This is also resembles the methodology and appearance to the graphic on Page 29 of the law review article entitled A Quantitative Analysis of the Writing Style of the U.S. Supreme Court, by Keith Carlson, Michael A. Livermore, and Daniel Rockmore, Dated March 11, 2015, linked to and discussed with the May 15, 2015 Subway Fold post cited above.