I only speak two languages: English and New York. Some visitors to NYC, especially those for the first time, often feel like they are hearing some otherworldly dialect of English being spoken here.
I am always amazed and a bit envious when I people are genuinely fluent in more than one language. I have friends and colleagues who can converse, write and even claim to think in multiple languages. Two of them immediately come to mind, one of whom who can speak 5 languages and the other can speak 6 languages. How do they do it?
Thus seeing an article posted on Gigaom.com entitled A Massive Database Now Translates News in 65 Languages in Real Time by Derrick Harris on Feb. 19, 2015 immediately got my attention. I will sum up, annotate and add some comments to this remarkable story.
The Global Database of Events, Languages and Tone (GDELT) is an ongoing project that has amassed a database of 250 million “socioeconomic and geopolitical events” and supporting metadata from 1979 to the present. GDELT was conceived and built by Kalev Leetaru, and he continues to run it. The database resides in Google’s cloud service and provides free access and coding tools to query and analyze this massive quantum of data.
Just one representative of GDELT’s many projects are an interactive map (available on GDELT’s home page), of conflicts and protests around the world. Support for this project is provided by The US Institute of Peace. an independent and nonpartisan American government institution.
Here is a deep and wide listing from GDELT’s blog that links directly to more than 300 of their other fascinating projects. Paging through and following even a sampling of these links will very likely help to spark your own imagination and creativity as to what can be done with this data and these tools.
On February 19, 2015 GDELT 2.0 was launched. In addition to a whole roster of new analytical tools, its most extraordinary new capability is real-time translation of news reports across 65 languages. The feeds of these reports are from non-Western and non-English sources. In effect, it is reporting from a different set of perspectives. The extensive details and parameters of this system are described in a February 19, 2015 blog post by Mr. Leetaru on GDELT’s website entitled GDELT Translingual: Translating the Planet.
Here is an accompanying blog post on the same day announcing and detailing many of the new tools and features entitled GDELT 2.0: Our Global World in Realtime. Among these is a capability called “Realtime Measurement of 2,300 Emotions and Themes” composed of “24 emotional measurement packages that together assess more than 2,300 emotions and themes from every article in realtime”. This falls within the science of content analysis which attempts to ascertain the deeper meanings and perspectives within a whole range of multimedia types and large sets.
I highly recommend checking out the Gigaom.com story. But I believe that is only the start if GDELT interests you. I further suggest clicking through and fully exploring their site to get a fuller sense of this project’s far-reaching vision and capabilities. Next, for the truly ambitious, the data sets and toolkits are all available for downloading right on the site. I say let the brainstorming for more new projects begin!
Back on December 2, 2014 in a Subway Fold post entitled Startup is Visualizing and Interpreting Massive Quantities of Daily Online News Content, we took a look at an exciting new startup call Quid that is doing similar sounding deep mining and analysis of news. Taken together, they represent a very fertile field for new endeavors like GDELT and Quid as the sophistication of machine intelligence to parse, and the capacities to gather and store these vast troves of data continues to advance. For both profit and non-profit organizations, I expect that potential benefits from deep global news analysis, interpretation, translation, visualization and metrics will continue to draw increasing numbers of interested and ambitious media companies, entrepreneurs, academics and government agencies.