I Can See for Miles: Using Augmented Reality to Analyze Business Data Sets

matrix-1013612__340, Image from Pixabay

While one of The Who’s first hit singles, I Can See for Miles, was most certainly not about data visualization, it still might – – on a bit of a stretch – – find a fitting a new context in describing one of the latest dazzling new technologies in the opening stanza’s declaration “there’s magic in my eye”.  In determining Who’s who and what’s what about all this, let’s have a look at report on a new tool enabling data scientists to indeed “see for miles and miles” in an exciting new manner.

This innovative approach was recently the subject of a fascinating article by an augmented reality (AR) designer named Benjamin Resnick about his team’s work at IBM on a project called Immersive Insights, entitled Visualizing High Dimensional Data In Augmented Reality, posted on July 3, 2017 on Medium.com. (Also embedded is a very cool video of a demo of this system.) They are applying AR’s rapidly advancing technology1 to display, interpret and leverage insights gained from business data. I highly recommend reading this in its entirety. I will summarize and annotate it here and then pose a few real-world questions of my own.

Immersive Insights into Where the Data-Points Point

As Resnick foresees such a system in several years, a user will start his or her workday by donning their AR glasses and viewing a “sea of gently glowing, colored orbs”, each of which visually displays their business’s big data sets2. The user will be able to “reach out select that data” which, in turn, will generate additional details on a nearby monitor. Thus, the user can efficiently track their data in an “aesthetically pleasing” and practical display.

The project team’s key objective is to provide a means to visualize and sum up the key “relationships in the data”. In the short-term, the team is aiming Immersive Insights towards data scientists who are facile coders, enabling them to visualize, using AR’s capabilities upon time series, geographical and networked data. For their long-term goals, they are planning to expand the range of Immersive Insight’s applicability to the work of business analysts.

For example, Instacart, a same-day food delivery service, maintains an open source data set on food purchases (accessible here). Every consumer represents a data-point wherein they can be expressed as a “list of purchased products” from among 50,000 possible items.

How can this sizable pool of data be better understood and the deeper relationships within it be extracted and understood? Traditionally, data scientists create a “matrix of 2D scatter plots” in their efforts to intuit connections in the information’s attributes. However, for those sets with many attributes, this methodology does not scale well.

Consequently, Resnick’s team has been using their own new approach to:

  • Lower complex data to just three dimensions in order to sum up key relationships
  • Visualize the data by applying their Immersive Insights application, and
  • Iteratively label and color-code the data” in conjunction with an “evolving understanding” of its inner workings

Their results have enable them to “validate hypotheses more quickly” and establish a sense about the relationships within the data sets. As well, their system was built to permit users to employ a number of versatile data analysis programming languages.

The types of data sets being used here are likewise deployed in training machine learning systems3. As a result, the potential exists for these three technologies to become complementary and mutually supportive in identifying and understanding relationships within the data as well as deriving any “black box predictive models”.

Analyzing the Instacart Data Set: Food for Thought

Passing over the more technical details provided on the creation of team’s demo in the video (linked above), and next turning to the results of the visualizations, their findings included:

  • A great deal of the variance in Instacart’s customers’ “purchasing patterns” was between those who bought “premium items” and those who chose less expensive “versions of similar items”. In turn, this difference has “meaningful implications” in the company’s “marketing, promotion and recommendation strategies”.
  • Among all food categories, produce was clearly the leader. Nearly all customers buy it.
  • When the users were categorized by the “most common department” they patronized, they were “not linearly separable”. This is, in terms of purchasing patterns, this “categorization” missed most of the variance in the system’s three main components (described above).

Resnick concludes that the three cornerstone technologies of Immersive Insights – – big data, augmented reality and machine learning – – are individually and in complementary combinations “disruptive” and, as such, will affect the “future of business and society”.

Questions

  • Can this system be used on a real-time basis? Can it be configured to handle changing data sets in volatile business markets where there are significant changes within short time periods that may affect time-sensitive decisions?
  • Would web metrics be a worthwhile application, perhaps as an add-on module to a service such as Google Analytics?
  • Is Immersive Insights limited only to business data or can it be adapted to less commercial or non-profit ventures to gain insights into processes that might affect high-level decision-making?
  • Is this system extensible enough so that it will likely end up finding unintended and productive uses that its designers and engineers never could have anticipated? For example, might it be helpful to juries in cases involving technically or financially complex matters such as intellectual property or antitrust?

 


1.  See the Subway Fold category Virtual and Augmented Reality for other posts on emerging AR and VR applications.

2.  See the Subway Fold category of Big Data and Analytics for other posts covering a range of applications in this field.

3.  See the Subway Fold category of Smart Systems for other posts on developments in artificial intelligence, machine learning and expert systems.

4.  For a highly informative and insightful examination of this phenomenon where data scientists on occasion are not exactly sure about how AI and machine learning systems produce their results, I suggest a click-through and reading of The Dark Secret at the Heart of AI,  by Will Knight, which was published in the May/June 2017 issue of MIT Technology Review.

GDELT 2.0 Launches Bringing Real-Time News Translation in 65 Languages

7094052079_2f4e870288_z

Image by Library and Archives Canada

I only speak two languages: English and New York. Some visitors to NYC, especially those for the first time, often feel like they are hearing some otherworldly dialect of English being spoken here.

I am always amazed and a bit envious when I people are genuinely fluent in more than one language. I have friends and colleagues who can converse, write and even claim to think in multiple languages. Two of them immediately come to mind, one of whom who can speak 5 languages and the other can speak 6 languages. How do they do it?

Thus seeing an article posted on Gigaom.com entitled A Massive Database Now Translates News in 65 Languages in Real Time by Derrick Harris on  Feb. 19, 2015 immediately got my attention. I will sum up, annotate and add some comments to this remarkable story.

The Global Database of Events, Languages and Tone (GDELT) is an ongoing project that has amassed a database of 250 million “socioeconomic and geopolitical events” and supporting metadata from 1979 to the present. GDELT was conceived and built by Kalev Leetaru, and he continues to run it. The database resides in Google’s cloud service and provides free access and coding tools to query and analyze this massive quantum of data.

Just one representative of GDELT’s many projects are an interactive map (available on GDELT’s home page), of conflicts and protests around the world.  Support for this project is provided by The US Institute of Peace. an independent and nonpartisan American government institution.

Here is a deep and wide listing from GDELT’s blog that links directly to more than 300 of their other fascinating projects. Paging through and following even a sampling of these links will very likely help to spark your own imagination and creativity as to what can be done with this data and these tools.

On February 19, 2015 GDELT 2.0 was launched. In addition to a whole roster of new analytical tools, its most extraordinary new capability is real-time translation of news reports across 65 languages. The feeds of these reports are from non-Western and non-English sources. In effect, it is reporting from a different set of perspectives. The extensive details and parameters of this system are described in a February 19, 2015 blog post by Mr. Leetaru on GDELT’s website entitled GDELT Translingual: Translating the Planet.

Here is an accompanying blog post on the same day announcing and detailing many of the new tools and features entitled GDELT 2.0: Our Global World in Realtime. Among these is a capability called “Realtime Measurement of 2,300 Emotions and Themes” composed of  “24 emotional measurement packages that together assess more than 2,300 emotions and themes from every article in realtime”. This falls within the science of content analysis which attempts to ascertain the deeper meanings and perspectives within a whole range of multimedia types and large sets.

I highly recommend checking out the Gigaom.com story. But I believe that is only the start if GDELT interests you. I further suggest clicking through and fully exploring their site to get a fuller sense of this project’s far-reaching vision and capabilities. Next, for the truly ambitious, the data sets and toolkits are all available for downloading right on the site. I say let the brainstorming for more new projects begin!

Back on December 2, 2014 in a Subway Fold post entitled Startup is Visualizing and Interpreting Massive Quantities of Daily Online News Content, we took a look at  an exciting new startup call Quid that is doing  similar sounding deep mining and analysis of news. Taken together, they represent a very fertile field for new endeavors like GDELT and Quid as the sophistication of machine intelligence to parse, and the capacities to gather and store these vast troves of data continues to advance. For both profit and non-profit organizations, I expect that potential benefits from deep global news analysis, interpretation, translation, visualization and metrics will continue to draw increasing numbers of interested and ambitious media companies, entrepreneurs, academics and government agencies.

 

 

Startup is Visualizing and Interpreting Massive Quantities of Daily Online News Content

"News", Image by Mars Hill Church Seattle

“News”, Image by Mars Hill Church Seattle

Just scratching the surface, some recent Subway Fold posts have focused upon sophisticated efforts, scaling from startups to established companies, that analyze and then try to capitalize upon big data trends in finance¹, sports² , cities³, health care* and law**. Now comes a new report on the work of another interesting startup in this sector called Quid. As reported in a most engaging story posted on VentureBeat.com on November 27, 2014 entitled Quid’s Article-analyzing App Can Tell You Many Things — Like Why You Lost the Senate Race in Iowa by Jordan Novet, they are gathering up, indexing and generating interpretive and insightful visualizations for their clients using data drawn from more than a million online articles each day from more than 50,000 sources.

I highly recommend a full read of this story for all of the fascinating details and accompanying screen captures from these apps. As well, I suggest visiting and exploring Quid’s site for a fuller sense of their products, capabilities and clients. I will briefly recap some of the key points from this story. Furthermore, this article provides a timely opportunity to more closely tie together seven related Subway Fold posts.

The first example provided in the story concerns the firm’s production of a rather striking graphic charting the turn in polling numbers for a Democratic candidate running for an open Senate seat in Iowa following a campaign visit from Hillary Clinton. When Senator Clinton spoke in the state in support of the Democratic candidate, she address women’s issue. Based upon Quid’s analysis of the media coverage of this, the visit seemed to have helped the Republican candidate, a woman, more then the Democratic candidate, a man.

Politics aside, Quid’s main objective is to become a leading software firm in supporting corporate strategy for its clients in markets sectors including, among others, technology, finance and government.

Quid’s systems works by scooping up its source materials online and then distilling out specific “people, places, industries and keywords”. In turn, all of the articles are compared and processed against a specific query. In turn, the software creates visualizations where “clusters” and “anomalies” can be further examined. The analytics also assess relative word counts, magnitudes of links shared on social media, and mentions in blog posts and tweets. (X-ref again to this November 22, 2014 post entitled Minting New Big Data Types and Analytics for Investors that covers, among other startups, one called Dataminr that also extensively analyzes trends in Twitter usage and content data.)

The sample screens here demonstrate the app’s deep levels of detail in analytics and visualization. As part of the company’s demo for the author, he provided a query about companies involved in “deep learning”.  (X-ref to this August 14, 2014 Subway Fold post entitled Spotify Enhances Playlist Recommendations Processing with “Deep Learning” Technology concerning how deep learning is being used to, well, tune up Spotify’s music recommendation system.) As this is an area the writer is familiar with, while he did not find any unexpected names in companies provided in the result, he still found this reassuring insofar as this confirmed that he was aware of all of the key participants in this field.

My follow up questions include:

  • Would it be to Quid’s and/or Dataminr’s advantage(s) to cross-license some of their technology and provide supporting expertise for applying and deploying it and, if so, how?
  • Would Quid’s visualizations benefit if they were ported to 3-D virtual environments such as Hyve-3d where users might be able to “walk” through the data? (X-ref to this August 28,2014 Subway Fold post entitled Hyve-3D: A New 3D Immersive and Collaborative Design System.) That is, does elevating these visualizations from 2-D to 3-D add to the accessibility, accuracy and analytics of the results being sought? Would this be a function or multiple functions of the industry, corporate strategy and granularity of the data sets?
  • What other marketplaces, sciences, professions and products might benefit from Quid’s, Dataminr’s and Hyve-3D’s approaches that have not even been considered yet?

________________________________
1.  See this November 22, 2014 post entitled Minting New Big Data Types and Analytics for Investors.

2.  See this October 31, 2014 post entitled New Data Analytics and Video Tools Affecting Defensive Strategies in the NFL and NBA.

3.  See this October 24, 2014 post entitled “I Quant NY” Blog Analyzes Public Data Sets Released by New York City.

*  See this October 3, 2014 post entitled New Startups, Hacks and Conferences Focused Upon Health Data and Analytics .

**  See this August 8, 2014 post entitled New Visualization Service for US Patent and Trademark Data .

Twitter Invests $10M in Establishing the Laboratory for Social Machine at MIT

The astronomical diversity of Twitter users and topics never ceases to expand and amaze. Everyone and their neighbor from #anthropologists to #zoologists and countless others post approximately 500 million Tweets each day. This produces a virtual ocean of highly valuable data and accompanying analytics that have found applications in, among a multitude of other areas, e-commerce, marketing, entertainment, government, sports, academia, science, medicine and law. For example, two recent Subway Fold posts here have looked at the mappings of Twitter networks and the analysis of Twitter traffic about TV shows to examine this phenomenon.

Taking this to yet another level of involvement and sophistication was an announcement on October 1, 2014 that was posted on Gigaom.com entitled Twitter Gives MIT $10M and Access to the Firehose to Build a Laboratory for Social Machines, by Matthew Ingram. To briefly recap, Twitter is providing funding for a new undertaking at MIT called the Laboratory for Social Machines (LSM). Its mandate is to examine the effects of social media on society, including the creation of new tools (such as pattern recognition and data visualization), and methodologies for doing so. They further intend to create a platform where the findings can be openly discussed and possibly acted upon by the interested parties.

LSM will have access to the entire quantum of Twitter posts going back to the social platform’s launch in 2006. Other planned participants will include journalists, “social groups and movements”. Their website provides more fine-grained details about their objectives, approaches and personnel. I highly recommend clicking through to the LSM site to learn more and get a genuine sense that this could really be something big. As well, their own new Twitter feed is @mitlsm.

Additional coverage of this story can be found here on The Wall Street Journal’s Digits blog and here on the Boston Business Journal’s techflash blog.

What a remarkable and admirable leap forward this is for Twitter and MIT. At its outset, this sounds like a venture that is destined to produce practical and actionable benefits to nterested groups across the real and virtual worlds, not to mention the positive publicity and good will this announcement has already generated.

My own questions include:

  • Will other interested parties be invited to provide funding or is this an exclusive venture between Twitter and MIT?
  • What types of new startups will the work of LSM inspire and support? Will LSM expand itself to become an incubator of some sort?
  • What policies will guide the LSM’s decision-making on the types of studies, tools, movements and so on to pursue? Is establishing an advisory board in their current plans?
  • Will other universities build comparable labs for social media studies?
  • Will professional organizations, trade associations, and other specific interest groups likewise create their own such labs?