While one of The Who’s first hit singles, I Can See for Miles, was most certainly not about data visualization, it still might – – on a bit of a stretch – – find a fitting a new context in describing one of the latest dazzling new technologies in the opening stanza’s declaration “there’s magic in my eye”. In determining Who’s who and what’s what about all this, let’s have a look at report on a new tool enabling data scientists to indeed “see for miles and miles” in an exciting new manner.
This innovative approach was recently the subject of a fascinating article by an augmented reality (AR) designer named Benjamin Resnick about his team’s work at IBM on a project called Immersive Insights, entitled Visualizing High Dimensional Data In Augmented Reality, posted on July 3, 2017 on Medium.com. (Also embedded is a very cool video of a demo of this system.) They are applying AR’s rapidly advancing technology1 to display, interpret and leverage insights gained from business data. I highly recommend reading this in its entirety. I will summarize and annotate it here and then pose a few real-world questions of my own.
Immersive Insights into Where the Data-Points Point
As Resnick foresees such a system in several years, a user will start his or her workday by donning their AR glasses and viewing a “sea of gently glowing, colored orbs”, each of which visually displays their business’s big data sets2. The user will be able to “reach out select that data” which, in turn, will generate additional details on a nearby monitor. Thus, the user can efficiently track their data in an “aesthetically pleasing” and practical display.
The project team’s key objective is to provide a means to visualize and sum up the key “relationships in the data”. In the short-term, the team is aiming Immersive Insights towards data scientists who are facile coders, enabling them to visualize, using AR’s capabilities upon time series, geographical and networked data. For their long-term goals, they are planning to expand the range of Immersive Insight’s applicability to the work of business analysts.
For example, Instacart, a same-day food delivery service, maintains an open source data set on food purchases (accessible here). Every consumer represents a data-point wherein they can be expressed as a “list of purchased products” from among 50,000 possible items.
How can this sizable pool of data be better understood and the deeper relationships within it be extracted and understood? Traditionally, data scientists create a “matrix of 2D scatter plots” in their efforts to intuit connections in the information’s attributes. However, for those sets with many attributes, this methodology does not scale well.
Consequently, Resnick’s team has been using their own new approach to:
- Lower complex data to just three dimensions in order to sum up key relationships
- Visualize the data by applying their Immersive Insights application, and
- “Iteratively label and color-code the data” in conjunction with an “evolving understanding” of its inner workings
Their results have enable them to “validate hypotheses more quickly” and establish a sense about the relationships within the data sets. As well, their system was built to permit users to employ a number of versatile data analysis programming languages.
The types of data sets being used here are likewise deployed in training machine learning systems3. As a result, the potential exists for these three technologies to become complementary and mutually supportive in identifying and understanding relationships within the data as well as deriving any “black box predictive models”.4
Analyzing the Instacart Data Set: Food for Thought
Passing over the more technical details provided on the creation of team’s demo in the video (linked above), and next turning to the results of the visualizations, their findings included:
- A great deal of the variance in Instacart’s customers’ “purchasing patterns” was between those who bought “premium items” and those who chose less expensive “versions of similar items”. In turn, this difference has “meaningful implications” in the company’s “marketing, promotion and recommendation strategies”.
- Among all food categories, produce was clearly the leader. Nearly all customers buy it.
- When the users were categorized by the “most common department” they patronized, they were “not linearly separable”. This is, in terms of purchasing patterns, this “categorization” missed most of the variance in the system’s three main components (described above).
Resnick concludes that the three cornerstone technologies of Immersive Insights – – big data, augmented reality and machine learning – – are individually and in complementary combinations “disruptive” and, as such, will affect the “future of business and society”.
- Can this system be used on a real-time basis? Can it be configured to handle changing data sets in volatile business markets where there are significant changes within short time periods that may affect time-sensitive decisions?
- Would web metrics be a worthwhile application, perhaps as an add-on module to a service such as Google Analytics?
- Is Immersive Insights limited only to business data or can it be adapted to less commercial or non-profit ventures to gain insights into processes that might affect high-level decision-making?
- Is this system extensible enough so that it will likely end up finding unintended and productive uses that its designers and engineers never could have anticipated? For example, might it be helpful to juries in cases involving technically or financially complex matters such as intellectual property or antitrust?
1. See the Subway Fold category Virtual and Augmented Reality for other posts on emerging AR and VR applications.
2. See the Subway Fold category of Big Data and Analytics for other posts covering a range of applications in this field.
3. See the Subway Fold category of Smart Systems for other posts on developments in artificial intelligence, machine learning and expert systems.
4. For a highly informative and insightful examination of this phenomenon where data scientists on occasion are not exactly sure about how AI and machine learning systems produce their results, I suggest a click-through and reading of The Dark Secret at the Heart of AI, by Will Knight, which was published in the May/June 2017 issue of MIT Technology Review.