Semantic Scholar and BigDIVA: Two New Advanced Search Platforms Launched for Scientists and Historians

"The Chemistry of Inversin", Image by Raymond Bryson

“The Chemistry of Inversion”, Image by Raymond Bryson

As powerful, essential and ubiquitous as Google and its search engine peers are across the world right now, needs often arise in many fields and marketplaces for platforms that can perform much deeper and wider digital excavating. So it is that two new highly specialized search platforms have just come online specifically engineered, in these cases, for scientists and historians. Each is structurally and functionally quite different from the other but nonetheless is aimed at very specific professional user bases with advanced researching needs.

These new systems provide uniquely enhanced levels of context, understanding and visualization with their results. We recently looked at a very similar development in the legal professions in an August 18, 2015 Subway Fold post entitled New Startup’s Legal Research App is Driven by Watson’s AI Technology.

Let’s have a look at both of these latest innovations and their implications. To introduce them, I will summarize and annotate two articles about their introductions, and then I will pose some additional questions of my own.

Semantic Scholar Searches for New Knowledge in Scientific Papers

First, the Allen Institute for Artificial Intelligence (A2I) has just launched its new system called Semantic Scholar, freely accessible on the web. This event was covered on NewScientist.com in a fascinating article entitled AI Tool Scours All the Science on the Web to Find New Knowledge on November 2, 2015 by Mark Harris.

Semantic Scholar is supported by artificial intelligence (AI)¹ technology. It is automated to “read, digest and categorise findings” from approximately two million scientific papers published annually. Its main objective is to assist researchers with generating new ideas and “to identify previously overlooked connections and information”. Because of the of the overwhelming volume of the scientific papers published each year, which no individual scientist could possibly ever read, it offers an original architecture and high-speed manner to mine all of this content.

Oren Etzioni, the director of A2I, termed Semantic Scholar a “scientist’s apprentice”, to assist them in evaluating developments in their fields. For example, a medical researcher could query it about drug interactions in a certain patient cohort having diabetes. Users can also pose their inquiries in natural language format.

Semantic Scholar operates by executing the following functions:

  • crawling the web in search of “publicly available scientific papers”
  • scanning them into its database
  • identifying citations and references that, in turn, are assessed to determine those that are the most “influential or controversial”
  • extracting “key phrases” appearing similar papers, and
  • indexing “the datasets and methods” used

A2I is not alone in their objectives. Other similar initiatives include:

Semantic Scholar will gradually be applied to other fields such as “biology, physics and the remaining hard sciences”.

BigDIVA Searches and Visualized 1,500 Year of History

The second innovative search platform is called Big Data Infrastructure Visualization Application (BigDIVA). The details about its development, operation and goals were covered in a most interesting report posted online on  NC State News on October 12, 2015 entitled Online Tool Aims to Help Researchers Sift Through 15 Centuries of Data by Matt Shipman.

This is joint project by the digital humanities scholars at NC State University and Texas A&M University. Its objective is to assist researchers in, among other fields, literature, religion, art and world history. This is done by increasing the speed and accuracy of searching through “hundreds of thousands of archives and articles” covering 450 A.D. to the present. BigDIVA was formally rolled out at NC State on October 16, 2015.

BigDIVA presents users with an entirely new visual interface, enabling them to search and review “historical documents, images of art and artifacts, and any scholarship associated” with them. Search results, organized by categories of digital resources, are displayed in infographic format4. The linked NC State News article includes a photo of this dynamic looking interface.

This system is still undergoing beta testing and further refinement by its development team. Expansion of its resources on additional historical periods is expected to be an ongoing process. Current plans are to make this system available on a subscription basis to libraries and universities.

My Questions

  • Might the IBM Watson, Semantic Scholar, DARPA and BigDIVA development teams benefit from sharing design and technical resources? Would scientists, doctors, scholars and others benefit from multi-disciplinary teams working together on future upgrades and perhaps even new platforms and interface standards?
  • What other professional, academic, scientific, commercial, entertainment and governmental fields would benefit from these highly specialized search platforms?
  • Would Google, Bing, Yahoo and other commercial search engines benefit from participating with the developers in these projects?
  • Would proprietary enterprise search vendors likewise benefit from similar joint ventures with the types of teams described above?
  • What entrepreneurial opportunities might arise for vendors, developers, designers and consultants who could provide fuller insight and support for developing customized search platforms?

 


1.  These 11 Subway Fold posts cover various AI applications and developments.

2.  These seven Subway Fold posts cover a range of IBM Watson applications and markets.

3A new history of DARPA written by Annie Jacobsen was recently published entitled The Pentagon’s Brain (Little Brown and Company, 2015).

4.  See this January 30, 2015 Subway Fold post entitled Timely Resources for Studying and Producing Infographics on this topic.

Movie Review of “The Human Face of Big Data”

"Blue and Pink Fractal", Image by dev Moore

“Blue and Pink Fractal”, Image by dev Moore

What does big data look like, anyway?

To try to find out, I was very fortunate to have obtained a pass to see a screening of a most enlightening new documentary called The Human Face of Big Data. The event was held on October 20, 2015 at Civic Hall in the Flatiron District in New York.

The film’s executive producer, Rick Smolan, (@ricksmolan), first made some brief introductory remarks about his professional work and the film we were about to see. Among his many accomplishments as a photographer and writer, he was the originator and driving force behind the A Day in the Life series of books where teams of photographers were dispatched to take pictures of different countries for each volume in such places as, among others, the United States, Japan and Spain.

He also added a whole new meaning to a having a hand in casting in his field by explaining to the audience that he had recently fallen from a try on his son’s scooter and hence his right hand was in a cast.

As the lights were dimmed and the film began, someone sitting right in front of me did something that was also, quite literally, enlightening but clearly in the wrong place and at the wrong time by opening up a laptop with a large and very bright screen. This was very distracting so I quickly switched seats. In retrospect, doing so also had the unintentional effect of providing me with a metaphor for the film: From my new perspective in the auditorium, I was seeing a movie that was likewise providing me with a whole new perspective on this important subject.

This film proceeded to provide an engrossing and informative examination of what exactly is “big data”, how it is gathered and analyzed, and its relative virtues and drawbacks.¹ It accomplished all of this by addressing these angles with segments of detailed expositions intercut with interviews of leading experts. In his comments afterwards, Mr. Smolan described big data as becoming a form of “nervous system” currently threading out across our entire planet.

Other documentarians could learn much from his team’s efforts as they smartly surveyed the Big Dataverse while economically compressing their production into a very compact and efficient package. Rather than a paint by, well, numbers production with overly long technical excursions, they deftly brought their subject to life with some excellent composition and editing of a wealth of multimedia content.

All of the film’s topics and transitions between them were appreciable evenhanded. Some segments specifically delved into how big data systems vacuum up this quantum of information and how it positively and negatively affects consumers and other demographic populations. Other passages raised troubling concerns about the loss of personal privacy in recent revelations concerning the electronic operations conducted by the government and the private sector.

I found the most compelling part of the film to be an interview with Dr. Eric Topol, (@EricTopol), a leading proponent of digital medicine, using smart phones as a medical information platform, and empowering patients to take control of their own medical data.² He spoke about the significance of the massive quantities and online availability of medical data and what this transformation  mean to everyone. His optimism and insights about big data having a genuine impact upon the quality of life for people across the globe was representative of this movie’s measured balance between optimism and caution.

This movie’s overall impression analogously reminded me of the promotional sponges that my local grocery used to hand out.  When you returned home and later added a few drops of water to these very small, flat and dried out novelties, they quickly and voluminously expanded. So too, here in just a 52-minute film, Mr. Smolan and his team have assembled a far-reaching and compelling view of the rapidly expanding parsecs of big data. All the audience needed to access, comprehend and soak up all of this rich subject matter was an open mind to new ideas.

Mr. Smolan returned to the stage after the movie ended to graciously and enthusiastically answer questions from the audience. It was clear from the comments and questions that nearly everyone there, whether they were familiar or unfamiliar with big data, had greatly enjoyed this cinematic tour of this subject and its implications. The audience’s well-informed inquiries concerned the following topics:

  • the ethics and security of big data collection
  • the degrees to which science fiction is now become science fact
  • the emergence and implications of virtual reality and augment reality with respect to entertainment and the role of big data in these productions³
  • the effects and influences of big data in medicine, law and other professions
  • the applications of big data towards extending human lifespans

Mr. Smolan also mentioned that his film will be shown on PBS in 2016. When it becomes scheduled, I very highly recommend setting some time aside to view it in its entirety.

Big data’s many conduits, trends, policies and impacts relentlessly continue to extend their global grasp. The Human Face of Big Data delivers a fully realized and expertly produced means for comprehending and evaluating this crucial and unavoidable phenomenon. This documentary is a lot to absorb yet an apt (and indeed fully app-ed), place to start.

 


One of the premiere online resources for anything and everything about movies is IMDB.com. It has just reached its 25th anniversary which was celebrated in a post in VentureBeat.com on October 30, 2015, entitled 25 Years of IMDb, the World’s Biggest Online Movie Database by Paul Sawers.


1These 44 Subway Fold Posts covered many of the latest developments in different fields, marketplaces and professions in the category of Big Data and Analytics.

2.  See also this March 3, 2015 Subway Fold post reviewing Dr. Topol’s latest book, entitled Book Review of “The Patient Will See You Now”.

3These 11 Subway Fold Posts cover many of the latest developments in the arts, sciences, and media industries in the category of Virtual and Augmented Reality. For two of the latest examples, see an article from the October 20, 2015 edition of The New York Times entitled The Times Partners With Google on Virtual Reality Project by Ravi Somaiya, and an article on Fortune.com on September 27, 2015 entitled Oculus Teams Up with 20th Century Fox to Bring Virtual Reality to Movies by Michael Addady. (I’m just speculating here, but perhaps The Human Face of Big Data would be well-suited for VR formatting and audience immersion.)

NASA is Providing Support for Musical and Humanitarian Projects

"NASA - Endeavor 2", Image by NASA

“NASA – Endeavor 2”, Image by NASA

In two recent news stories, NASA has generated a world of good will and positive publicity about itself and its space exploration program. It would be an understatement to say their results have been both well-grounded and out of this world.

First, NASA astronaut Chris Hadfield created a vast following for himself online when he uploaded a video onto YouTube of him singing David Bowie’s classic Space Oddity while on a mission on the International Space Station (ISS).¹ As reported on the October 7, 2015 CBS Evening News broadcast, Hadfield will be releasing an album of 12 songs he wrote and performed in space, today on October 9. 2015. He also previously wrote a best-selling book entitled An Astronaut’s Guide to Life on Earth: What Going to Space Taught Me About Ingenuity, Determination, and Being Prepared for Anything (Little, Brown and Company, 2013). I highly recommend checking out his video, book and Twitter account @Cmdr_Hadfield.

What a remarkably accomplished career in addition to his becoming an unofficial good will ambassador for NASA.

The second story, further enhancing the agency’s reputation, concerns a very positive program affecting many lives that was reported in a most interesting article on Wired.com on September 28, 2015 entitled How NASA Data Can Save Lives From Space by Issie Lapowsky. I will summarize and annotate it, and then pose some my own terrestrial questions.

Agencies’ Partnership

According to a NASA administrator Charles Bolden, astronauts frequently look down at the Earth from space and realize that borders across the world are subjectively imposed by warfare or wealth. These dividing lines between nations seem to become less meaningful to them while they are in flight. Instead, the astronauts tend to look at the Earth and have a greater awareness everyone’s responsibilities to each other. Moreover, they wonder what they can possibly do when they return to make some sort of meaningful difference on the ground.

Bolden recently shared this experience with an audience at the United States Agency for International Development (USAID) in Washington, DC, to explain the reasoning behind a decade-long partnership between NASA and USAID. (This latter is the US government agency responsible for the administration of US foreign aid.) At first, this would seem to be an unlikely joint operation between two government agencies that do not seem to have that much in common.

In fact, this combination provides “a unique perspective on the grave need that exists in so many places around the world”, and a special case where one agency sees it from space and the other one sees it on the ground.

They are joined together into a partnership known as SERVIR where NASA supplies “imagery, data, and analysis” to assist developing nations.  They help these countries with forecasting and dealing “with natural disasters and the effects of climate change”.

Partnership’s Results

Among others, SERVIR’s tools have produced the following representative results:

  • Predicting floods in Bangladesh that gives citizens a total of eight days notice in order to make preparations that will save lives. This reduced the number to 17 during the last year’s monsoon season whereas previously it had been in the thousands.
  • Predicting forest fires in the Himalayas.
  • For central America, NASA created  a map of ocean chlorophyll concentration that assisted public officials in identifying and improving shellfish testing in order to deal with “micro-algae outbreaks” responsible for causing significant health issues.

SERVIR currently operates in 30 countries. As a part of their network, there are regional hubs working with “local partners to implement the tools”. Last week it opened such a hub in Asia’s Mekong region. Both NASA and USAID are hopeful that the number of such hubs will continue to grow.

Google is also assisting with “life saving information from satellite imagery”. They are doing this by applying artificial intelligence (AI)² capabilities to Google Earth. This project is still in its preliminary stages.

My Questions

  • Should SERVIR reach out to the space agencies and humanitarian organizations of other countries to explore similar types of humanitarian joint ventures?
  • Do the space agencies of other countries have similar partnerships with their own aid agencies?
  • Would SERVIR benefit from partnerships with other US government agencies? Similarly, would it benefit from partnering with other humanitarian non-governmental organizations (NGO)?
  • Would SERVIR be the correct organization to provide assistance in global environmental issues? Take for example the report on the October 8, 2015 CBS Evening News network broadcast of the story about the bleaching of coral reefs around the world.

 


1.  While Hatfield’s cover and Bowie’s original version of Space Oddity are most often associated in pop culture with space exploration, I would like to suggest another song that also captures this spirit and then truly electrifies it: Space Truckin’ by Deep Purple. This appeared on their Machine Head album which will be remembered for all eternity because it included the iconic Smoke on the Water. Nonetheless, Space Truckin‘ is, in my humble opinion, a far more propulsive tune than Space Oddity. Its infectious opening riff will instantly grab your attention while the rest of the song races away like a Saturn Rocket reaching for escape velocity. Furthermore, the musicianship on this recording is extraordinary. Pay close attention to Richie Blackmore’s scorching lead guitar and Ian Paice’s thundering drums. Come on, let’s go space truckin’!

2. These eight Subway Fold posts cover AI from a number of different perspectives involving a series of different applications and markets.

Facebook is Now Restricting Access to Certain Data About Its User Base to Third Parties

Image by Gerd Altmann

Image by Gerd Altmann

It is a simple and straight-forward basic business concept in any area of commerce: Do not become too overly reliant upon a single customer or supplier. Rather, try to build a diversified portfolio of business relationships to diligently avoid this possibility and, at the same time, assist in developing potential new business.

Starting in May 2015, Facebook instituted certain limits upon access to the valuable data about its 1.5 billion user base¹ to commercial and non-commercial third parties. This has caused serious disruption and even the end of operations for some of them who had so heavily depended on the social media giant’s data flow. Let’s see what happened.

This story was reported in a very informative and instructive article in the September 22, 2015 edition of The Wall Street Journal entitled Facebook’s Restrictions on User Data Cast a Long Shadow by Deepa Seetharaman and Elizabeth Dwoskin. (Subscription required.) If you have access to the WSJ.com, I highly recommend reading in its entirety. I will summarize and annotate it, and then pose some of my own third-party questions.

This change in Facebook’s policy has resulted in “dozen of startups” closing, changing their approach or being bought out. This has also affected political data consultants and independent researchers.

This is a significant shift in Facebook’s approach to sharing “one of the world’s richest sources of information on human relationships”. Dating back to 2007, CEO Mark Zuckerberg opened to access to Facebook’s “social graph” to outsiders. This included data points, among many others, about users’ friends, interests and “likes“.

However, the company recently changed this strategy due to users’ concerns about their data being shared with third parties without any notice. A spokeswoman from the company stated this is now being done in manner that is “more privacy protective”. This change has been implemented to thus give greater control to their user base.

Other social media leaders including LinkedIn and Twitter have likewise limited access, but Facebook’s move in this direction has been more controversial. (These 10 recent Subway Fold posts cover a variety of ways that data from Twitter is being mined, analyzed and applied.)

Examples of the applications that developers have built upon this data include requests to have friends join games, vote, and highlight a mutual friend of two people on a date. The reduction or loss of this data flow from Facebook will affect these and numerous other services previously dependent on it. As well, privacy experts have expressed their concern that this change might result in “more objectionable” data-mining practices.

Others view these new limits are a result of the company’s expansion and “emergence as the world’s largest social network”.

Facebook will provide data to outsiders about certain data types like birthdays. However, information about users’ friends is mostly not available. Some developers have expressed complaints about the process for requesting user data as well as results of “unexpected outcomes”.

These new restrictions have specifically affected the following Facebook-dependent websites in various ways:

  • The dating site Tinder asked Facebook about the new data policy shortly after it was announced because they were concerned that limiting data about relationships would impact their business. A compromise was eventually obtained but limited this site only to access to “photos and names of mutual friends”.
  • College Connect, an app that provided forms of social information and assistance to first-generation students, could not longer continue its operations when it lost access to Facebook’s data. (The site still remains online.)
  • An app called Jobs With Friends that connected job searchers with similar interests met a similar fate.
  • Social psychologist Benjamin Crosier was in the process of creating an app searching for connections “between social media activity and ills like drug addiction”. He is currently trying to save this project by requesting eight data types from Facebook.
  • An app used by President Obama’s 2012 re-election campaign was “also stymied” as a result. It was used to identify potential supporters and trying to get them to vote and encourage their friends on Facebook to vote or register to vote.²

Other companies are trying an alternative strategy to build their own social networks. For example, Yesgraph Inc. employs predictive analytics³ methodology to assist clients who run social apps in finding new users by data-mining, with the user base’s permission, through lists of email addresses and phone contacts.

My questions are as follows:

  • What are the best practices and policies for social networks to use to optimally balance the interests of data-dependent third parties and users’ privacy concerns? Do they vary from network to network or are they more likely applicable to all or most of them?
  • Are most social network users fully or even partially concerned about the privacy and safety of their personal data? If so, what practical steps can they take to protect themselves from unwanted access and usage of it?
  • For any given data-driven business, what is the threshold for over-reliance on a particular data supplier? How and when should their roster of data suppliers be further diversified in order to protect themselves from disruptions to their operations if one or more of them change their access policies?

 


1.   Speaking of interesting data, on Monday, August 24, 2015, for the first time ever in the history of the web, one billion users logged onto the same site, Facebook. For the details, see One Out of Every 7 People on Earth Used Facebook on Monday, by Alexei Oreskovic, posted on BusinessInsider.com on August 27, 2015.

2See the comprehensive report entitled A More Perfect Union by Sasha Issenberg in the December 2012 issue of MIT’s Technology Review about how this campaign made highly effective use of its data and social networks apps and data analytics in their winning 2012 re-election campaign.

3.  These seven Subway Fold posts cover predictive analytics applications in range of different fields.

Data Analysis and Visualizations of All U.S. Presidential State of the Union Addresses

"President Obama's State of the Union Address 2013", Word cloud image by Kurtis Garbutt

“President Obama’s State of the Union Address 2013”, Word cloud image by Kurtis Garbutt

While data analytics and visualization tools have accumulated a significant historical record of accomplishments, now, in turn, this technology is being applied to actual significant historical accomplishments. Let’s have a look.

Every year in January, the President of the United States gives the State of the Union speech before both houses of the U.S. Congress. This is to address the condition of the nation, his legislative agenda and other national priorities. The requirement for this presentation appears in Article II of the U.S. Constitution.

This talk with the nation has been given every year (with only one exception), since 1790. The resulting total of 224 speeches presents a remarkable and dynamic historical record of U.S. history and policy. Researchers at Columbia University and the University of Paris have recently applied sophisticated data analytics and visualization tools to this trove of presidential addresses. Their findings were published in the August 10, 2015 edition of the Proceedings of the National Academy of Sciences in a truly fascinating paper entitled Lexical Shifts, Substantive Changes, and Continuity in State of the Union Discourse, 1790–2014, by Alix Rule, Jean-Philippe Cointet, and Peter S. Bearman.

A very informative and concise summary of this paper was also posted in an article on Phys.org, also on August 10, 2015, entitled in a post entitled Big Data Analysis of State of the Union Remarks Changes View of American History, (no author is listed). I will summarize, annotate and post a few questions of my own. I highly recommend clicking through and reading the full report and the summary article together for a fuller perspective on this achievement. (Similar types of textual and graphical analyses of US law were covered in the May 15, 2015 Subway Fold post entitled Recent Visualization Projects Involving US Law and The Supreme Court.)

The researchers developed custom algorithms for their research. They were applied to the total number of words used in all of the addresses, from 1790 to 2014, of 1.8 million.  By identifying the frequencies of “how often words appear jointly” and “mapping their relation to other clusters of words”, the team was able to highlight “dominant social and political” issues and their relative historical time frames. (See Figure 1 at the bottom of Page 2 of the full report for this lexigraphical mapping.)

One of the researchers’ key findings was that although the topics of “industry, finance, and foreign policy” were predominant and persist throughout all of the addresses, following World War II the recurring keywords focus further upon “nation building, the regulation of business and the financing of public infrastructure”. While it is well know that these emergent terms were all about modern topics, the researchers were thus able to pinpoint the exact time frames when they first appeared. (See Page 5 of the full report for the graphic charting these data trends.)

Foreign Policy Patters

The year 1917 struck the researchers as a critical turning point because it represented a dramatic shift in the data containing words indicative of more modern times. This was the year that the US sent its troops into battle in Europe in WWI. It was then that new keywords in the State of the Union including “democracy,” “unity,” “peace” and “terror” started to appear and recur. Later, by the 1940’s, word clusters concerning the Navy appeared, possibly indicating emerging U.S. isolationism. However, they suddenly disappeared again as the U.S. became far more involved in world events.

Domestic Policy Patterns

Over time, the researchers identified changes in the terminology used when addressing domestic matters. These concerned the government’s size, economic regulation, and equal opportunity. Although the focus of the State of the Union speeches remained constant, new keywords appeared whereby “tax relief,” “incentives” and “welfare” have replaced “Treasury,” “amount” and “expenditures”.

An important issue facing this project was that during the more than two centuries being studied, keywords could substantially change in meaning over time. To address this, the researchers applied new network analysis methods developed by Jean-Philippe Cointet, a team member, co-author and physicist at the University of Paris. They were intended to identify changes whereby “some political topics morph into similar topics with common threads” as others fade away. (See Figure 3 at the bottom of Page 4 of the full paper for this enlightening graphic.*)

As a result, they were able to parse the relative meanings of words as they appear with each other and, on a more macro level, in the “context of evolving topics”. For example, it was discovered that the word “Constitution” was:

  • closely associated with the word “people” in early U.S. history
  • linked to “state” following the Civil War
  • linked to “law” during WWI and WWII, and
  • returned to “people” during the 1970’s

Thus, the meaning of “Constitution” must be assessed in its historical context.

My own questions are as follows:

  • Would this analytical approach yield new and original insights if other long-running historical records such as the Congressional Record were like subject to the research team’s algorithms and analytics?
  • Could companies and other commercial businesses derive any benefits from having their historical records similarly analyzed? For example, might it yield new insights and recommendations for corporate governance and information governance policies and procedures?
  • Could this methodology be used as an electronic discovery tool for litigators as they parse corporate documents produced during a case?

 


*  This is also resembles the methodology and appearance to the graphic on Page 29 of the law review article entitled A Quantitative Analysis of the Writing Style of the U.S. Supreme Court, by Keith Carlson, Michael A. Livermore, and Daniel Rockmore, Dated March 11, 2015, linked to and discussed with the May 15, 2015 Subway Fold post cited above.

Watson, is That You? Yes, and I’ve Just Demo-ed My Analytics Skills at IBM’s New York Office

IMAG0082

My photo of the entrance to IBM’s office at 590 Madison Avenue in New York, taken on July 29, 2015.

I don’t know if my heart can take this much excitement. Yesterday morning, on July 29, 2015, I attended a very compelling presentation and demo of IBM’s Watson technology. (This AI-driven platform has been previously covered in these five Subway Fold posts.) Just the night before, I saw I saw a demo of some ultra-cool new augmented reality systems.

These experiences combined to make me think of the evocative line from Supernaut by Black Sabbath with Ozzie belting out “I’ve seen the future and I’ve left it behind”. (Incidentally, this prehistoric metal classic also has, IMHO, one of the most infectious guitar riffs with near warp speed shredding ever recorded.)

Yesterday’s demo of Watson Analytics, one key component among several on the platform, was held at IBM’s office in the heart of midtown Manhattan at 590 Madison Avenue and 57th Street. The company very graciously put this on for free. All three IBM employees who spoke were outstanding in their mastery of the technology, enthusiasm for its capabilities, and informative Q&A interactions with the audience. Massive kudos to everyone involved at the company in making this happen. Thanks, too, for all of attendees who asked such excellent questions.

Here is my summary of the event:

Part 1: What is Watson Analytics?

The first two speakers began with a fundamental truth about all organizations today: They have significant quantities of data that are driving all operations. However, a bottleneck often occurs when business users understand this but do not have the technical skills to fully leverage it while, correspondingly, IT workers do not always understand the business context of the data. As a result, business users have avenues they can explore but not the best or most timely means to do so.

This is where Watson can be introduced because it can make these business users self-sufficient with an accessible, extensible and easier to use analytics platform. It is, as one the speakers said “self-service analytics in the cloud”. Thus, Watson’s constituents can be seen as follows:

  • “What” is how to discover and define business problems.
  • “Why” is to understand the existence and nature of these problems.
  • “How” is to share this process in order to affect change.

However, Watson is specifically not intended to be a replacement for IT in any way.

Also, one of Watson’s key capabilities is enabling users to pursue their questions by using a natural language dialog. This involves querying Watson with questions posed in ordinary spoken terms.

Part 2: A Real World Demo Using Airline Customer Data

Taken directly from the world of commerce, the IBM speakers presented a demo of Watson Analytics’ capabilities by using a hypothetical situation in the airline industry. This involved a business analyst in the marketing department for an airline who was given a compilation of market data prepared by a third-party vendor. The business analyst was then assigned by his manager with researching and planning how to reduce customer churn.

Next, by enlisting Watson Analytics for this project, the two central issues became how the data could be:

  • Better understand, leveraged and applied to increase customers’ positive opinions while simultaneously decreasing the defections to the airline’s competitors.
  • Comprehensively modeled in order to understand the elements of the customer base’s satisfaction, or lack thereof, with the airline’s services.

The speakers then put Watson Analytics through its paces up on large screens for the audience to observe and ask questions. The goal of this was to demonstrate how the business analyst could query Watson Analytics and, in turn, the system would provide alternative paths to explore the data in search of viable solutions.

Included among the variables that were dexterously tested and spun into enlightening interactive visualizations were:

  • Satisfaction levels by other peer airlines and the hypothetical Watson customer airline
  • Why customers are, and are not, satisfied with their travel experience
  • Airline “status” segments such as “platinum” level flyers who pay a premium for additional select services
  • Types of travel including for business and vacation
  • Other customer demographic points

This results of this exercise as they appeared onscreen showed how Watson could, with its unique architecture and tool set:

  • Generate “guided suggestions” using natural language dialogs
  • Identify and test all manner of connections among the population of data
  • Use predictive analytics to make business forecasts¹
  • Calculate a “data quality score” to assess the quality of the data upon which business decisions are based
  • Map out a wide variety of data dashboards and reports to view and continually test the data in an effort to “tell a story”
  • Integrate an extensible set of analytical and graphics tools to sift through large data sets from relevant Twitter streams²

Part 3: The Development Roadmap

The third and final IBM speaker outlined the following paths for Watson Analytics that are currently in beta stage development:

  • User engagement developers are working on an updated visual engine, increased connectivity and capabilities for mobile devices, and social media commentary.
  • Collaboration developers are working on accommodating work groups and administrators, and dashboards that can be filtered and distributed.
  • Data connector developers are working on new data linkages, improving the quality and shape of connections, and increasing the degrees of confidence in predictions. For example, a connection to weather data is underway that would be very helpful to the airline (among other industries), in the above hypothetical.
  • New analytics developers are working on new functionality for business forecasting, time series analyses, optimization, and social media analytics.

Everyone in the audience, judging by the numerous informal conversations that quickly formed in the follow-up networking session, left with much to consider about the potential applications of this technology.


1.  Please see these six Subway Fold posts covering predictive analytics in other markets.

2.  Please see these ten Subway Fold posts for a variety of other applications of Twitter analytics.

 

Twitter and Facebook are Rapidly Rising Across All Major US Demographic Groups as Primary News Platforms

"Media in Central Park New York City", Image by Ernst Moeksis

“Media in Central Park New York City”, Image by Ernst Moeksis

Cutting across five fundamental demographic segments, Twitter and Facebook are now the primary sources for news among the US population. This was the central finding of a new report issued on July 14, 2015 by the Pew Research Center for Journalism and Media entitled News Use on Facebook and Twitter Is on the Rise by Michael Barthel, Elisa Shearer, Jeffrey Gottfried and Amy Mitchell. The full text and supporting graphics appear in an 18-page PDF file on the Pew website is entitled The Evolving Role of News on Twitter and Facebook. I highly recommended clicking through to read the full report.

A number of concise summaries of it quickly appeared online. I found the one written by Joseph Lichtman on NeimanLab.com (a site about Internet journalism at Harvard University), entitled New Pew Data: More Americans are Getting News on Facebook and Twitter, also on July 14th to be an informative briefing on it. I will, well, try to sum up this summary, add some annotations and pose some questions.

First, for some initial perspective, on January 21, 2015, a Subway Fold Post entitled  The Transformation of News Distribution by Social Media Platforms in 2015, examined how the nature of news media was being dramatically impacted by social media. This new Pew Research Institute report focuses on the changing demographics of Facebook and Twitter users for news consumption.

This new study found that 63% of both Twitter and Facebook users are now getting their news from these leading social media platforms. As compared to a similar Pew survey in 2013, this is a 52% increase for Twitter and a 47% increase for Facebook. Of those following a live news event as it occurs, the split is more pronounced as 59% of Twitter users and 31% of Facebook users are engaged in viewing such coverage.

According to Amy Mitchell, one of the report’s authors and Pew’s Director of Journalism Research, each social media site “adapt to their role” and provide “unique features”. As well, they ways in which US users connect in different ways “have implications” for how they “learn about their world” and partake in their democracy.

In order enhance their growing commitment to live coverage, both sites have recently rolled out innovative new services. Twitter has a full-featured multimedia app called Project Lightening to facilitate following news in real-time. Facebook is likewise expanding its news operations with their recent announced of the launch of Instant Articles, a rapid news co-publishing app in cooperation with nine of the world’s leading news organizations.

Further parsing the survey’s demographic data for US adults generated the following findings:

  • Sources of News: 10% get their news on Twitter while 41% get their news on Facebook, with an overlap of 8% using both. This is also due to the fact that Facebook has a much larger user base than Twitter. Furthermore, while the total US user bases of both platforms currently remains steady, the percentages of those users therein seeking news on both is itself increasing.
  • Comparative Trends in Five Key Demographics: The very enlightening chart at the bottom of Page 2 of the report breaks down Twitter’s and Facebook’s percentages and percentage increases between 2013 and 2015 for gender, race, age, education level, and incomes.
  • Relative Importance of Platforms: These results are further qualified in that those surveyed reported that Americans still see both of these platforms overall as “secondary news sources” and “not a very important way” to stay current.
  • Age Groups: When age levels were added, this changes to nearly 50% of those between 18 and 35 years finding Twitter and Facebook to be “the most important” sources of news. Moving on to those over 35 years, the numbers declined to 34% of Facebook users and 31% of Twitter users responding that these platforms were among the “most important” news sources.
  • Content Types Sought and Engaged: Facebook users were more likely to click on political content than Twitter users to the extent of 32% to 25%, respectively. The revealing charts in the middle of Page 3 demonstrate that Twitter users see and pursue a wider variety of 11 key news topics. As well, the percentage tallies of gender differences by topic and by platform are also presented.

My own questions are as follows:

  • Might Twitter and Facebook benefit from additional cooperative ventures to further expand their comprehensiveness, target demographics, and enhanced data analytics for news categories by exploring additional projects with other organizations. For instance, and among many other possibilities, there are Dataminr who track and parse the entirety of the Twitterverse in real-time (as previously covered in these three Subway Fold posts); Quid who is tracking massive amount of online news (as previously covered in this Subway Fold post); and GDELT which is translating online news in real-time in 65 languages (as previously covered in this Subway Fold post).
  • What additional demographic categories would be helpful in future studies by Pew and other researchers as this market and its supporting technologies, particularly in an increasingly social and mobile web world, continue to evolve so quickly? For example, how might different online access speeds affect the distribution and audience segmentation of news distributed on social platforms?
  • Are these news consumption demographics limited only to Twitter and Facebook? For example, LinkedIn has gone to great lengths in the past few years to upgrade its content offerings. How might the results have differed if the Pew questionnaire had included LinkedIn and possibly others like Instagram?
  • How can this Pew study be used to improve the effectiveness of marketing and business development for news organizations for their sponsors, content strategist for their clients, and internal and external SEO professionals for their organizations?