Semantic Scholar and BigDIVA: Two New Advanced Search Platforms Launched for Scientists and Historians

"The Chemistry of Inversin", Image by Raymond Bryson

“The Chemistry of Inversion”, Image by Raymond Bryson

As powerful, essential and ubiquitous as Google and its search engine peers are across the world right now, needs often arise in many fields and marketplaces for platforms that can perform much deeper and wider digital excavating. So it is that two new highly specialized search platforms have just come online specifically engineered, in these cases, for scientists and historians. Each is structurally and functionally quite different from the other but nonetheless is aimed at very specific professional user bases with advanced researching needs.

These new systems provide uniquely enhanced levels of context, understanding and visualization with their results. We recently looked at a very similar development in the legal professions in an August 18, 2015 Subway Fold post entitled New Startup’s Legal Research App is Driven by Watson’s AI Technology.

Let’s have a look at both of these latest innovations and their implications. To introduce them, I will summarize and annotate two articles about their introductions, and then I will pose some additional questions of my own.

Semantic Scholar Searches for New Knowledge in Scientific Papers

First, the Allen Institute for Artificial Intelligence (A2I) has just launched its new system called Semantic Scholar, freely accessible on the web. This event was covered on NewScientist.com in a fascinating article entitled AI Tool Scours All the Science on the Web to Find New Knowledge on November 2, 2015 by Mark Harris.

Semantic Scholar is supported by artificial intelligence (AI)¹ technology. It is automated to “read, digest and categorise findings” from approximately two million scientific papers published annually. Its main objective is to assist researchers with generating new ideas and “to identify previously overlooked connections and information”. Because of the of the overwhelming volume of the scientific papers published each year, which no individual scientist could possibly ever read, it offers an original architecture and high-speed manner to mine all of this content.

Oren Etzioni, the director of A2I, termed Semantic Scholar a “scientist’s apprentice”, to assist them in evaluating developments in their fields. For example, a medical researcher could query it about drug interactions in a certain patient cohort having diabetes. Users can also pose their inquiries in natural language format.

Semantic Scholar operates by executing the following functions:

  • crawling the web in search of “publicly available scientific papers”
  • scanning them into its database
  • identifying citations and references that, in turn, are assessed to determine those that are the most “influential or controversial”
  • extracting “key phrases” appearing similar papers, and
  • indexing “the datasets and methods” used

A2I is not alone in their objectives. Other similar initiatives include:

Semantic Scholar will gradually be applied to other fields such as “biology, physics and the remaining hard sciences”.

BigDIVA Searches and Visualized 1,500 Year of History

The second innovative search platform is called Big Data Infrastructure Visualization Application (BigDIVA). The details about its development, operation and goals were covered in a most interesting report posted online on  NC State News on October 12, 2015 entitled Online Tool Aims to Help Researchers Sift Through 15 Centuries of Data by Matt Shipman.

This is joint project by the digital humanities scholars at NC State University and Texas A&M University. Its objective is to assist researchers in, among other fields, literature, religion, art and world history. This is done by increasing the speed and accuracy of searching through “hundreds of thousands of archives and articles” covering 450 A.D. to the present. BigDIVA was formally rolled out at NC State on October 16, 2015.

BigDIVA presents users with an entirely new visual interface, enabling them to search and review “historical documents, images of art and artifacts, and any scholarship associated” with them. Search results, organized by categories of digital resources, are displayed in infographic format4. The linked NC State News article includes a photo of this dynamic looking interface.

This system is still undergoing beta testing and further refinement by its development team. Expansion of its resources on additional historical periods is expected to be an ongoing process. Current plans are to make this system available on a subscription basis to libraries and universities.

My Questions

  • Might the IBM Watson, Semantic Scholar, DARPA and BigDIVA development teams benefit from sharing design and technical resources? Would scientists, doctors, scholars and others benefit from multi-disciplinary teams working together on future upgrades and perhaps even new platforms and interface standards?
  • What other professional, academic, scientific, commercial, entertainment and governmental fields would benefit from these highly specialized search platforms?
  • Would Google, Bing, Yahoo and other commercial search engines benefit from participating with the developers in these projects?
  • Would proprietary enterprise search vendors likewise benefit from similar joint ventures with the types of teams described above?
  • What entrepreneurial opportunities might arise for vendors, developers, designers and consultants who could provide fuller insight and support for developing customized search platforms?

 


October 19, 2017 Update: For the latest progress and applications of the Semantic Scholar system, see the latest report in a new post on the Economist.com entitled A Better Way to Search Through Scientific Papers, dated October 19, 2017.


1.  These 11 Subway Fold posts cover various AI applications and developments.

2.  These seven Subway Fold posts cover a range of IBM Watson applications and markets.

3A new history of DARPA written by Annie Jacobsen was recently published entitled The Pentagon’s Brain (Little Brown and Company, 2015).

4.  See this January 30, 2015 Subway Fold post entitled Timely Resources for Studying and Producing Infographics on this topic.

Timely Resources for Studying and Producing Infographics

Image by Nicho Design

Image by Nicho Design

[This post was originally uploaded on October 21, 2014. It has been updated below with new information on January 30, 2015.]

Infographics seem to be appearing in a steadily increasing frequency in many online and print publications. Collectively they are an expressive informational phenomenon where art and data science intersect to produce often strikingly original and informative results. In two previous Subway Fold posts concerning new visual perspectives and covering user data about LinkedIn, I highlighted two examples that struck me as being particularly effective in transforming complex data sets into clear and convincing visual displays.

Recently, I have come across the following resources about inforgraphics I believe are worth exploring:

  • A new book entitled Infographics Designers’ Sketchbooks by authors Steven Heller and Rick Landers is being published today, October 14, 2014, by Princeton Architectural Press. An advanced review, including quotes from the authors, was posted on October 7, 2014 entitled A Behind-the-Scenes Look at How Infographics Are Made on Wired.com by Liz Stinson. To quickly recap this article, the book compiles a multitude of resources, sketches, how-to’s, best practices guidelines, and insights from more than 200 designers of infographics. Based upon the writer’s description, there is much value and motivation to be had within these pages to learn and put to good use the aesthetic and explanatory powers of infographics.
  • DailyInfographic.com provides thousands of exceptional examples of infographics, true to its name updated daily, that are valuable for both the information they present and, moreover, the inspiration they provide to consider trying to design and prepare your own for your online and print efforts. This page on Wikipedia provides an excellent exploration of the evolution and effectiveness of infographics.
  • Edward Tufte is considered to be one of the foremost experts in the visual presentation of data and information and I highly recommend checking out his link rich biography and bibliography page on Wikipedia and more of his work and other offerings on his own site edwardtufte.com.
  • October 15, 2014 UPDATE:  Yesterday, soon after I added this post, I read about the publication of another compilation of the year’s best in this field in US entitled The Best American Infographics 2014 by Gareth Cook (Houghton Mifflin Harcourt). This appeared in an article about the publication of this new book on Scientific American.com in a post there entitled SA Recognized for Great Infographics  by Jen Christiansen. This collection includes two outstanding infographics that have recently appeared in  Scientific American about the locations of wild bees and the increasing levels of caffeine in various drinks, both of which are reproduced on this page. (One location where I would not like to, well, bee, is where these two topics intersect to produce over-caffeinated wild bees. Run!)

Please post any comments here to share examples of infographics that have impressed you or impacted your understanding of particular concepts and information.

January 30, 2015 Update:

Consisely Getting to the heart of succeeding with this web-ubiquitous form of visual display of information is a very practical new column by Sarah Quinn entitled What Makes a Great Infographic? , posted on January 28, 2015, on SocialMediaToday.com.  I highly recommend clicking through and reading it in full for all of its valuable details. I believe it is a timely addition to anyone’s infographic toolkit.

I will briefly sum up, annotate and add some comments to Ms. Quinn’s five elements to get an infographic to potential greatness. (The anagram I have come up to help commit these points to memory by using their first letters is: Try make your effort a GooD ACT):

1.  A Targeted Audience:  Research your audience well so that your infographic becomes a must share for them. As a part of this, focus upon what problem they may have that you can solve for them and use the infographic to provide solutions to it. Further, establish a persona define the ideal audience you intend to reach and then address them. (Personas are often the cornerstones of marketing and content strategy campaigns.)

2.  A Compelling Theme: Your infographic depicts “your story’ and must strongly relate with your brand’s identity.  The representative sample used in this article is entitled “Food Safety at the Grill” which does an effective job of guiding and educating the reader while simultaneously representing the infographic author’s brand.

3.  Actionable DataThis should be thoroughly researched and the numbers threaded throughout the graphical display. In effect, the data should support the solution and/or brand you are presenting.

4.  Awesome GraphicsQuite simply, it must be aesthetically pleasing while presenting the message. Indeed, the graphics’ quality will form an effective narrative. If you are outsourcing this, Ms. Quinn provides seven helpful guidelines to help instruct the graphics contractor.

5.  Powerful Copy:  This is just as important as the display and should include “powerful headlines” is presenting your message. As with the targeted audience in 1. above, so to should the text be compelling enough so that readers will be motivated to share the infographic with others.

Minting New Big Data Types and Analytics for Investors

Along with the exponential growth of big data in terms of its quantity, myriad of collection points, nearly limitless storage capabilities, and complex analytics, investors are keenly interested in discovering unique advantages from this phenomenon to be applied in the securities markets.* While financial institutions of all types have used sophisticated metrics and predictions to gain tactical advantages in their trading operations for many decades, burgeoning big data methodologies have recently created new opportunities for entrepreneurs to provide the financial services industry with ever more original and arcane forms of predictive analytics.

Investors now have data services available to them offering insights never previously feasible or even imaginable. Yesterday’s (November 21, 2014) edition of The Wall Street Journal carried a fascinating report highlighting three of these operations entitled Startups Tip Investors to Hidden Data Pearls by Bradley Hope. (A subscription to the WSJ Online is required for full access to this report on WSJ.com, but this piece was available here in slightly different version on CBS’s Marketwatch.com.) This additional extract page from the article is also available online and contains explanatory graphics of their formats and analyses.

How are these new data points being mined, examined and spun into forecasts? To briefly sum up the work of these startups covered in this article:

  • Orbital Insight analyzes satellite photos of building sites in 30 cities in China, cornfields, and parking lots in order to assess how their capacities might influence the markets in various ways. They are seeking to intuit “early indicators” of trends and influences. Their clients include hedge funds.
  • Dataminr sifts through a half a billion daily tweets in order to spot potential market moving trends ahead of the news services.** The link above to the graphics from the WSJ article contains a very effective infographic on this process.*** The company’s proprietary systems categorize and analyze all tweets in real time, discerns potentially useful patterns, and then distributes the results to their clients.
  • Premise Inc. uses a global system, now in 18 countries, that provides cell phone credits as payments to individuals who monitor the prices of various goods. From this input, the company tracks early inflation rates and other economic data. They believe that their data can differ from official government sources.

I recommend reading this story in full for all of its compelling details.

My follow up questions include:

  • Who watches these watchmen? Will market forces determine which of them are producing valid and actionable collection and analytics or should they somehow be subject to regulatory oversight?
  • Because these data types and analytics are so new, how are these companies and others like them addressing the distinctions between correlation and causation in their reports to their clients? Would it be beneficial for them to form a trade association to address this and other issues that might arise in the future for this nascent industry?
  • Are there entrepreneurial opportunities here for another type of new startups to vet the practices and products of such companies? That is, analysts who produce no new data types themselves, but rather, apply existing and, perhaps develop new, analytical tools for such assessments?
  • What other fields, markets and professions might benefit from this trend to discover and assess new data types in addition to finance?

_____________________________

*    Please see this April 9, 2014 Subway Fold post entitled Roundup of Some Recent Books on Big Data, Analytics and Intelligent Systems.

**   Please see this July 31, 2014 Subway Fold post entitled New Analytical Twitter Traffic Report on US TV Shows During the 2013 – 2014 Season.

***  Please see this January 30, 2015 Subway Fold post entitled Timely Resources for Studying and Producing Infographics.