IBM’s Watson is Now Data Mining TED Talks to Extract New Forms of Knowledge

"sydneytocairns_385", Image by Daniel Dimarco

“sydneytocairns_385”, Image by Daniel Dimarco

Who really benefited from the California Gold Rush of 1849? Was it the miners, only some of whom were successfully, or the merchants who sold them their equipment? Historians have differed as to the relative degree, but they largely believe it was the merchants.

Today, it seems we have somewhat of a modern analog to this in our very digital world: The gold rush of 2015 is populated by data miners and IBM is providing them with access to its innovative Watson technology in order for these contemporary prospectors to discover new forms of knowledge.

So then, what happens when Watson is deployed to sift through the thousands of incredibly original and inspiring videos of online TED Talks? Can the results be such that TED can really talk and, when processed by Watson, yield genuine knowledge with meaning and context?

Last week, the extraordinary results of this were on display at the four-day World of Watson exposition here in New York. A fascinating report on it entitled How IBM Watson Can Mine Knowledge from TED Talks by Jeffrey Coveyduc, Director, IBM Watson, and Emily McManus, Editor, TED.com was posted on the TED Blog on May 5, 2015. This was the same day that the newfangled Watson + TED system was introduced at the event. The story also includes a captivating video of a prior 2014 TED Talk by Dario Gil of IBM entitled Cognitive Systems and the Future of Expertise that came to play a critical role in launching this undertaking.

Let’s have a look and see what we can learn from the initial results. I will sum up and annotate this report, and then ask a few additional questions.

One of the key objectives of this new system is to enable users to query it in natural language. An example given in the article is “Will new innovations give me a longer life?”. Thus, users can ask questions about ideas expressed among the full database of TED talks and, for the results, view video excerpts where such ideas have been explored. Watson’s results are further accompanied by a “timeline” of related concepts contained in a particular video clip permitting users to “tunnel sideways” if they wish and explore other topics that are “contextually related”.

The rest of the article is a dialog between the project’s leaders Jeffrey Coveyduc from IBM and TED.com editor Emily McManus that took place at Watson World.  They discussed how this new idea was transformed into a “prototype” of a fresh new means to extract “insights” from within “unstructured video”.

Ms. McManus began by recounting how she had attended Mr. Dario’s TED Talk about cognitive computing. Her admiration of his presentation led her to wonder whether Watson could be applied to TED Talks’ full content whereby users would be able to pose their own questions to it in natural language. She asked Mr. Dario if this might be possible.

Mr. Coveyduc said that Mr. Dario then approached him to discuss the proposed project. They agreed that it was not just the content per se, but rather, that TED’s mission of spreading ideas was so compelling. Because one of Watson’s key objectives is to “extract knowledge” that’s meaningful to the user, it thus appeared to be “a great match”.

Ms. McManus mentioned that TED Talks maintains an application programming interface (API) to assist developers in accessing their nearly 2,000 videos and transcripts. She agreed to provide access to TED’s voluminous content to IBM. The company assembled its multidisciplinary project team in about eight weeks.

They began with no preconceptions as to where their efforts would lead. Mr. Coveyduc said they “needed the freedom to be creative”. They drew from a wide range of Watson’s existing technical services. In early iterations of their work they found that “ideas began to group themselves”. In turn, this led them to “new insights” within TED’s vast content base.

Ms. McManus recently received a call from Mr. Dario asking her to stop by his office in New York. He demo-ed the new system which had completely indexed the TED content. Moreover, he showed how it could display, according to her “a universe of concepts extracted” from the content’s core. Next, using the all important natural language capabilities to pose questions, they demonstrated how the results in the form of numerous short clips which, taken altogether, were compiling “a nuanced and complex answer to a big question”, as she described it.

Mr. Coveyduc believes this new system simplifies how users can inspect and inquire about “diverse expertise and viewpoints” expressed in video. He cited other potential areas of exploration such as broadcast journalism and online courses (also known as MOOCs*). Furthermore, the larger concept underlying this project is that Watson can distill the major “ideas and concepts” of each TED Talk and thus give users the knowledge they are seeking.

Going beyond Watson + TED’s accomplishments, he believes that video search remains quite challenging but this project demonstrates it can indeed be done. As a result, he thinks that mining such deep and wide knowledge within massive video libraries may turn into “a shared source of creativity and innovation”.

My questions are as follows:

  • What if Watson was similarly applied to the vast troves of video classes used by professionals to maintain their ongoing license certifications in, among others, law, medicine and accounting? Would new forms of potentially applicable and actionable knowledge emerge that would benefit these professionals as well as the consumers of their services? Rather than restricting Watson to processing the video classes of each profession separately, what might be the results of instead processing them together in various combinations and permutations?
  • What if Watson was configured to process the video repositories of today’s popular MOOC providers  such as Coursera or edX? The same as well for universities around the world who are putting their classes online. Their missions are more or less the same in enabling remote learning across the web in a multitude of subjects. The results could possibly hold new revelations about subjects that no one can presently discern.

Two other recent Subway Fold posts that can provide additional information, resources and questions that I suggest checking out include Artificial Intelligence Apps for Business are Approaching a Tipping Point posted on March 31, 2015, and Three New Perspectives on Whether Artificial Intelligence Threatens or Benefits the World posted on December 27, 2014.


*  See the September 18, 2014 Subway Fold post entitled A Real Class Act: Massive Open Online Courses (MOOCs) are Changing the Learning Process for the full details and some supporting links.

Mapping the Distribution of Mobile Device Operating Systems in New York

“Busy Times Square”, Image by Jim Larrison

Scott Galloway, a Clinical Professor of Marketing at NYU Stern School of Business, consultant and entrepreneur, recently gave a remarkable and captivating 15-minute presentation at this year’s Digital Life Design 15 (DLD15) Conference. This event was held in Munich on January 18 through 20, 2015. He examined the four most dominant global companies in the digital world and predicted those among them whose market values might  rise or fall. These included Amazon, Google, Apple and Facebook. Combined, their current market value is more than $1 trillion (yes, that’s trillion with a “t“).

The content and delivery of Professor’s Galloway’s talk is something that I think you will not soon forget. Whether his insights are in whole or in part correct, his talk will motivate you to think about  these four companies who, individually and as a group, exert such monumental economic, technical, commercial, and cultural influence across the entirety of the web. I highly recommend that you click-through and fully view this video.

Towards the end of his presentation, Professor Galloway clicked onto a rather astonishing slide of a heat map of New York City encoded with data points indicating mobile devices using Apple’s IoS, Android or Blackberry operating systems. This particular part of the presentation was covered in a most interesting article entitled Fun Maps: Heat Map of Mobile Operating Systems in NYC by Michelle Young on UntappedCities.com on March 31, 2015. The article adds three very informative additional graphics individually illuminated the spread of each OS. I will briefly recap this report, provide some links and annotations, and add a few comments of my own.

Professor Galloway interprets the results as indicating a correlation between each OS and the relative wealth of different neighborhoods in NYC: IoS devices are more prevalent in areas of higher incomes while Android appears more concentrated in lower income areas and suburbia.

However, Ms. Young believes this mapping is “misleading” and cites another article on UntappedCities.com entitled Beautiful Maps and the Lies They Tell, posted on February 20, 2014. This carefully refuted a series of data-mapped visualizations that were first published and interpreted as showing that only wealthier people used fitness apps.

Furthermore, there have been a series of Twitter posts in response to this heat map stating that the colors used for the heat map (red for IoS, green for Android and purple for Blackberry), might be misleading due to some optical blurring in the colors and geotagged tweets from 2011 to 2013. (X-ref to the March 20, 2015 Subway Fold post entitled Studies Link Social Media Data with Personality and Health Indicators, for other examples of geotagging.) In effect, there may be a structural bias whereby “If Twitter users tend to be on Apple products”.

The data and heat maps notwithstanding, as a New York City native and life-long resident, my own completely unscientific observations tell me that IoS and Android are more evenly split both in terms of absolute numbers and any correlation to the relative wealth of any given neighbor hood. The most obvious thing that jumped out at me was that each day millions of people commute all around the city, mostly into and around Manhattan. However,  this does not seem to have been taken into account. Thus, while User X’s mobile device may show him or her in a wealthier area of Manhattan, he or she might well live in, and commute from, another more working class neighborhood from a considerable distance away.

Rather than using such static heat maps, I would propose that a time-series of readings and data be taken continuously over a week or so. Next, I suggest applying some customized algorithms and analytics to smooth out, normalize and intuit the data. My instincts tell me that the results would indicate a much more homogenous mix of mobile OSes across all or most of the neighborhoods here.

Spectacular Views of New York, San Francisco and Las Vegas at Night from 7,500 Feet Up

file9301311905521

Image by kconnors

[This post was originally uploaded on January 14, 2015. It has been updated below with new information on March 19, 2015.]

Even as a lifelong New Yorker, I believe that each day always brings many new things to see and to learn about this great place. Indeed, no one can ever quite know it all or live everything it has to offer. Such vastness and diversity are two its many enduring charms.

I just experienced that sense of wonder on an even greater scale upon viewing nine extraordinary images that have been posted today (January 14, 2015) in a story on Mashable.com entitled What a Night in New York City Looks Like from 7,500 Feet by Max Knoblauch. This display and accompanying text is about the photos taken by Vincent Laforet, a Pultizer prize-winning photographer, from a helicopter at 7,500 feet above Manhattan on the night of November 8, 2014. These were taken as an assignment for Men’s Health magazine. He is quoted here about how he accomplished this and the challenges it posed. As also linked to within in the article is the full set of Laforet’s dazzling photos from this project on a site on Storehouse.com entitled Gotham 7.5K as well as a 3.5 minute video of how he does this high altitude urban photography. I, well, highly recommend clicking through and viewing both of these.

Also, I would just like to add a few bits of navigation to the photos as they appear on Mashable for those of you who are not familiar with New York:

Photo 1:  Broadway and Times Square looking east to west, in an ocean of LED signage everywhere. (For further information about the technology of this illumination see the August 11, 2014 Subway Fold post entitled Times Square’s Operating System.)

Photo 2: All of Manhattan looking north to south starting at Battery Park at the bottom center of the image. To the right are Brooklyn and Queens. To the left is New Jersey.

Photo 3: Midtown Manhattan from the Hudson river on the very left to the Est River on the very right. Broadway, again, is the very brightly lit street appearing diagonally from the upper middle left to the lower middle right. The brightly lit circular building to the middle left is Madison Square Garden.

Photo 4:  The new World Trade Center and to the right is the Wall Street area.

Photo 5:  The Brooklyn Bridge and the Manhattan Bridge spanning, not surprisingly, Brooklyn and Manhattan.

Photo 6:  Another view of Manhattan very similar to Photo 2, this time more of a southwest to northeast perspective. Notice also the Brooklyn and Manhattan Bridges from Photo 5 above, seen here in the middle right of the picture.

Photo 7:  Moving from top to bottom are the point further south in Manhattan where Broadway and Sixth Avenue intersect each other. The Empire State Building is to the middle right.

Photo 8:  Midtown.

Photo 9: A Reverse POV from Photos 2 and 6, this time going river to river from north to south. Central Park is the rectangular area in the lower middle right, the World Trade Center is in the upper middle area, and the bridges are off to the left. Brooklyn is to the left and New Jersey is to the right.

For another astonishing panoramic of New York from way up, please also see this cover of the March 17, 2014 issue of Time that was taken from the very top of the antenna on the World Trade Center and the accompanying story of how it was done.

March 19, 2015 Update:

Today’s (March 19, 2015) edition of The New York Times carried a very informative report with more detail about Vincent Laforet’s aerial photography, this time of San Francisco entitled Capturing The Night in Digital Photos, Spectacularly by Farhad Manjoo (the regular writer of the NYTimes’ always excellent, imho,  State of the Art column). It was accompanied by four of his remarkable photos of the City by the Bay from waaaay up high at night. I highly recommend clicking through for the full-text of this story and its eye-popping graphics. I will briefly summarize some of the extra information in this piece not covered in the Mashable.com story above.

Mr. Laforet has been able to capture New York, San Francisco and Las Vegas in his truly original nighttime photography because of the dramatic advances in the digital cameras and the software he uses such as Adobe Lightroom. To demonstrate the possibilities, he took Mr. Manjoo along for a photographic session from a helicopter over San Francisco. One of the images he took, the third of four in the article, makes this city appear as “an orange-and-blue microchip”.

When Mr. Laforet’s photo’s of New York were first published in Men’s Health, he was let down by the relative lack of response they received. However, when he uploaded the images to Storehouse.com (linked to above), they proceeded to go viral across the Web. This new link on Storehouse.com contains his photo galleries of New York, San Francisco and Las Vegas. I believe they will leave you in absolute wonder at their beauty.

Mr. Laforet has developed a series of technological and physical techniques in order to steady himself and his imagery under very challenging conditions. He also takes a large number of photos during each of his sky-bound photography adventures in order to capture numerous perspectives while employing a variety of cameras and lenses.

Massive amounts of kudos to Mr. Laforet as an artist doing truly original and imaginative work.

“I Quant NY” Blog Analyzes Public Data Sets Released by New York City

8025834548_a2eb6f2115_z

Image by Justin Brown

[This post was originally uploaded on October 24, 2014. It has been updated below with new information on February 3, 2015.]

Using large data sets that local government agencies in New York City have made available by virtue of the NYC Open Data program, a visiting college professor at Pratt Institute, statistician and blogger named Ben Wellington, has been taking a close quantitative look at some common aspects of everyday life here in the city. He was a guest on The Brian Lehrer Show on WNYC radio in New York on October 16, 2014 to discuss four of his recent posts on his I Quant NY blog presenting the results of several of his investigations and analyses. The nearly 13-minute podcast entitled We Quant NY: Stories From Data is absolutely fascinating as Wellington describes his subjects, results and supporting methodologies.

(X-ref to this Subway Fold post on April 9, 2014 post, in particular to the fourth book mentioned entitled Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia by Anthony M. Townsend about other endeavors like this. As well, an article entitled They’re Tracking When You Turn Off the Lights by Elizabeth Dwoskin was published in The Wall Street Journal on October 20, 2014 [subscription required] about current efforts by researchers in New York and elsewhere to place “municipal sensor networks” around the city to gather and study many other data sets about the how the city operations and its residents. Townsend is also quoted in this story.)

The posts and analytics that Mr. Wellington discussed on the radio and online included:

  • Why it is nearly impossible to purchase or refill a MetroCard to pay your transit fares in such an amount that it will have $0.00 left on it. There always seems to be some small amount left no matter what payment option you choose at the vending machines.This irks many of my fellow New Yorkers.
  • Fire hydrants that generate the most tickets for parking violations.
  • The gender difference among the customer base for the Citi Bike sharing program. That is, Citi Bike riders in midtown Manhattan tend to be more male while riders in Brooklyn tend to be more female. Why is this so?
  • Which building in Manhattan is the farthest from the subway. (In his October 23, 2014 blog post, Mr. Wellington has studied and found the residence in Brooklyn which is the farthest from the subway.)

I believe that Mr. Wellington’s efforts are to be admired and appreciated because his is helping us to learn more about how NYC really operates on a very granular level. This can potentially lead to improvements in municipal services and other areas he has explored on his blog such as affordable housing, restaurant chain cleanliness (based upon the data generated by the NYC’s inspection and letter grade rating system), and the water quality and safety of the local swimming areas. I hope that he continues his efforts and inspires others to follow in this citizen’s approach to using publicly available big data for everyone’s benefit.

February 3, 2015 Update:

How interesting could the subject of laundromats in New York possibly be? As it turns out, these washing/drying/folding establishments generate some very interesting data and analytics about the neighborhoods where they operate. Who knew? Let’s, well, press on and see.

A few weeks ago, after Brian Lehrer had guests on his show to discuss President Obama’s State of the Union Address and then New York Governor Andrew Cuomo’s State of the State Address, he then had a segment of his show where he asked callers about the state of the own streets. This was a truly hyper-local topic about a city with a great diversity neighborhoods across its five boroughs. One of the callers to the show from the Upper West Side of Manhattan called in to say that as a result of ongoing real estate development on her street, all of her local laundromats had gone out of  business.

As it turned out, Ben Wellington of the I Quant New York blog (above), heard this and went to work on an analysis to see what the city-wide data might indicate about this. He then returned as a guest on The Brian Lehrer Show on January 28, 2015, to discuss his findings. The podcast available on wnyc.org is entitled Following Up: Are Laundromats Disappearing? Mr. Wellington’s post on his I Quant NY blog, also posted on January 28th, is entitled Does Gentrification Cause a Reduction in Laundromats? I highly recommend clicking through and checking out both of them as remarkable examples of how a deeper look at some rather mundane urban data can produce such surprising results and insights about New York.

On the podcast, they were also joined by author and photographer Snorri Sturluson who wrote a book entitled Laundromat (PowerHouse Books, 2013), and later on by Brian Wallace who is the president of the Coin Laundry Association, a trade group. Mr. Sturluson’s book is a photo album sampling many of the hundreds of laundromats across the entire city. (All ten of its reviews on Amazon.com are for the full five stars.)

The ensuing discussion began with the fundamental question of whether the increased affluence and real estate development in a neighborhood directly leads to a decline in the number of local laundromats. As it turns out, a more nuanced and complicated relationship emerged from the geocoded data. In Mr. Wellington’s mapping the results indicate (as shown on both the podcast page and his blog post), that population density is more likely to be the main determinant of the concentration of laundromats. Affluence in each neighborhood is also a factor, but it should also be evaluated in conjunction with population density. The mapping also shows that certain neighborhoods in Queens such as Astoria and Jackson Heights, have the highest concentrations of Laundromats.

Callers to show raised other possible consideration such as whether there are higher numbers of recent college grads in an area, the emergence of online services that offer full laundry services including pickup and delivery, and even the social acceptability nowadays of going to a laundromat. Here are my follow-up questions:

  • Is population density in this analysis more particular to New York than other cities or, if similarly mapped elsewhere, would the distribution of its impact and statistically weighting appear to be similar in other comparably large cities?
  • What other types of businesses, government agencies, scientists and universities might be interested in these results and in testing such data in other locations?
  • Are there additional patterns of businesses that cluster around laundromats such as supermarkets or restaurants and, if so, how to whom might these data sets and analytics be useful?
  • Will the eternal mystery of where socks lost in the laundry go to ever be solved?

Digital.NYC Site Launches as a Comprehensive Resource for New York City Tech and Startups

Being a very proud native of New York City, I was thrilled to see an article on TechCrunch.com posted on October 1, 2014, whose title just about said it all with Digital.nyc Launches To Be The Hub For New York Tech by Jonathan Shieber. This announced the launch of a brand new site called Digital.NYC, a hub destination concerning nearly anything and everything about the thriving tech and startup markets here in The Big Apple. For anyone interested in startups, workspaces, incubators, jobs, investing, training, news and access to a gazillion other relevant resources, this is meant to be an essential must-click  resource.The site is the product of a cooperative venture by the City of New York, IBM and the venture capital firm Gust. For additional reporting, see also IBM Starts Online Hub for NYC Tech Firms posted the same day on usatoday.com, by Mike Snider.

I highly recommend a click-through and thorough perusal of this site for the remarkable depth and richness of its offerings, timeliness, and sense of excitement and vitality that threads throughout all of it pages. While the term “platform” is often overused to describe a program or site, I believe that Digital.NYC truly lives up to this term of art.

What also really slew me about this site was its elegant design and ease of navigation that belie its vastness. The site clearly evinces its designers’ and builders’ passion for the subject matter and incredible hard work they put into getting it all just right. What a daunting task they must have faced in trying to meld all of these content categories together in a layout that is so highly functional, navigable and engaging. Bravo! to everyone involved in making this happen.

Indeed, for me it passes the Man from Mars Test: If you just landed on Earth and started out knowing little or nothing about the tech market in NYC, some time spent with this site would handily start you on your way to assessing its massive dimensions, operations and opportunities. Alternatively, very savvy and veteran entrepreneurs, investors, programmers, web designers, students, venture capitalists, urban planners and others will likewise find much to learn and use here.

I Googled around a bit to see whether other cities had similar hub sites. My initial research shows that there is nothing else per se like Digital.NYC currently online. Please post a comment here or send me an email if you do know of any others out there and I will post them. However, In my online travels I did find a site called Entrepreneurial Insights that has compiled on a page entitled Startup Hubs a series of recently posted in-depth reports global startup hubs. These cities include Paris, Toronto, Boston, Mumbai, Rio de Janeiro, Bangkok, Istanbul, Singapore, Beijing, Tel Aviv, Barcelona, Berlin and New York.

New Study About Taxi Ride Sharing and Its Implications for the Emergence of the “Sharing Economy”

Adding one of the more compelling scientific studies to the ongoing and rapidly developing saga of urban car ride-sharing services, the September 2, 2014 edition of The New York Times published a summary and analysis of a study of what would happen, as the titles states, If 2 New Yorkers Shared a Cab … , by Kenneth Chang and Joshua A. Kirsch. In the findings’ simplest terms, there would be a 40% reduction on the cab fleet and corresponding improvements in traffic flows, energy consumption and the environment.

The author of this fascinating study are Steven Strogatz*, a mathematics professor at Cornell, whose team included Carlo Ratti of MIT. This article contains links to their recently published paper, an accompanying graphic of the data points overlaid upon a street map of NYC, and a link to a site they have set established enabling anyone to peruse a massive database of taxi ride info.

This article also expertly explores:

  • The scientific methods used to obtain these results, balanced against the reality of the fact that New Yorkers are very reluctant to voluntarily share cab rides
  • How the recent introductions here of Uber and Lyft are impacting the economics and dynamics of the city’s taxi industry
  • Whether and how the possible introduction of self-driving cars might affect the study’s findings
  • The concerns of a scientist who is skeptical of the study’s conclusions

The day following day, on September 3rd, Strogatz and Ratti were interviewed about their report on the Brian Lehrer Show** on WNYC in New York. They covered more of the details concerning their methods, conclusions and predictions. But what really enlivened this show were the live calls from the listeners with remarkable stories of their cab rides in NYC as passengers and from an actual driver as they related to the prospect and realities of ride sharing. I highly recommend this 23 minute podcast entitled Should We Start Sharing Taxis? for these reports from the front lines of this story.

For additional original perspectives, commentary and insights into the emergence of the new sharing economy that I found to be quite relevant to this story, I further recommend the following three articles that were published during same week:

Will this sharing trend gain further traction in other sectors of the service economy? If so, what sectors and job types might be sucsceptible? If not, is this just a trend that will quickly run its course or perhaps morph into something more enduring?

___________________________

* Professor Strogatz has written a number of highly acclaimed books on science and math. Ten years ago I had the great pleasure of reading one of them entitled Sync: How Order Emerges From Chaos In the Universe, Nature, and Daily Life (Hyperion, 2004). This is a strikingly original work about how synchrony emerges from within a wide diversity of biological and environmental systems. I found his writing to be highly engaging and accessible about what otherwise would appear to be a highly complex topic for a general audience. He has done a masterful job here of explaining the concepts and examples with great clarity. I highly recommend it for any reader looking for something entirely new and different.

** X-ref to the August 1, 2014 post here entitled Discussion re: Faster Web Service, Media Mergers and Net Neutrality about another interesting segment of this show, including a link to its podcast.

Times Square’s Operating System

I am a native New Yorker. I have always loved my hometown and taken great pride in being from here. I have seen this place at its best and at its worst and everywhere in between during my life. No matter what, whenever I see the city’s skyline from further away and when I am in the city itself, my own [I] Heart [NY] beat a little bit faster.

It was with great interest that I read a terrific article posted on July 31, 2014 on Gizmodo.com entitled How Times Square Works by Adam Clarke Estes. He reports in great detail how all of the massive LCD signage works. As any visitor to Times Square has seen, you are surrounded by a very sophisticated and extensive array of brilliantly colored and often animated displays for a multitude of products, places and entertainments. The author has done a masterful job of explaining how the underlying technologies operate and integrate, some of the tech and advertisers involved, the principals of their design and placement, and the massive coordination needed to keep everything in sync on a 24/7 basis. He also provides some very colorful history, facts and photos about the area and its modern symphony LCD displays. This piece is quite, well, enlightening for any tourists as well as NYC residents.

In any telling of the history of Times Square, what always emerges is the total transformation of the area since the early 1990’s. For many years prior to that, the area’s reputation was more for its crime, dirty streets and overall seediness. I had a first-hand view of this when, for several summers, I had a job in a music store (remember those?) right in the heart of this place. I had a great deal of fun working in the store but was always somewhat afraid venturing out on the streets whenever I arrived, had lunch or left.

Fortunately, through better planning and policies as well as the NYC’s rapid economic growth, this urban blight was excised and replaced with something much better in every possible way. It now lives up its global reputation as truly being the Crossroad of the World.