Mapping the Distribution of Mobile Device Operating Systems in New York

“Busy Times Square”, Image by Jim Larrison

Scott Galloway, a Clinical Professor of Marketing at NYU Stern School of Business, consultant and entrepreneur, recently gave a remarkable and captivating 15-minute presentation at this year’s Digital Life Design 15 (DLD15) Conference. This event was held in Munich on January 18 through 20, 2015. He examined the four most dominant global companies in the digital world and predicted those among them whose market values might  rise or fall. These included Amazon, Google, Apple and Facebook. Combined, their current market value is more than $1 trillion (yes, that’s trillion with a “t“).

The content and delivery of Professor’s Galloway’s talk is something that I think you will not soon forget. Whether his insights are in whole or in part correct, his talk will motivate you to think about  these four companies who, individually and as a group, exert such monumental economic, technical, commercial, and cultural influence across the entirety of the web. I highly recommend that you click-through and fully view this video.

Towards the end of his presentation, Professor Galloway clicked onto a rather astonishing slide of a heat map of New York City encoded with data points indicating mobile devices using Apple’s IoS, Android or Blackberry operating systems. This particular part of the presentation was covered in a most interesting article entitled Fun Maps: Heat Map of Mobile Operating Systems in NYC by Michelle Young on UntappedCities.com on March 31, 2015. The article adds three very informative additional graphics individually illuminated the spread of each OS. I will briefly recap this report, provide some links and annotations, and add a few comments of my own.

Professor Galloway interprets the results as indicating a correlation between each OS and the relative wealth of different neighborhoods in NYC: IoS devices are more prevalent in areas of higher incomes while Android appears more concentrated in lower income areas and suburbia.

However, Ms. Young believes this mapping is “misleading” and cites another article on UntappedCities.com entitled Beautiful Maps and the Lies They Tell, posted on February 20, 2014. This carefully refuted a series of data-mapped visualizations that were first published and interpreted as showing that only wealthier people used fitness apps.

Furthermore, there have been a series of Twitter posts in response to this heat map stating that the colors used for the heat map (red for IoS, green for Android and purple for Blackberry), might be misleading due to some optical blurring in the colors and geotagged tweets from 2011 to 2013. (X-ref to the March 20, 2015 Subway Fold post entitled Studies Link Social Media Data with Personality and Health Indicators, for other examples of geotagging.) In effect, there may be a structural bias whereby “If Twitter users tend to be on Apple products”.

The data and heat maps notwithstanding, as a New York City native and life-long resident, my own completely unscientific observations tell me that IoS and Android are more evenly split both in terms of absolute numbers and any correlation to the relative wealth of any given neighbor hood. The most obvious thing that jumped out at me was that each day millions of people commute all around the city, mostly into and around Manhattan. However,  this does not seem to have been taken into account. Thus, while User X’s mobile device may show him or her in a wealthier area of Manhattan, he or she might well live in, and commute from, another more working class neighborhood from a considerable distance away.

Rather than using such static heat maps, I would propose that a time-series of readings and data be taken continuously over a week or so. Next, I suggest applying some customized algorithms and analytics to smooth out, normalize and intuit the data. My instincts tell me that the results would indicate a much more homogenous mix of mobile OSes across all or most of the neighborhoods here.

Studies Link Social Media Data with Personality and Health Indicators

twitter-292994_1280[This post was originally uploaded on January 27, 2015. It has been updated below with new information on March 20, 2015 and February 26, 2018.]

Reports of two new studies were issued recently describing meaningful connections between the predictive value of Facebook Likes and personality types, and next the parsing of language in Tweets to forecast the likelihood of heart disease. This presents us with an opportunity to examine two highly similar human health indicators that were identified by sophisticated analytics applied to massive troves of data generated by two of the world’s leading social media platforms. Where is all of this leading and what issues arise as a result? I will first summarize some parts of these two reports, add some links and annotations, and then pose some questions. I also highly recommend clicking through for a full read of both of pieces.

The first report was posted on NewScientist.com on January 12, 2015 with the concise title of What You ‘Like’ on Facebook Gives Away Your Personality by Hal Hodson. According to this article, researchers working at Stanford University and Cambridge University have developed an algorithm that, based completely upon what people “Like” on Facebook, can be determinative of a user’s personality. The data for this was gathered in a survey of 86,000 people who filled out personality questionnaires that were then matched against their activity on Facebook. Indeed, the results showed that this new method was more accurate than the determinations of the test subjects’ family and friends.

These characteristics are called the Big Five personality traits and include (as explored in detail in the preceding Wikipedia link):

  • Openness to experience
  • Conscientiousness
  • Extraversion
  • Agreeableness
  • Neuroticism

The article includes comments from David Funder of the University of California, Riverside, who is a researcher on personality, that while this study is “impressive”, it still does not provide a truly deep understanding of an individual’s personality. Funder’s work looks at 100 dimensions, a far larger number than the researchers in the Facebook study who focused upon the Big Five.

Nonetheless, two of these researchers on this new study, Youyou Wu  of Cambridge and Michael Kosinski of Stanford, believe their work is applicable on a global scale and applied in several areas. For instance,  they foresee their new Like algorithm could be used to in hiring operations to search large data files of candidates and identify those who might be most suitable for a particular job. Other possibilities include health and education. Kosinski also acknowledges that this approach would further require appropriate policy and technology considerations in order to address issues such its potential invasiveness.

(In a similar application Facebook Likes and other data from social media sites, universities in the US are now using such information and analytics to locate and pitch to alumni as potential donors as reported in a most interesting article in the January 25, 2015 edition of The New York Times entitled Your College May Be Banking on Your Facebook Likes, by Natasha Singer. Among other things, this story reports on the work and methods of two startups in this area called EverTrue and Graduway.)

The second report linking social media data to a health indicator was Scientists Say Tweets Predict Heart Disease and Community Health by Derrick Harris posted on Gigaom.com on January 22, 2015. In a study authored by researchers at the University of Pennsylvania, as part of their Well-Being Project, entitled Psychological Language on Twitter Predicts County-Level Heart Disease Mortality, they concluded that the vocabulary use by individuals in their Tweets can  predict “the rate of heart disease deaths in the counties where they live”. This phenomenon manifests itself by showing that Tweets concerning more upbeat topics and expressed in more positive terms correlated with lower mortality rates when compared to rates reported by the Center for Disease Control (CDC). Conversely, mortality rates were higher in areas “with angry language about negative topics”.

The accompanying side-by-said graphics of the Twitter data and the CDC data covering the upper right quarter of the US states and their constituent 1,300 counties, dramatically illustrates these findings. The pool of data was drawn from 148 million Tweets with geotags.

These results also provide further support for the accuracy and predictive validity of data from Twitter, notwithstanding any “inherent geographical biases”, and exceeding that of more “traditional polls or surveys”. Indeed, language in Tweets turns out to have a comparatively higher predictive value than other economic or health-related data. The researchers further believe that their findings might be more helpful when applied to “community-scale policies or interventions” rather than to assisting specific people.

My follow-up questions include:

  • Would mapping a statistically significant number of Twitter networks in counties with higher and/or lower mortality rates, a process described in the February 5, 2015 Subway Fold post entitled Visualization, Interpretation and Inspiration from Mapping Twitter Networks, provide additional insights that would be helpful to medical professionals and local policy planners? For example, are many of the negative Twitter posters in each other’s networks such that they become self-reinforcing? Are there recognizable network effects occurring that can somehow be corrected with regards to the degree of negativity and, in turn, public health? Would this pose any legal, policy or privacy issues?
  • For both of these articles, do these types of findings require more rigorous and wider-scale mathematical and scientific analysis before applying them to such critically important mental and physical health matters? If so, should such testing be done by public or private institutions, universities and/or the government agencies?
  • As first expressed in this November 22, 2014 Subway Fold post entitled Minting New Big Data Types and Analytics for Investors, how are the differences in correlation and causation being factored into these studies? Given the skepticism expressed above about Facebook Likes being so indicative about personality, are there other effects and influences that need to be identified and filtered out of these types of conclusions?
  • If the usage and analysis of social media data continues to grow in areas, well, like employment, education and health, what protections, if any, should people be given, by law and/or the social media companies, to protect themselves or opt out in advance of any potentially negative consequences?

March 20, 2015 Update:

Providing some very worthwhile additional insight and analysis of the University of Pennsylvania study covered in the initial post above, Maria Konnikova has written a very engaging article entitled What Your Tweets Say About You that was posted on The New Yorker website on March 17, 2015. I highly recommend clicking through and reading the entire text. I will sum up just some of the key points, add some links and pose several  additional questions.

The research study (linked to above), was conducted by a team led by psychologist and Professor Johannes Eichstaedt. Their main conclusion was that the collection and subsequent linguistic analysis of tweets proved to be validly predictive of locations with higher concentrations of fatalities from cardiovascular disease. The inverse was also true that geographic clusters of tweets with more positive content had lower death rates from the same cause. It was not that the population tweeting had heart disease, but rather, there is a discernible correlation between angrier content and a higher incidence of the heart disease within an area.

This “correlation is especially strange” due to the fact that Twitter users are generally younger that individuals who perish from heart ailments. Citing a January 9, 2015 study from the Pew Research Center entitled Demographics of Key Social Networking Platforms (also, imho, well worth a click-through and full reading), which, among other things tabulates the ages of the users of all of the leading social media platforms. Just 22% of US Twitter users are more than 50 years old. However, the relative risk of heart disease does not begin to rise until decades later.

How, then, to analytically connect younger people in a particular area who are posting negative tweets with their older neighbors who face higher chances of developing heart disease? The researchers theorize that the tweets “may be a window into the aggregated and powerful effects if the community context”. The overall health of people living in a particular area that is “poorer, more fragmented” and not as healthy as those residing in “richer, integrated ones”. As a result, the angrier tweets of someone in their twenties are likely reflective of an area with higher life stressors that, in turn, later result in more heart-related deaths.

Nonetheless, another renowned expert in this field of linguistic analysis of text, James Pennebaker, recommended caution in drawing any connection based upon this data. He urges further study of the data and posing additional questions about causation. Currently, in his own work, he is examining Twitter data to see how family and religious factors evolve.

There is also value in studying social media content of individuals. For example, Microsoft has previously studied 70,000 tweets of people with depression and then used this data to construct a “predictive index” to identify “other users who were likely depressed based on their social-media posts”.

Eisenstaedt’s team is continuing their work by looking at Twitter data for individuals and communities over time periods, rather than a “snapshot” data set. They are also adding Facebook profiles to their work.

Finally, Pennebaker believes that social media may also generate positive effects on mental health based on his previous studies on the benefits of keeping a personal journal. This may be so despite the private nature of a journal and the very public access of social media and its interactivity.

My additional questions are as follows:

  • Will additional discreet language patterns be discovered and validated that will indicate concentrations of other medical conditions within communities? Are we only at the beginning of using textual analysis of tweets as a metric of the states of local health?
  • Given that there is a lag time of years between negative tweets and the appearance of heart disease, should interventions be undertaken within a community at higher risk and, if so, by whom and at what cost?
  • Are other negative online behaviors such as cyberbullying indicative of some form of identifiable illness that can be treated on a community-wide basis or must this be dealt with on an individual in a case-by-case manner?

February 26, 2018 Update: Using social media activity data to diagnose and treat possible health conditions has advanced in a number of new systems and studies as reported in today’s New York Times in an article entitled How Companies Scour Our Digital Lives for Clues to Our Health, by Natasha Singer, dated February 26, 2018.