Facebook is Now Restricting Access to Certain Data About Its User Base to Third Parties

Image by Gerd Altmann

Image by Gerd Altmann

It is a simple and straight-forward basic business concept in any area of commerce: Do not become too overly reliant upon a single customer or supplier. Rather, try to build a diversified portfolio of business relationships to diligently avoid this possibility and, at the same time, assist in developing potential new business.

Starting in May 2015, Facebook instituted certain limits upon access to the valuable data about its 1.5 billion user base¹ to commercial and non-commercial third parties. This has caused serious disruption and even the end of operations for some of them who had so heavily depended on the social media giant’s data flow. Let’s see what happened.

This story was reported in a very informative and instructive article in the September 22, 2015 edition of The Wall Street Journal entitled Facebook’s Restrictions on User Data Cast a Long Shadow by Deepa Seetharaman and Elizabeth Dwoskin. (Subscription required.) If you have access to the WSJ.com, I highly recommend reading in its entirety. I will summarize and annotate it, and then pose some of my own third-party questions.

This change in Facebook’s policy has resulted in “dozen of startups” closing, changing their approach or being bought out. This has also affected political data consultants and independent researchers.

This is a significant shift in Facebook’s approach to sharing “one of the world’s richest sources of information on human relationships”. Dating back to 2007, CEO Mark Zuckerberg opened to access to Facebook’s “social graph” to outsiders. This included data points, among many others, about users’ friends, interests and “likes“.

However, the company recently changed this strategy due to users’ concerns about their data being shared with third parties without any notice. A spokeswoman from the company stated this is now being done in manner that is “more privacy protective”. This change has been implemented to thus give greater control to their user base.

Other social media leaders including LinkedIn and Twitter have likewise limited access, but Facebook’s move in this direction has been more controversial. (These 10 recent Subway Fold posts cover a variety of ways that data from Twitter is being mined, analyzed and applied.)

Examples of the applications that developers have built upon this data include requests to have friends join games, vote, and highlight a mutual friend of two people on a date. The reduction or loss of this data flow from Facebook will affect these and numerous other services previously dependent on it. As well, privacy experts have expressed their concern that this change might result in “more objectionable” data-mining practices.

Others view these new limits are a result of the company’s expansion and “emergence as the world’s largest social network”.

Facebook will provide data to outsiders about certain data types like birthdays. However, information about users’ friends is mostly not available. Some developers have expressed complaints about the process for requesting user data as well as results of “unexpected outcomes”.

These new restrictions have specifically affected the following Facebook-dependent websites in various ways:

  • The dating site Tinder asked Facebook about the new data policy shortly after it was announced because they were concerned that limiting data about relationships would impact their business. A compromise was eventually obtained but limited this site only to access to “photos and names of mutual friends”.
  • College Connect, an app that provided forms of social information and assistance to first-generation students, could not longer continue its operations when it lost access to Facebook’s data. (The site still remains online.)
  • An app called Jobs With Friends that connected job searchers with similar interests met a similar fate.
  • Social psychologist Benjamin Crosier was in the process of creating an app searching for connections “between social media activity and ills like drug addiction”. He is currently trying to save this project by requesting eight data types from Facebook.
  • An app used by President Obama’s 2012 re-election campaign was “also stymied” as a result. It was used to identify potential supporters and trying to get them to vote and encourage their friends on Facebook to vote or register to vote.²

Other companies are trying an alternative strategy to build their own social networks. For example, Yesgraph Inc. employs predictive analytics³ methodology to assist clients who run social apps in finding new users by data-mining, with the user base’s permission, through lists of email addresses and phone contacts.

My questions are as follows:

  • What are the best practices and policies for social networks to use to optimally balance the interests of data-dependent third parties and users’ privacy concerns? Do they vary from network to network or are they more likely applicable to all or most of them?
  • Are most social network users fully or even partially concerned about the privacy and safety of their personal data? If so, what practical steps can they take to protect themselves from unwanted access and usage of it?
  • For any given data-driven business, what is the threshold for over-reliance on a particular data supplier? How and when should their roster of data suppliers be further diversified in order to protect themselves from disruptions to their operations if one or more of them change their access policies?

 


1.   Speaking of interesting data, on Monday, August 24, 2015, for the first time ever in the history of the web, one billion users logged onto the same site, Facebook. For the details, see One Out of Every 7 People on Earth Used Facebook on Monday, by Alexei Oreskovic, posted on BusinessInsider.com on August 27, 2015.

2See the comprehensive report entitled A More Perfect Union by Sasha Issenberg in the December 2012 issue of MIT’s Technology Review about how this campaign made highly effective use of its data and social networks apps and data analytics in their winning 2012 re-election campaign.

3.  These seven Subway Fold posts cover predictive analytics applications in range of different fields.

Studies Link Social Media Data with Personality and Health Indicators

twitter-292994_1280[This post was originally uploaded on January 27, 2015. It has been updated below with new information on March 20, 2015.]

Reports of two new studies were issued recently describing meaningful connections between the predictive value of Facebook Likes and personality types, and next the parsing of language in Tweets to forecast the likelihood of heart disease. This presents us with an opportunity to examine two highly similar human health indicators that were identified by sophisticated analytics applied to massive troves of data generated by two of the world’s leading social media platforms. Where is all of this leading and what issues arise as a result? I will first summarize some parts of these two reports, add some links and annotations, and then pose some questions. I also highly recommend clicking through for a full read of both of pieces.

The first report was posted on NewScientist.com on January 12, 2015 with the concise title of What You ‘Like’ on Facebook Gives Away Your Personality by Hal Hodson. According to this article, researchers working at Stanford University and Cambridge University have developed an algorithm that, based completely upon what people “Like” on Facebook, can be determinative of a user’s personality. The data for this was gathered in a survey of 86,000 people who filled out personality questionnaires that were then matched against their activity on Facebook. Indeed, the results showed that this new method was more accurate than the determinations of the test subjects’ family and friends.

These characteristics are called the Big Five personality traits and include (as explored in detail in the preceding Wikipedia link):

  • Openness to experience
  • Conscientiousness
  • Extraversion
  • Agreeableness
  • Neuroticism

The article includes comments from David Funder of the University of California, Riverside, who is a researcher on personality, that while this study is “impressive”, it still does not provide a truly deep understanding of an individual’s personality. Funder’s work looks at 100 dimensions, a far larger number than the researchers in the Facebook study who focused upon the Big Five.

Nonetheless, two of these researchers on this new study, Youyou Wu  of Cambridge and Michael Kosinski of Stanford, believe their work is applicable on a global scale and applied in several areas. For instance,  they foresee their new Like algorithm could be used to in hiring operations to search large data files of candidates and identify those who might be most suitable for a particular job. Other possibilities include health and education. Kosinski also acknowledges that this approach would further require appropriate policy and technology considerations in order to address issues such its potential invasiveness.

(In a similar application Facebook Likes and other data from social media sites, universities in the US are now using such information and analytics to locate and pitch to alumni as potential donors as reported in a most interesting article in the January 25, 2015 edition of The New York Times entitled Your College May Be Banking on Your Facebook Likes, by Natasha Singer. Among other things, this story reports on the work and methods of two startups in this area called EverTrue and Graduway.)

The second report linking social media data to a health indicator was Scientists Say Tweets Predict Heart Disease and Community Health by Derrick Harris posted on Gigaom.com on January 22, 2015. In a study authored by researchers at the University of Pennsylvania, as part of their Well-Being Project, entitled Psychological Language on Twitter Predicts County-Level Heart Disease Mortality, they concluded that the vocabulary use by individuals in their Tweets can  predict “the rate of heart disease deaths in the counties where they live”. This phenomenon manifests itself by showing that Tweets concerning more upbeat topics and expressed in more positive terms correlated with lower mortality rates when compared to rates reported by the Center for Disease Control (CDC). Conversely, mortality rates were higher in areas “with angry language about negative topics”.

The accompanying side-by-said graphics of the Twitter data and the CDC data covering the upper right quarter of the US states and their constituent 1,300 counties, dramatically illustrates these findings. The pool of data was drawn from 148 million Tweets with geotags.

These results also provide further support for the accuracy and predictive validity of data from Twitter, notwithstanding any “inherent geographical biases”, and exceeding that of more “traditional polls or surveys”. Indeed, language in Tweets turns out to have a comparatively higher predictive value than other economic or health-related data. The researchers further believe that their findings might be more helpful when applied to “community-scale policies or interventions” rather than to assisting specific people.

My follow-up questions include:

  • Would mapping a statistically significant number of Twitter networks in counties with higher and/or lower mortality rates, a process described in the February 5, 2015 Subway Fold post entitled Visualization, Interpretation and Inspiration from Mapping Twitter Networks, provide additional insights that would be helpful to medical professionals and local policy planners? For example, are many of the negative Twitter posters in each other’s networks such that they become self-reinforcing? Are there recognizable network effects occurring that can somehow be corrected with regards to the degree of negativity and, in turn, public health? Would this pose any legal, policy or privacy issues?
  • For both of these articles, do these types of findings require more rigorous and wider-scale mathematical and scientific analysis before applying them to such critically important mental and physical health matters? If so, should such testing be done by public or private institutions, universities and/or the government agencies?
  • As first expressed in this November 22, 2014 Subway Fold post entitled Minting New Big Data Types and Analytics for Investors, how are the differences in correlation and causation being factored into these studies? Given the skepticism expressed above about Facebook Likes being so indicative about personality, are there other effects and influences that need to be identified and filtered out of these types of conclusions?
  • If the usage and analysis of social media data continues to grow in areas, well, like employment, education and health, what protections, if any, should people be given, by law and/or the social media companies, to protect themselves or opt out in advance of any potentially negative consequences?

March 20, 2015 Update:

Providing some very worthwhile additional insight and analysis of the University of Pennsylvania study covered in the initial post above, Maria Konnikova has written a very engaging article entitled What Your Tweets Say About You that was posted on The New Yorker website on March 17, 2015. I highly recommend clicking through and reading the entire text. I will sum up just some of the key points, add some links and pose several  additional questions.

The research study (linked to above), was conducted by a team led by psychologist and Professor Johannes Eichstaedt. Their main conclusion was that the collection and subsequent linguistic analysis of tweets proved to be validly predictive of locations with higher concentrations of fatalities from cardiovascular disease. The inverse was also true that geographic clusters of tweets with more positive content had lower death rates from the same cause. It was not that the population tweeting had heart disease, but rather, there is a discernible correlation between angrier content and a higher incidence of the heart disease within an area.

This “correlation is especially strange” due to the fact that Twitter users are generally younger that individuals who perish from heart ailments. Citing a January 9, 2015 study from the Pew Research Center entitled Demographics of Key Social Networking Platforms (also, imho, well worth a click-through and full reading), which, among other things tabulates the ages of the users of all of the leading social media platforms. Just 22% of US Twitter users are more than 50 years old. However, the relative risk of heart disease does not begin to rise until decades later.

How, then, to analytically connect younger people in a particular area who are posting negative tweets with their older neighbors who face higher chances of developing heart disease? The researchers theorize that the tweets “may be a window into the aggregated and powerful effects if the community context”. The overall health of people living in a particular area that is “poorer, more fragmented” and not as healthy as those residing in “richer, integrated ones”. As a result, the angrier tweets of someone in their twenties are likely reflective of an area with higher life stressors that, in turn, later result in more heart-related deaths.

Nonetheless, another renowned expert in this field of linguistic analysis of text, James Pennebaker, recommended caution in drawing any connection based upon this data. He urges further study of the data and posing additional questions about causation. Currently, in his own work, he is examining Twitter data to see how family and religious factors evolve.

There is also value in studying social media content of individuals. For example, Microsoft has previously studied 70,000 tweets of people with depression and then used this data to construct a “predictive index” to identify “other users who were likely depressed based on their social-media posts”.

Eisenstaedt’s team is continuing their work by looking at Twitter data for individuals and communities over time periods, rather than a “snapshot” data set. They are also adding Facebook profiles to their work.

Finally, Pennebaker believes that social media may also generate positive effects on mental health based on his previous studies on the benefits of keeping a personal journal. This may be so despite the private nature of a journal and the very public access of social media and its interactivity.

My additional questions are as follows:

  • Will additional discreet language patterns be discovered and validated that will indicate concentrations of other medical conditions within communities? Are we only at the beginning of using textual analysis of tweets as a metric of the states of local health?
  • Given that there is a lag time of years between negative tweets and the appearance of heart disease, should interventions be undertaken within a community at higher risk and, if so, by whom and at what cost?
  • Are other negative online behaviors such as cyberbullying indicative of some form of identifiable illness that can be treated on a community-wide basis or must this be dealt with on an individual in a case-by-case manner?