Ethical Issues and Considerations Arising in Big Data Research

Image from Pixabay

Image from Pixabay

In 48 of 50 states in the US, new attorneys are required to pass a 60 multiple-choice question exam on legal ethics in addition to passing their state’s bar exam. This is known as the Multistate Professional Responsibility Examination (MPRE). I well recall taking this test myself.

The subject matter of this test is the professional ethical roles and responsibilities a lawyer must abide by as an advocate and counselor to clients, courts and the legal profession. It is founded upon a series of ethical considerations and disciplinary rules that are strictly enforced by the bars of each state. Violations can potentially lead to a series of professional sanctions and, in severe cases depending upon the facts, disbarment from practice for a term of years or even permanently.

In other professions including, among others, medicine and accounting, similar codes of ethics exist and are expected to be scrupulously followed. They are defined efforts to ensure honesty, quality, transparency and integrity in their industries’ dealings with the public, and to address certain defined breaches. Many professional trade organizations also have formal codes of ethics but often do not have much, if any, sanction authority.

Should some comparable forms of guidelines and boards likewise be put into place to oversee the work of big data researchers? This was the subject of a very compelling article posted on Wired.com on May 20, 2016, entitled Scientists Are Just as Confused About the Ethics of Big-Data Research as You by Sharon Zhang. I highly recommend reading it in its entirety. I will summarize, annotate and add some further context to this, as well as pose a few questions of my own.

Two Recent Data Research Incidents

Last month. an independent researcher released, without permission, the profiles with very personal information of 70,000 users of the online dating site OKCupid. These users were quite angered by this. OKCupid is pursuing a legal claim to remove this data.

Earlier in 2014, researchers at Facebook manipulated items in users’ News Feeds for a study on “mood contagion“.¹ Many users were likewise upset when they found out. The journal that published this study released an “expression of concern”.

Users’ reactions over such incidents can have an effect upon subsequent “ethical boundaries”.

Nonetheless, the researchers involved in both of these cases had “never anticipated” the significant negative responses to their work. The OKCupid study was not scrutinized by any “ethical review process”, while a review board at Cornell had concluded that the Facebook study did not require a full review because the Cornell researchers only had a limited role in it.

Both of these incidents illustrate how “untested the ethics” are of these big data research. Only now are the review boards that oversee the work of these researchers starting to pay attention to emerging ethical concerns. This is in high contrast to the controls and guidelines upon medical research in clinical trials.

The Applicability of The Common Rule and Institutional Research Boards

In the US, under the The Common Rule, which governs ethics for federally funded biomedical and behavioral research where humans are involved, studies are required to undergo an ethical review.  However, such review does not apply a “unified system”, but rather, each university maintains its own institutional review board (IRB). These are composed of other (mostly medical) researchers at each university. Only a few of them “are professional ethicists“.

To a lesser extent, do they have experience in computer technology. This deficit may be affecting the protection of subjects who participate in data science research projects. In the US, there are hundreds of IRBs but they are each dealing with “research efforts in the digital age” in their own ways.

Both the Common Rule and the IRB system came into being following the revelation in the 1970s that the U.S. Public Health Service had, between 1932 and 1972, engaged in a terrible and shameful secret program that came to be known as the Tuskegee Syphilis Experiment. This involved leaving African Americans living in rural Alabama with untreated syphilis in order to study the disease. As a result of this outrage, the US Department of Health and Human Services created new regulations concerning any research on human subjects they conducted. All other federal agencies likewise adopted such regulations. Currently, “any institution that gets federal funding has to set up an IRB to oversee research involving humans”.

However, many social scientists today believe these regulations are not accurate or appropriate for their types of research involving areas where the risks involved “are usually more subtle than life or death”. For example, if you are seeking volunteers to take a survey on test-taking behaviors, the IRB language requirements on physical risks does not fit the needs of the participants in such a study.

Social scientist organizations have expressed their concern about this situation. As a result, the American Association of University Professors (AAUP) has recommended:

  • Adding more social scientists to IRBs, or
  • Creating new and separate review boards to assess social science research

In 2013, AAUP issued a report entitled Regulation of Research on Human Subjects: Academic Freedom and the Institutional Review Board, recommending that the researchers themselves should decide if “their minimal risk work needs IRB approval or not”. In turn, this would make more time available to IRBs for “biomedical research with life-or-death stakes”.

This does not, however, imply that all social science research, including big data studies, are entirely risk-free.

Ethical Issues and Risk Analyses When Data Sources Are Comingled

Dr. Elizabeth A. Buchanan who works as an ethicist at the University of Wisconsin-Stout, believes that the Internet is now entering its “third phase” where researchers can, for example, purchase several years’ worth of Twitter data and then integrate it “with other publicly available data”.² This mixture results in issues involving “ethics and privacy”.

Recently, while serving on an IRB, she took part in evaluated a project proposal involving merging mentions of a drug by its street name appearing on social media with public crime data. As a result, people involved in crimes could potentially become identified. The IRB still gave its approval. According to Dr. Buchanan, the social value of this undertaking must be weighed against its risk. As well, the risk should be minimized by removing any possible “idenifiers” in any public release of this information.

As technology continues to advance, such risk evaluation can become more challenging. For instance, in 2013, MIT researchers found out that they were able to match up “publicly available DNA sequences” by using data about the participants that the “original researchers” had uploaded online.³ Consequently, in such cases, Dr. Buchanan believes it is crucial for IRBs “to have either a data scientist, computer scientist or IT security individual” involved.

Likewise, other types of research organizations such as, among others, open science repositories, could perhaps “pick up the slack” and handle more of these ethical questions. According to Michelle Meyer, a bioethicist at Mount Sinai, oversight must be assumed by someone but the best means is not likely to be an IRB because they do not have the necessary “expertise in de-identification and re-identification techniques”.

Different Perspectives on Big Data Research

A technology researcher at the University of Maryland 4 named Dr. Katie Shilton recently conducted interviews of “20 online data researchers”. She discovered “significant disagreement” among them on matters such as the “ethics of ignoring Terms of Service and obtaining informed consent“. The group also reported that the ethical review boards they dealt with never questioned the ethics of the researchers, while peer reviewers and their professional colleagues had done so.

Professional groups such as the Association of Internet Researchers (AOIR) and the Center for Applied Internet Data Analysis (CAIDA) have created and posted their own guidelines:

However, IRBs who “actually have power” are only now “catching up”.

Beyond universities, tech companies such as Microsoft have begun to establish in-house “ethical review processes”. As well, in December 2015, the Future of Privacy Forum held a gathering called Beyond IRBs to evaluate “processes for ethical review outside of federally funded research”.

In conclusion., companies continually “experiment on us” with data studies. Just to name to name two, among numerous others, they focus on A/B testing 5 of news headings and supermarket checkout lines. As they hire increasing numbers of data scientists from universities’ Ph.D. programs, these schools are sensing an opportunity to close the gap in terms of using “data to contribute to public knowledge”.

My Questions

  • Would the companies, universities and professional organizations who issue and administer ethical guidelines for big data studies be taken more seriously if they had the power to assess and issue public notices for violations? How could this be made binding and what sort of appeals processes might be necessary?
  • At what point should the legal system become involved? When do these matters begin to involve civil and/or criminal investigations and allegations? How would big data research experts be certified for hearings and trials?
  • Should teaching ethics become a mandatory part of curriculum in data science programs at universities? If so, should the instructors only be selected from the technology industry or would it be helpful to invite them from other industries?
  • How should researchers and their employers ideally handle unintended security and privacy breaches as a result of their work? Should they make timely disclosures and treat all inquiries with a high level of transparency?
  • Should researchers experiment with open source methods online to conduct certain IRB functions for more immediate feedback?

 


1.  For a detailed report on this story, see Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry, by Vindu Goel, in the June 29, 2014 edition of The New York Times.

2These ten Subway Fold posts cover a variety of applications in analyzing Twitter usage data.

3.  For coverage on this story see an article published in The New York Times on January 17, 2013, entitled Web Hunt for DNA Sequences Leaves Privacy Compromised, by Gina Kolata.

4.  For another highly interesting but unrelated research initiative at the University of Maryland, see the December 27, 2015 Subway Fold post entitled Virtual Reality Universe-ity: The Immersive “Augmentarium” Lab at the U. of Maryland.

5.  For a detailed report on this methodology, see the September 30, 2015 Subway Fold post entitled Google’s A/B Testing Method is Being Applied to Improve Government Operations.

Digital Smarts Everywhere: The Emergence of Ambient Intelligence

Image from Pixabay

Image from Pixabay

The Troggs were a legendary rock and roll band who were part of the British Invasion in the late 1960’s. They have always been best known for their iconic rocker Wild Thing. This was also the only Top 10 hit that ever had an ocarina solo. How cool is that! The band went on to have two other major hits, With a Girl Like You and Love is All Around.¹

The third of the band’s classic singles can be stretched a bit to be used as a helpful metaphor to describe an emerging form pervasive “all around”-edness, this time in a more technological context. Upon reading a fascinating recent article on TechCrunch.com entitled The Next Stop on the Road to Revolution is Ambient Intelligence, by Gary Grossman, on May 7, 2016, you will find a compelling (but not too rocking) analysis about how the rapidly expanding universe of digital intelligent systems wired into our daily routines is becoming more ubiquitous, unavoidable and ambient each day.

All around indeed. Just as romance can dramatically affect our actions and perspectives, studies now likewise indicate that the relentless global spread of smarter – – and soon thereafter still smarter – – technologies is comparably affecting people’s lives at many different levels.² 

We have followed just a sampling of developments and trends in the related technologies of artificial intelligence, machine learning, expert systems and swarm intelligence in these 15 Subway Fold posts. I believe this new article, adding “ambient intelligence” to the mix, provides a timely opportunity to bring these related domains closer together in terms of their common goals, implementations and benefits. I highly recommend reading Mr. Grossman’s piece it in its entirety.

I will summarize and annotate it, add some additional context, and then pose some of my own Trogg-inspired questions.

Internet of Experiences

Digital this, that and everything is everywhere in today’s world. There is a surging confluence of connected personal and business devices, the Internet, and the Internet of Things (I0T) ³. Woven closely together on a global scale, we have essentially built “a digital intelligence network that transcends all that has gone before”. In some cases, this quantum of advanced technologies gains the “ability to sense, predict and respond to our needs”, and is becoming part of everyone’s “natural behaviors”.

A forth industrial revolution might even manifest itself in the form of machine intelligence whereby we will interact with the “always-on, interconnected world of things”. As a result, the Internet may become characterized more by experiences where users will converse with ambient intelligent systems everywhere. The supporting planks of this new paradigm include:

A prediction of what more fully realized ambient intelligence might look like using travel as an example appeared in an article entitled Gearing Up for Ambient Intelligence, by Lisa Morgan, on InformationWeek.com on March 14, 2016. Upon leaving his or her plane, the traveler will receive a welcoming message and a request to proceed to the curb to retrieve their luggage. Upon reaching curbside, a self-driving car6 will be waiting with information about the hotel booked for the stay.

Listening

Another article about ambient intelligence entitled Towards a World of Ambient Computing, by Simon Bisson, posted on ZDNet.com on February 14, 2014, is briefly quoted for the line “We will talk, and the world will answer”, to illustrate the point that current technology will be morphing into something in the future that would be nearly unrecognizable today. Grossman’s article proceeds to survey a series of commercial technologies recently brought to market as components of a fuller ambient intelligence that will “understand what we are asking” and provide responsive information.

Starting with Amazon’s Echo, this new device can, among other things:

  • Answer certain types of questions
  • Track shopping lists
  • Place orders on Amazon.com
  • Schedule a ride with Uber
  • Operate a thermostat
  • Provide transit schedules
  • Commence short workouts
  • Review recipes
  • Perform math
  • Request a plumber
  • Provide medical advice

Will it be long before we begin to see similar smart devices everywhere in homes and businesses?

Kevin Kelly, the founding Executive Editor of WIRED and a renowned futurist7, believes that in the near future, digital intelligence will become available in the form of a utility8 and, as he puts it “IQ as a service”. This is already being done by Google, Amazon, IBM and Microsoft who are providing open access to sections of their AI coding.9 He believes that success for the next round of startups will go to those who enhance and transforms something already in existence with the addition of AI. The best example of this is once again self-driving cars.

As well, in a chapter on Ambient Computing from a report by Deloitte UK entitled Tech Trends 2015, it was noted that some products were engineering ambient intelligence into their products as a means to remain competitive.

Recommending

A great deal of AI is founded upon the collection of big data from online searching, the use of apps and the IoT. This universe of information supports neural networks learn from repeated behaviors including people’s responses and interests. In turn, it provides a basis for “deep learning-derived personalized information and services” that can, in turn, derive “increasingly educated guesses with any given content”.

An alternative perspective, that “AI is simply the outsourcing of cognition by machines”, has been expressed by Jason Silva, a technologist, philosopher and video blogger on Shots of Awe. He believes that this process is the “most powerful force in the universe”, that is, of intelligence. Nonetheless, he sees this as an evolutionary process which should not be feared. (See also the December 27, 2014 Subway Fold post entitled  Three New Perspectives on Whether Artificial Intelligence Threatens or Benefits the World.)

Bots are another contemporary manifestation of ambient intelligence. These are a form of software agent, driven by algorithms, that can independently perform a range of sophisticated tasks. Two examples include:

Speaking

Optimally, bots should also be able to listen and “speak” back in return much like a 2-way phone conversation. This would also add much-needed context, more natural interactions and “help to refine understanding” to these human/machine exchanges. Such conversations would “become an intelligent and ambient part” of daily life.

An example of this development path is evident in Google Now. This service combines voice search with predictive analytics to present users with information prior to searching. It is an attempt to create an “omniscient assistant” that can reply to any request for information “including those you haven’t thought of yet”.

Recently, the company created a Bluetooth-enable prototype of lapel pin based on this technology that operates just by tapping it much like the communicators on Star Trek. (For more details, see Google Made a Secret Prototype That Works Like the Star Trek Communicator, by Victor Luckerson, on Time.com, posted on November 22, 2015.)

The configurations and specs of AI-powered devices, be it lapel pins, some form of augmented reality10 headsets or something else altogether, supporting such pervasive and ambient intelligence are not exactly clear yet. Their development and introduction will take time but remain inevitable.

Will ambient intelligence make our lives any better? It remains to be seen, but it is probably a viable means to handle some of more our ordinary daily tasks. It will likely “fade into the fabric of daily life” and be readily accessible everywhere.

Quite possibly then, the world will truly become a better place to live upon the arrival of ambient intelligence-enabled ocarina solos.

My Questions

  • Does the emergence of ambient intelligence, in fact, signal the arrival of a genuine fourth industrial revolution or is this all just a semantic tool to characterize a broader spectrum of smarter technologies?
  • How might this trend affect overall employment in terms of increasing or decreasing jobs on an industry by industry basis and/or the entire workforce? (See also this June 4, 2015 Subway Fold post entitled How Robots and Computer Algorithms Are Challenging Jobs and the Economy.)
  • How might this trend also effect non-commercial spheres such as public interest causes and political movements?
  • As ambient intelligence insinuates itself deeper into our online worlds, will this become a principal driver of new entrepreneurial opportunities for startups? Will ambient intelligence itself provide new tools for startups to launch and thrive?

 


1.   Thanks to Little Steven (@StevieVanZandt) for keeping the band’s music in occasional rotation on The Underground Garage  (#UndergroundGarage.) Also, for an appreciation of this radio show see this August 14, 2014 Subway Fold post entitled The Spirit of Rock and Roll Lives on Little Steven’s Underground Garage.

2.  For a remarkably comprehensive report on the pervasiveness of this phenomenon, see the Pew Research Center report entitled U.S. Smartphone Use in 2015, by Aaron Smith, posted on April 1, 2015.

3These 10 Subway Fold posts touch upon the IoT.

4.  The Subway Fold category Big Data and Analytics contains 50 posts cover this topic in whole or in part.

5.  The Subway Fold category Telecommunications contains 12 posts cover this topic in whole or in part.

6These 5 Subway Fold posts contain references to self-driving cars.

7.   Mr. Kelly is also the author of a forthcoming book entitled The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future, to be published on June 7, 2016 by Viking.

8.  This September 1, 2014 Subway Fold post entitled Possible Futures for Artificial Intelligence in Law Practice, in part summarized an article by Steven Levy in the September 2014 issue of WIRED entitled Siri’s Inventors Are Building a Radical New AI That Does Anything You Ask. This covered a startup called Viv Labs whose objective was to transform AI into a form of utility. Fast forward to the Disrupt NY 2016 conference going on in New York last week. On May 9, 2016, the founder of Viv, Dag Kittlaus, gave his presentation about the Viv platform. This was reported in an article posted on TechCrunch.com entitled Siri-creator Shows Off First Public Demo of Viv, ‘the Intelligent Interface for Everything’, by Romain Dillet, on May 9, 2016. The video of this 28-minute presentation is embedded in this story.

9.  For the full details on this story see a recent article entitled The Race Is On to Control Artificial Intelligence, and Tech’s Future by John Markoff and Steve Lohr, published in the March 25, 2016 edition of The New York Times.

10These 10 Subway Fold posts cover some recent trends and development in augmented reality.