Ethical Issues and Considerations Arising in Big Data Research

Image from Pixabay

Image from Pixabay

In 48 of 50 states in the US, new attorneys are required to pass a 60 multiple-choice question exam on legal ethics in addition to passing their state’s bar exam. This is known as the Multistate Professional Responsibility Examination (MPRE). I well recall taking this test myself.

The subject matter of this test is the professional ethical roles and responsibilities a lawyer must abide by as an advocate and counselor to clients, courts and the legal profession. It is founded upon a series of ethical considerations and disciplinary rules that are strictly enforced by the bars of each state. Violations can potentially lead to a series of professional sanctions and, in severe cases depending upon the facts, disbarment from practice for a term of years or even permanently.

In other professions including, among others, medicine and accounting, similar codes of ethics exist and are expected to be scrupulously followed. They are defined efforts to ensure honesty, quality, transparency and integrity in their industries’ dealings with the public, and to address certain defined breaches. Many professional trade organizations also have formal codes of ethics but often do not have much, if any, sanction authority.

Should some comparable forms of guidelines and boards likewise be put into place to oversee the work of big data researchers? This was the subject of a very compelling article posted on Wired.com on May 20, 2016, entitled Scientists Are Just as Confused About the Ethics of Big-Data Research as You by Sharon Zhang. I highly recommend reading it in its entirety. I will summarize, annotate and add some further context to this, as well as pose a few questions of my own.

Two Recent Data Research Incidents

Last month. an independent researcher released, without permission, the profiles with very personal information of 70,000 users of the online dating site OKCupid. These users were quite angered by this. OKCupid is pursuing a legal claim to remove this data.

Earlier in 2014, researchers at Facebook manipulated items in users’ News Feeds for a study on “mood contagion“.¹ Many users were likewise upset when they found out. The journal that published this study released an “expression of concern”.

Users’ reactions over such incidents can have an effect upon subsequent “ethical boundaries”.

Nonetheless, the researchers involved in both of these cases had “never anticipated” the significant negative responses to their work. The OKCupid study was not scrutinized by any “ethical review process”, while a review board at Cornell had concluded that the Facebook study did not require a full review because the Cornell researchers only had a limited role in it.

Both of these incidents illustrate how “untested the ethics” are of these big data research. Only now are the review boards that oversee the work of these researchers starting to pay attention to emerging ethical concerns. This is in high contrast to the controls and guidelines upon medical research in clinical trials.

The Applicability of The Common Rule and Institutional Research Boards

In the US, under the The Common Rule, which governs ethics for federally funded biomedical and behavioral research where humans are involved, studies are required to undergo an ethical review.  However, such review does not apply a “unified system”, but rather, each university maintains its own institutional review board (IRB). These are composed of other (mostly medical) researchers at each university. Only a few of them “are professional ethicists“.

To a lesser extent, do they have experience in computer technology. This deficit may be affecting the protection of subjects who participate in data science research projects. In the US, there are hundreds of IRBs but they are each dealing with “research efforts in the digital age” in their own ways.

Both the Common Rule and the IRB system came into being following the revelation in the 1970s that the U.S. Public Health Service had, between 1932 and 1972, engaged in a terrible and shameful secret program that came to be known as the Tuskegee Syphilis Experiment. This involved leaving African Americans living in rural Alabama with untreated syphilis in order to study the disease. As a result of this outrage, the US Department of Health and Human Services created new regulations concerning any research on human subjects they conducted. All other federal agencies likewise adopted such regulations. Currently, “any institution that gets federal funding has to set up an IRB to oversee research involving humans”.

However, many social scientists today believe these regulations are not accurate or appropriate for their types of research involving areas where the risks involved “are usually more subtle than life or death”. For example, if you are seeking volunteers to take a survey on test-taking behaviors, the IRB language requirements on physical risks does not fit the needs of the participants in such a study.

Social scientist organizations have expressed their concern about this situation. As a result, the American Association of University Professors (AAUP) has recommended:

  • Adding more social scientists to IRBs, or
  • Creating new and separate review boards to assess social science research

In 2013, AAUP issued a report entitled Regulation of Research on Human Subjects: Academic Freedom and the Institutional Review Board, recommending that the researchers themselves should decide if “their minimal risk work needs IRB approval or not”. In turn, this would make more time available to IRBs for “biomedical research with life-or-death stakes”.

This does not, however, imply that all social science research, including big data studies, are entirely risk-free.

Ethical Issues and Risk Analyses When Data Sources Are Comingled

Dr. Elizabeth A. Buchanan who works as an ethicist at the University of Wisconsin-Stout, believes that the Internet is now entering its “third phase” where researchers can, for example, purchase several years’ worth of Twitter data and then integrate it “with other publicly available data”.² This mixture results in issues involving “ethics and privacy”.

Recently, while serving on an IRB, she took part in evaluated a project proposal involving merging mentions of a drug by its street name appearing on social media with public crime data. As a result, people involved in crimes could potentially become identified. The IRB still gave its approval. According to Dr. Buchanan, the social value of this undertaking must be weighed against its risk. As well, the risk should be minimized by removing any possible “idenifiers” in any public release of this information.

As technology continues to advance, such risk evaluation can become more challenging. For instance, in 2013, MIT researchers found out that they were able to match up “publicly available DNA sequences” by using data about the participants that the “original researchers” had uploaded online.³ Consequently, in such cases, Dr. Buchanan believes it is crucial for IRBs “to have either a data scientist, computer scientist or IT security individual” involved.

Likewise, other types of research organizations such as, among others, open science repositories, could perhaps “pick up the slack” and handle more of these ethical questions. According to Michelle Meyer, a bioethicist at Mount Sinai, oversight must be assumed by someone but the best means is not likely to be an IRB because they do not have the necessary “expertise in de-identification and re-identification techniques”.

Different Perspectives on Big Data Research

A technology researcher at the University of Maryland 4 named Dr. Katie Shilton recently conducted interviews of “20 online data researchers”. She discovered “significant disagreement” among them on matters such as the “ethics of ignoring Terms of Service and obtaining informed consent“. The group also reported that the ethical review boards they dealt with never questioned the ethics of the researchers, while peer reviewers and their professional colleagues had done so.

Professional groups such as the Association of Internet Researchers (AOIR) and the Center for Applied Internet Data Analysis (CAIDA) have created and posted their own guidelines:

However, IRBs who “actually have power” are only now “catching up”.

Beyond universities, tech companies such as Microsoft have begun to establish in-house “ethical review processes”. As well, in December 2015, the Future of Privacy Forum held a gathering called Beyond IRBs to evaluate “processes for ethical review outside of federally funded research”.

In conclusion., companies continually “experiment on us” with data studies. Just to name to name two, among numerous others, they focus on A/B testing 5 of news headings and supermarket checkout lines. As they hire increasing numbers of data scientists from universities’ Ph.D. programs, these schools are sensing an opportunity to close the gap in terms of using “data to contribute to public knowledge”.

My Questions

  • Would the companies, universities and professional organizations who issue and administer ethical guidelines for big data studies be taken more seriously if they had the power to assess and issue public notices for violations? How could this be made binding and what sort of appeals processes might be necessary?
  • At what point should the legal system become involved? When do these matters begin to involve civil and/or criminal investigations and allegations? How would big data research experts be certified for hearings and trials?
  • Should teaching ethics become a mandatory part of curriculum in data science programs at universities? If so, should the instructors only be selected from the technology industry or would it be helpful to invite them from other industries?
  • How should researchers and their employers ideally handle unintended security and privacy breaches as a result of their work? Should they make timely disclosures and treat all inquiries with a high level of transparency?
  • Should researchers experiment with open source methods online to conduct certain IRB functions for more immediate feedback?

 


1.  For a detailed report on this story, see Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry, by Vindu Goel, in the June 29, 2014 edition of The New York Times.

2These ten Subway Fold posts cover a variety of applications in analyzing Twitter usage data.

3.  For coverage on this story see an article published in The New York Times on January 17, 2013, entitled Web Hunt for DNA Sequences Leaves Privacy Compromised, by Gina Kolata.

4.  For another highly interesting but unrelated research initiative at the University of Maryland, see the December 27, 2015 Subway Fold post entitled Virtual Reality Universe-ity: The Immersive “Augmentarium” Lab at the U. of Maryland.

5.  For a detailed report on this methodology, see the September 30, 2015 Subway Fold post entitled Google’s A/B Testing Method is Being Applied to Improve Government Operations.

Updates on Recent Posts Re: Music’s Big Data, Deep Learning, VR Movies, Regular Movies’ Effects on Our Brains, Storytelling and, of Course, Zombies

This week has seen the publication of an exciting series of news stories and commentaries that provide a very timely opportunity to update six recent Subway Fold posts. The common thread running through the original posts and these new pieces is the highly inventive mixing, mutating and monetizing of pop culture and science. Please put on your virtual 3-D glasses let’s see what’s out there.

The December 10, 2014 Subway Fold post entitled Is Big Data Calling and Calculating the Tune in Today’s Global Music Market? explored the apps, companies and trends that have become the key drivers in the current global music business. Adding to the big data strategies and implementations for three more major music companies and their rosters of artists was a very informative report in the December 15, 2014 edition of The Wall Street Journal by Hannah Karp entitled Music Business Plays to Big Data’s Beat. (A subscription for the full text required a subscription to WSJonline.com, but the story also appeared in full on Nasdaq.com clickable here.) As described in detail in this report, Universal Music, Warner Music, and Sony Music have all created sophisticated systems to parse numerous data sources and apply customized analytics for planning and executing marketing campaigns.

Next for an alternative and somewhat retro approach, a veteran music retailer named Sal Nunziato wrote a piece on the Op Ed page of The New York Times on the very same day entitled Elegy for the ‘Suits’. He blamed the Internet more than the music labels for the current state of music where “anyone with a computer, a kazoo and an untuned guitar” can release their music  online regardless of its quality. Thus, the ‘suits’ he nostalgically misses were the music company execs who exerted  more controlled upon the quantity and quality of music available to the public.

Likewise covering the tuning up of another major force in today’s online music streaming industry was an August 14, 2014 Subway Fold post entitled Spotify Enhances Playlist Recommendations Processing with “Deep Learning” Technology. This summarized a report about how deep learning technology was being successfully applied to improve the accuracy and responsiveness of Spotify’s recommendation engine. Presenting an even stronger case that you-ain’t-seen-nothing-yet in this field was an engaging analysis of some still largely unseen developments in deep learning posted on December 15, 2014, on Gigaom.com entitled What We Read About Deep Learning is Just the Tip of the Iceberg by Derrick Harris. These include experimental systems being tested by the likes of Google, Facebook and Microsoft. As well, there were a series of intriguing presentations and demos at the recent Neural Information Processing Systems conference held in Montreal. As detailed here with a wealth of supporting links, many of these advanced systems and methods are expected to gain more press and publicity in 2015.

Returning to the here and now at end of 2014, the current release of the movie adaptation of the novel Wild by Cheryl Strayed (Knopf, 2011), has been further formatted into 3-minute supplemental virtual reality movie as reported in the December 15, 2014 edition of The New York Times by Michael Cieply in an article entitled Virtual Reality ‘Wild’ Trek. This fits right in with the developments covered in the December 10, 2014 Subway Fold post entitled A Full Slate of Virtual Reality Movies and Experiences Scheduled at the 2015 Sundance Film Festival as this short film is also scheduled to be presented at the 2015 Sundance festival. Using Oculus and Samsung VR technology, this is an immersive meeting with the lead character, played by actress Reese Witherspoon, while she is hiking in the wilderness. She is quoted as being very pleased with the final results of this VR production.

The next set of analyses and enhancements to our cinematic experience, continuing right along with the September 3, 2014 Subway Fold post entitled Applying MRI Technology to Determine the Effects of Movies and Music on Our Brains, concerns a newly published book that explains the science of how movies affect our brains entitled Flicker: Your Brain on Movies (Oxford University Press, 2014), by Dr. Jeffrey Zacks. The author was interviewed during a fascinating segment of the December 18, 2014 broadcast of The Brian Lehrer Show on WYNC radio. Among other things, he spoke about why audiences cry during movies (even when the films are not very good), sometimes root for the villain, and move to duck out of the way when an object on the screen seems to be coming right at them such as the giant bolder rolling after Indiana Jones at the start of Raiders of the Lost Ark. Much of this is intentionally done by the filmmakers to manipulate audiences into heightened emotional responses to key events as they unfold on the big screen.

Of course, all movie making involves the art and science of storytelling skills as discussed in the November 4, 2014 Subway Fold post entitled Say, Did You Hear the Story About the Science and Benefits of Being an Effective Storyteller?. In a very practical and insightful article in the December 12, 2014 edition of The New York Times by Alina Tugend entitled Storytelling Your Way to a Better Job or a Stronger Start-Up there are some helpful applications for today’s marketplace. As concisely stated in this piece “You need to have a good story.” It describes in detail how there are now consultants, charging meaningful fees, with new approaches and techniques who assist people in improving their skills in order to become more persuasive storytellers. Among others interviewed for this story was Dr. Paul J. Zak, who wrote the recent article on The Harvard Business Review Blog which was the basis for the November 4th Subway Fold post. It concludes with five helpful pointers to spin a compelling yarn for your listeners.

Finally, the best story told on TV during the 2014 season was – – in a fictional world where brains take on an entirely different significance – –  The Walking Dead on AMC in terms of the extraordinary number of tweets about ongoing adventures Sheriff Rick and the Grimes Gang. This was covered on Nielsen.com on December 15, 2014 in a post entitled Tops of 2014: Social TV.  TWD averaged twice as many tweets as its next competitor in the ongoing series category. This follows up directly with the July 31, 2014 Subway Fold post entitled New Analytical Twitter Traffic Report on US TV Shows During the 2013 – 2014 Season.  As I read scores of TWD tweets on the mid-season finale myself, everyone will miss you, Beth.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As a major fan of TWD, I would like to take the opportunity add my own brief review about the tragic events in Episode 5.8:

I think that in the end, Beth was a form of avatar for the entire show. She traveled many miles from lying on her bed in Season 2 completely unable to function and progressing to Season 5 as a realist concerning herself and the group’s survival. Rather than resigning herself to be held a captive ward in the hospital, she was determined to escape no matter what and was so proud of helping Jonah to escape.

She awakened and arose to be a survivor and a committed member of the Grimes Gang, just as everyone else has done during the past five years. That is, Beth’s journey reflects the entire group’s journey. She, and the Grimes Gang, up to this point have survived all of the threats they faced and endured all of the horrors they have seen. They will all survive but this death with have more serious repercussions than perhaps any other death up until this point. Maggie, Daryl, Rick, Carol and Carl, the core of the GG, will not soon recover from this.

What I still do not understand is why, given that she was finally free in the hospital’s hallway, did she jeopardize her life by going after the lead officer with a scissors. It seemed to be somewhat at odds with Beth’s character as someone who had survived until now on her own determination and close bond with the group. She had nothing to gain by such a reckless act in the middle of a very volatile situation. Was it a sacrifice to save Jonah? Did she realize that the cop was holding a gun at that point? Was she just overtaken by the motivation that desperate times sometimes call for desperate measures?

Consider, too, that she was Herschel’s daughter and her character reflected what she had learned from him: 1. Both learned to see things differently and adapted when the circumstances changed. 2. Both faced sacrifices and danger with great dignity. (Recall Herschel’s acknowledging grin towards Rick right before the Governor murdered the elder of the survivors, and then Beth’s defiant grin when she saw that Jonah had escaped.) 3. Both were resilient insofar as Herschel adapting to the loss of his leg and Beth recovering from her father’s murder. 4. Both sought to comfort others as Herschel stayed with the flu patients and Beth finally drew Daryl out about his terrible family life. Recall also, the three very effective times during her history on the show when Beth’s singing gave great comfort to the others. Indeed, she was a saintly figure but as this story arc wore on, her demise seemed to be foretold.

TWD remains, for me, an absolutely brilliant show in terms of its characters, narrative and presentation.