Ethical Issues and Considerations Arising in Big Data Research

Image from Pixabay

Image from Pixabay

In 48 of 50 states in the US, new attorneys are required to pass a 60 multiple-choice question exam on legal ethics in addition to passing their state’s bar exam. This is known as the Multistate Professional Responsibility Examination (MPRE). I well recall taking this test myself.

The subject matter of this test is the professional ethical roles and responsibilities a lawyer must abide by as an advocate and counselor to clients, courts and the legal profession. It is founded upon a series of ethical considerations and disciplinary rules that are strictly enforced by the bars of each state. Violations can potentially lead to a series of professional sanctions and, in severe cases depending upon the facts, disbarment from practice for a term of years or even permanently.

In other professions including, among others, medicine and accounting, similar codes of ethics exist and are expected to be scrupulously followed. They are defined efforts to ensure honesty, quality, transparency and integrity in their industries’ dealings with the public, and to address certain defined breaches. Many professional trade organizations also have formal codes of ethics but often do not have much, if any, sanction authority.

Should some comparable forms of guidelines and boards likewise be put into place to oversee the work of big data researchers? This was the subject of a very compelling article posted on on May 20, 2016, entitled Scientists Are Just as Confused About the Ethics of Big-Data Research as You by Sharon Zhang. I highly recommend reading it in its entirety. I will summarize, annotate and add some further context to this, as well as pose a few questions of my own.

Two Recent Data Research Incidents

Last month. an independent researcher released, without permission, the profiles with very personal information of 70,000 users of the online dating site OKCupid. These users were quite angered by this. OKCupid is pursuing a legal claim to remove this data.

Earlier in 2014, researchers at Facebook manipulated items in users’ News Feeds for a study on “mood contagion“.¹ Many users were likewise upset when they found out. The journal that published this study released an “expression of concern”.

Users’ reactions over such incidents can have an effect upon subsequent “ethical boundaries”.

Nonetheless, the researchers involved in both of these cases had “never anticipated” the significant negative responses to their work. The OKCupid study was not scrutinized by any “ethical review process”, while a review board at Cornell had concluded that the Facebook study did not require a full review because the Cornell researchers only had a limited role in it.

Both of these incidents illustrate how “untested the ethics” are of these big data research. Only now are the review boards that oversee the work of these researchers starting to pay attention to emerging ethical concerns. This is in high contrast to the controls and guidelines upon medical research in clinical trials.

The Applicability of The Common Rule and Institutional Research Boards

In the US, under the The Common Rule, which governs ethics for federally funded biomedical and behavioral research where humans are involved, studies are required to undergo an ethical review.  However, such review does not apply a “unified system”, but rather, each university maintains its own institutional review board (IRB). These are composed of other (mostly medical) researchers at each university. Only a few of them “are professional ethicists“.

To a lesser extent, do they have experience in computer technology. This deficit may be affecting the protection of subjects who participate in data science research projects. In the US, there are hundreds of IRBs but they are each dealing with “research efforts in the digital age” in their own ways.

Both the Common Rule and the IRB system came into being following the revelation in the 1970s that the U.S. Public Health Service had, between 1932 and 1972, engaged in a terrible and shameful secret program that came to be known as the Tuskegee Syphilis Experiment. This involved leaving African Americans living in rural Alabama with untreated syphilis in order to study the disease. As a result of this outrage, the US Department of Health and Human Services created new regulations concerning any research on human subjects they conducted. All other federal agencies likewise adopted such regulations. Currently, “any institution that gets federal funding has to set up an IRB to oversee research involving humans”.

However, many social scientists today believe these regulations are not accurate or appropriate for their types of research involving areas where the risks involved “are usually more subtle than life or death”. For example, if you are seeking volunteers to take a survey on test-taking behaviors, the IRB language requirements on physical risks does not fit the needs of the participants in such a study.

Social scientist organizations have expressed their concern about this situation. As a result, the American Association of University Professors (AAUP) has recommended:

  • Adding more social scientists to IRBs, or
  • Creating new and separate review boards to assess social science research

In 2013, AAUP issued a report entitled Regulation of Research on Human Subjects: Academic Freedom and the Institutional Review Board, recommending that the researchers themselves should decide if “their minimal risk work needs IRB approval or not”. In turn, this would make more time available to IRBs for “biomedical research with life-or-death stakes”.

This does not, however, imply that all social science research, including big data studies, are entirely risk-free.

Ethical Issues and Risk Analyses When Data Sources Are Comingled

Dr. Elizabeth A. Buchanan who works as an ethicist at the University of Wisconsin-Stout, believes that the Internet is now entering its “third phase” where researchers can, for example, purchase several years’ worth of Twitter data and then integrate it “with other publicly available data”.² This mixture results in issues involving “ethics and privacy”.

Recently, while serving on an IRB, she took part in evaluated a project proposal involving merging mentions of a drug by its street name appearing on social media with public crime data. As a result, people involved in crimes could potentially become identified. The IRB still gave its approval. According to Dr. Buchanan, the social value of this undertaking must be weighed against its risk. As well, the risk should be minimized by removing any possible “idenifiers” in any public release of this information.

As technology continues to advance, such risk evaluation can become more challenging. For instance, in 2013, MIT researchers found out that they were able to match up “publicly available DNA sequences” by using data about the participants that the “original researchers” had uploaded online.³ Consequently, in such cases, Dr. Buchanan believes it is crucial for IRBs “to have either a data scientist, computer scientist or IT security individual” involved.

Likewise, other types of research organizations such as, among others, open science repositories, could perhaps “pick up the slack” and handle more of these ethical questions. According to Michelle Meyer, a bioethicist at Mount Sinai, oversight must be assumed by someone but the best means is not likely to be an IRB because they do not have the necessary “expertise in de-identification and re-identification techniques”.

Different Perspectives on Big Data Research

A technology researcher at the University of Maryland 4 named Dr. Katie Shilton recently conducted interviews of “20 online data researchers”. She discovered “significant disagreement” among them on matters such as the “ethics of ignoring Terms of Service and obtaining informed consent“. The group also reported that the ethical review boards they dealt with never questioned the ethics of the researchers, while peer reviewers and their professional colleagues had done so.

Professional groups such as the Association of Internet Researchers (AOIR) and the Center for Applied Internet Data Analysis (CAIDA) have created and posted their own guidelines:

However, IRBs who “actually have power” are only now “catching up”.

Beyond universities, tech companies such as Microsoft have begun to establish in-house “ethical review processes”. As well, in December 2015, the Future of Privacy Forum held a gathering called Beyond IRBs to evaluate “processes for ethical review outside of federally funded research”.

In conclusion., companies continually “experiment on us” with data studies. Just to name to name two, among numerous others, they focus on A/B testing 5 of news headings and supermarket checkout lines. As they hire increasing numbers of data scientists from universities’ Ph.D. programs, these schools are sensing an opportunity to close the gap in terms of using “data to contribute to public knowledge”.

My Questions

  • Would the companies, universities and professional organizations who issue and administer ethical guidelines for big data studies be taken more seriously if they had the power to assess and issue public notices for violations? How could this be made binding and what sort of appeals processes might be necessary?
  • At what point should the legal system become involved? When do these matters begin to involve civil and/or criminal investigations and allegations? How would big data research experts be certified for hearings and trials?
  • Should teaching ethics become a mandatory part of curriculum in data science programs at universities? If so, should the instructors only be selected from the technology industry or would it be helpful to invite them from other industries?
  • How should researchers and their employers ideally handle unintended security and privacy breaches as a result of their work? Should they make timely disclosures and treat all inquiries with a high level of transparency?
  • Should researchers experiment with open source methods online to conduct certain IRB functions for more immediate feedback?


1.  For a detailed report on this story, see Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry, by Vindu Goel, in the June 29, 2014 edition of The New York Times.

2These ten Subway Fold posts cover a variety of applications in analyzing Twitter usage data.

3.  For coverage on this story see an article published in The New York Times on January 17, 2013, entitled Web Hunt for DNA Sequences Leaves Privacy Compromised, by Gina Kolata.

4.  For another highly interesting but unrelated research initiative at the University of Maryland, see the December 27, 2015 Subway Fold post entitled Virtual Reality Universe-ity: The Immersive “Augmentarium” Lab at the U. of Maryland.

5.  For a detailed report on this methodology, see the September 30, 2015 Subway Fold post entitled Google’s A/B Testing Method is Being Applied to Improve Government Operations.

Virtual Reality Universe-ity: The Immersive “Augmentarium” Lab at the U. of Maryland

"A Touch of Science", Image by Mars P.

“A Touch of Science”, Image by Mars P.

Got to classes. Sit through a series of 50 minute lectures. Drink coffee. Pay attention and take notes. Drink more coffee. Go to the library to study, do research and complete assignments. Rinse and repeat for the rest of the semester. Then take your final exams and hope that you passed everything. More or less, things have traditionally been this way in college since Hector was a pup.

Might students instead be interested in participating at the new and experimental learning laboratory called the Augmentarium at the University of Maryland where immersing themselves in their studies takes on an entirely new meaning? This is a place where virtual reality (VR)  is being tested and integrated into the learning process. (There 14 Subway Fold posts cover a range of VR and augmented reality [AR] developments and applications.)

Where do I sign up for this?¹

The story was covered in a fascinating report that appeared on December 8, 2015 on the website of the Chronicle of Higher Education entitled Virtual-Reality Lab Explores New Kinds of Immersive Learning, by Ellen Wexler. I highly recommend reading this in its entirety as well as clicking on the Augmentarium link to learn about some these remarkable projects. I also suggest checking out the hashtag #Augmentarium on Twitter the very latest news and developments. I will summarize and annotate this story, and pose some of my own questions right after I take off my own imaginary VR headset.

Developing VR Apps in the Augmentarium

In 2014, Brendan Iribe, the co-founder of the VR headset company Oculus², as well as a University of Maryland alumni, donated $31 million to the University for its development of VR technology³. During the same year, with addition funding obtained from the National Science Foundation, the Augmentarium was built. Currently, researchers at the facility are working on applications of VR to “health care, public safety, and education”.

Professor Ramani Duraiswami, a PhD and co-founder of a startup called VisiSonics (developers of 3D audio and VR gaming systems), is involved with the Augmentarium. His work is in the area of audio, which he believes has a great effect upon how people perceive the world around them. He further thinks that an audio or video lecture presented via distance learning can be greatly enhanced by using VR to, in his words make “the experience feel more immersive”. He feels this would make you feel as though you are in the very presence of the instructor4.

During a recent showcase there, Professor Duraiswami demo-ed 3D sound5 and a short VR science fiction production called Fixing Incus. (This link is meant to be played on a smartphone that is then embedded within a VR viewer/headset.) This implementation showed the audience what it was like to be immersed into a virtual environment where, when they moved their heads and line of sight, what they were viewing corresponding and seamlessly changed.

Enhancing Virtual Immersions for Medicine and Education

Amitabh Varshney, the Director of the University’s Institute for Advanced Computer Studies, is now researching “how the brain processes information in immersive environments” and how is differs from how this is done on a computer screen.6 He believes that VR applications in the classroom will enable students to immerse themselves in their subjects, such as being able to “walk through buildings they design” and “explore” them beyond “just the equations” involved in creating these structures.

At the lab’s recent showcase, he provided the visitors with (non-VR) 3D glasses and presented “an immersive video of a surgical procedure”. He drew the audience’s attention to the doctors at the operating table who were “crowing around” it. He believes that the use of 3D headsets would provide medical students a better means to “move around” and get an improved sense of what this experience is actually like in the operating room. (The September 22, 2015 Subway Fold post entitled VR in the OR: New Virtual Reality System for Planning, Practicing and Assisting in Surgery is also on point and provides extended coverage on this topic.)

While today’s early iterations of VR headsets (either available now or early in 2016), are “cumbersome”, researchers hope that they will evolve (in a manner similar to mobile phones which, in turn and as mentioned above, are presently a key element in VR viewers), and be applied in “hospitals, grocery stores and classrooms”.  Director Varshney can see them possibly developing along an even faster timeline.

My Questions

  • Is the establishment and operation of the Augmentarium a model that other universities should consider as a means to train students in this field, attract donations, and incubate potential VR and AR startups?
  • What entrepreneurial opportunities might exist for consulting, engineering and tech firms to set up comparable development labs at other schools and in private industry?
  • What other types of academic courses would benefit from VR and AR support? Could students now use these technologies to create or support their academic projects? What sort of grading standards might be applied to them?
  • Do the rapidly expanding markets for VR and AR require that some group in academia and/or the government establish technical and perhaps even ethical standards for such labs and their projects?
  • How are relevant potential intellectual property and technology transfer issues going to be negotiated, arbitrated and litigated if needed?


1.  Btw, has anyone ever figured out how the very elusive and mysterious “To Be Announced (TBA)”, the professor who appears in nearly all course catalogs, ends up teaching so many subjects at so many schools at the same time? He or she must have an incredibly busy schedule.

2.  These nine Subway Fold posts cover, among other VR and AR related stories, the technology of Oculus.

3.  This donation was reported in an article on September 11, 2014 in The Washington Post in an article entitled Brendan Iribe, Co-founder of Oculus VR, Makes Record $31 Million Donation to U-Md by Nick Anderson.

4.  See also the February 18, 2015 Subway Fold post entitled A Real Class Act: Massive Open Online Courses (MOOCs) are Changing the Learning Process.

5.  See also Designing Sound for Virtual Reality by Todd Baker posted on on December 21, 2015, for a thorough overview of this aspect of VR, and the August 5, 2015 Subway Fold post entitled  Latest Census on Virtual Senses: A Seminar on Augmented Reality in New York covering, among other AR technologies, the development work and 3D sound wireless headphones of Hooke Audio.

6.  On a somewhat related topic, see the December 18, 2015 Subway Fold post entitled Mind Over Subject Matter: Researchers Develop A Better Understanding of How Human Brains Manage So Much Information.