Ethical Issues and Considerations Arising in Big Data Research

Image from Pixabay

Image from Pixabay

In 48 of 50 states in the US, new attorneys are required to pass a 60 multiple-choice question exam on legal ethics in addition to passing their state’s bar exam. This is known as the Multistate Professional Responsibility Examination (MPRE). I well recall taking this test myself.

The subject matter of this test is the professional ethical roles and responsibilities a lawyer must abide by as an advocate and counselor to clients, courts and the legal profession. It is founded upon a series of ethical considerations and disciplinary rules that are strictly enforced by the bars of each state. Violations can potentially lead to a series of professional sanctions and, in severe cases depending upon the facts, disbarment from practice for a term of years or even permanently.

In other professions including, among others, medicine and accounting, similar codes of ethics exist and are expected to be scrupulously followed. They are defined efforts to ensure honesty, quality, transparency and integrity in their industries’ dealings with the public, and to address certain defined breaches. Many professional trade organizations also have formal codes of ethics but often do not have much, if any, sanction authority.

Should some comparable forms of guidelines and boards likewise be put into place to oversee the work of big data researchers? This was the subject of a very compelling article posted on Wired.com on May 20, 2016, entitled Scientists Are Just as Confused About the Ethics of Big-Data Research as You by Sharon Zhang. I highly recommend reading it in its entirety. I will summarize, annotate and add some further context to this, as well as pose a few questions of my own.

Two Recent Data Research Incidents

Last month. an independent researcher released, without permission, the profiles with very personal information of 70,000 users of the online dating site OKCupid. These users were quite angered by this. OKCupid is pursuing a legal claim to remove this data.

Earlier in 2014, researchers at Facebook manipulated items in users’ News Feeds for a study on “mood contagion“.¹ Many users were likewise upset when they found out. The journal that published this study released an “expression of concern”.

Users’ reactions over such incidents can have an effect upon subsequent “ethical boundaries”.

Nonetheless, the researchers involved in both of these cases had “never anticipated” the significant negative responses to their work. The OKCupid study was not scrutinized by any “ethical review process”, while a review board at Cornell had concluded that the Facebook study did not require a full review because the Cornell researchers only had a limited role in it.

Both of these incidents illustrate how “untested the ethics” are of these big data research. Only now are the review boards that oversee the work of these researchers starting to pay attention to emerging ethical concerns. This is in high contrast to the controls and guidelines upon medical research in clinical trials.

The Applicability of The Common Rule and Institutional Research Boards

In the US, under the The Common Rule, which governs ethics for federally funded biomedical and behavioral research where humans are involved, studies are required to undergo an ethical review.  However, such review does not apply a “unified system”, but rather, each university maintains its own institutional review board (IRB). These are composed of other (mostly medical) researchers at each university. Only a few of them “are professional ethicists“.

To a lesser extent, do they have experience in computer technology. This deficit may be affecting the protection of subjects who participate in data science research projects. In the US, there are hundreds of IRBs but they are each dealing with “research efforts in the digital age” in their own ways.

Both the Common Rule and the IRB system came into being following the revelation in the 1970s that the U.S. Public Health Service had, between 1932 and 1972, engaged in a terrible and shameful secret program that came to be known as the Tuskegee Syphilis Experiment. This involved leaving African Americans living in rural Alabama with untreated syphilis in order to study the disease. As a result of this outrage, the US Department of Health and Human Services created new regulations concerning any research on human subjects they conducted. All other federal agencies likewise adopted such regulations. Currently, “any institution that gets federal funding has to set up an IRB to oversee research involving humans”.

However, many social scientists today believe these regulations are not accurate or appropriate for their types of research involving areas where the risks involved “are usually more subtle than life or death”. For example, if you are seeking volunteers to take a survey on test-taking behaviors, the IRB language requirements on physical risks does not fit the needs of the participants in such a study.

Social scientist organizations have expressed their concern about this situation. As a result, the American Association of University Professors (AAUP) has recommended:

  • Adding more social scientists to IRBs, or
  • Creating new and separate review boards to assess social science research

In 2013, AAUP issued a report entitled Regulation of Research on Human Subjects: Academic Freedom and the Institutional Review Board, recommending that the researchers themselves should decide if “their minimal risk work needs IRB approval or not”. In turn, this would make more time available to IRBs for “biomedical research with life-or-death stakes”.

This does not, however, imply that all social science research, including big data studies, are entirely risk-free.

Ethical Issues and Risk Analyses When Data Sources Are Comingled

Dr. Elizabeth A. Buchanan who works as an ethicist at the University of Wisconsin-Stout, believes that the Internet is now entering its “third phase” where researchers can, for example, purchase several years’ worth of Twitter data and then integrate it “with other publicly available data”.² This mixture results in issues involving “ethics and privacy”.

Recently, while serving on an IRB, she took part in evaluated a project proposal involving merging mentions of a drug by its street name appearing on social media with public crime data. As a result, people involved in crimes could potentially become identified. The IRB still gave its approval. According to Dr. Buchanan, the social value of this undertaking must be weighed against its risk. As well, the risk should be minimized by removing any possible “idenifiers” in any public release of this information.

As technology continues to advance, such risk evaluation can become more challenging. For instance, in 2013, MIT researchers found out that they were able to match up “publicly available DNA sequences” by using data about the participants that the “original researchers” had uploaded online.³ Consequently, in such cases, Dr. Buchanan believes it is crucial for IRBs “to have either a data scientist, computer scientist or IT security individual” involved.

Likewise, other types of research organizations such as, among others, open science repositories, could perhaps “pick up the slack” and handle more of these ethical questions. According to Michelle Meyer, a bioethicist at Mount Sinai, oversight must be assumed by someone but the best means is not likely to be an IRB because they do not have the necessary “expertise in de-identification and re-identification techniques”.

Different Perspectives on Big Data Research

A technology researcher at the University of Maryland 4 named Dr. Katie Shilton recently conducted interviews of “20 online data researchers”. She discovered “significant disagreement” among them on matters such as the “ethics of ignoring Terms of Service and obtaining informed consent“. The group also reported that the ethical review boards they dealt with never questioned the ethics of the researchers, while peer reviewers and their professional colleagues had done so.

Professional groups such as the Association of Internet Researchers (AOIR) and the Center for Applied Internet Data Analysis (CAIDA) have created and posted their own guidelines:

However, IRBs who “actually have power” are only now “catching up”.

Beyond universities, tech companies such as Microsoft have begun to establish in-house “ethical review processes”. As well, in December 2015, the Future of Privacy Forum held a gathering called Beyond IRBs to evaluate “processes for ethical review outside of federally funded research”.

In conclusion., companies continually “experiment on us” with data studies. Just to name to name two, among numerous others, they focus on A/B testing 5 of news headings and supermarket checkout lines. As they hire increasing numbers of data scientists from universities’ Ph.D. programs, these schools are sensing an opportunity to close the gap in terms of using “data to contribute to public knowledge”.

My Questions

  • Would the companies, universities and professional organizations who issue and administer ethical guidelines for big data studies be taken more seriously if they had the power to assess and issue public notices for violations? How could this be made binding and what sort of appeals processes might be necessary?
  • At what point should the legal system become involved? When do these matters begin to involve civil and/or criminal investigations and allegations? How would big data research experts be certified for hearings and trials?
  • Should teaching ethics become a mandatory part of curriculum in data science programs at universities? If so, should the instructors only be selected from the technology industry or would it be helpful to invite them from other industries?
  • How should researchers and their employers ideally handle unintended security and privacy breaches as a result of their work? Should they make timely disclosures and treat all inquiries with a high level of transparency?
  • Should researchers experiment with open source methods online to conduct certain IRB functions for more immediate feedback?

 


1.  For a detailed report on this story, see Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry, by Vindu Goel, in the June 29, 2014 edition of The New York Times.

2These ten Subway Fold posts cover a variety of applications in analyzing Twitter usage data.

3.  For coverage on this story see an article published in The New York Times on January 17, 2013, entitled Web Hunt for DNA Sequences Leaves Privacy Compromised, by Gina Kolata.

4.  For another highly interesting but unrelated research initiative at the University of Maryland, see the December 27, 2015 Subway Fold post entitled Virtual Reality Universe-ity: The Immersive “Augmentarium” Lab at the U. of Maryland.

5.  For a detailed report on this methodology, see the September 30, 2015 Subway Fold post entitled Google’s A/B Testing Method is Being Applied to Improve Government Operations.

LinkNYC Rollout Brings Speedy Free WiFi and New Opportunities for Marketers to New York

Link.NYC WiFi Kiosk 5, Image by Alan Rothman

Link.NYC WiFi Kiosk 5, Image by Alan Rothman

Back in the halcyon days of yore before the advent of smartphones and WiFi, there were payphones and phone booths all over of the streets in New York. Most have disappeared, but a few scattered survivors have still managed to hang on. An article entitled And Then There Were Four: Phone Booths Saved on Upper West Side Sidewalks, by Corey Kilgannon, posted on NYTimes.com on February 10, 2016, recounts the stories of some of the last lonely public phones.

Taking their place comes a highly innovative new program called LinkNYC (also @LinkNYC and #LinkNYC). This initiative has just begun to roll out across all five boroughs with a network of what will become thousands of WiFi kiosks providing free and way fast free web access and phone calling, plus a host of other online NYC support services. The kiosks occupy the same physical spaces as the previous payphones.

The first batch of them has started to appear along Third Avenue in Manhattan. I took the photos accompanying this post of one kiosk at the corner of 14th Street and Third Avenue. While standing there, I was able to connect to the web on my phone and try out some of the LinkNYC functions. My reaction: This is very cool beans!

LinkNYC also presents some potentially great new opportunities for marketers. The launch of the program and the companies getting into it on the ground floor were covered in a terrific new article on AdWeek.com on February 15, 2015 entitled What It Means for Consumers and Brands That New York Is Becoming a ‘Smart City’, by Janet Stilson. I recommend reading it in its entirety. I will summarize and annotate it to add some additional context, and pose some of my own ad-free questions.

LinkNYC Set to Proliferate Across NYC

Link.NYC WiFi Kiosk 2, Image by Alan Rothman

Link.NYC WiFi Kiosk 2, Image by Alan Rothman

When completed, LinkNYC will give New York a highly advanced mobile network spanning the entire city. Moreover, it will help to transform it into a very well-wired “smart city“.¹ That is, an urban area comprehensively collecting, analyzing and optimizing vast quantities of data generated by a wide array of sensors and other technologies. It is a network and a host of network effects where a city learns about itself and leverages this knowledge for multiple benefits for it citizenry.²

Beyond mobile devices and advertising, smart cities can potentially facilitate many other services. The consulting firm Frost & Sullivan predicts that there will be 26 smart cities across the globe during by 2025. Currently, everyone is looking to NYC to see how the implementation of LinkNYC works out.

According to Mike Gamaroff, the head of innovation in the New York office of Kinetic Active a global media and marketing firm, LinkNYC is primarily a “utility” for New Yorkers as well as “an advertising network”. Its throughput rates are at gigabit speeds thereby making it the fastest web access available when compared to large commercial ISP’s average rates of merely 20 to 30 megabits.

Nick Cardillicchio, a strategic account manager at Civiq Smartscapes, the designer and manufacturer of the LinkNYC kiosks, said that LinkNYC is the only place where consumers can access the Net at such speeds. For the AdWeek.com article, he took the writer, Janet Stilson, on a tour of the kiosks include the one at Third Avenue and 14th Street, where one of the first ones is in place. (Coincidentally, this is the same kiosk I photographed for this post.)

There are a total of 16 currently operational for the initial testing. The WiFi web access is accessible with 150 feet of the kiosk and can range up to 400 feet. Perhaps those New Yorkers actually living within this range will soon no longer need their commercial ISPs.

Link.NYC WiFi Kiosk 4, Image by Alan Rothman

Link.NYC WiFi Kiosk 4, Image by Alan Rothman

The initial advertisers appearing in rotation on the large digital screen include Poland Spring (see the photo at the right), MillerCoors, Pager and Citibank. Eventually “smaller tablet screens” will be added to enable users to make free domestic voice or video calls. As well, they will present maps, local activities and emergency information in and about NYC. Users will also be able to charge up their mobile devices.

However, it is still too soon to assess and quantify the actual impact on such providers. According to David Krupp, CEO, North America, for Kinetic, neither Poland Spring nor MillerCoors has produced an adequate amount of data to yet analyze their respective LinkNYC ad campaigns. (Kinetic is involved in supporting marketing activities.)

Commercializing the Kiosks

The organization managing LinkNYC, the CityBridge consortium (consisting of Qualcomm, Intersection, and Civiq Smartscapes) , is not yet indicating when the new network will progress into a more “commercial stage”. However, once the network is fully implemented with the next few years, the number of kiosks might end up being somewhere between 75,000 and 10,000. That would make it the largest such network in the world.

CityBridge is also in charge of all the network’s advertising sales. These revenues will be split with the city. Under the 12-year contract now in place, this arrangement is predicted to produce $500M for NYC, with positive cash flow anticipated within 5 years. Brad Gleeson, the chief commercial officer at Civiq, said this project depends upon the degree to which LinkNYC is “embraced by Madison Avenue” and the time need for the network to reach “critical mass”.

Because of the breadth and complexity of this project, achieving this inflection point will be quite challenging according to David Etherington, the chief strategy officer at Intersection. He expressed his firm’s “dreams and aspirations” for LinkNYC, including providing advertisers with “greater strategic and creative flexibility”, offering such capabilities as:

  • Dayparting  – dividing a day’s advertising into several segments dependent on a range of factors about the intended audience, and
  • Hypertargeting – delivering advertising to very highly defined segments of an audience

Barry Frey, the president and CEO of the Digital Place-based Advertising Association, was also along for the tour of the new kiosks on Third Avenue. He was “impressed” by the capability it will offer advertisers to “co-locate their signs and fund services to the public” for such services as free WiFi and long-distance calling.

As to the brand marketers:

  • MillerCoors is using information at each kiosk location from Shazam, for the company’s “Sounds of the Street” ad campaign which presents “lists of the most-Shazammed tunes in the area”. (For more about Shazam, see the December 10, 2014 Subway Fold post entitled Is Big Data Calling and Calculating the Tune in Today’s Global Music Market?)
  • Poland Spring is now running a 5-week campaign featuring a digital ad (as seen in the third photo above). It relies upon “the brand’s popularity in New York”.

Capturing and Interpreting the Network’s Data

Link.NYC WiFi Kiosk 1, Image by Alan Rothman

Link.NYC WiFi Kiosk 1, Image by Alan Rothman

Thus far, LinkNYC has been “a little vague” about its methods for capturing the network’s data, but has said that it will maintain the privacy of all consumers’ information. One source has indicated that LinkNYC will collect, among other points “age, gender and behavioral data”. As well, the kiosks can track mobile devices within its variably 150 to 400 WiFi foot radius to ascertain the length of time a user stops by.  Third-party data is also being added to “round out the information”.³

Some industry experts’ expectations of the value and applications of this data include:

  • Helma Larkin, the CEO of Posterscope, a New York based firm specializing in “out-of- home communications (OOH)“, believes that LinkNYC is an entirely “new out-of-home medium”. This is because the data it will generate “will enhance the media itself”. The LinkNYC initiative presents an opportunity to build this network “from the ground up”. It will also create an opportunity to develop data about its own audience.
  • David Krupp of Kinetic thinks that data that will be generated will be quite meaningful insofar as producing a “more hypertargeted connection to consumers”.

Other US and International Smart City Initiatives

Currently in the US, there is nothing else yet approaching the scale of LinkNYC. Nonetheless, Kansas City is now developing a “smaller advertiser-supported  network of kiosks” with wireless support from Sprint. Other cities are also working on smart city projects. Civiq is now in discussions with about 20 of them.

Internationally, Rio de Janeiro is working on a smart city program in conjunction with the 2016 Olympics. This project is being supported by Renato Lucio de Castro, a consultant on smart city projects. (Here is a brief video of him describing this undertaking.)

A key challenge facing all smart city projects is finding officials in local governments who likewise have the enthusiasm for efforts like LinkNYC. Michael Lake, the CEO of Leading Cities, a firm that help cities with smart city projects, believes that programs such as LinkNYC will “continue to catch on” because of the additional security benefits they provide and the revenues they can generate.

My Questions

  • Should domestic and international smart cities to cooperate to share their resources, know-how and experience for each other’s mutual benefit? Might this in some small way help to promote urban growth and development on a more cooperative global scale?
  • Should LinkNYC also consider offering civic support services such as voter registration or transportation scheduling apps as well as charitable functions where pedestrians can donate to local causes?
  • Should LinkNYC add some augmented reality capabilities to enhance the data capabilities and displays of the kiosks? (See these 10 Subway Fold posts covering a range of news and trends on this technology.)

February 19, 2017 Update:  For the latest status report on LinkNYC nearly a year after this post was first uploaded, please see After Controversy, LinkNYC Finds Its Niche, by Gerald Schifman, on CrainsNewYork.com, dated February 15, 2017.


1.   While Googling “smart cities” might nearly cause the Earth to shift off its axis with its resulting 70 million hits, I suggest reading a very informative and timely feature from the December 11, 2015 edition of The Wall Street Journal entitled As World Crowds In, Cities Become Digital Laboratories, by Robert Lee Hotz.

2.   Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia (W. W. Norton & Company, 2013), by Anthony M. Townsend, is a deep and wide book-length exploration of how big data and analytics are being deployed in large urban areas by local governments and independent citizens. I very highly recommend reading this fascinating exploration of the nearly limitless possibilities for smart cities.

3.   See, for example, How Publishers Utilize Big Data for Audience Segmentation, by Arvid Tchivzhel, posted on Datasciencecentral.com on November 17, 2015


These items just in from the Pop Culture Department: It would seem nearly impossible to film an entire movie thriller about a series of events centered around a public phone, but a movie called – – not so surprisingly – – Phone Booth managed to do this quite effectively in 2002. It stared Colin Farrell, Kiefer Sutherland and Forest Whitaker. Imho, it is still worth seeing.

Furthermore, speaking of Kiefer Sutherland, Fox announced on January 15, 2016 that it will be making 24: Legacy, a complete reboot of the 24 franchise, this time without him playing Jack Bauer. Rather, they have cast Corey Hawkins in the lead role. Hawkins can now be seen doing an excellent job playing Heath on season 6 of The Walking Dead. Watch out Grimes Gang, here comes Negan!!


NASA is Providing Support for Musical and Humanitarian Projects

"NASA - Endeavor 2", Image by NASA

“NASA – Endeavor 2”, Image by NASA

In two recent news stories, NASA has generated a world of good will and positive publicity about itself and its space exploration program. It would be an understatement to say their results have been both well-grounded and out of this world.

First, NASA astronaut Chris Hadfield created a vast following for himself online when he uploaded a video onto YouTube of him singing David Bowie’s classic Space Oddity while on a mission on the International Space Station (ISS).¹ As reported on the October 7, 2015 CBS Evening News broadcast, Hadfield will be releasing an album of 12 songs he wrote and performed in space, today on October 9. 2015. He also previously wrote a best-selling book entitled An Astronaut’s Guide to Life on Earth: What Going to Space Taught Me About Ingenuity, Determination, and Being Prepared for Anything (Little, Brown and Company, 2013). I highly recommend checking out his video, book and Twitter account @Cmdr_Hadfield.

What a remarkably accomplished career in addition to his becoming an unofficial good will ambassador for NASA.

The second story, further enhancing the agency’s reputation, concerns a very positive program affecting many lives that was reported in a most interesting article on Wired.com on September 28, 2015 entitled How NASA Data Can Save Lives From Space by Issie Lapowsky. I will summarize and annotate it, and then pose some my own terrestrial questions.

Agencies’ Partnership

According to a NASA administrator Charles Bolden, astronauts frequently look down at the Earth from space and realize that borders across the world are subjectively imposed by warfare or wealth. These dividing lines between nations seem to become less meaningful to them while they are in flight. Instead, the astronauts tend to look at the Earth and have a greater awareness everyone’s responsibilities to each other. Moreover, they wonder what they can possibly do when they return to make some sort of meaningful difference on the ground.

Bolden recently shared this experience with an audience at the United States Agency for International Development (USAID) in Washington, DC, to explain the reasoning behind a decade-long partnership between NASA and USAID. (This latter is the US government agency responsible for the administration of US foreign aid.) At first, this would seem to be an unlikely joint operation between two government agencies that do not seem to have that much in common.

In fact, this combination provides “a unique perspective on the grave need that exists in so many places around the world”, and a special case where one agency sees it from space and the other one sees it on the ground.

They are joined together into a partnership known as SERVIR where NASA supplies “imagery, data, and analysis” to assist developing nations.  They help these countries with forecasting and dealing “with natural disasters and the effects of climate change”.

Partnership’s Results

Among others, SERVIR’s tools have produced the following representative results:

  • Predicting floods in Bangladesh that gives citizens a total of eight days notice in order to make preparations that will save lives. This reduced the number to 17 during the last year’s monsoon season whereas previously it had been in the thousands.
  • Predicting forest fires in the Himalayas.
  • For central America, NASA created  a map of ocean chlorophyll concentration that assisted public officials in identifying and improving shellfish testing in order to deal with “micro-algae outbreaks” responsible for causing significant health issues.

SERVIR currently operates in 30 countries. As a part of their network, there are regional hubs working with “local partners to implement the tools”. Last week it opened such a hub in Asia’s Mekong region. Both NASA and USAID are hopeful that the number of such hubs will continue to grow.

Google is also assisting with “life saving information from satellite imagery”. They are doing this by applying artificial intelligence (AI)² capabilities to Google Earth. This project is still in its preliminary stages.

My Questions

  • Should SERVIR reach out to the space agencies and humanitarian organizations of other countries to explore similar types of humanitarian joint ventures?
  • Do the space agencies of other countries have similar partnerships with their own aid agencies?
  • Would SERVIR benefit from partnerships with other US government agencies? Similarly, would it benefit from partnering with other humanitarian non-governmental organizations (NGO)?
  • Would SERVIR be the correct organization to provide assistance in global environmental issues? Take for example the report on the October 8, 2015 CBS Evening News network broadcast of the story about the bleaching of coral reefs around the world.

 


1.  While Hatfield’s cover and Bowie’s original version of Space Oddity are most often associated in pop culture with space exploration, I would like to suggest another song that also captures this spirit and then truly electrifies it: Space Truckin’ by Deep Purple. This appeared on their Machine Head album which will be remembered for all eternity because it included the iconic Smoke on the Water. Nonetheless, Space Truckin‘ is, in my humble opinion, a far more propulsive tune than Space Oddity. Its infectious opening riff will instantly grab your attention while the rest of the song races away like a Saturn Rocket reaching for escape velocity. Furthermore, the musicianship on this recording is extraordinary. Pay close attention to Richie Blackmore’s scorching lead guitar and Ian Paice’s thundering drums. Come on, let’s go space truckin’!

2. These eight Subway Fold posts cover AI from a number of different perspectives involving a series of different applications and markets.