Ethical Issues and Considerations Arising in Big Data Research

Image from Pixabay

Image from Pixabay

In 48 of 50 states in the US, new attorneys are required to pass a 60 multiple-choice question exam on legal ethics in addition to passing their state’s bar exam. This is known as the Multistate Professional Responsibility Examination (MPRE). I well recall taking this test myself.

The subject matter of this test is the professional ethical roles and responsibilities a lawyer must abide by as an advocate and counselor to clients, courts and the legal profession. It is founded upon a series of ethical considerations and disciplinary rules that are strictly enforced by the bars of each state. Violations can potentially lead to a series of professional sanctions and, in severe cases depending upon the facts, disbarment from practice for a term of years or even permanently.

In other professions including, among others, medicine and accounting, similar codes of ethics exist and are expected to be scrupulously followed. They are defined efforts to ensure honesty, quality, transparency and integrity in their industries’ dealings with the public, and to address certain defined breaches. Many professional trade organizations also have formal codes of ethics but often do not have much, if any, sanction authority.

Should some comparable forms of guidelines and boards likewise be put into place to oversee the work of big data researchers? This was the subject of a very compelling article posted on Wired.com on May 20, 2016, entitled Scientists Are Just as Confused About the Ethics of Big-Data Research as You by Sharon Zhang. I highly recommend reading it in its entirety. I will summarize, annotate and add some further context to this, as well as pose a few questions of my own.

Two Recent Data Research Incidents

Last month. an independent researcher released, without permission, the profiles with very personal information of 70,000 users of the online dating site OKCupid. These users were quite angered by this. OKCupid is pursuing a legal claim to remove this data.

Earlier in 2014, researchers at Facebook manipulated items in users’ News Feeds for a study on “mood contagion“.¹ Many users were likewise upset when they found out. The journal that published this study released an “expression of concern”.

Users’ reactions over such incidents can have an effect upon subsequent “ethical boundaries”.

Nonetheless, the researchers involved in both of these cases had “never anticipated” the significant negative responses to their work. The OKCupid study was not scrutinized by any “ethical review process”, while a review board at Cornell had concluded that the Facebook study did not require a full review because the Cornell researchers only had a limited role in it.

Both of these incidents illustrate how “untested the ethics” are of these big data research. Only now are the review boards that oversee the work of these researchers starting to pay attention to emerging ethical concerns. This is in high contrast to the controls and guidelines upon medical research in clinical trials.

The Applicability of The Common Rule and Institutional Research Boards

In the US, under the The Common Rule, which governs ethics for federally funded biomedical and behavioral research where humans are involved, studies are required to undergo an ethical review.  However, such review does not apply a “unified system”, but rather, each university maintains its own institutional review board (IRB). These are composed of other (mostly medical) researchers at each university. Only a few of them “are professional ethicists“.

To a lesser extent, do they have experience in computer technology. This deficit may be affecting the protection of subjects who participate in data science research projects. In the US, there are hundreds of IRBs but they are each dealing with “research efforts in the digital age” in their own ways.

Both the Common Rule and the IRB system came into being following the revelation in the 1970s that the U.S. Public Health Service had, between 1932 and 1972, engaged in a terrible and shameful secret program that came to be known as the Tuskegee Syphilis Experiment. This involved leaving African Americans living in rural Alabama with untreated syphilis in order to study the disease. As a result of this outrage, the US Department of Health and Human Services created new regulations concerning any research on human subjects they conducted. All other federal agencies likewise adopted such regulations. Currently, “any institution that gets federal funding has to set up an IRB to oversee research involving humans”.

However, many social scientists today believe these regulations are not accurate or appropriate for their types of research involving areas where the risks involved “are usually more subtle than life or death”. For example, if you are seeking volunteers to take a survey on test-taking behaviors, the IRB language requirements on physical risks does not fit the needs of the participants in such a study.

Social scientist organizations have expressed their concern about this situation. As a result, the American Association of University Professors (AAUP) has recommended:

  • Adding more social scientists to IRBs, or
  • Creating new and separate review boards to assess social science research

In 2013, AAUP issued a report entitled Regulation of Research on Human Subjects: Academic Freedom and the Institutional Review Board, recommending that the researchers themselves should decide if “their minimal risk work needs IRB approval or not”. In turn, this would make more time available to IRBs for “biomedical research with life-or-death stakes”.

This does not, however, imply that all social science research, including big data studies, are entirely risk-free.

Ethical Issues and Risk Analyses When Data Sources Are Comingled

Dr. Elizabeth A. Buchanan who works as an ethicist at the University of Wisconsin-Stout, believes that the Internet is now entering its “third phase” where researchers can, for example, purchase several years’ worth of Twitter data and then integrate it “with other publicly available data”.² This mixture results in issues involving “ethics and privacy”.

Recently, while serving on an IRB, she took part in evaluated a project proposal involving merging mentions of a drug by its street name appearing on social media with public crime data. As a result, people involved in crimes could potentially become identified. The IRB still gave its approval. According to Dr. Buchanan, the social value of this undertaking must be weighed against its risk. As well, the risk should be minimized by removing any possible “idenifiers” in any public release of this information.

As technology continues to advance, such risk evaluation can become more challenging. For instance, in 2013, MIT researchers found out that they were able to match up “publicly available DNA sequences” by using data about the participants that the “original researchers” had uploaded online.³ Consequently, in such cases, Dr. Buchanan believes it is crucial for IRBs “to have either a data scientist, computer scientist or IT security individual” involved.

Likewise, other types of research organizations such as, among others, open science repositories, could perhaps “pick up the slack” and handle more of these ethical questions. According to Michelle Meyer, a bioethicist at Mount Sinai, oversight must be assumed by someone but the best means is not likely to be an IRB because they do not have the necessary “expertise in de-identification and re-identification techniques”.

Different Perspectives on Big Data Research

A technology researcher at the University of Maryland 4 named Dr. Katie Shilton recently conducted interviews of “20 online data researchers”. She discovered “significant disagreement” among them on matters such as the “ethics of ignoring Terms of Service and obtaining informed consent“. The group also reported that the ethical review boards they dealt with never questioned the ethics of the researchers, while peer reviewers and their professional colleagues had done so.

Professional groups such as the Association of Internet Researchers (AOIR) and the Center for Applied Internet Data Analysis (CAIDA) have created and posted their own guidelines:

However, IRBs who “actually have power” are only now “catching up”.

Beyond universities, tech companies such as Microsoft have begun to establish in-house “ethical review processes”. As well, in December 2015, the Future of Privacy Forum held a gathering called Beyond IRBs to evaluate “processes for ethical review outside of federally funded research”.

In conclusion., companies continually “experiment on us” with data studies. Just to name to name two, among numerous others, they focus on A/B testing 5 of news headings and supermarket checkout lines. As they hire increasing numbers of data scientists from universities’ Ph.D. programs, these schools are sensing an opportunity to close the gap in terms of using “data to contribute to public knowledge”.

My Questions

  • Would the companies, universities and professional organizations who issue and administer ethical guidelines for big data studies be taken more seriously if they had the power to assess and issue public notices for violations? How could this be made binding and what sort of appeals processes might be necessary?
  • At what point should the legal system become involved? When do these matters begin to involve civil and/or criminal investigations and allegations? How would big data research experts be certified for hearings and trials?
  • Should teaching ethics become a mandatory part of curriculum in data science programs at universities? If so, should the instructors only be selected from the technology industry or would it be helpful to invite them from other industries?
  • How should researchers and their employers ideally handle unintended security and privacy breaches as a result of their work? Should they make timely disclosures and treat all inquiries with a high level of transparency?
  • Should researchers experiment with open source methods online to conduct certain IRB functions for more immediate feedback?

 


1.  For a detailed report on this story, see Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry, by Vindu Goel, in the June 29, 2014 edition of The New York Times.

2These ten Subway Fold posts cover a variety of applications in analyzing Twitter usage data.

3.  For coverage on this story see an article published in The New York Times on January 17, 2013, entitled Web Hunt for DNA Sequences Leaves Privacy Compromised, by Gina Kolata.

4.  For another highly interesting but unrelated research initiative at the University of Maryland, see the December 27, 2015 Subway Fold post entitled Virtual Reality Universe-ity: The Immersive “Augmentarium” Lab at the U. of Maryland.

5.  For a detailed report on this methodology, see the September 30, 2015 Subway Fold post entitled Google’s A/B Testing Method is Being Applied to Improve Government Operations.

“Technographics” – A New Approach for B2B Marketers to Profile Their Customers’ Tech Systems

"Gold Rings - Sphere 1" Image by Linda K

“Gold Rings – Sphere 1” Image by Linda K

Today’s marketing and business development professionals use a wide array of big data collection and analytical tools to create and refine sophisticated profiles of market segments and their customer bases. These are deployed in order to systematically and scientifically target and sell their goods and services in steadily changing marketplaces.

These processes can include, among a multitude of other vast data sets and methodologies, demographics, web user metrics and econometrics. Businesses are always looking for a data-driven edge in highly competitive sectors and such profiling, when done correctly, can be very helpful in detecting and interpreting market trends, and consistently keeping ahead of their rivals. (The Subway Fold category of Big Data and Analytics now contains 50 posts about a variety of trends and applications in this field.)

I will briefly to this add my own long-term yet totally unscientific study of office-mess-ographics. Here I have been looking for any correlation between the relative states of organization – – or entropy – – in people’s offices and their work’s quality and output.  The results still remain inconclusive after years of study.

One of the most brilliant and accomplished people I have ever known had an office that resembled a cave deep in the earth with piles of paper resembling stalagmites all over it. Even more remarkably, he could reach into any one of those piles and pull out exactly the documents he wanted. His work space was so chaotic that there was a long-standing joke that Jimmy Hoffa’s and Judge Crater’s long-lost remains would be found whenever ever he retired and his office was cleaned out.

Speaking of office-focused analytics, an article posted on VentureBeat.com on March 5, 2016, entitled CMOs: ‘Technographics’ is the New Demographics, by Sean Zinsmeister, brought news of a most interesting new trend. I highly recommend reading this in its entirety. I will summarize and add some context to it, and then pose a few question-ographics of my own.

New Analytical Tool for B2B Marketers

Marketers are now using a new methodology call technography to analyze their customers’ “tech stack“, a term of art for the composition of their supporting systems and platforms. The objective of this approach is to deeply understand what this says about them as a company and, moreover, how can this be used in business-to-business (B2B) marketing campaigns. Thus applied, technography can identify “pain points” in products and alleviate them for current and prospective customers.

Using established consumer marketing methods, there is much to be learned and leveraged on how technology is being used by very granular segments of users bases.  For example:

By virtue of this type of technographic data, retailers can target their ads in anticipation of “which customers are most likely to shop in store, online, or via mobile”.

Next, by transposing this form of well-established marketing approach next upon B2B commerce, the objective is to carefully examine the tech stacks of current and future customers in order to gain a marketing advantage. That is, to “inform” a business’s strategy and identify potential new roles and needs to be met. These corporate tech stacks can include systems for:

  • Office productivity
  • Project management
  • Customer relationship management (CRM)
  • Marketing

Gathering and Interpreting Technographic Signals and Nuances

Technographics can provide unique and valuable insights into assessing, for example, whether a customer values scalability or ease-of-use more, and then act upon this.

As well, some of these technographic signals can be indicative of other factors not, per se, directly related to technology. This was the case at Eloqua, a financial technology concern. They noticed their marketing systems have predictive value in determining the company’s best prospects. Furthermore, they determined that companies running their software were inclined “to have a certain level of technological sophistication”, and were often large enough to have the capacity to purchase higher-end systems.

As business systems continually grow in their numbers and complexity, interpreting technographic nuances has also become more of a challenge. Hence, the application of artificial intelligence (AI) can be helpful in detecting additional useful patterns and trends. In a July 2011 TED Talk by Ted Slavin, directly on point here, entitled How Algorithms Shape Our World, he discussed how algorithms and machine learning are needed today to help make sense out of the massive and constantly growing amounts of data. (The Subway Fold category of Smart Systems contains 15 posts covering recent development and applications involving AI and machine learning.)

Technographic Resources and Use Cases

Currently, technographic signals are readily available from various data providers including:

They parse data using such factors as “web hosting, analytics, e-commerce, advertising, or content management platforms”. Another firm called Ghostery has a Chrome browser extension illuminating the technologies upon which any company’s website is built.

The next key considerations are to “define technographic profiles and determine next-best actions” for specific potential customers. For instance, an analytics company called Looker creates “highly targeted campaigns” aimed at businesses who use Amazon Web Services (AWS). The greater the number of marketers who undertake similar pursuits, the more they raise the value of their marketing programs.

Technographics can likewise be applied for competitive leverage in the following use cases:

  • Sales reps prospecting for new leads can be supported with more focused messages for potential new customers. These are shaped by understanding their particular motivations and business challenges.
  • Locating opportunities in new markets can be achieved by assessing the tech stacks of prospective customers. Such analytics can further be used for expanding business development and product development. An example is the online training platform by Mindflash. They detected a potential “demand for a Salesforce training program”. Once it became available, they employed technographic signals to pinpoint customers to whom they could present it.
  • Enterprise wide decision-making benefits can be achieved by adding “value in areas like cultural alignment”. Familiarity with such data for current employees and job seekers can aid businesses with understanding the “technology disposition” of their workers. Thereafter, its alignment with the “customers or partners” can be pursued.  Furthermore, identifying areas where additional training might be needed can help to alleviate productivity issues resulting from “technology disconnects between employees”.

Many businesses are not yet using technographic signals to their full advantage. By increasing such initiatives, businesses can acquire a much deeper understanding of their inherent values. In turn, the resulting insights can have a significant effect on the experiences of their customers and, in turn, elevate their resulting levels of loyalty, retention and revenue, as well as the magnitude of deals done.

My Questions

  • Would professional service industries such as law, medicine and accounting, and the vendors selling within these industries, benefit from integrating technographics into their own business development and marketing efforts?
  • Could there be, now or in the future, an emerging role for dedicated technographics specialists, trainers and consultants? Alternatively, should these new analytics just be treated as another new tool to be learned and implemented by marketers in their existing roles?
  • If a company identifies some of their own employees who might benefit from additional training, how can they be incentivized to participate in it? Could gamification techniques also be applied in creating these training programs?
  • What, if any, privacy concerns might surface in using technographics on potential customer leads and/or a company’s own internal staff?

New Chips are Using Deep Learning to Enhance Mobile, Camera and Auto Image Processing Capabilities

"Smartphone Photography", Image by AvenueTheory

“Smartphone Photography”, Image by AvenueTheory

We interface with our devices’ screens for inputs and outputs nearly all day and everyday. What many of the gadgets will soon be able to display and, moreover, understand about digital imagery is about to take a significant leap forward. This will be due to the pending arrival of new chips embedded into their circuitry that are enabled by artificial intelligence (AI) algorithms. Let’s have a look.

This story was reported in a most interesting article on TechnologyReview.com entitled Silicon Chips That See Are Going to Make Your Smartphone Brilliant by Tom Simonite on May 14, 2015. I will sum, annotate and pose some question about it.

The key technology behind these new chips is an AI methodology called deep learning. In these 10 recent Subway Fold posts, deep learning has been covered in a range of applications in various online and real world marketplaces including, among others, entertainment, news, social media, law, medicine, finance and education. The emergence of these smarter new chips will likely bring additional significant enhancements to all of them and many others insofar as their abilities to better comprehend the nature of the content of images.

Two major computer chip companies, Synopsis and Qualcomm, and the Chinese search firm Baidu, are developing systems, based upon deep learning, for mobile devices, autos and other screen-based hardware. They were discussed by their representatives at the May 2015 Embedded Vision Summit held on Tuesday, May 12, 2015, in Santa Clara, California. The companies’ representatives were:

  • Pierre Paul, the director of Research and Development at Synopsis, who presented a demo of a new chip core that “recognized speed limit signs” on the road for vehicles and enabled facial recognition for security apps. This chip uses less power than current chips on the market and, moreover, could add some “visual intelligence” to phone and car apps, and security cameras.  (Here is the link to the abstracts of the presentations, listed by speaker including Mr. Paul’s entitled Low-power Embedded Vision: A Face Tracker Case Study from the Summit’s website.)
  • Ren Wu, Distinguished Scientist, Baidu Institute of Deep Learning, said that deep learning-based chips are important for computers used for research, and called for making such intelligence as ubiquitous as possible. (Here is the link to the abstracts of the presentations, listed by speaker including Mr. Wu’s, entitled Enabling Ubiquitous Visual Intelligence Through Deep Learning from the Summit’s website.)

Both Wu and Gehlhaar said that adding more intelligence to mobile device’s ability to recognize photos could be used to address the privacy implications of some apps by lessening the quantity of personal data they upload to the web.

My questions are as follows:

  • Whether and how should social networks employ these chips? For example, what if such visually intelligent capabilities were to be added to the recently rolled out live video apps Periscope and MeerKat on Twitter?
  • Will these chips be adapted to the forthcoming commercial augmented and virtual reality systems (as discussed in the five recent Subway Fold posts)? If so, what new capabilities might they add to these environments?
  • What additional privacy and security concerns will need to be addressed by manufacturers, consumers and regulators as these chips are introduced into their respective marketplaces?