Ethical Issues and Considerations Arising in Big Data Research

Image from Pixabay

Image from Pixabay

In 48 of 50 states in the US, new attorneys are required to pass a 60 multiple-choice question exam on legal ethics in addition to passing their state’s bar exam. This is known as the Multistate Professional Responsibility Examination (MPRE). I well recall taking this test myself.

The subject matter of this test is the professional ethical roles and responsibilities a lawyer must abide by as an advocate and counselor to clients, courts and the legal profession. It is founded upon a series of ethical considerations and disciplinary rules that are strictly enforced by the bars of each state. Violations can potentially lead to a series of professional sanctions and, in severe cases depending upon the facts, disbarment from practice for a term of years or even permanently.

In other professions including, among others, medicine and accounting, similar codes of ethics exist and are expected to be scrupulously followed. They are defined efforts to ensure honesty, quality, transparency and integrity in their industries’ dealings with the public, and to address certain defined breaches. Many professional trade organizations also have formal codes of ethics but often do not have much, if any, sanction authority.

Should some comparable forms of guidelines and boards likewise be put into place to oversee the work of big data researchers? This was the subject of a very compelling article posted on Wired.com on May 20, 2016, entitled Scientists Are Just as Confused About the Ethics of Big-Data Research as You by Sharon Zhang. I highly recommend reading it in its entirety. I will summarize, annotate and add some further context to this, as well as pose a few questions of my own.

Two Recent Data Research Incidents

Last month. an independent researcher released, without permission, the profiles with very personal information of 70,000 users of the online dating site OKCupid. These users were quite angered by this. OKCupid is pursuing a legal claim to remove this data.

Earlier in 2014, researchers at Facebook manipulated items in users’ News Feeds for a study on “mood contagion“.¹ Many users were likewise upset when they found out. The journal that published this study released an “expression of concern”.

Users’ reactions over such incidents can have an effect upon subsequent “ethical boundaries”.

Nonetheless, the researchers involved in both of these cases had “never anticipated” the significant negative responses to their work. The OKCupid study was not scrutinized by any “ethical review process”, while a review board at Cornell had concluded that the Facebook study did not require a full review because the Cornell researchers only had a limited role in it.

Both of these incidents illustrate how “untested the ethics” are of these big data research. Only now are the review boards that oversee the work of these researchers starting to pay attention to emerging ethical concerns. This is in high contrast to the controls and guidelines upon medical research in clinical trials.

The Applicability of The Common Rule and Institutional Research Boards

In the US, under the The Common Rule, which governs ethics for federally funded biomedical and behavioral research where humans are involved, studies are required to undergo an ethical review.  However, such review does not apply a “unified system”, but rather, each university maintains its own institutional review board (IRB). These are composed of other (mostly medical) researchers at each university. Only a few of them “are professional ethicists“.

To a lesser extent, do they have experience in computer technology. This deficit may be affecting the protection of subjects who participate in data science research projects. In the US, there are hundreds of IRBs but they are each dealing with “research efforts in the digital age” in their own ways.

Both the Common Rule and the IRB system came into being following the revelation in the 1970s that the U.S. Public Health Service had, between 1932 and 1972, engaged in a terrible and shameful secret program that came to be known as the Tuskegee Syphilis Experiment. This involved leaving African Americans living in rural Alabama with untreated syphilis in order to study the disease. As a result of this outrage, the US Department of Health and Human Services created new regulations concerning any research on human subjects they conducted. All other federal agencies likewise adopted such regulations. Currently, “any institution that gets federal funding has to set up an IRB to oversee research involving humans”.

However, many social scientists today believe these regulations are not accurate or appropriate for their types of research involving areas where the risks involved “are usually more subtle than life or death”. For example, if you are seeking volunteers to take a survey on test-taking behaviors, the IRB language requirements on physical risks does not fit the needs of the participants in such a study.

Social scientist organizations have expressed their concern about this situation. As a result, the American Association of University Professors (AAUP) has recommended:

  • Adding more social scientists to IRBs, or
  • Creating new and separate review boards to assess social science research

In 2013, AAUP issued a report entitled Regulation of Research on Human Subjects: Academic Freedom and the Institutional Review Board, recommending that the researchers themselves should decide if “their minimal risk work needs IRB approval or not”. In turn, this would make more time available to IRBs for “biomedical research with life-or-death stakes”.

This does not, however, imply that all social science research, including big data studies, are entirely risk-free.

Ethical Issues and Risk Analyses When Data Sources Are Comingled

Dr. Elizabeth A. Buchanan who works as an ethicist at the University of Wisconsin-Stout, believes that the Internet is now entering its “third phase” where researchers can, for example, purchase several years’ worth of Twitter data and then integrate it “with other publicly available data”.² This mixture results in issues involving “ethics and privacy”.

Recently, while serving on an IRB, she took part in evaluated a project proposal involving merging mentions of a drug by its street name appearing on social media with public crime data. As a result, people involved in crimes could potentially become identified. The IRB still gave its approval. According to Dr. Buchanan, the social value of this undertaking must be weighed against its risk. As well, the risk should be minimized by removing any possible “idenifiers” in any public release of this information.

As technology continues to advance, such risk evaluation can become more challenging. For instance, in 2013, MIT researchers found out that they were able to match up “publicly available DNA sequences” by using data about the participants that the “original researchers” had uploaded online.³ Consequently, in such cases, Dr. Buchanan believes it is crucial for IRBs “to have either a data scientist, computer scientist or IT security individual” involved.

Likewise, other types of research organizations such as, among others, open science repositories, could perhaps “pick up the slack” and handle more of these ethical questions. According to Michelle Meyer, a bioethicist at Mount Sinai, oversight must be assumed by someone but the best means is not likely to be an IRB because they do not have the necessary “expertise in de-identification and re-identification techniques”.

Different Perspectives on Big Data Research

A technology researcher at the University of Maryland 4 named Dr. Katie Shilton recently conducted interviews of “20 online data researchers”. She discovered “significant disagreement” among them on matters such as the “ethics of ignoring Terms of Service and obtaining informed consent“. The group also reported that the ethical review boards they dealt with never questioned the ethics of the researchers, while peer reviewers and their professional colleagues had done so.

Professional groups such as the Association of Internet Researchers (AOIR) and the Center for Applied Internet Data Analysis (CAIDA) have created and posted their own guidelines:

However, IRBs who “actually have power” are only now “catching up”.

Beyond universities, tech companies such as Microsoft have begun to establish in-house “ethical review processes”. As well, in December 2015, the Future of Privacy Forum held a gathering called Beyond IRBs to evaluate “processes for ethical review outside of federally funded research”.

In conclusion., companies continually “experiment on us” with data studies. Just to name to name two, among numerous others, they focus on A/B testing 5 of news headings and supermarket checkout lines. As they hire increasing numbers of data scientists from universities’ Ph.D. programs, these schools are sensing an opportunity to close the gap in terms of using “data to contribute to public knowledge”.

My Questions

  • Would the companies, universities and professional organizations who issue and administer ethical guidelines for big data studies be taken more seriously if they had the power to assess and issue public notices for violations? How could this be made binding and what sort of appeals processes might be necessary?
  • At what point should the legal system become involved? When do these matters begin to involve civil and/or criminal investigations and allegations? How would big data research experts be certified for hearings and trials?
  • Should teaching ethics become a mandatory part of curriculum in data science programs at universities? If so, should the instructors only be selected from the technology industry or would it be helpful to invite them from other industries?
  • How should researchers and their employers ideally handle unintended security and privacy breaches as a result of their work? Should they make timely disclosures and treat all inquiries with a high level of transparency?
  • Should researchers experiment with open source methods online to conduct certain IRB functions for more immediate feedback?

 


1.  For a detailed report on this story, see Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry, by Vindu Goel, in the June 29, 2014 edition of The New York Times.

2These ten Subway Fold posts cover a variety of applications in analyzing Twitter usage data.

3.  For coverage on this story see an article published in The New York Times on January 17, 2013, entitled Web Hunt for DNA Sequences Leaves Privacy Compromised, by Gina Kolata.

4.  For another highly interesting but unrelated research initiative at the University of Maryland, see the December 27, 2015 Subway Fold post entitled Virtual Reality Universe-ity: The Immersive “Augmentarium” Lab at the U. of Maryland.

5.  For a detailed report on this methodology, see the September 30, 2015 Subway Fold post entitled Google’s A/B Testing Method is Being Applied to Improve Government Operations.

Mary Meeker’s 2016 Internet Trends Presentation

"Blue Marble - 2002", Image by NASA Goddard Space Flight Center

“Blue Marble – 2002”, Image by NASA Goddard Space Flight Center

On June 1, 2016, at the 2016 Code Conference held this week in California, Mary Meeker, a world-renowned Internet expert and partner in the venture capital firm Kleiner Perkins, presented her fifteenth annual in-depth and highly analytical presentation on current Internet trends. It is an absolutely remarkable accomplishment that is highly respected throughout the global technology industry and economy. The video of her speech is available here on Recode.com

Her 2016 Internet Trends presentation file is divided into a series of eight main sections covering, among many other things: Internet user and financial growth rates, online advertising, generational market segments and technological preferences, new products and vendors, mobile screens for nearly everything, e-commerce, big data, privacy issues, video growth on social media platforms, messaging systems , smartphone growth,  voice interfaces, consumer spending, online security, connectivity, Facebook’s v. Google’s growth rates, and massive consumer markets in China and India. That is just the tip of the tip of the iceberg in this 213-slide file.

Ms. Meeker’s assessments and predictions here form an extraordinarily comprehensive and insightful piece of work. There is much here for anyone and everyone to learn and consider in the current and trending states nearly anything and everything online. Moreover, there are likely many potential opportunities for new and established businesses, as well as other institutions, within this file.

I very highly recommend that you set aside some time to thoroughly read through Ms. Meeker’s full presentation. You will be richly rewarded with knowledge and insight that can potentially yield a world of informative and practical dividends.

“Technographics” – A New Approach for B2B Marketers to Profile Their Customers’ Tech Systems

"Gold Rings - Sphere 1" Image by Linda K

“Gold Rings – Sphere 1” Image by Linda K

Today’s marketing and business development professionals use a wide array of big data collection and analytical tools to create and refine sophisticated profiles of market segments and their customer bases. These are deployed in order to systematically and scientifically target and sell their goods and services in steadily changing marketplaces.

These processes can include, among a multitude of other vast data sets and methodologies, demographics, web user metrics and econometrics. Businesses are always looking for a data-driven edge in highly competitive sectors and such profiling, when done correctly, can be very helpful in detecting and interpreting market trends, and consistently keeping ahead of their rivals. (The Subway Fold category of Big Data and Analytics now contains 50 posts about a variety of trends and applications in this field.)

I will briefly to this add my own long-term yet totally unscientific study of office-mess-ographics. Here I have been looking for any correlation between the relative states of organization – – or entropy – – in people’s offices and their work’s quality and output.  The results still remain inconclusive after years of study.

One of the most brilliant and accomplished people I have ever known had an office that resembled a cave deep in the earth with piles of paper resembling stalagmites all over it. Even more remarkably, he could reach into any one of those piles and pull out exactly the documents he wanted. His work space was so chaotic that there was a long-standing joke that Jimmy Hoffa’s and Judge Crater’s long-lost remains would be found whenever ever he retired and his office was cleaned out.

Speaking of office-focused analytics, an article posted on VentureBeat.com on March 5, 2016, entitled CMOs: ‘Technographics’ is the New Demographics, by Sean Zinsmeister, brought news of a most interesting new trend. I highly recommend reading this in its entirety. I will summarize and add some context to it, and then pose a few question-ographics of my own.

New Analytical Tool for B2B Marketers

Marketers are now using a new methodology call technography to analyze their customers’ “tech stack“, a term of art for the composition of their supporting systems and platforms. The objective of this approach is to deeply understand what this says about them as a company and, moreover, how can this be used in business-to-business (B2B) marketing campaigns. Thus applied, technography can identify “pain points” in products and alleviate them for current and prospective customers.

Using established consumer marketing methods, there is much to be learned and leveraged on how technology is being used by very granular segments of users bases.  For example:

By virtue of this type of technographic data, retailers can target their ads in anticipation of “which customers are most likely to shop in store, online, or via mobile”.

Next, by transposing this form of well-established marketing approach next upon B2B commerce, the objective is to carefully examine the tech stacks of current and future customers in order to gain a marketing advantage. That is, to “inform” a business’s strategy and identify potential new roles and needs to be met. These corporate tech stacks can include systems for:

  • Office productivity
  • Project management
  • Customer relationship management (CRM)
  • Marketing

Gathering and Interpreting Technographic Signals and Nuances

Technographics can provide unique and valuable insights into assessing, for example, whether a customer values scalability or ease-of-use more, and then act upon this.

As well, some of these technographic signals can be indicative of other factors not, per se, directly related to technology. This was the case at Eloqua, a financial technology concern. They noticed their marketing systems have predictive value in determining the company’s best prospects. Furthermore, they determined that companies running their software were inclined “to have a certain level of technological sophistication”, and were often large enough to have the capacity to purchase higher-end systems.

As business systems continually grow in their numbers and complexity, interpreting technographic nuances has also become more of a challenge. Hence, the application of artificial intelligence (AI) can be helpful in detecting additional useful patterns and trends. In a July 2011 TED Talk by Ted Slavin, directly on point here, entitled How Algorithms Shape Our World, he discussed how algorithms and machine learning are needed today to help make sense out of the massive and constantly growing amounts of data. (The Subway Fold category of Smart Systems contains 15 posts covering recent development and applications involving AI and machine learning.)

Technographic Resources and Use Cases

Currently, technographic signals are readily available from various data providers including:

They parse data using such factors as “web hosting, analytics, e-commerce, advertising, or content management platforms”. Another firm called Ghostery has a Chrome browser extension illuminating the technologies upon which any company’s website is built.

The next key considerations are to “define technographic profiles and determine next-best actions” for specific potential customers. For instance, an analytics company called Looker creates “highly targeted campaigns” aimed at businesses who use Amazon Web Services (AWS). The greater the number of marketers who undertake similar pursuits, the more they raise the value of their marketing programs.

Technographics can likewise be applied for competitive leverage in the following use cases:

  • Sales reps prospecting for new leads can be supported with more focused messages for potential new customers. These are shaped by understanding their particular motivations and business challenges.
  • Locating opportunities in new markets can be achieved by assessing the tech stacks of prospective customers. Such analytics can further be used for expanding business development and product development. An example is the online training platform by Mindflash. They detected a potential “demand for a Salesforce training program”. Once it became available, they employed technographic signals to pinpoint customers to whom they could present it.
  • Enterprise wide decision-making benefits can be achieved by adding “value in areas like cultural alignment”. Familiarity with such data for current employees and job seekers can aid businesses with understanding the “technology disposition” of their workers. Thereafter, its alignment with the “customers or partners” can be pursued.  Furthermore, identifying areas where additional training might be needed can help to alleviate productivity issues resulting from “technology disconnects between employees”.

Many businesses are not yet using technographic signals to their full advantage. By increasing such initiatives, businesses can acquire a much deeper understanding of their inherent values. In turn, the resulting insights can have a significant effect on the experiences of their customers and, in turn, elevate their resulting levels of loyalty, retention and revenue, as well as the magnitude of deals done.

My Questions

  • Would professional service industries such as law, medicine and accounting, and the vendors selling within these industries, benefit from integrating technographics into their own business development and marketing efforts?
  • Could there be, now or in the future, an emerging role for dedicated technographics specialists, trainers and consultants? Alternatively, should these new analytics just be treated as another new tool to be learned and implemented by marketers in their existing roles?
  • If a company identifies some of their own employees who might benefit from additional training, how can they be incentivized to participate in it? Could gamification techniques also be applied in creating these training programs?
  • What, if any, privacy concerns might surface in using technographics on potential customer leads and/or a company’s own internal staff?

LinkNYC Rollout Brings Speedy Free WiFi and New Opportunities for Marketers to New York

Link.NYC WiFi Kiosk 5, Image by Alan Rothman

Link.NYC WiFi Kiosk 5, Image by Alan Rothman

Back in the halcyon days of yore before the advent of smartphones and WiFi, there were payphones and phone booths all over of the streets in New York. Most have disappeared, but a few scattered survivors have still managed to hang on. An article entitled And Then There Were Four: Phone Booths Saved on Upper West Side Sidewalks, by Corey Kilgannon, posted on NYTimes.com on February 10, 2016, recounts the stories of some of the last lonely public phones.

Taking their place comes a highly innovative new program called LinkNYC (also @LinkNYC and #LinkNYC). This initiative has just begun to roll out across all five boroughs with a network of what will become thousands of WiFi kiosks providing free and way fast free web access and phone calling, plus a host of other online NYC support services. The kiosks occupy the same physical spaces as the previous payphones.

The first batch of them has started to appear along Third Avenue in Manhattan. I took the photos accompanying this post of one kiosk at the corner of 14th Street and Third Avenue. While standing there, I was able to connect to the web on my phone and try out some of the LinkNYC functions. My reaction: This is very cool beans!

LinkNYC also presents some potentially great new opportunities for marketers. The launch of the program and the companies getting into it on the ground floor were covered in a terrific new article on AdWeek.com on February 15, 2015 entitled What It Means for Consumers and Brands That New York Is Becoming a ‘Smart City’, by Janet Stilson. I recommend reading it in its entirety. I will summarize and annotate it to add some additional context, and pose some of my own ad-free questions.

LinkNYC Set to Proliferate Across NYC

Link.NYC WiFi Kiosk 2, Image by Alan Rothman

Link.NYC WiFi Kiosk 2, Image by Alan Rothman

When completed, LinkNYC will give New York a highly advanced mobile network spanning the entire city. Moreover, it will help to transform it into a very well-wired “smart city“.¹ That is, an urban area comprehensively collecting, analyzing and optimizing vast quantities of data generated by a wide array of sensors and other technologies. It is a network and a host of network effects where a city learns about itself and leverages this knowledge for multiple benefits for it citizenry.²

Beyond mobile devices and advertising, smart cities can potentially facilitate many other services. The consulting firm Frost & Sullivan predicts that there will be 26 smart cities across the globe during by 2025. Currently, everyone is looking to NYC to see how the implementation of LinkNYC works out.

According to Mike Gamaroff, the head of innovation in the New York office of Kinetic Active a global media and marketing firm, LinkNYC is primarily a “utility” for New Yorkers as well as “an advertising network”. Its throughput rates are at gigabit speeds thereby making it the fastest web access available when compared to large commercial ISP’s average rates of merely 20 to 30 megabits.

Nick Cardillicchio, a strategic account manager at Civiq Smartscapes, the designer and manufacturer of the LinkNYC kiosks, said that LinkNYC is the only place where consumers can access the Net at such speeds. For the AdWeek.com article, he took the writer, Janet Stilson, on a tour of the kiosks include the one at Third Avenue and 14th Street, where one of the first ones is in place. (Coincidentally, this is the same kiosk I photographed for this post.)

There are a total of 16 currently operational for the initial testing. The WiFi web access is accessible with 150 feet of the kiosk and can range up to 400 feet. Perhaps those New Yorkers actually living within this range will soon no longer need their commercial ISPs.

Link.NYC WiFi Kiosk 4, Image by Alan Rothman

Link.NYC WiFi Kiosk 4, Image by Alan Rothman

The initial advertisers appearing in rotation on the large digital screen include Poland Spring (see the photo at the right), MillerCoors, Pager and Citibank. Eventually “smaller tablet screens” will be added to enable users to make free domestic voice or video calls. As well, they will present maps, local activities and emergency information in and about NYC. Users will also be able to charge up their mobile devices.

However, it is still too soon to assess and quantify the actual impact on such providers. According to David Krupp, CEO, North America, for Kinetic, neither Poland Spring nor MillerCoors has produced an adequate amount of data to yet analyze their respective LinkNYC ad campaigns. (Kinetic is involved in supporting marketing activities.)

Commercializing the Kiosks

The organization managing LinkNYC, the CityBridge consortium (consisting of Qualcomm, Intersection, and Civiq Smartscapes) , is not yet indicating when the new network will progress into a more “commercial stage”. However, once the network is fully implemented with the next few years, the number of kiosks might end up being somewhere between 75,000 and 10,000. That would make it the largest such network in the world.

CityBridge is also in charge of all the network’s advertising sales. These revenues will be split with the city. Under the 12-year contract now in place, this arrangement is predicted to produce $500M for NYC, with positive cash flow anticipated within 5 years. Brad Gleeson, the chief commercial officer at Civiq, said this project depends upon the degree to which LinkNYC is “embraced by Madison Avenue” and the time need for the network to reach “critical mass”.

Because of the breadth and complexity of this project, achieving this inflection point will be quite challenging according to David Etherington, the chief strategy officer at Intersection. He expressed his firm’s “dreams and aspirations” for LinkNYC, including providing advertisers with “greater strategic and creative flexibility”, offering such capabilities as:

  • Dayparting  – dividing a day’s advertising into several segments dependent on a range of factors about the intended audience, and
  • Hypertargeting – delivering advertising to very highly defined segments of an audience

Barry Frey, the president and CEO of the Digital Place-based Advertising Association, was also along for the tour of the new kiosks on Third Avenue. He was “impressed” by the capability it will offer advertisers to “co-locate their signs and fund services to the public” for such services as free WiFi and long-distance calling.

As to the brand marketers:

  • MillerCoors is using information at each kiosk location from Shazam, for the company’s “Sounds of the Street” ad campaign which presents “lists of the most-Shazammed tunes in the area”. (For more about Shazam, see the December 10, 2014 Subway Fold post entitled Is Big Data Calling and Calculating the Tune in Today’s Global Music Market?)
  • Poland Spring is now running a 5-week campaign featuring a digital ad (as seen in the third photo above). It relies upon “the brand’s popularity in New York”.

Capturing and Interpreting the Network’s Data

Link.NYC WiFi Kiosk 1, Image by Alan Rothman

Link.NYC WiFi Kiosk 1, Image by Alan Rothman

Thus far, LinkNYC has been “a little vague” about its methods for capturing the network’s data, but has said that it will maintain the privacy of all consumers’ information. One source has indicated that LinkNYC will collect, among other points “age, gender and behavioral data”. As well, the kiosks can track mobile devices within its variably 150 to 400 WiFi foot radius to ascertain the length of time a user stops by.  Third-party data is also being added to “round out the information”.³

Some industry experts’ expectations of the value and applications of this data include:

  • Helma Larkin, the CEO of Posterscope, a New York based firm specializing in “out-of- home communications (OOH)“, believes that LinkNYC is an entirely “new out-of-home medium”. This is because the data it will generate “will enhance the media itself”. The LinkNYC initiative presents an opportunity to build this network “from the ground up”. It will also create an opportunity to develop data about its own audience.
  • David Krupp of Kinetic thinks that data that will be generated will be quite meaningful insofar as producing a “more hypertargeted connection to consumers”.

Other US and International Smart City Initiatives

Currently in the US, there is nothing else yet approaching the scale of LinkNYC. Nonetheless, Kansas City is now developing a “smaller advertiser-supported  network of kiosks” with wireless support from Sprint. Other cities are also working on smart city projects. Civiq is now in discussions with about 20 of them.

Internationally, Rio de Janeiro is working on a smart city program in conjunction with the 2016 Olympics. This project is being supported by Renato Lucio de Castro, a consultant on smart city projects. (Here is a brief video of him describing this undertaking.)

A key challenge facing all smart city projects is finding officials in local governments who likewise have the enthusiasm for efforts like LinkNYC. Michael Lake, the CEO of Leading Cities, a firm that help cities with smart city projects, believes that programs such as LinkNYC will “continue to catch on” because of the additional security benefits they provide and the revenues they can generate.

My Questions

  • Should domestic and international smart cities to cooperate to share their resources, know-how and experience for each other’s mutual benefit? Might this in some small way help to promote urban growth and development on a more cooperative global scale?
  • Should LinkNYC also consider offering civic support services such as voter registration or transportation scheduling apps as well as charitable functions where pedestrians can donate to local causes?
  • Should LinkNYC add some augmented reality capabilities to enhance the data capabilities and displays of the kiosks? (See these 10 Subway Fold posts covering a range of news and trends on this technology.)

February 19, 2017 Update:  For the latest status report on LinkNYC nearly a year after this post was first uploaded, please see After Controversy, LinkNYC Finds Its Niche, by Gerald Schifman, on CrainsNewYork.com, dated February 15, 2017.


1.   While Googling “smart cities” might nearly cause the Earth to shift off its axis with its resulting 70 million hits, I suggest reading a very informative and timely feature from the December 11, 2015 edition of The Wall Street Journal entitled As World Crowds In, Cities Become Digital Laboratories, by Robert Lee Hotz.

2.   Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia (W. W. Norton & Company, 2013), by Anthony M. Townsend, is a deep and wide book-length exploration of how big data and analytics are being deployed in large urban areas by local governments and independent citizens. I very highly recommend reading this fascinating exploration of the nearly limitless possibilities for smart cities.

3.   See, for example, How Publishers Utilize Big Data for Audience Segmentation, by Arvid Tchivzhel, posted on Datasciencecentral.com on November 17, 2015


These items just in from the Pop Culture Department: It would seem nearly impossible to film an entire movie thriller about a series of events centered around a public phone, but a movie called – – not so surprisingly – – Phone Booth managed to do this quite effectively in 2002. It stared Colin Farrell, Kiefer Sutherland and Forest Whitaker. Imho, it is still worth seeing.

Furthermore, speaking of Kiefer Sutherland, Fox announced on January 15, 2016 that it will be making 24: Legacy, a complete reboot of the 24 franchise, this time without him playing Jack Bauer. Rather, they have cast Corey Hawkins in the lead role. Hawkins can now be seen doing an excellent job playing Heath on season 6 of The Walking Dead. Watch out Grimes Gang, here comes Negan!!


New IBM Watson and Medtronic App Anticipates Low Blood Glucose Levels for People with Diabetes

"Glucose: Ball-and-Stick Model", Image by Siyavula Education

“Glucose: Ball-and-Stick Model”, Image by Siyavula Education

Can a new app jointly developed by IBM with its Watson AI technology in partnership with the medical device maker Medtronic provide a new form of  support for people with diabetes by safely avoiding low blood glucose (BG) levels (called hypoglycemia), in advance? If so, and assuming regulatory approval, this technology could potentially be a very significant boon to the care of this disease.

Basics of Managing Blood Glucose Levels

The daily management of diabetes involves a diverse mix of factors including, but not limited to, regulating insulin dosages, checking BG readings, measuring carbohydrate intakes at meals, gauging activity and exercise levels, and controlling stress levels. There is no perfect algorithm to do this as everyone with this medical condition is different from one another and their bodies react in individual ways in trying to balance all of this while striving to maintain healthy short and long-term control of BG levels.

Diabetes care today operates in a very data-driven environment. BG levels, expressed numerically, can be checked on hand-held meter and test strips using a single drop of blood or a continuous glucose monitoring system (CGM). The latter consists of a thumb drive-size sensor attached with temporary adhesive to the skin and a needle attached to this unit inserted just below the skin. This system provides patients with frequent real-time readings of their BG levels, and whether they are trending up or down, so they can adjust their medication accordingly. That is, for A grams of carbs and B amounts of physical activity and other contributing factors, C amount of insulin can be calculated and dispensed.

Insulin itself can be administered either manually by injection or by an insulin pump (also with a subcutaneously inserted needle). The later of these consists of two devices: The pump itself, a small enclosed device (about the size of a pager), with an infusion needle placed under the patient’s skin and a Bluetooth-enabled handheld device (that looks just like a smartphone), used to adjust the pump’s dosage and timing of insulin released. Some pump manufacturers are also bringing to market their latest generation of CGMs that integrate their data and command functions with their users’ smartphones.

(The links in the previous two paragraphs are to Wikipedia pages with detailed pages and photos on CGMs and insulin pumps. See also, this June 27, 2015 Subway Fold post entitled Medical Researchers are Developing a “Smart Insulin Patch” for another glucose sensing and insulin dispensing system under development.)

The trickiest part of all of these systems is maintaining levels of BG throughout each day that are within an acceptable range of values. High levels can result in a host of difficult symptoms. Hypoglycemic low levels, can quickly become serious, manifesting as dizziness, confusion and other symptoms, and can ultimately lead to unconsciousness in extreme cases if not treated immediately.

New App for Predicting and Preventing Low Blood Glucose Levels

Taking this challenge to an entirely new level, at last week’s annual Consumer Electronics Show (CES) held in Las Vegas, IBM and Medtronic jointly announced their new app to predict hypoglycemic events in advance. The app is built upon Watson’s significant strengths in artificial intelligence (AI) and machine learning to sift through and intuit patterns in large volumes of data, in this case generated from Medtronic’s user base for their CGMs and insulin pumps. This story was covered in a most interesting article posted in The Washington Post on January 6, 2016 entitled IBM Extends Health Care Bet With Under Armour, Medtronic by Jing Cao and Michelle Fay Cortez. I will summarize and annotate this report and then pose some of my own questions.

The announcement and demo of this new app on January 6, 2016 at CES showed the process by which a patient’s data can be collected from their Medtronic devices and then combined with additional information from their wearable activity trackers and food intake. Next, all of this information is processed through Watson in order to “provide feedback” for the patient to “manage their diabetes”.

Present and Future Plans for The App and This Approach

Making the announcement were Virginia Rometty, Chairman, President and CEO of IBM, and Omar Ishrak, Chairman and CEO of Medtronic. The introduction of this technology is expected in the summer of 2016. It still needs to be submitted to the US government’s regulatory review process.

Ms. Rometty said that the capability to predict low BG events, in some cases up to three hours before they occur, is a “breakthrough”. She described Watson as “cognitive computing”, using algorithms to generate “prescriptive and predictive analysis”. The company is currently making a major strategic move into finding and facilitating applications and partners for Watson in the health care industry. (The eight Subway Fold posts cover other various systems and developments using Watson.)

Hooman Hakami, Executive VP and President, of the Diabetes Group at Medtronic, described how his company is working to “anticipate” how the behavior of each person with Diabetes affects their blood glucose levels. With this information they can then “make choices to improve their health”. Here is the page from the company’s website about their partnership with IBM to work together on treating diabetes.

In the future, both companies are aiming to “give patients real-time information” on how their individual data is influencing their BG levels and “provide coaching” to assist them in making adjustments to keep their readings in a “healthy range”. In one scenario, patients might receive a text message that “they have an 85% chance of developing low blood sugar within an hour”. This will also include a recommendation to watch their readings and eat something to raise their BG back up to a safer level.

My Questions

  • Will this make patients more or less diligent in their daily care? Is there potential for patients to possibly assume less responsibility for their care if they sense that the management of their diabetes is running on a form of remote control? Alternatively, might this result in too much information for patients to manage?
  • What would be the possible results if this app is ever engineered to work in conjunction with the artificial pancreas project being led in by Ed Damiano and his group of developers in Boston?
  • If this app receives regulatory approval and gains wide acceptance among people with diabetes, what does this medical ecosystem look like in the future for patients, doctors, medical insurance providers, regulatory agencies, and medical system entrepreneurs? How might it positively or negatively affect the market for insulin pumps and CGMs?
  • Should IBM and Medtronic consider making their app available on and open-source basis to enable other individuals and groups of developers to improve it as well as develop additional new apps?
  • Whether and how will insurance policies for both patients and manufacturers, deal with any potential liability that may arise if the app causes some unforeseen adverse effects? Will medical insurance even cover, encourage or discourage the use of such an app?
  • Will the data generated by the app ever be used in any unforeseen ways that could affect patients’ privacy? Would patients using the new app have to relinquish all rights and interests to their own BG data?
  • What other medical conditions might benefit from a similar type of real-time data, feedback and recommendation system?

Mind Over Subject Matter: Researchers Develop A Better Understanding of How Human Brains Manage So Much Information

"Synapse", Image by Allan Ajifo

“Synapse”, Image by Allan Ajifo

There is an old joke that goes something like this: What do you get for the man who has everything and then where would he put it all?¹ This often comes to mind whenever I have experienced the sensation of information overload caused by too much content presented from too many sources. Especially since the advent of the Web, almost everyone I know has also experienced the same overwhelming experience whenever the amount of information they are inundated with everyday seems increasingly difficult to parse, comprehend and retain.

The multitudes of screens, platforms, websites, newsfeeds, social media posts, emails, tweets, blogs, Post-Its, newsletters, videos, print publications of all types, just to name a few, are relentlessly updated and uploaded globally and 24/7. Nonetheless, for each of us on an individualized basis, a good deal of the substance conveyed by this quantum of bits and ocean of ink somehow still manages to stick somewhere in our brains.

So, how does the human brain accomplish this?

Less Than 1% of the Data

A recent advancement covered in a fascinating report on Phys.org on December 15, 2015 entitled Researchers Demonstrate How the Brain Can Handle So Much Data, by Tara La Bouff describes the latest research into how this happens. I will summarize and annotate this, and pose a few organic material-based questions of my own.

To begin, people learn to identify objects and variations of them rather quickly. For example, a letter of the alphabet, no matter the font or an individual regardless of their clothing and grooming, are always recognizable. We can also identify objects even if the view of them is quite limited. This neurological processing proceeds reliably and accurately moment-by-moment during our lives.

A recent discover by a team of researchers at Georgia Institute of Technology (Georgia Tech)² found that we can make such visual categorizations with less than 1% of the original data. Furthermore, they created and validated an algorithm “to explain human learning”. Their results can also be applied to “machine learning³, data analysis and computer vision4. The team’s full findings were published in the September 28, 2015 issue of Neural Computation in an article entitled Visual Categorization with Random Projection by Rosa I. Arriaga, David Rutter, Maya Cakmak and Santosh S. Vempala. (Dr. Cakmak is from the University of Washington, while the other three are from Georgia Tech.)

Dr. Vempala believes that the reason why humans can quickly make sense of the very complex and robust world is because, as he observes “It’s a computational problem”. His colleagues and team members examined “human performance in ‘random projection tests'”. These measure the degree to which we learn to identify an object. In their work, they showed their test subjects “original, abstract images” and then asked them if they could identify them once again although using a much smaller segment of the image. This led to one of their two principal discoveries that the test subjects required only 0.15% of the data to repeat their identifications.

Algorithmic Agility

In the next phase of their work, the researchers prepared and applied an algorithm to enable computers (running a simple neural network, software capable of imitating very basic human learning characteristics), to undertake the same tasks. These digital counterparts “performed as well as humans”. In turn, the results of this research provided new insight into human learning.

The team’s objective was to devise a “mathematical definition” of typical and non-typical inputs. Next, they wanted to “predict which data” would be the most challenging for the test subjects and computers to learn. As it turned out, they each performed with nearly equal results. Moreover, these results proved that “data will be the hardest to learn over time” can be predicted.

In testing their theory, the team prepared 3 different groups of abstract images of merely 150 pixels each. (See the Phys.org link above containing these images.) Next, they drew up “small sketches” of them. The full image was shown to the test subjects for 10 seconds. Next they were shown 16 of the random sketches. Dr. Vempala of the team was “surprised by how close the performance was” of the humans and the neural network.

While the researchers cannot yet say with certainty that “random projection”, such as was demonstrated in their work, happens within our brains, the results lend support that it might be a “plausible explanation” for this phenomenon.

My Questions

  • Might this research have any implications and/or applications in virtual reality and augment reality systems that rely on both human vision and processing large quantities of data to generate their virtual imagery? (These 13 Subway Fold posts cover a wide range of trends and applications in VR and AR.)
  • Might this research also have any implications and/or applications in medical imaging and interpretation since this science also relies on visual recognition and continual learning?
  • What other markets, professions, universities and consultancies might be able to turn these findings into new entrepreneurial and scientific opportunities?

 


1.  I was unable to definitively source this online but I recall that I may have heard it from the comedian Steven Wright. Please let me know if you are aware of its origin. 

2.  For the work of Georgia’s Tech’s startup incubator see the Subway Fold post entitled Flashpoint Presents Its “Demo Day” in New York on April 16, 2015.

3.   These six Subway Fold posts cover a range of trends and developments in machine learning.

4.   Computer vision was recently taken up in an October 14, 2015 Subway Fold post entitled Visionary Developments: Bionic Eyes and Mechanized Rides Derived from Dragonflies.

New Job De-/script/-ions for Attorneys with Coding and Tech Business Skills

"CODE_n SPACES Pattern", Image by CODE_n

“CODE_n SPACES Pattern”, Image by CODE_n

The conventional wisdom among lawyers and legal educators has long been that having a second related degree or skill from another field can be helpful in finding an appropriate career path. That is, a law degree plus, among others, an MBA, engineering or nursing degree can be quite helpful in finding an area of specialization that leverages both fields. There are synergies and advantages to be shared by both the lawyers and their clients in these circumstances.

Recently, this something extra has expanded to include very timely applied tech and tech business skills. Two recently reported developments highlight this important emerging trend. One involves a new generation of attorneys who have a depth of coding skills and the other is an advanced law degree to prepare them for positions in the tech and entrepreneurial marketplaces. Let’s have a look at them individually and then what they might means together for legal professionals in a rapidly changing world. I will summarize and annotate both of them, and compile a few plain text questions of my own.

(These 26 other Subway Fold posts in the category of Law Practice and Legal Education have tracked many related developments.)

Legal Codes and Lawyers Who Code

1.  Associates

The first article features four young lawyers who have found productive ways to apply their coding skills at their law offices. This story appeared in the November 13, 2015 edition of The Recorder (subscription required) entitled Lawyers Who Code Hack New Career Path by Patience Haggin. I highly recommend reading it in its entirely.

During an interview at Apple for a secondment (a form of temporary arrangement where a lawyer from a firm will join the in-house legal department of a client)¹, a first-year lawyer named Canek Acosta was asked where he knew how to use Excel. He “laughed – and got the job” at Apple. In addition to his law degree, he had majored in computer science and math as an undergraduate.

Next, as a law student at Michigan State University College of Law, he participated in the LegalRnD – The Center for Legal Services Innovation, a program that teaches students to identify and solve “legal industry process bottlenecks”.  The Legal RnD website lists and describes all eight courses in their curriculum. It has also sent out teams to legal hackathons. (See the March 24, 2015 Subway Fold post entitled “Hackcess to Justice” Legal Hackathons in 2014 and 2015 for details on these events.)

Using his combination of skills, Acosta wrote scripts that automated certain tasks, including budget spreadsheets, for Apple’s legal department. As a result, some new efficiencies were achieved. Acosta believes that his experience at Apple was helpful in subsequently getting hired at the law firm of O’Melvany & Myers as an associate.

While his experience is currently uncommon, law firms are expected to increasingly recruit law students to become associates who have such contemporary skills in addition to their legal education. Furthermore, some of these students are sidestepping traditional roles in law practice and finding opportunities in law practice management and other non-legal staff roles that require a conflation of “legal analysis and hacking skills”.

Acosta further believes that a “hybrid lawyer-programmer” can locate the issues in law office operational workflows and then resolve them. Now at O’Melvany, in addition to his regular responsibilities as a litigation associate, he is also being asked to use his programming ability to “automate tasks for the firm or a client matter”.

At the San Francisco office of Winston & Strawn, first-year associate Joseph Mornin has also made good use of his programming skills. While attending UC-Berkeley School of Law, he wrote a program to assist legal scholars in generating “permanent links when citing online sources”. He also authored a browser extension called Bestlaw that “adds features to Westlaw“, a major provider of online legal research services.

2.  Consultants and Project Managers

In Chicago, the law firm Seyfarth Shaw has a legal industry consulting subsidiary called SeyfarthLean. One of their associate legal solutions architects is Amani Smathers.  She believes that lawyers will have to be “T-shaped” whereby they will need to combine their “legal expertise” with other skills including “programming, or marketing, or project management“.² Although she is also a graduate of Michigan State University College of Law, instead of practicing law, she is on a team that provides consulting for clients on, among other things, data analytics. She believes that “legal hacking jobs” may provide alternatives to other attorneys not fully interested in more traditional forms of law practices.

Yet another Michigan State law graduate, Patrick Ellis, is working as a legal project manager at the Michigan law firm Honigman Miller Schwartz and Cohn. In this capacity, he uses his background in statistics to “develop estimates and pricing arrangements”. (Mr. Ellis was previously mentioned in a Subway Fold post on March 15, 2015, entitled Does Being on Law Review or Effective Blogging and Networking Provide Law Students with Better Employment Prospects?.)

A New and Unique LLM to be Offered Jointly by Cornell Law School and Cornell Tech

The second article concerned the announcement of a new 1-year, full-time Master of Laws program (which confers an “LLM” degree), to be offered jointly by Cornell Law School and Cornell Tech (a technology-focused graduate and research campus of Cornell in New York City). This LLM is intended to provide practicing attorneys and other graduates with specialized skills needed to support and to lead tech companies. In effect, the program combines elements of law, technology and entrepreneurship. This news was carried in a post on October 29, 2015 on The Cornell Daily Sun entitled Cornell Tech, Law School Launch New Degree Program by Annie Bui.

According to Cornell’s October 27, 2015 press release , students in this new program will be engaged in “developing products and other solutions to challenges posed by companies”. They will encounter real-world circumstances facings businesses and startups in today’s digital marketplace. This will further include studying the accompanying societal and policy implications.

The program is expected to launch in 2016. It will be relocated from a temporary site and then moved to the Cornell Tech campus on Roosevelt Island in NYC in 2017.

My Questions

  • What other types of changes, degrees and initiatives are needed for law schools to better prepare their graduates for practicing in the digital economy? For example, should basic coding principles be introduced in some classes such as first-year contracts to enable students to better handle matters involving Bitcoin and the blockchain when they graduate? (See these four Subway Fold posts on this rapidly expanding technology.)
  • Should Cornell Law School, as well as other law schools interested in instituting similar courses and degrees, consider offering them online? If not for full degree statuses, should these courses alternatively be accredited for Continuing Legal Education requirements?
  • Will or should the Cornell Law/Cornell Tech LLM syllabus offer the types of tech and tech business skills taught by the Michigan State’s LegalRnD program? What do each of these law schools’ programs discussed here possibly have to offer to each other? What unique advantage(s) might an attorney with an LLM also have if he or she can do some coding?
  • Are there any law offices out there that are starting to add an attorney’s tech skills and coding capabilities to their evaluation of potential job candidates? Are legal recruiters adding these criteria to job descriptions for searching they are conducting?
  • Are there law offices out there that are beginning to take an attorney’s tech skills and/or coding contributions into account during annual performance reviews? If not, should they now considering adding them and how should they be evaluated?

 


1.  Here is an informative opinion about the ethical issues involved secondment arrangements issued by the Association of the Bar of the City of New York Committee on Professional and Judicial Ethics.

2.  I had an opportunity to hear Ms. Smathers give a very informative presentation about “T-shaped skills” at the Reinvent Law presentation held in New York in February 2014.