Concrete Data Sets: New Online Map of Building Construction Metrics Across New York

Image from Pixabay.com

There is an age-old expression among New Yorkers that their city will really be a great place one day if someone ever finishes building it. I have heard this many times during my life as a native and lifelong resident of this remarkable place.

Public and private construction goes on each day on a vast scale throughout the five boroughs of NYC. Over the past several decades under successive political administrations, many areas have been re-zoned to facilitate and accelerate this never-ending buildup and built-out. This relentless activity produces many economic benefits for the municipal economy. However, it also results in other detrimental effects including housing prices and rents that continue to soar upward, disruptive levels of noise and waste materials affecting people living nearby, increased stresses upon local infrastructure, and just as regrettably, steady erosion of the unique characters and spirits of many neighborhoods.¹

In a significant technological achievement intended to focus and consolidate the massive quantities of location, scope and cost data about the  plethora of structures sprouting up everywhere, on August 22, 2018 the New York City Buildings Department launched an interactive NYC Active Major Construction Map (“The Map”). Full coverage of its inauguration was provided in a very informative article in The New York Times entitled A Real-Time Map Tracks the Building Frenzy That’s Transforming New York, by Corey Kilgannon, on August 22, 2018. (Here, too, is the Building Department’s press release.) I highly recommend both a click-through and full read of it and further online exploration of The Map itself.

I will also summarize and annotate this report, and then pose some of my own code compliant questions.

Home on the [Data] Range

Construction on Lexington Avenue, Image by Jeffrey Zeldman

As the ubiquitous pounding of steel and pouring of concrete proceeds unabated, there is truly little or no getting around it. The Map is one component of a $60 million digital initiative established in 2015 which is intended to produce an “impressive level of detail” on much of this cityscape altering activity.

The recent inception of The Map provides everyone in the metro area an online platform to track some of the key details of the largest of these projects plotted across a series of key metrics.  An accompanying grid of tables below it lists and ordinates the largest projects based upon these dimensions.

The Map’s user interface presents this “overview of the frenzy of construction” dispersed across the city’s communities using the following configurations:

  • Each project’s location represented by a blue dot that can be clicked to reveal the property’s contractor, history and any violations.
  • Cumulative real-time totals of square footage under construction, permits and dwelling units involved. This data can be further filtered by borough.
  • Scrollable and clickable Top 10 lists by project square footage, size, cost and dwelling units

As well, it provides residents a virtual means to identify who is making all of that real-world blaring construction noise in their neighborhood.²

If I Had a Hammer

Executives, organizations and community advocates representing a diversity of interests have expressed their initial support for The Map.

Second Avenue Subway Update, Image by MTA (2)

The NYC Building Commissioner, Rick D. Chandler, believes this new resource is a means to provide transparency to his department’s tremendous quantity of construction data. Prior to the rollout of The Map, accessing and processing this information required much greater technical and professional skills. Furthermore, the data will be put to use to “improve and streamline the department’s operations”.

According to Andrew Berman, the Executive Director of the non-profit advocacy group Greenwich Village Society for Historic Preservation, he finds The Map to be both useful and “long overdue”. It is providing his group with a more convenient means to discover additional information about the proliferation of project sites in the Village. He also noted that under the previously existing municipal databases, this data was far more challenging to extract. Nonetheless, the new map remains insufficient for him and “other measures were needed” for the city government to increase oversight and enforcement of construction regulations concerning safety and the types of projects are permitted on specific properties.

Local real estate industry trade groups such as the Real Estate Board of New York, are also sanguine about this form of digital innovation, particularly for it accessibility. The group’s current president, John H. Banks, finds that it is “more responsive to the needs of the private sector”, raises transparency and the public’s “awareness of economic activity, jobs and tax revenues” flowing from the city’s construction projects.

Plans are in place to expand The Map based upon user feedback. As well, it will receive daily updates thus providing “a real-time advantage over analyst and industry reports”.

Image from Pixabay.com

My Questions

  • Does a roadmap currently exist for the projected development path of The Map’s content and functionality? If so, how can all interested parties provide ongoing commentary and support for it?
  • Are there other NYC databases and data sources that could possibly be integrated into the map? For example, tax, environmental and regulatory information might be helpful.
  • Can other cities benefit from the design and functionality of The Map to create or upgrade their own versions of similar website initiatives?
  • What new entrepreneurial, academic and governmental opportunities might now present themselves because of The Map?
  • How might artificial intelligence and/or machine learning capabilities be, well, mapped into The Map’s functionalities? Are there any plans to add chatbot scripting capabilities to The Map?

 


Two related Subway Fold posts covering other aspects of construction include:


1.  For a deeply insightful analysis and passionate critique of the pervasive and permanent changes to many of New York’s neighborhoods due to a confluence of political, economic and social forces and interests, I highly recommend reading Vanishing New York: How a Great City Lost Its Soul, by Jeremiah Moss, (Dey Street Books, 2017). While I did not agree with some aspects of his book, the author has expertly captured and scrutinized how, where and why this great city has been changed forever in many ways. (See also the author’s blog Jeremiah’s Vanishing New York for his continuing commentary and perspectives.)

2.  Once I lived in a building that had been mercifully quiet for a long time until the adjacent building was purchased, gutted and totally renovated. For four months during this process, the daily noise level by comparison made a typical AC/DC concert sound like pin drop.

Text Analysis Systems Mine Workplace Emails to Measure Staff Sentiments

Image from Pixabay.com

Have you ever been employed in a genuinely cooperative and productive environment where you looked forward each day to making your contribution to the enterprise and assisting your colleagues? Conversely, have you ever worked in a highly stressful and unsupportive atmosphere where you dreading going back there nearly every day?  Or perhaps you have found in your career that your jobs and employers were somewhere in the mid-range of this spectrum of office cultures.

For all of these good, bad or indifferent workplaces, a key question is whether any of the actions of management to engage the staff and listen to their concerns ever resulted in improved working conditions and higher levels of job satisfaction?

The answer is most often “yes”. Just having a say in, and some sense of control over, our jobs and workflows can indeed have a demonstrable impact on morale, camaraderie and the bottom line. As posited in the Hawthorne Effect, also termed the “Observer Effect”, this was first discovered during studies in the 1920’s and 1930’s when the management of a factory made improvements to the lighting and work schedules. In turn, worker satisfaction and productivity temporarily increased. This was not so much because there was more light, but rather, that the workers sensed that management was paying attention to, and then acting upon, their concerns. The workers perceived they were no longer just cogs in a machine.

Perhaps, too, the Hawthorne Effect is in some ways the workplace equivalent of the Heisenberg’s Uncertainty Principle in physics. To vastly oversimplify this slippery concept, the mere act of observing a subatomic particle can change its position.¹

Giving the processes of observation, analysis and change at the enterprise level a modern (but non-quantum) spin, is a fascinating new article in the September 2018 issue of The Atlantic entitled What Your Boss Could Learn by Reading the Whole Company’s Emails, by Frank Partnoy.  I highly recommend a click-through and full read if you have an opportunity. I will summarize and annotate it, and then, considering my own thorough lack of understanding of the basics of y=f(x), pose some of my own physics-free questions.

“Engagement” of Enron’s Emails

By Enron [Public domain], via Wikimedia Commons

Andrew Fastow was the Chief Financial Officer of Enron when the company infamously collapsed into bankruptcy in December 2001. Criminal charges were brought against some of the corporate officers, including Fastow, who went to prison for six years as a result.

After he had served his sentence he became a public speaker about his experience. At one of his presentations in Amsterdam in 2016, two men from the audience approached him. They were from KeenCorp, whose business is data analytics. Specifically, their clients hire them to analyze the email “word patterns and their context” of their employees. This is done in an effort to quantify and measure the degree of the staff’s “engagement”. The resulting numerical rating is higher when they feel more “positive and engaged”, while lower when they are unhappier and less “engaged”.

The KeenCorp representatives explained to Fastow that they had applied their software to the email archives of 150 Enron executives in an effort to determine “how key moments in the company’s tumultuous collapse” would be assessed and a rated by their software. (See also the February 26, 2016 Subway Fold post entitled The Predictive Benefits of Analyzing Employees’ Communications Networks, covering, among other things, a similar analysis of Enron’s emails.)

KeenCorp’s software found the lowest engagement score when Enron filed for bankruptcy. However, the index also took a steep dive two years earlier. This was puzzling since the news about the Enron scandal was not yet public. So, they asked Fastow if he could recall “anything unusual happening at Enron on June 28, 1999”.

Sentimental Journal

Milky Way in Mauritius, Image by Jarkko J

Today the text analytics business, like the work done by KeenCorp, is thriving. It has been long-established as the processing behind email spam filters. Now it is finding other applications including monitoring corporate reputations on social media and other sites.²

The finance industry is another growth sector, as investment banks and hedge funds scan a wide variety of information sources to locate “slight changes in language” that may point towards pending increases or decreases in share prices. Financial research providers are using artificial intelligence to mine “insights” from their own selections of news and analytical sources.

But is this technology effective?

In a paper entitled Lazy Prices, by Lauren Cohen (Harvard Business School and NBER), Christopher Malloy (Harvard Business School and NBER), and Quoc Nguyen (University of Illinois at Chicago), in a draft dated February 22, 2018, these researchers found that the share price of company, in this case NetApp in their 2010 annual report, measurably went down after the firm “subtly changes” its reporting “descriptions of certain risks”. Algorithms can detect such changes more quickly and effectively than humans. The company subsequently clarified in its 2011 annual report their “failure to comply” with reporting requirements in 2010. A highly skilled stock analyst “might have missed that phrase”, but once again its was captured by “researcher’s algorithms”.

In the hands of a “skeptical investor”, this information might well have resulted in them questioning the differences in the 2010 and 2011 annual reports and, in turn, saved him or her a great deal of money. This detection was an early signal of a looming decline in NetApp’s stock. Half a year after the 2011 report’s publication, it was reported that the Syrian government has bought the company and “used that equipment to spy on its citizen”, causing further declines.

Now text analytics is being deployed at a new target: The composition of employees’ communications. Although it has been found that workers have no expectations of privacy in their workplaces, some companies remain reluctant to do so because of privacy concerns. Thus, companies are finding it more challenging to resist the “urge to mine employee information”, especially as text analysis systems continue to improve.

Among the evolving enterprise applications are the human resources departments in assessing overall employee morale. For example, Vibe is such an app that scans through communications on Slack, a widely used enterprise platform. Vibe’s algorithm, in real-time reporting, measures the positive and negative emotions of a work team.

Finding Context

“Microscope”, image by Ryan Adams

Returning to KeenCorp, can their product actually detect any wrongdoing by applying text analysis? While they did not initially see it, the company’s system had identified a significant “inflection point” in Enron’s history on the June 28, 1999 date in question. Fastow said that was the day the board had discussed a plan called “LJM”, involving a group of questionable transactions that would mask the company’s badly under-performing assets while improving its financials. Eventually, LJM added to Enron’s demise. At that time, however, Fastow said that everyone at the company, including employees and board members, was reluctant to challenge this dubious plan.

KeenCorp currently has 15 employees and six key clients. Fastow is also one of their consultants and advisors. He also invested in the company when he saw their algorithm highlight Enron’s employees’ concerns about the LJM plan. He hopes to raise potential clients’ awareness of this product to help them avoid similar situations.

The company includes heat maps as part of its tool set to generate real-time visualizations of employee engagement. These can assist companies in “identifying problems in the workplace”. In effect, it generates a warning (maybe a warming, too), that may help to identify significant concerns. As well, it can assist companies with compliance of government rules and regulations. Yet the system “is only as good as the people using it”, and someone must step forward and take action when the heat map highlights an emerging problem.

Analyzing employees’ communications also presents the need for applying a cost/benefit analysis of privacy considerations. In certain industries such as finance, employees are well aware that their communications are being monitored and analyzed, while in other businesses this can be seen “as intrusive if not downright Big Brotherly”. Moreover, managers “have the most to fear” from text analysis systems. For instance, it can be used to assess sentiment when someone new is hired or given a promotion. Thus, companies will need to find a balance between the uses of this data and the inherent privacy concerns about its collection.

In addressing privacy concerns about data collection, KeenCorp does not “collect, store on report” info about individual employees. All individually identifying personal info is scrubbed away.

Text analysis is still in its early stages. There is no certainty yet that it may not register a false positive reading and that it will capture all emerging potential threats. Nonetheless it is expected to continue to expand and find new fields for application. Experts predict that among these new areas will be corporate legal, compliance and regulatory operations. Other possibilities include protecting against possible liabilities for “allegations of visa, fraud and harassment”.

The key takeaway from the current state of this technology is to ascertain the truth about employees’ sentiments not by snooping, but rather, “by examining how they are saying it”.

My Questions

  • “Message In a Bottle”, Image from Pixabay.com

    Should text analysis data be factored into annual reviews of officers and/or board members? If so, how can this be done and what relative weight should it be given?

  • Should employees at any or all levels and departments be given access to text analysis data? How might this potentially impact their work satisfaction and productivity?
  • Is there a direct, casual or insignificant relationship between employee sentiment data and up and/or down movements in market value? If so, how can companies elevate text analysis systems to higher uses?
  • How can text analysis be used for executive training and development? Might it also add a new dimension to case studies in business schools?
  • What does this data look like in either or both of short-term and long-term time series visualizations? Are there any additional insights to be gained by processing the heat maps into animations to show how their shape and momentum are changing over time?

 


1.  See also the May 20, 2015 Subway Fold post entitled A Legal Thriller Meets Quantum Physics: A Book Review of “Superposition” for the application of this science in a hard rocking sci-fi novel.

2These 10 Subway Fold posts cover other measurements of social media analytics, some including other applications of text analytics.

Book Review of “Frenemies: The Epic Disruption of the Ad Business (and Everything Else)”

“Advertising in Times Square”, image by Dirk Knight

Every so often, an ad campaign comes along that is strikingly brilliant for its originality, execution, persuasiveness, longevity, humor and pathos. During the mid-1980’s, one of these bright shining examples was the television ads for Bartles & Jaymes Wine Coolers. They consisted of two fictional characters: Frank Bartles, who owned a winery and did all of the talking, and Ed Jaymes, a farmer who never spoke a word but whose deadpan looks were priceless. They traveled across the US to different locations in pursuit of sales, trying to somehow adapt their approaches to reflect the local surroundings. Bartles was very sincere but often a bit naive in his pitches along the way, best exemplified in this ad and another one when they visited New York.

These commercials succeeded beyond all expectations in simultaneously establishing brand awareness, boosting sales and being laugh-out-loud hilarious because Bartles’s and Jaymes’s were such charming, aw-shucks amateurs. In actuality, these ads were deftly conceived and staged by some smart and savvy creatives from the Hal Riney & Partners agency. For further lasting effect, they always had Bartles express his appreciation to the viewers at the end of each spot with his memorable trademark tagline of “Thanks for your support”. These 30-second video gems are as entertaining today as they were thirty years ago.

But those halcyon days of advertising are long gone. The industry’s primary media back then was limited to print, television and radio. Creativity was its  cornerstone and the words “data analytics” must have sounded like something actuaries did in a darkened room while contemplating the infinite. (Who knows, maybe it still does to some degree.)

Fast forwarding to 2018, advertising is an utterly different and hyper-competitive sector whose work product is largely splayed across countless mobile and stationary screens on Planet Earth. Expertly chronicling and precisely assaying the transformative changes happening to this sector is an informative and engaging new book entitled Frenemies: The Epic Disruption of the Ad Business (and Everything Else) [Penguin Press, 2018], by the renowned business author Ken Auletta. Just as a leading ad agency in its day cleverly and convincingly took TV viewers on an endearing cultural tour of the US as we followed the many ad-ventures of Bartles & Jaymes, so too, this book takes its readers on a far-ranging and immersive tour of the current participants, trends, challenges and technologies affecting the ad industry.

A Frenemy of My Frenemy is My Frenemy

Image from Pixabay

This highly specialized world is under assault from a confluence of competitive, online, economic, social and mathematical forces. Many people who work in it are deeply and rightfully concerned about its future and the tenure of their places in it. Auletta comprehensively reports on and assesses these profound changes from deep within the operations of several key constituencies (the “frenemies”, conflating “friend” and “enemy”). At first this might seem a bit too much of “inside baseball” (although the ad pitch remains alive and well), but he quickly and efficiently establishes who’s who and what’s what in today’s morphing ad markets, making this book valuable and accessible to readers both within and outside of this field.  It can also be viewed as a multi-dimensional case study of an industry right now being, in the truest sense of the word, disrupted.¹ There is likewise much to learned and considered here by other businesses being buffeted by similar winds.

Frenemies, as thoroughly explored throughout this book, are both  business competitors and partners at the same time. They are former and current allies in commerce who concurrently cooperate and compete. Today they are actively infiltrating each other’s markets. The full matrix of frenemies and their threats and relationships to each other includes the interests and perspectives of ad agencies and their clients, social media networks, fierce competition from streamers and original content producers like Netflix², traditional media in transition to digital platforms, consulting companies and, yes, consumers.

Auletta travels several parallel tracks in his reporting. First, he examines the past, present on onrushing future with respect to revenue streams, profits, client bases served, artificial intelligence (AI) driven automation, and the frenemies’ very fluid alliances. Second, he skillfully deploys the investigative journalistic strategy of “following the money” as it ebbs and flows in many directions among the key players. Third, he illuminates the industry’s evolution from Don Draper’s traditional “Mad Men” to 2018’s “math men” who are the data wranglers, analysts and strategists driven by ever more thin-sliced troves of consumer data the agencies and their corporate clients are using to achieve greater accuracy and efficiency in selling their goods and services.

A deep and wide roster of C-level executives from these various groups were interviewed for the book. Chief among them are two ad industry legends who serve as the x and y axes upon which Auletta has plotted a portion of his reporting. One is Martin Sorrell, who was the founder and CEO of WPP, the world’s largest advertising holding company.³ The other is Michael Kassan, the founder and CEO of MediaLink, a multifaceted firm that connects, negotiates and advises on behalf of a multitude of various parties, often competitors in critical matters affecting the ad business. Both of these individuals have significantly shaped modern advertising over many decades and are currently propagating some of the changes spotlighted in the book in trying to keep it vital, relevant and profitable.

Online Privacy v. Online Primacy

“Tug of War”, image by Pixabay

The established tradition of creativity being the primary driver of advertising creation and campaigns has given way to algorithm-driven data analytics. All of the frenemies and a myriad of other sites in many other parsecs of the websphere vacuum up vast amounts of data on users, their online usage patterns, and even go so far as to try to infer their behavioral attributes. This is often combined with additional personal information from third-party sources and data brokers. Armed with all of this data and ever more sophisticated means for sifting and intuiting it, including AI4, the frenemies are devising their campaigns to far more precisely target potential consumers and their cohorts with finely grained customized ads.

The high point of this book is Auletta’s nuanced coverage of the ongoing controversy involving the tension between frenemies using data analytics to increase click-through rates and, hopefully, sales versus respecting the data privacy of people as they traverse the Web. In response to this voracious data collection, millions of users have resisted this intrusiveness by adding free browser extensions such as AdBlock Plus to circumvent online tracking and ad distribution.5 This struggle has produced a slippery slope between the commercial interests of the frenemies and consumers’ natural distaste for advertising, as well as their resentment at having their data co-opted, appropriated and misused without their knowledge or consent. Recently, public and governmental concerns were dramatically displayed in the harsh light of the scandals involving Facebook and Cambridge Analytica.

Furthermore, Google and Facebook dominate the vast majority of online advertising traffic, revenues and, most importantly, the vast quantum of user information which ad agencies believe would be particularly helpful to them in profiling and reaching consumers. Nonetheless, they maintain it is highly proprietary to them alone and much of it has not been shared. Frenemies much?

Additional troubling trends for the ad industry are likewise given a thorough 3-D treatment. Auletta returns to the axiom several times that audiences do not want to be interrupted with ads (particularly on their mobile devices). Look no further than the likes of premium and the major streaming services who offer all of their content uninterrupted in its entirety. The growing ranks of content creators they engage know this and prefer it because they can concentrate on their presentations without commercial breaks slicing and dicing their narrative continuity. The still profitable revenue streams flowing from this are based upon the strengths of the subscription model.

Indeed, in certain cases advertising is being simultaneously disrupted and innovated. Some of the main pillars of the media like The New York Times are now expanding their in-house advertising staff and service offerings. They can offer a diversified array of ads and analyses directly to their advertisers. Likewise, engineering-driven operations like Google and Facebook can deploy their talent benches to better target consumers for their advertisers by extracting and applying insights from their massive databases. Why should their clients continue go to the agencies when their ads can be composed and tracked for them directly?

Adapt or Go Home

“Out with the Old, In with the New”, image by Mark

The author presents a balanced although not entirely sanguine view of the ad industry’s changes to maintain its composure and clients in the midst of this storm. The frenemy camps must be willing to make needed and often difficult adjustments to accommodate emerging technological and strategic survival methods. He examines the results of two contemporary approaches to avoiding adblocking apps and more fully engaging very specific audiences. One is called “native advertising“, which involves advertisers producing commercial content and paying for its placement online or in print to promote their own products. Generally, these are formatted and integrated to appear as though they are integrated with a site’s or publication’s regular editorial content but contain a notice that it is, in fact “Advertising”.

However, Auletta believes that the second adaptive mechanism, the online subscription model, will not be much more sustainable beyond its current successes. Consumers are already spending money on their favorite paywalled sites.  But it would seem logical that users might not be thus willing to pay for Facebook and others that have always been free. As well, cable’s cord-cutters are continuing to exhibit steady growing in their numbers and their migrations towards streaming services such as Amazon Prime.6

Among the media giants, CBS seems to be getting their adaptive strategies right from continuing to grow multiple revenue streams. They now have the legal rights and financial resources to produce and sell original programming. They have also recently launched original web programming such as Star Trek: Discovery on a commercial-free subscription basis on CBS All Access. This can readily be seen as a challenge to Netflix despite the fact that CBS also providing content to Netflix. Will other networks emulate this lucrative and eyeball attracting model?

As Auletta also concludes, for now at least, consumers as frenemies, appear to be the beneficiaries of all this tumult. They have many device agnostic platforms, pricing options and a surfeit of content from which to choose. They can also meaningfully reduce, although not entirely eliminate, ads following them all over the web and those pesky stealth tracking systems. Whether they collectively can maintain their advantage is subject to sudden change in this environment.

Because of the timing of the book’s completion and publication, the author and publisher should consider including in any subsequent edition the follow-up impacts of Sorrell’s departure from WPP and his new venture (S4 Capital), the effects of the May 2018 implementation of EU’s General Data Protection Regulation (GDPR), and the progress of any industry or government regulation following the raft of recent massive data breaches and misuses.

Notwithstanding that, however, “Frenemies” fully delivers on all of its book jacket’s promises and premises. It is a clear and convincing case of truth in, well, advertising.

So, how would Frank Bartles and Ed Jaymes 2.0 perceive their promotional travels throughout today’s world? Would their folksy personas play well enough on YouTube to support a dedicated channel for them? Would their stops along the way be Instagram-able events? What would be their reactions when asked to Google something or download a podcast?

Alternatively, could they possibly have been proto-social media influencers who just showed up decades too soon? Nah, not really. Even in today’s digital everything world, Frank and Ed 1.0 still abide. Frank may have also unknowingly planted a potential meme among today’s frenemies with his persistent proclamations of “Thanks for your support”: The 2018 upgrade might well be “Thanks for your support and all of your data”.

 


For a very enlightening interview with Ken Auletta, check out the June 26, 2018 podcast entitled Game Change: How the Ad Business Got Disrupted, from The Midday Show on WNYC (the local NPR affiliate in New York).


September 4, 2018 Update: Today’s edition of The New York Times contains an highly enlightening article directly on point with many of the key themes of Frenemies entitled Amazon Sets Its Sights on the $88 Billion Online Ad Market, by Julie Creswell. The report details Amazon’s significant move into online advertising supported by its massive economic, data analytics, scaling and strategic resources. It comprehensively analyzes the current status and future prospects of the company’s move into direct competition with Google and Facebook in this immense parsec of e-commerce. I highly recommend a click-through and full read of this if you have an opportunity.


1.   The classic work on the causes and effect of market disruptions, the disruptors and those left behind is The Innovator’s Dilemma, by Clayton Christensen (HarperBusiness, 2011). The first edition of the book was published in 1992.

2.    Netflix Topples HBO in Emmy Nominations, but ‘Game of Thrones’ Still Rules, July 13, 2018, New York Times, by The Associated Press. However, see also Netflix Drops Dud on Wall St. As Subscriber Growth Flops, July 16, 2018, New York Times, by Reuters.

3.   Sorrell is reported in the book as saying he would not leave anytime soon from running WPP. However, following the book’s publication, he was asked to step down in April 2018 following allegations of inappropriate conduct. See Martin Sorrell Resigns as Chief of WPP Advertising Agency, New York Times, by Matt Stevens and Liz Alderman, April 14, 2018. Nonetheless, Sorrell has quickly returned to the industry as reported in Martin Sorrell Beats WPP in Bidding War for Dutch Marketing Firm, New York Times, by Sapna Maheshwari, July 10, 2018.

4.  For a very timely example, see The Ad Agency Giant Omnicom Has Created a New AI Tool That is Poised to Completely Change How Ads Get Made, BusinessInsider.com, by Lauren Johnson,  July 12, 2018.

5.   Two other similar anti-tracking browser extensions in wide usage include, among others Ghostery and Privacy Badger.

6.   See also  Cord-Cutting Keeps Churning: U.S. Pay-TV Cancelers to Hit 33 Million in 2018 (Study), Variety.com, by Todd Spangler, July 24, 2018.

Mary Meeker’s 2018 Massive Internet Trends Presentation

“Blue Marble – 2002”, Image by NASA Goddard Space Flight Center

Yesterday, on May 30, 2018, at the 2018 Code Conference being held this week in Rancho Palos Verdes, California, Mary Meeker, a world-renowned Internet expert and partner in the venture capital firm Kleiner Perkins, presented her seventeenth annual in-depth and highly analytical presentation on current Internet trends. It is an absolutely remarkable accomplishment that is highly respected throughout the global technology industry and economy. The video of her speech is available here on Recode.com.

Her 2018 Internet Trends presentation file is divided into a series of twelve main sections covering, among many other things: Internet user, usage and devices growth rates; online payment systems; content creation; voice interfaces’ significant potential;  user experiences; Amazon’s and Alibaba’s far-reaching effects; data collection, regulation and privacy concerns; tech company trends and investment analyses; e-commerce sectors, consumers experiences and emerging trends;  social media’s breadth, revenue streams and influences; the grown and returns of online advertising; changes in consumer spending patterns and online pricing; key transportation, healthcare and demographic patterns;  disruptions in how, where and whether we work; increasingly sophisticated data gathering, analytics and optimization; AI trends, capabilities and market drivers; lifelong learning for the workforce; many robust online markets in China for, among many, online retail, mobile media and entertainment services; and a macro analysis of the US economy and online marketplaces.

That is just the tip of the tip of the iceberg in this 294-slide deck.

Ms. Meeker’s assessments and predictions here form an extraordinarily comprehensive and insightful piece of work. There is much here for anyone and everyone to learn and consider in the current and trending states nearly anything and everything online. Moreover, there are likely many potential opportunities for new and established businesses, as well as other institutions, within this file.

I very highly recommend that you set aside some time to thoroughly read through and fully immerse your thoughts in Ms. Meeker’s entire presentation. You will be richly rewarded with knowledge and insight that can potentially yield a world of informative, strategic and practical dividends.


September 15, 2018 Update: Mary Meeker has left Kleiner Perkins to start her own investment firm. The details of this are reported in an article in the New York Times entitled Mary Meeker, ‘Queen of the Internet,’ Is Leaving Kleiner Perkins to Start a New Fund, by Erin Griffith, posted on September 14, 2018. I wish her the great success for her new venture. I also hope that she will still have enough time that she can continue to publish her brilliant annual reports on Internet trends.

I Can See for Miles: Using Augmented Reality to Analyze Business Data Sets

matrix-1013612__340, Image from Pixabay

While one of The Who’s first hit singles, I Can See for Miles, was most certainly not about data visualization, it still might – – on a bit of a stretch – – find a fitting a new context in describing one of the latest dazzling new technologies in the opening stanza’s declaration “there’s magic in my eye”.  In determining Who’s who and what’s what about all this, let’s have a look at report on a new tool enabling data scientists to indeed “see for miles and miles” in an exciting new manner.

This innovative approach was recently the subject of a fascinating article by an augmented reality (AR) designer named Benjamin Resnick about his team’s work at IBM on a project called Immersive Insights, entitled Visualizing High Dimensional Data In Augmented Reality, posted on July 3, 2017 on Medium.com. (Also embedded is a very cool video of a demo of this system.) They are applying AR’s rapidly advancing technology1 to display, interpret and leverage insights gained from business data. I highly recommend reading this in its entirety. I will summarize and annotate it here and then pose a few real-world questions of my own.

Immersive Insights into Where the Data-Points Point

As Resnick foresees such a system in several years, a user will start his or her workday by donning their AR glasses and viewing a “sea of gently glowing, colored orbs”, each of which visually displays their business’s big data sets2. The user will be able to “reach out select that data” which, in turn, will generate additional details on a nearby monitor. Thus, the user can efficiently track their data in an “aesthetically pleasing” and practical display.

The project team’s key objective is to provide a means to visualize and sum up the key “relationships in the data”. In the short-term, the team is aiming Immersive Insights towards data scientists who are facile coders, enabling them to visualize, using AR’s capabilities upon time series, geographical and networked data. For their long-term goals, they are planning to expand the range of Immersive Insight’s applicability to the work of business analysts.

For example, Instacart, a same-day food delivery service, maintains an open source data set on food purchases (accessible here). Every consumer represents a data-point wherein they can be expressed as a “list of purchased products” from among 50,000 possible items.

How can this sizable pool of data be better understood and the deeper relationships within it be extracted and understood? Traditionally, data scientists create a “matrix of 2D scatter plots” in their efforts to intuit connections in the information’s attributes. However, for those sets with many attributes, this methodology does not scale well.

Consequently, Resnick’s team has been using their own new approach to:

  • Lower complex data to just three dimensions in order to sum up key relationships
  • Visualize the data by applying their Immersive Insights application, and
  • Iteratively label and color-code the data” in conjunction with an “evolving understanding” of its inner workings

Their results have enable them to “validate hypotheses more quickly” and establish a sense about the relationships within the data sets. As well, their system was built to permit users to employ a number of versatile data analysis programming languages.

The types of data sets being used here are likewise deployed in training machine learning systems3. As a result, the potential exists for these three technologies to become complementary and mutually supportive in identifying and understanding relationships within the data as well as deriving any “black box predictive models”.

Analyzing the Instacart Data Set: Food for Thought

Passing over the more technical details provided on the creation of team’s demo in the video (linked above), and next turning to the results of the visualizations, their findings included:

  • A great deal of the variance in Instacart’s customers’ “purchasing patterns” was between those who bought “premium items” and those who chose less expensive “versions of similar items”. In turn, this difference has “meaningful implications” in the company’s “marketing, promotion and recommendation strategies”.
  • Among all food categories, produce was clearly the leader. Nearly all customers buy it.
  • When the users were categorized by the “most common department” they patronized, they were “not linearly separable”. This is, in terms of purchasing patterns, this “categorization” missed most of the variance in the system’s three main components (described above).

Resnick concludes that the three cornerstone technologies of Immersive Insights – – big data, augmented reality and machine learning – – are individually and in complementary combinations “disruptive” and, as such, will affect the “future of business and society”.

Questions

  • Can this system be used on a real-time basis? Can it be configured to handle changing data sets in volatile business markets where there are significant changes within short time periods that may affect time-sensitive decisions?
  • Would web metrics be a worthwhile application, perhaps as an add-on module to a service such as Google Analytics?
  • Is Immersive Insights limited only to business data or can it be adapted to less commercial or non-profit ventures to gain insights into processes that might affect high-level decision-making?
  • Is this system extensible enough so that it will likely end up finding unintended and productive uses that its designers and engineers never could have anticipated? For example, might it be helpful to juries in cases involving technically or financially complex matters such as intellectual property or antitrust?

 


1.  See the Subway Fold category Virtual and Augmented Reality for other posts on emerging AR and VR applications.

2.  See the Subway Fold category of Big Data and Analytics for other posts covering a range of applications in this field.

3.  See the Subway Fold category of Smart Systems for other posts on developments in artificial intelligence, machine learning and expert systems.

4.  For a highly informative and insightful examination of this phenomenon where data scientists on occasion are not exactly sure about how AI and machine learning systems produce their results, I suggest a click-through and reading of The Dark Secret at the Heart of AI,  by Will Knight, which was published in the May/June 2017 issue of MIT Technology Review.

Ethical Issues and Considerations Arising in Big Data Research

Image from Pixabay

Image from Pixabay

In 48 of 50 states in the US, new attorneys are required to pass a 60 multiple-choice question exam on legal ethics in addition to passing their state’s bar exam. This is known as the Multistate Professional Responsibility Examination (MPRE). I well recall taking this test myself.

The subject matter of this test is the professional ethical roles and responsibilities a lawyer must abide by as an advocate and counselor to clients, courts and the legal profession. It is founded upon a series of ethical considerations and disciplinary rules that are strictly enforced by the bars of each state. Violations can potentially lead to a series of professional sanctions and, in severe cases depending upon the facts, disbarment from practice for a term of years or even permanently.

In other professions including, among others, medicine and accounting, similar codes of ethics exist and are expected to be scrupulously followed. They are defined efforts to ensure honesty, quality, transparency and integrity in their industries’ dealings with the public, and to address certain defined breaches. Many professional trade organizations also have formal codes of ethics but often do not have much, if any, sanction authority.

Should some comparable forms of guidelines and boards likewise be put into place to oversee the work of big data researchers? This was the subject of a very compelling article posted on Wired.com on May 20, 2016, entitled Scientists Are Just as Confused About the Ethics of Big-Data Research as You by Sharon Zhang. I highly recommend reading it in its entirety. I will summarize, annotate and add some further context to this, as well as pose a few questions of my own.

Two Recent Data Research Incidents

Last month. an independent researcher released, without permission, the profiles with very personal information of 70,000 users of the online dating site OKCupid. These users were quite angered by this. OKCupid is pursuing a legal claim to remove this data.

Earlier in 2014, researchers at Facebook manipulated items in users’ News Feeds for a study on “mood contagion“.¹ Many users were likewise upset when they found out. The journal that published this study released an “expression of concern”.

Users’ reactions over such incidents can have an effect upon subsequent “ethical boundaries”.

Nonetheless, the researchers involved in both of these cases had “never anticipated” the significant negative responses to their work. The OKCupid study was not scrutinized by any “ethical review process”, while a review board at Cornell had concluded that the Facebook study did not require a full review because the Cornell researchers only had a limited role in it.

Both of these incidents illustrate how “untested the ethics” are of these big data research. Only now are the review boards that oversee the work of these researchers starting to pay attention to emerging ethical concerns. This is in high contrast to the controls and guidelines upon medical research in clinical trials.

The Applicability of The Common Rule and Institutional Research Boards

In the US, under the The Common Rule, which governs ethics for federally funded biomedical and behavioral research where humans are involved, studies are required to undergo an ethical review.  However, such review does not apply a “unified system”, but rather, each university maintains its own institutional review board (IRB). These are composed of other (mostly medical) researchers at each university. Only a few of them “are professional ethicists“.

To a lesser extent, do they have experience in computer technology. This deficit may be affecting the protection of subjects who participate in data science research projects. In the US, there are hundreds of IRBs but they are each dealing with “research efforts in the digital age” in their own ways.

Both the Common Rule and the IRB system came into being following the revelation in the 1970s that the U.S. Public Health Service had, between 1932 and 1972, engaged in a terrible and shameful secret program that came to be known as the Tuskegee Syphilis Experiment. This involved leaving African Americans living in rural Alabama with untreated syphilis in order to study the disease. As a result of this outrage, the US Department of Health and Human Services created new regulations concerning any research on human subjects they conducted. All other federal agencies likewise adopted such regulations. Currently, “any institution that gets federal funding has to set up an IRB to oversee research involving humans”.

However, many social scientists today believe these regulations are not accurate or appropriate for their types of research involving areas where the risks involved “are usually more subtle than life or death”. For example, if you are seeking volunteers to take a survey on test-taking behaviors, the IRB language requirements on physical risks does not fit the needs of the participants in such a study.

Social scientist organizations have expressed their concern about this situation. As a result, the American Association of University Professors (AAUP) has recommended:

  • Adding more social scientists to IRBs, or
  • Creating new and separate review boards to assess social science research

In 2013, AAUP issued a report entitled Regulation of Research on Human Subjects: Academic Freedom and the Institutional Review Board, recommending that the researchers themselves should decide if “their minimal risk work needs IRB approval or not”. In turn, this would make more time available to IRBs for “biomedical research with life-or-death stakes”.

This does not, however, imply that all social science research, including big data studies, are entirely risk-free.

Ethical Issues and Risk Analyses When Data Sources Are Comingled

Dr. Elizabeth A. Buchanan who works as an ethicist at the University of Wisconsin-Stout, believes that the Internet is now entering its “third phase” where researchers can, for example, purchase several years’ worth of Twitter data and then integrate it “with other publicly available data”.² This mixture results in issues involving “ethics and privacy”.

Recently, while serving on an IRB, she took part in evaluated a project proposal involving merging mentions of a drug by its street name appearing on social media with public crime data. As a result, people involved in crimes could potentially become identified. The IRB still gave its approval. According to Dr. Buchanan, the social value of this undertaking must be weighed against its risk. As well, the risk should be minimized by removing any possible “idenifiers” in any public release of this information.

As technology continues to advance, such risk evaluation can become more challenging. For instance, in 2013, MIT researchers found out that they were able to match up “publicly available DNA sequences” by using data about the participants that the “original researchers” had uploaded online.³ Consequently, in such cases, Dr. Buchanan believes it is crucial for IRBs “to have either a data scientist, computer scientist or IT security individual” involved.

Likewise, other types of research organizations such as, among others, open science repositories, could perhaps “pick up the slack” and handle more of these ethical questions. According to Michelle Meyer, a bioethicist at Mount Sinai, oversight must be assumed by someone but the best means is not likely to be an IRB because they do not have the necessary “expertise in de-identification and re-identification techniques”.

Different Perspectives on Big Data Research

A technology researcher at the University of Maryland 4 named Dr. Katie Shilton recently conducted interviews of “20 online data researchers”. She discovered “significant disagreement” among them on matters such as the “ethics of ignoring Terms of Service and obtaining informed consent“. The group also reported that the ethical review boards they dealt with never questioned the ethics of the researchers, while peer reviewers and their professional colleagues had done so.

Professional groups such as the Association of Internet Researchers (AOIR) and the Center for Applied Internet Data Analysis (CAIDA) have created and posted their own guidelines:

However, IRBs who “actually have power” are only now “catching up”.

Beyond universities, tech companies such as Microsoft have begun to establish in-house “ethical review processes”. As well, in December 2015, the Future of Privacy Forum held a gathering called Beyond IRBs to evaluate “processes for ethical review outside of federally funded research”.

In conclusion., companies continually “experiment on us” with data studies. Just to name to name two, among numerous others, they focus on A/B testing 5 of news headings and supermarket checkout lines. As they hire increasing numbers of data scientists from universities’ Ph.D. programs, these schools are sensing an opportunity to close the gap in terms of using “data to contribute to public knowledge”.

My Questions

  • Would the companies, universities and professional organizations who issue and administer ethical guidelines for big data studies be taken more seriously if they had the power to assess and issue public notices for violations? How could this be made binding and what sort of appeals processes might be necessary?
  • At what point should the legal system become involved? When do these matters begin to involve civil and/or criminal investigations and allegations? How would big data research experts be certified for hearings and trials?
  • Should teaching ethics become a mandatory part of curriculum in data science programs at universities? If so, should the instructors only be selected from the technology industry or would it be helpful to invite them from other industries?
  • How should researchers and their employers ideally handle unintended security and privacy breaches as a result of their work? Should they make timely disclosures and treat all inquiries with a high level of transparency?
  • Should researchers experiment with open source methods online to conduct certain IRB functions for more immediate feedback?

 


1.  For a detailed report on this story, see Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry, by Vindu Goel, in the June 29, 2014 edition of The New York Times.

2These ten Subway Fold posts cover a variety of applications in analyzing Twitter usage data.

3.  For coverage on this story see an article published in The New York Times on January 17, 2013, entitled Web Hunt for DNA Sequences Leaves Privacy Compromised, by Gina Kolata.

4.  For another highly interesting but unrelated research initiative at the University of Maryland, see the December 27, 2015 Subway Fold post entitled Virtual Reality Universe-ity: The Immersive “Augmentarium” Lab at the U. of Maryland.

5.  For a detailed report on this methodology, see the September 30, 2015 Subway Fold post entitled Google’s A/B Testing Method is Being Applied to Improve Government Operations.

“Technographics” – A New Approach for B2B Marketers to Profile Their Customers’ Tech Systems

"Gold Rings - Sphere 1" Image by Linda K

“Gold Rings – Sphere 1” Image by Linda K

Today’s marketing and business development professionals use a wide array of big data collection and analytical tools to create and refine sophisticated profiles of market segments and their customer bases. These are deployed in order to systematically and scientifically target and sell their goods and services in steadily changing marketplaces.

These processes can include, among a multitude of other vast data sets and methodologies, demographics, web user metrics and econometrics. Businesses are always looking for a data-driven edge in highly competitive sectors and such profiling, when done correctly, can be very helpful in detecting and interpreting market trends, and consistently keeping ahead of their rivals. (The Subway Fold category of Big Data and Analytics now contains 50 posts about a variety of trends and applications in this field.)

I will briefly to this add my own long-term yet totally unscientific study of office-mess-ographics. Here I have been looking for any correlation between the relative states of organization – – or entropy – – in people’s offices and their work’s quality and output.  The results still remain inconclusive after years of study.

One of the most brilliant and accomplished people I have ever known had an office that resembled a cave deep in the earth with piles of paper resembling stalagmites all over it. Even more remarkably, he could reach into any one of those piles and pull out exactly the documents he wanted. His work space was so chaotic that there was a long-standing joke that Jimmy Hoffa’s and Judge Crater’s long-lost remains would be found whenever ever he retired and his office was cleaned out.

Speaking of office-focused analytics, an article posted on VentureBeat.com on March 5, 2016, entitled CMOs: ‘Technographics’ is the New Demographics, by Sean Zinsmeister, brought news of a most interesting new trend. I highly recommend reading this in its entirety. I will summarize and add some context to it, and then pose a few question-ographics of my own.

New Analytical Tool for B2B Marketers

Marketers are now using a new methodology call technography to analyze their customers’ “tech stack“, a term of art for the composition of their supporting systems and platforms. The objective of this approach is to deeply understand what this says about them as a company and, moreover, how can this be used in business-to-business (B2B) marketing campaigns. Thus applied, technography can identify “pain points” in products and alleviate them for current and prospective customers.

Using established consumer marketing methods, there is much to be learned and leveraged on how technology is being used by very granular segments of users bases.  For example:

By virtue of this type of technographic data, retailers can target their ads in anticipation of “which customers are most likely to shop in store, online, or via mobile”.

Next, by transposing this form of well-established marketing approach next upon B2B commerce, the objective is to carefully examine the tech stacks of current and future customers in order to gain a marketing advantage. That is, to “inform” a business’s strategy and identify potential new roles and needs to be met. These corporate tech stacks can include systems for:

  • Office productivity
  • Project management
  • Customer relationship management (CRM)
  • Marketing

Gathering and Interpreting Technographic Signals and Nuances

Technographics can provide unique and valuable insights into assessing, for example, whether a customer values scalability or ease-of-use more, and then act upon this.

As well, some of these technographic signals can be indicative of other factors not, per se, directly related to technology. This was the case at Eloqua, a financial technology concern. They noticed their marketing systems have predictive value in determining the company’s best prospects. Furthermore, they determined that companies running their software were inclined “to have a certain level of technological sophistication”, and were often large enough to have the capacity to purchase higher-end systems.

As business systems continually grow in their numbers and complexity, interpreting technographic nuances has also become more of a challenge. Hence, the application of artificial intelligence (AI) can be helpful in detecting additional useful patterns and trends. In a July 2011 TED Talk by Ted Slavin, directly on point here, entitled How Algorithms Shape Our World, he discussed how algorithms and machine learning are needed today to help make sense out of the massive and constantly growing amounts of data. (The Subway Fold category of Smart Systems contains 15 posts covering recent development and applications involving AI and machine learning.)

Technographic Resources and Use Cases

Currently, technographic signals are readily available from various data providers including:

They parse data using such factors as “web hosting, analytics, e-commerce, advertising, or content management platforms”. Another firm called Ghostery has a Chrome browser extension illuminating the technologies upon which any company’s website is built.

The next key considerations are to “define technographic profiles and determine next-best actions” for specific potential customers. For instance, an analytics company called Looker creates “highly targeted campaigns” aimed at businesses who use Amazon Web Services (AWS). The greater the number of marketers who undertake similar pursuits, the more they raise the value of their marketing programs.

Technographics can likewise be applied for competitive leverage in the following use cases:

  • Sales reps prospecting for new leads can be supported with more focused messages for potential new customers. These are shaped by understanding their particular motivations and business challenges.
  • Locating opportunities in new markets can be achieved by assessing the tech stacks of prospective customers. Such analytics can further be used for expanding business development and product development. An example is the online training platform by Mindflash. They detected a potential “demand for a Salesforce training program”. Once it became available, they employed technographic signals to pinpoint customers to whom they could present it.
  • Enterprise wide decision-making benefits can be achieved by adding “value in areas like cultural alignment”. Familiarity with such data for current employees and job seekers can aid businesses with understanding the “technology disposition” of their workers. Thereafter, its alignment with the “customers or partners” can be pursued.  Furthermore, identifying areas where additional training might be needed can help to alleviate productivity issues resulting from “technology disconnects between employees”.

Many businesses are not yet using technographic signals to their full advantage. By increasing such initiatives, businesses can acquire a much deeper understanding of their inherent values. In turn, the resulting insights can have a significant effect on the experiences of their customers and, in turn, elevate their resulting levels of loyalty, retention and revenue, as well as the magnitude of deals done.

My Questions

  • Would professional service industries such as law, medicine and accounting, and the vendors selling within these industries, benefit from integrating technographics into their own business development and marketing efforts?
  • Could there be, now or in the future, an emerging role for dedicated technographics specialists, trainers and consultants? Alternatively, should these new analytics just be treated as another new tool to be learned and implemented by marketers in their existing roles?
  • If a company identifies some of their own employees who might benefit from additional training, how can they be incentivized to participate in it? Could gamification techniques also be applied in creating these training programs?
  • What, if any, privacy concerns might surface in using technographics on potential customer leads and/or a company’s own internal staff?