June 30, 2017 Update: This post was originally uploaded on September 30, 2015. It has been updated with new information below.
During my annual visit with my ophthalmologist, he always checks the accuracy of the prescription for my glasses by trying out different pairs of lenses and then asking me to read the letter chart on the wall. For each eye, he switches the lenses back and forth and asks me a series of times “Which is better, 1 or 2?”. This is called a refraction test. My answers either confirm that my current lenses are correct or that I need an updated prescription for new lenses.
I never realized until recently that this method of testing is very similar to one tech companies use to measure and adjust the usability of their products and services. (I view this as my own bit of, well, in-sight.) This process is called “A/B testing“, where test subjects are shown two nearly identical versions of something with one of them containing some slight variation. Then they are asked to choose which one they prefer between the two.
What if this method was transposed and applied in a seemingly non-intuitive leap to the public sector? A new initiative founded upon this by the US federal government was reported on in a fascinating and instructive article in the September 26, 2015 edition of The New York Times entitled A Better Government, One Tweak at a Time, by Justin Wolfers*. I highly recommend reading it in its entirety. I will summarize and annotate it, and then ask some of my own non-A/B questions. (There is another very informative article on this topic, covering the US and elsewhere, in today’s September 30, 2015 edition of The New York Times entitled Behaviorists Show the U.S. How to Improve Government Operations, by Binyamin Appelbaum.)
Google makes extensive use of this method in their testing and development projects. Their A/B testing has confirmed an effect that social scientists have known for years in that “small changes in how choices are presented can lead to big changes in behavior”. Moreover, effective design is not so much about an aesthetically pleasing appearance as it is about testing competing ideas and gathering data to evaluate which of them works best.
Last year, this project team introduced the effectiveness and success of A/B to the public sector was launched when the federal government organized a group of officials (enlisted from a wide variety of backgrounds and professions), called the Social and Behavioral Sciences Team (SBST). It is also referred to as the “Nudge Unit“. Their mandate was to “design a better government”. They set out to A/B test different government functions to see what works and what does not.
After a year in operation, they have recently released their first annual report, detailing the many “small tweaks” they have implemented. Each of these changes was subjected to A/B testing. Their results have been “impressive” and imply that their efforts will save $Millions, if not $Billions. Moreover, because these changes are so relatively inexpensive, “even moderate impacts” could produce remarkably “high cost-benefit ratios”.
Among the SBST’s accomplishments are the following:
- Improving Printing Efficiency: Some, but not all, printers at the US Department of Agriculture presented users with a pop-up message to encourage two-sided printing. As a result, two-sided printing rose by 6%. While this sounds small, its magnitude quickly scales up because US government printers produce 18 billion pages each year. The SBST report suggests that implementing this for the entire federal government could potentially save more than half a billion pages a year.
- Reminding High School Graduates to Finish Their College Enrollment: Text messages were sent by the researchers to high school students during the summer after their graduation, urging them to follow-up on the next steps needed to enroll in college. The differential of those who received the texts and those who did not, in terms of completing their enrollment, was 68% to 65%, respectively. The positive effect was more pronounced for low-income students who got these texts. While this 3% improvement also might not sound so large, at a mere cost of doing this at $7 per student, it proved to be tremendously cost-effective as compared to the $Thousands it otherwise costs to offer “grant and scholarship aid”.
- Increasing Vendors’ Honesty on Tax Forms: Prompts were randomly placed on some versions of a federal-vendor tax collection form asking vendors to be truthful in completing it. Those who used the form containing the prompt reported more taxable sales than those using the untweaked form. In turn, this resulted in vendors voluntarily paying “an additional $1.6 million in taxes”. Again, scaling up this experiment could potentially raise additional $Billions in tax revenue.
- Raising Applications by Those Eligible for Student Loan Relief: The government knows, through their own methods, who is struggling to repay their federally funded student loans. Another experiment sent a selected group of them emails about applying for loan relief resulted in “many more” applying for it than those who did not receive this message.
- Lifting Savings Rates for People in the Military: When members of the military service were transferred to Joint Base Myer-Henderson Hall in Virginia, they received a prompt to enroll in the military’s savings plan. The result was a significant rise in participants. This contrasts with no increase by other who were transferred to Fort Bragg in North Carolina and not prompted.
- Other Successful Experimental “Nudges”:
- Well written letters resulting in more health care sign-ups
- Emails urging employees to join workplace savings plans
- Shortened URLs encouraging more people to pay bills online
- Telling veterans that they earned rather than were entitled to a program increased their participation in it.
Justin Wolfers, the author of this article, concludes that it is the testing itself that makes for these successes. He very succinctly summarizes this by stating:
“Experiment relentlessly, keep what works, and discard what doesn’t.”
He further asserts that if this is done as Google has done it, the US government might likewise become “clear, user-friendly and unflinchingly effective”.
My own questions about A/B testing by the government include:
- Would it also produce cost-effective results for state and local governments? Are there any applications that could be done on a multi-national or even global level?
- Could it be applied to improve electronic and perhaps even online public voting systems?
- Could it bring incremental improvements in government administered health programs?
- What would be the result if the government asked the public to submit suggestions online for new A/B testing applications? Could A/B testing itself be done by governments online?
- Does it lend itself to being open sourced for test projects in the design, collection and interpretation of data?
An earlier and well-regarded book about using a variety of forms of nudges to improve public policies and functions is Nudge: Improving Decisions About Health, Wealth, and Happiness, by Richard H. Thaler and Cass R. Sunstein (Penguin Press, 2009).
June 30, 2017 Update: For a timely and valuable primer and update on A/B testing I highly recommend a click-through and full reading of A Refresher on A/B Testing, by Amy Gallo (@amygallo), posted 6/28/17 on the Harvard Business Review blog. The author expertly covers the definition, process, interpretation, applications and errors of this methodology.
* The author’s bio in this article states he is “a senior fellow at the Peterson Institute for International Economics and professor of economics and public policy at the University of Michigan“. (The links were added by me and not included in the original text.)