Practical Data Science and the Tricky Business of A/B Testing

By on

by James Kobielus

Increasingly, the best websites aren’t so much designed as calculated. And that calculation, more often than not, rides on a never-ending campaign of A/B testing. And the A/B testing, in turn, is under the constant supervision of data scientists running one real world experiment after another.

The calculations for modern website optimization usually involve maximizing or trading off various operational metrics with some relevance to one or more business objectives. Typically, A/B testing’s focus is comparison of the real world results from deployment of two or more design options – such as whether to make your e-commerce site’s “purchase” button blue, green, or chartreuse – in otherwise equivalent production environments. All other factors held constant, if one option correlates with, for example, a greater reduction in shopping-cart abandonment rates, compared to the others, those results may figure into your decision to incorporate that option as a standard setting in your website design from now on. Or until such time as other colors or design alternatives deliver better results in future A/B tests.

Increasingly, businesses are using A/B testing to drive continuous, incremental improvements to every last detail of their online designs, content offerings, customer engagement strategies, and so forth. It’s a painstaking, behind-the-scenes activity that depends on a closed feedback loop of operational performance metrics driving tweaks to the design elements, predictive models, business rules, and other artifacts that constitute your business technology infrastructure.

In this new order, data scientists experiment continuously by deploying new predictive models, business rules, and orchestration logic into front-office and back-office applications. They might experiment with different logic to drive customer handling across different engagement channels. They might play with different models for differentiating offers by customer demographics, transaction history, times of day, and other variables. They might examine the impact of running different process models at different times of the day, week, or month in back-office processes, such as order fulfillment, materials management, manufacturing, and logistics, in order to determine which can maximize product quality while reducing time-to-market and life-cycle costs.

The beauty of real-world experiments is that you can continuously and surreptitiously test diverse scenarios inline to your running business. Your data scientists can compare results across differentially controlled scenarios in a systematic, scientific manner. They can use the results of these in-production experiments – such as improvements in response, acceptance, satisfaction, and defect rates – to determine which models work best in various circumstances.

In assessing the efficacy of models in the real world, your data scientists will want to isolate key comparison variables through A/B testing. They should iterate through successive tests by rapidly deploying challenger models in place of the former in-production champion models as soon as the latter become less predictive.  The key development approaches that facilitate these experiments include champion/challenger modeling, real-time model scoring, and automatic best-model selection. Data scientists should also use adaptive machine-learning techniques to generate a steady stream of alternate “challenger” models and rules to automatically kick into production when they score higher than the in-production “champion” models/rules in predictive power.

One of the benefits of A/B testing is that it is often conducted non-disruptively in the background of your production environment, with users being unaware that you are evaluating disparate options across different sites, applications, channels, customer segments, geographies, times of day, days of the week, and so forth. Another benefit is that your competition may also be unaware of these incremental improvements. To the extent that you are steadily making subtle revisions to your overall application infrastructure over time, you can introduce significant innovations in stealth, long before your competition realizes what you’re up to.

The process improvements that A/B testing delivers are often a closely held trade secret. But the process of A/B testing is not usually a secret; in fact, it is standard operating procedure in a growing range of Internet-facing industries, especially e-commerce, social-media, and online publishing. And it’s at the heart of customer experience optimization practices in any industry that seeks to improve the usability of their online presence.

That’s why it was great to learn that Facebook has open-sourced its code of managing A/B testing. Per this recent article, Facebook has released a portion of its A/B testing code, called PlanOut, that helps data scientists build and manage experiments while ensuring accurate testing results.

A/B testing is quite tricky to get right and enforce consistency in how different data scientists conduct it on different projects. In announcing open-sourcing of PlanOut code, Facebook data scientists discussed the impetus thusly: “At Facebook, we run over a thousand experiments each day. While many of these experiments are designed to optimize specific outcomes, others aim to inform long-term design decisions. And because we run so many experiments, we need reliable ways of routinizing experimentation. … Many online experiments are implemented by engineers who are not trained statisticians. While experiments are often simple to analyze when done correctly, it can be surprisingly easy to make mistakes in their design, implementation, logging, and analysis.”

It’s no surprise that Facebook took such an initiative. Zuckerberg and crew have been prime movers in open-source software projects such as Apache Hadoop and even open-source hardware initiatives such as the Open Compute Project. Now, with the open-sourcing of PlanOut for A/B test management, it’s clear that Facebook also recognize the importance of open, flexible, pervasive, and standard operational data-science practices in the online economy.

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept