“The secret sauce for a successful online business is experimentation.” - Hal Varian, Chief Economist, Google
A/B testing is the most rigorous approach organizations can take to validate the impact that they are having.
When you need a higher degree of certainty in the changes you’re making, A/B testing is the ultimate approach to implement
That’s because A/B testing allows you to quantify how each individual feature you ship impacts success metrics. For that reason A/B testing is a powerful tool to determine whether or not you’re having impact.
But A/B testing has downsides.
A/B testing cannot tell you why something is happening. Being over-reliant on A/B testing to make decisions means that you might not move the needle in your business as much as you would like.
The other risk is that you make mistakes in your A/B testing procedure and misinterpret results.
This article will introduce you to A/B testing; share advantages and disadvantages of A/B testing, and walk you through how to make A/B testing successful.
We also share a guide for how to set up A/B testing within your organization, or implement it as a program in your product management team.
“Experimentation is so powerful because it removes the need for any 'rational' guess-work.” - Jonny Longden
What is A/B testing?
A/B testing is a technique to understand, using statistical analysis, whether one product or feature performs better for users than another.
Users are split into 2 randomly selected segments, and are either shown the original variant, or a new treatment.
Using these two segments, you can measure how your selected primary metric performs in each pool.
Once the results are statistically significant, you can assess whether the new treatment was better than the original.
Typically you are testing a feature or item based on a user hypothesis.
Example:
Hypothesis: By removing an information gathering screen post first time user log in, we will increase activated users, since they will onboard into the experience faster, and start to use it right away.
Control Group A: See the same experience when they first sign up
Test Group B: Information gathering screen post log in is removed when they first sign up.
Primary metric: Weekly activated users
Secondary metric (monitor): Monthly retention rate of A vs B
A/B testing is particularly useful when you have a complex hypothesis to test.
For example, removing an information gathering screen might get the user to experience the product faster, by accelerating their introduction to the product.
On the other hand, the screen might have gathered valuable information that allowed you to tailor the product better for the user, and send them more appropriate onboarding and retention messaging.
There might also be an effect where the users who had true intent to use the product were happy to complete the information form.
In this case you’re not sure which feature drives activation: and hence the test can help you find it out.
Advantages of A/B testing in product management
“When is the best time to A/B test? The short answer is always. The more realistic answer is whenever you can.” - Jon Simpson, Forbes.
An effective A/B testing program is all about
- Data over opinions: by creating a culture of learning it removes subjectivity
- Preventing downsides: by avoiding shipping products or features which drive negative results
- Creating focus: by spotting pockets of value early
- Shipping better: MVP code vs longer cycles, wrappers that allow you to spot errors quickly
- Maintaining uplift: The impact of an experiment doesn’t last forever, you need to keep running new tests to stay ahead of your competition
Data over opinions
Often product teams are surrounded by stakeholders that believe they have the best ideas.
The reality is, that no one really knows which ideas will add the most impact, until you present these to customers at scale.
Through implementing A/B testing in an organization, you can quickly remove reliance on subjective opinions, and continuously learn about what really has the biggest impact on customers.
This can create an incredible culture of empowered teams that can model, hypothesise and showcase the impact of their work.
Preventing downsides
A negative A/B test is as useful as a positive one.
Booking.com have one of the most advanced experimentation cultures in the tech industry.
They run thousands of tests a quarter, and have collected a massive library of historical data. The typical success rate for their testing program is just 10-20%.
Although that makes it seem like A/B testing isn’t effective, it actually means the opposite.
Instead of just shipping features, the company discovered that the features they were going to ship either had negative consequences, or had limited impact on the desired business metrics. They didn’t ship as a result.
As product managers, every change we implement we think will have a positive impact, but the success rates of A/B testing show how little we know until we test it.
“The vast majority of experiments will not impact your tracked metrics one way or another.” Oliver Palmer, Ex-Optimisation Manager at The Telegraph.
Get the Hustle Badger Case Study on Booking.com’s experimentation culture
Creating focus
A/B testing takes a lot of the guesswork away from doing product. Instead of continuing to focus on areas that you think might drive change, you build conclusive evidence on areas that impact user behavior. You can retain that library of evidence and data as a company.
This means you get better at knowing what is likely to drive impact and where the pockets of value might be. With this insight, you can double down your efforts on areas that will drive change.
Shipping better
By using your experimentation platform as the way you deploy code, you can deploy changes quicker, and turn them off quicker, if you see that something has gone wrong on the website,
‘There are essentially three main reasons why we use the experimentation platform as part of our development process.
- It allows us to deploy new code faster and more safely.
- It allows us to turn off individual features quickly when needed.
- It helps us validate that our product changes have the expected impact on the user experience.’
- Lukas Vermeer, Booking.com experimentation
Experimentation acts as an extra safety net. If all shipped code is wrapped in an experiment, then it is easy to turn the feature on and off again if something is broken.
Uplift has to be maintained
The benefits of an experiment don’t last forever. Competitors can copy your changes and customers have a recency bias, meaning the impact of the experiment isn’t as strong over time.
This means that even when you ship something which initially drives uplift, that uplift effect might decay over time.
For example, let’s say your experiment to remove the information gathering screen was successful in activating more users initially.
But at 6 months you might see activation rate return to close to its historic level, as the adoption advantage your product gained by allowing users to interact immediately with the product decays as it becomes the norm in the market.
By creating a continuous experimentation program, you will constantly refine your experience to stay ahead of the curve.