Data May 31, 2013 4 min read

Experiments and A/B thinking: what digital products can teach everyone else

Not every decision needs a year-long project. Some of them should be tested with fast, cheap experiments - even in non-digital environments.

Technology companies building consumer products have long settled into a norm: before rolling out a change globally, they test it on a small portion of users. Button on the right or on the left. New ranking algorithm or old. Email subject line A or B. Compare the result, make the decision.

This is called A/B testing, or more broadly, a culture of experimentation. And I have been wondering for a while: why is this logic applied so rarely outside digital products?

Why it seems inapplicable

The standard response is "our situation is different." In manufacturing you cannot run two processes in parallel. In service businesses customers will notice the difference. In B2B there are too few transactions to get a statistically significant result.

Some of these objections are valid. But some of them are simply the habit of thinking in terms of large projects.

When I talk to executives outside the technology sector, I often notice they make decisions in one of two ways. Either on the basis of intuition and past experience, or by launching a full project with justification, approval, and implementation that takes months. The middle option - a quick, cheap hypothesis test - is almost never there.

What an experiment looks like outside the digital world

An experiment in an industrial or service context does not have to look like a classic A/B test with control and test groups. It can be:

a pilot on one production line before scaling;
a trial change to a sales script for one team;
a test of a new report format in one department before rolling it out company-wide;
running a new intake process on one warehouse.

The common principle: the change is applied to part of the system, the result is measured, and a conclusion is drawn before scaling. This is not science in the strict sense - but it is far better than "let's launch and see" at the scale of the whole organisation.

What it needs to work

Three conditions without which an experiment makes no sense.

First - the hypothesis is formulated before you start. What exactly are we testing? What result will count as confirmation, and what will count as refutation? If the hypothesis is formulated after seeing the results, that is not an experiment - it is a post-hoc explanation of what happened.

Second - a metric that can be measured. Not "it got better," but "the yield of acceptable products rose from X to Y" or "the time to process one order fell from A to B." Without a measurable result there is no experiment - there is an impression.

Third - willingness to accept a negative result. This is the hardest part. A culture where "a negative result is a failure" kills experiments faster than any technical constraint. If the team knows that a "does not work" finding will be read as a personal mistake, they will not test hypotheses that might not confirm.

Where this does not work

An honest conversation requires acknowledging limits.

Where a change cannot be reversed, experimenting is dangerous. Where the sample is too small and any random fluctuation will look like a result, the conclusions are unreliable. Where the cost of preparing a pilot is comparable to the cost of full implementation, there is no point.

But there are fewer such situations than people usually think. Most operational hypotheses can be tested quickly, cheaply, and reversibly - if the effort is made deliberately.

A few questions to get started

Before the next major change in the company, it is worth asking:

Can this be tested on part of the system first?
What exactly do we want to learn - and how will we know when we have learned it?
Do we have baseline indicators to compare the result against?
Who makes the decision based on the experiment, and by what criteria?
If the result is negative - what do we do?

Sometimes the answer to the first question is no, it is impossible. But more often than it seems, the answer is yes - if the approach is restructured slightly.

Back to all posts

Contact

Why it seems inapplicable

What an experiment looks like outside the digital world

What it needs to work

Where this does not work

A few questions to get started

If this resonated, write to me. I reply personally.