June 12th, 2009I finished reading the book Super Crunchers by Ian Ayres. I thought it was good. I liked his explanation of randomized a/b or multi variant testing done online and off.
He explains that randomly dividing prospects into two groups and seeing which approach has the highest rate is one of the most powerful super crunching techniques ever devised.
When you rely on historical data, it is much harder to tease out causation. The sample size is key. If we get a large enough sample, we can be pretty sure that the group coming up heads will be statistically identical to the group coming up tails. If we then intervene to treat the heads differently, we can measure the pure effect of the intervention…after randomization makes the two groups identical on every other dimension, we can be confident that any change in the two groups outcome was caused by their different treatment.
Of course, randomization doesn’t mean that those who were treated differently are exactly the same as those who were not treated differently. If we looked at the heights of people in one group, we would see a bell curve of heights. The point is that we would see the same bell curve of heights for those for those in the other group. Since the distribution of both groups becomes increasingly identical as the sample size increases, then we can attribute any differences in the average group response to the difference in treatment.
In lab experiments, researches create data by carefully controlling for everything to create matched pairs that are identical except for the thing being tested. Outside of the lab, it’s sometimes simply impossible to create pairs that are the same on all peripheral dimensions. Randomization is how businesses can create data without creating perfectly matched distributions.
The power behind randomized testing is undeniable. So should we just have computers make all our decisions for us? With that question in mind is were he goes throughout the majority of the book.
Randomized trials require firms to hypothesize in advance before the test starts. Historical data lets the researcher sit back and decide what to test after the fact. Randomizers need to take more initiative than people who run after the fact regressions.
The most important thing that is left to humans is to use our minds and our intuition to guess at what veriables should and should not be included in the statistical analyisis. The regressions can test whether there is a casual effect and estimate the size of the causal impact, but somebody (some body, some human) needs to specify the test itself.
So then the question becomes what do we test, and after we test the question becomes, what are the results telling us?