Success Metric vs. Fail Condition – To the Pain!

Last week I wrote about the difference between an assumption and a hypothesis and Steven Diebold wrote a great response. Steve schooled me on a few topics and pointed out my lack of clarity in some areas. One thing he brought up deserves more debate: The Success Metric.
Experiment design=fun!

I reject the idea of a success metric.

Bonus: I’m writing a more complete version of how to design great experiments as an open source “Real Book”, you can get on the download list here:

Learn more about experiment design

Declaring Victory!

A success metric is the idea that given any hypothesis, there is a metric which will indicate that, in the lean startup and experiment design jargon, the hypothesis is validated. Or simply put, it’s a good idea.

Setting the success metric seems easy. For example, our hypothesis might be:

The value proposition “Faster download speeds for your BitTorrent client” (Version A) will generate more sign ups than the value proposition “Conceal your IP address when downloading Game of Thrones with BitTorrent.” (Version B)

(Not the greatest hypothesis, but let’s roll with it for now.)

A little statistics go a long way in experiment designThe metric for measuring this hypothesis will be the % of unique visitors that sign up on version A vs. version B. If the conversion rate for B is 5%, then if the conversion rate for A is 25%, the hypothesis is considered validated. Victory!

Now let’s look at some situations where it’s a bit harder to declare a clear victory.

Flying Penguins

What can penguins teach us about experiment design?If our hypothesis is, “some penguins can fly” we can very easily set a success metric that would prove this hypothesis. If we see at least one flying penguin (outside of the cinema), then clearly some penguins can fly.

So we go look at 10 penguins in the zoo and…they can’t fly.

But maybe these are the wrong kind of penguins. We can go to a different city, go to another zoo, and look at another 20 penguins.

They still can’t fly.

Maybe it’s only penguins in zoos that can’t fly. So we get a boat, go to Antartica, and look at 1000 penguins in the wild.

They still can’t fly.

But maybe they just don’t like to fly while people are watching! Clearly they wouldn’t have wings if they couldn’t fly, so we probably just haven’t found the flying ones….yet.

The Slippery Slope of Failure

This is a quite common problem with startups:

Maybe these customers didn’t want to buy our product, but I’m sure if we keep looking we’ll find the ones that will.

If we define our success metric at 20%, when the conversion comes in at 19%….it’s close enough.

When the conversion is 15%….there’s room for improvement.

When it’s 10%…clearly we need to spend more time optimizing.

When it’s 5%…well….some people are still interested!

When it’s 1%…maybe we’re not explaining it well enough.

When it’s 0%…did we forget to install analytics?

It’s almost impossible to accept failure. There’s always a potential rationalization. After all….we just haven’t succeeded…yet.

Just like the penguins.

Censorship - see no evil, hear no evil, speak no evil

Success Metrics make for bad science.

The Scientific Method!

A well designed experiments yields results

This general problem is well known to science…we can never prove a hypothesis. We can only fail to disprove it.

We can try over and over to disprove our hypothesis, until we have tried so often that we give up and accept the truth of the matter.

That’s why we have the Theory of General Relativity instead of the Fact of General Relativity. Although the Theory of General Relativity allows us to launch rockets into space and lets our phones geolocate us, it’s still just a theory. Eventually, we may find some situation where the theory breaks down and won’t explain all the facts (e.g. quantum physics). Then we’ll have to come up with a new theory.

So, instead of trying to prove a hypothesis with a success metric, we should try to disprove the hypothesis with a Fail Condition.

Setting the Fail Condition

"Why didn't I spend more time on my experiment design?!"

How many penguins do you need to observe before we are convinced that penguins can’t fly?

10? 50? 1000? The more penguins you look at, the higher your level of confidence in your conclusion that penguins can or can not fly.

Science has clear criteria for what is an acceptable level of confidence (six-sigma), but we don’t have that luxury in entrepreneurship or in lean startup.

Fortunately, we don’t need it. We don’t need to prove to everyone that penguins can’t fly. We just need to prove it to ourselves. Because ultimately, our goal is to build a business.

If our business was to sell penguins cool flight goggles, we need to know that high wind speeds while flying is a serious problem for most penguins. If most penguins can’t fly, this is probably not a good business.

So what % of penguins need to be able to fly for this business to be worth investing out time in? 50%? 30%? 1%?

Focusing on just Adélie penguins, if we need a market of 2 million penguins to be able to make this a profitable business and there are about 3.75 million Adélie penguins, then we need almost 50% of any penguins we survey to be able to fly to make this business work. So how many do we need to look at?

If we look at 10 and NONE can fly, then even with a margin of error ~7% due to a small sample size…this is a bad business.

Semantics

This is more than just semantics.

Of course, a very smart and practiced individual might be able to set a success metric and be very rigorous when applying it.

19%? Nope….we set a Success Metric of 20%, let’s scrap this business.

Those are words no entrepreneur will ever utter.

As entrepreneurs, we are biased towards our vision, towards optimizing, towards self delusion. The purpose of lean startup is to guard against this sort of cognitive bias.

So stop trying to validate your ideas, invalidate them instead!

Key Takeaways

Bonus: I’m writing a more complete version of how to design great experiments as an open source “Real Book”, you can get on the download list here:

Learn more about experiment design

 

So…what should I post next? Tweet to tell me what to write:

Show me how to test product market fit!

or

How can I do lean startup in my friggin' huge company?

Discussion (4 comments)

  1. Pingback: Lean Startup Interns Needed! by @TriKro

  2. Sean Murphy says:
    02.07.2015.

    Roger Cauvin has blogged about “You can never validate a hypothesis, only fail to invalidate it” a couple of times, here are two:

    http://blog.cauvin.org/2014/02/stop-validating-and-start-falsifying.html

    http://blog.cauvin.org/2013/10/lean-startup-concepts.html in the comments he elaborates:
    Questions for product managers who believe they already sufficiently leverage these concepts:

    1. How many purposeful experiments have you run in the past month?
    2. What falsifiable predictions did you document before running those experiments?
    3. Did you work with your team to instrument your product or website to collect the needed metrics?
    4. What did you learn from your experiments?
    5. How did you modify your business model or tactics as a result of what you learned?

    1. Tristan says:
      07.07.2015.

      I really like the first one. I think Karl Popper did us all a favor in his writings.

      Those are really good diagnostic questions that we should all ask of ourselves.

      Kenny and I look at our “Done” column in trello where we tag all cards with “Experiment” or “Task”. If the end of the week shows no experiments done, we get unhappy with ourselves.

  3. Edouard says:
    10.07.2015.

    Thank you Karl,Tristan and Steven

    I just got a little wiser today.

  4. Andy Cars says:
    15.07.2017.

    Thank you Tristan for a great post.

Got something to say?