As corporate innovation gets more trendy, businesses are keen to put clear KPIs in place to measure the effectiveness of innovation teams and hold them accountable for making progress.
CEOs yearn for a nice, clean number that keeps rising and tells them that the innovation program is succeeding. VPs of innovation are also desperate to prove the effectiveness of their programs so that they can request more funding.
While most innovation programs have successfully argued against using traditional metrics like ROI on an early-stage product, there is a rush to replace those KPIs with more actionable metrics such as experiment velocity.
However, we have to use metrics in the right context for them to be useful. Our friend Dan Toma, author of The Corporate Startup, recently proposed that experiment velocity, one of our favorite measurements of innovation, was a bad metric and prone to gaming.
Dan is correct. Experiment velocity is a vanity metric and, therefore, gamable. But how do we measure progress without it?
Experiment velocity is the number of generative or evaluative methods being run over a certain time period with the intention of generating knowledge. Click To Tweet Put simply, it’s how much stuff the team is doing while trying to learn something about their business model.
It is not a measure of how much the team actually accomplishes or whether the business model is viable. I often use the metric with early-stage innovation teams to understand if they are ready to focus on other metrics.
(We’ll come back to how to use it in detail in another post.)
Gaming the System
“…purposefully or not. Product teams knowing that they are having their ‘experiment velocity’ measured might claim that every tiny thing they do is an experiment. Or, giving them the benefit of the doubt, they don’t know how to design the right experiments so although their velocity is high, their impact is low.” – Dan Toma, Experiment Velocity vs. Learning Velocity
Teams can easily run dozens of small experiments with little or no outcome in terms of knowledge generated. If team Alpha reports six experiments run in the last week and team Beta reports only one, that says nothing about which team is doing better. All experiments are not created equal.
For example, we can run 20 comprehension tests in a week iterating on a value proposition. But one good concierge test might generate one critical piece of information about which features to build.
Dan suggests that focussing on learning velocity can be more productive. Truthfully, that metric is just as gamable.
The number of learnings itself is not important. It’s whether those learnings validate or invalidate a critical element of the business model.
Some teams come back from one experiment with a list of 20 learnings, many of which are utterly useless. Sometimes learning that the customer enjoys the color blue used in our logo is useful, sometimes it’s irrelevant. (Usually it’s irrelevant.)
How can we compare team Charlie that learns 20 minor ways to tweak value propositions that don’t work vs. team Delta that simply learns their value proposition doesn’t work at all?Often the most important thing is learning what doesn’t work. Click To Tweet
It’s far too easy to come back from a customer discovery interview and label all of the different notes as individual learnings and report back a raft of 15 brand new nuggets of knowledge.
When it comes down to it, if we’re incentivized based on any metric, we will game it. If our holiday bonus depends on it, “Customers like free coffee” is valuable learning.If we’re incentivized based on any metric, we will game it. Click To Tweet
Combining Gamed Metrics
If experiment velocity and learning velocity are both gamable, then the experiment learning ratio is too. If even one of those metrics are gamable, then of course the combined metric will be gamable.
We can’t complain about one metric being fallible and then combine two vanity metrics to somehow create an actionable metric. We need to fix the problem.
Alternatively, we could measure progress by applying the story point system of agile to experiments to give them proportional weight. A comprehension test might be just 1 point, while a value proposition experiment (or learning) can be valued at 3 points. In this system, the points would be assigned by team members using planning poker, T-shirt sizing, or some other collaborative estimation system.
Agile Planning Cards for Estimation
We don’t have personal experience with this, but we see some flaws in this method as well.
The only reason to spend time doing this is if we’ve become obsessed with measuring teams against one another. It’s a way of saying that our learning is great and that other team is learning useless things.
A value proposition test on an incremental innovation is very different than the same test run on a radical innovation. How should we compare the two?
Perhaps it’s possible to generate an elaborate system of evaluating risk and comparing learnings, but what is the benefit?
The purpose of any of these proposed innovation metrics is not to set someone’s bonus for the year and certainly not to compare one team to another. The purpose is to provide a warning sign when teams are going off the rails and get them help.
Necessary But Insufficient
We need to focus on learning—not running experiments in order to make progress. In order to learn, we run experiments and research.
Running experiments does not guarantee knowledge, but it is impossible to generate knowledge without them. Crystal balls and product manager “remote viewing” clairvoyants do not count.
Dan points out that we can run a lot of experiments and not learn anything. But we can’t learn anything without running at least a few.
So if we’re trying to understand if the team is functioning well, we don’t need to know how many experiments they are running…so long as that number is higher than one.
That’s why we try and run at least one experiment per week.
Some growth teams doing rapid A/B testing can run a dozen experiments in a week; some can only run a couple. But all teams must run at least one.
Measure, But Don’t Count
Running three experiments or thirty doesn’t prove that everything is going well, but zero experiments indicates something is wrong.
Same goes with learning velocity. A team reporting 20 learnings doesn’t indicate everything is going right, but reporting zero learnings indicates something is wrong.
These metrics can be used as a threshold to tell us when things are going well or going poorly. A simple traffic light system can be helpful:
- Green – At least one experiment and learning each week.
- Yellow – One experiment, but no learnings. The team is doing something, but it’s not generating knowledge. Something needs to be fixed.
- Red – Problems. The team isn’t even getting itself out the door.
(Note: This is not the complete system we use at Kromatic when coaching teams, but it’s a good first step.)
Moreover, if we’re not forcing teams to compete on the total number of experiments run, we don’t have to worry as much about this metric being gamed. We just need to make sure that this metric is being used to help teams and not as a means to set bonuses or punish teams.
Traffic Light Triage
Measuring progress can be tricky.
Red Light: When dealing with a small number of teams, we don’t need a fancy accounting system. We just need to focus first on the teams that aren’t getting out of the building.
Those teams need a swift kick in the ass. And we can only help the teams that actually want a kick in the ass.
We can help those teams by showing them the right experiment to run and helping them run the experiment right.
Then and only then can we start making real progress on the business model. This is something that Dan, Esther, and Tendayi call Validation Velocity in The Corporate Startup. This is not only generating knowledge, but generating knowledge about the right things.
…The real question is when it comes to innovation how do you measure progress. This is where Validation Velocity comes in. This is based on businesses having an innovation framework that has key components (e.g. customer need, solution, business model etc). Ultimately, every innovation project has to answer the same broad questions: Is there a real customer need? Do we have the right solution? Have we found the right price point? Do we have the right channels? Does our growth engine work? If you organize these questions in a hierarchy of importance then all that matters is whether teams are running experiments that provide definitive answers to these questions.” – Tendayi Viki
The metrics we’re discussing here shouldn’t be mistaken for progress on a business model. They are metrics for coaches.
They are useful as a mental framework to focus our efforts on helping teams or in a high-level dashboard to see overall progress. But ultimately they don’t matter to the individual teams, and most innovation programs don’t have enough teams to worry about creating a dashboard. Just go talk to the teams!
So if you’re an innovation team, ignore this stuff. Just go learn about your business. If you’re a coach or innovation portfolio manager, use these concepts to make sure the team is making progress, and remember the rudder fallacy.
- Experiment velocity is a vanity metric and is gamable.
- So are most other metrics.
- Use a threshold on such metrics and measure, but don’t count.
- Don’t mistake coaching metrics for business progress.