Marvelous Techniques

Posted:
Saturday, January 17th, 2009
Category:
Numbers I have loved

Marvelous Techniques: Streaks, slumps, and improving your odds

By Mark Chussil

You can read and download a revised version of this essay.

Praise be to him, or her, who groweth revenue, and expandeth profits, and raiseth up the stock price after a slump, for she or he shall be exalted as the One who delivereth the business model of the future, and he or she shall appear on the evening news, and shall writeth a book, and shall receiveth a share.

Woe be to him, or her, who declineth after a streak, and presideth over a fall, and misseth the targets, for she or he shall be reviled, and ridiculed, and cast out, and subjected to a harsh but blessedly temporary media stoning, and caused to serveth as a symbol to frighten future generations from the ways of the false profits.

And the praise and woe may or mayeth not help your business.

Imagine that you’ve got 10 ladies and gentlemen hard at work, each of them saying yes or no on some decision. Perhaps they’re assessing applications for credit; perhaps they’re connecting job seekers with job givers; perhaps they’re deciding which shipping service to use for shipments; perhaps they’re recommending products for customers. Imagine further that each of them makes 4 such decisions each work day. That adds up to 10,000 decisions per work year. Imagine that they average 60% accuracy: that is, they say yes 60% of the time that they should say yes, and they say no 60% of the time that they should say no. You know that you’ll get 6,000 correct decisions each year, and 4,000 mistakes.

You have two questions. First, a marvelous technique has been offered to you that can raise their accuracy to 80%, and because it’s costly you wonder if it’s worth it. Second, when you test the marvelous technique you notice that Aaron and Abigail are on a streak while Barry and Bonnie are in a slump, and you want to know how (and whether) to respond.

Is a marvelous technique worth it?
We know pretty well how to assess the financial impact of a product or service that purports to reduce costs. You take the cost savings per use, multiply by the number of uses, and compare the result to the cost of the product or service. If the savings exceed the cost of the product or service, it makes economic sense. There’s a little more to it, such as the time value of money and opportunity costs, but the basic idea is savings versus cost.

What about products or services that purport to improve the quality of our decisions?

People ask my colleagues and me about the value of business war games, strategy simulations, crisis simulations, competitive intelligence, market research, win/loss analysis, advertising, and more. Many want to know how confident they can be that one of those marvelous techniques will improve their bottom line, market share, quality of decision-making, crisis preparedness, or whatever they care about.

Confidence is within reach; certainty is not. Even if a marvelous technique helped every time it had been applied in the past, even if ecstatic executives gushed with testimonials to its powers, even if your situation seems a perfect DNA match for its capabilities, that would not prove you will benefit from it. As we learned from Nassim Nicholas Taleb, the abundance of white swans does not prove the non-existence of black swans.

The important (and answerable) question is whether the likely benefit of a marvelous technique will outweigh its costs. “Likely benefit” is how much better (if at all) you will probably perform with a marvelous technique than you will probably perform without it. The costs include relevant outlays for training, software, hardware, retooling, and so on. Conceptually it’s not so different from looking at the financial impact of a cost-reducing product or service.

We may consider four conceptual outcomes to a decision, each of which has financial consequences:

You said yes and you should have said yes. In other words, yes is the right answer, and you got it right.
You said no and you should have said no. No is the right answer, and you got it right.
You said yes and you should have said no. You did something that you should not have done.
You said no and you should have said yes. You did not do something that you should have done.

Got-it-right decisions 1 and 2 will generally have better consequences than got-it-wrong decisions 3 and 4. That doesn’t necessarily mean that it’s worthwhile to spend money reducing got-it-wrong decisions. If outcome 3 is infrequent or if it isn’t a whole lot worse than outcome 2, and if it’s expensive to tell the difference between type-2 and type-3 situations, then it may be cheaper overall just to say “yes” all the time. For instance, a full-service store that takes product returns may prefer to make a few mistakes (taking back a product that they didn’t sell or that was shoplifted) rather than annoy legitimate customers with rigorous checking.

Let’s put some numbers on the problem. There’s no magic in these numbers; we’ll use them merely to illustrate the process and analysis.

The payoff of correctly deciding yes is $15 (you do something and it pays off) and of correctly deciding no is $0 (you correctly do nothing).
The cost of incorrectly deciding yes is $7 (you do something and it hurts) and of incorrectly deciding no is $2 (you didn’t do something and it hurts).
The right answer is “yes” 50% of the time.
The odds of making the right decision with your current process are 60%.
The odds of making the right decision with the marvelous technique are 80%.
The one-time cost of the marvelous technique is $500 (for buying a software license or paying for a research project or some such thing), and…
…it costs $0.50 to use on each decision (due to fees or longer time to run the new system).

With those made-up numbers, what works?

Always yes. If you always decided yes, you would average 50% x $15 + 50% x $-7 over the long term, which is $4.00. It wouldn’t be that exactly for the same reason that flipping a coin n times is extremely unlikely to be precisely 50% heads and 50% tails, but it will be close.

Always no. It matters on which decisions you’re right. If you’re only right on “yes” decisions (which you’d do by choosing yes every time), you’d get that $4.00 on average. If, however, you’re only right on “no” decisions (by choosing no every time), you’d lose money. (That’s from 50% x $0 + 50% x $-2.) So, with the numbers in this illustration, an always-yes decision strategy beats an always-no decision strategy.

The current process. Your team is right 60% of the time with your current process. Over the long term, that means that when yes is right, they’d decide yes 60% of the time and no 40% of the time, and when no is right, they’d decide no 60% of the time and yes 40% of the time. Since yes is right 50% of the time and no is right 50% of the time, your long-term expectations would be:
50% x (60% x $15 + 40% x $-2) + 50% x (60% x $0 + 40% x $-7)
which is $2.70. Now that you think about it, you’re not so sure that the team is adding value. Saying “yes” every time averages $4.00, and your team, loyal and dedicated as they are, averages $2.70. Hmmm. (This is why you’re looking for a marvelous technique.)

Perfection. If you always decided correctly, you would average 50% x $15 + 50% x $0 over the long term, which is $7.50. Always-yes is the best simple strategy available (given the payoffs and costs above), yet its $4.00 performs way below the optimal strategy’s $7.50. That means it may be valuable indeed to find out how to make better decisions. Will the marvelous technique do the trick? Will it at least beat always-yes?

The marvelous technique. Using the marvelous technique to move your team to 80% accuracy gives us this calculation:
50% x (80% x $15 + 20% x $-2) + 50% x (80% x $0 + 20% x $-7)
which leads to $5.10. You don’t get to keep it all, though, because the marvelous technique costs $0.50 each time you use it, so you get $4.60 per decision. Then there’s that $500 up-front cost. The improvement over 60% is $4.60 – $2.70, or $1.90, so you can expect to break even on the up-front cost on the 264th improved decision. The improvement over always-yes is $4.60 – $4.00, or $0.60, which pays off on the 834th improved decision.

Uh oh. Until you find something better, the marvelous technique looks like a good deal. You give it to four randomly selected team members, Aaron, Abigail, Barry, and Bonnie. You match them with randomly selected teammates who, as a control group, will continue to use the current process. Aaron and Abigail immediately strike gold. Barry and Bonnie immediately strike dirt. What’s going on? Is the marvelous technique not so marvelous? Does it not work for everyone? Are Barry and Bonnie sabotaging the change? Have Aaron and Abigail found a way to make it even better?

Streaks and slumps
To answer those questions, and to illustrate the answers, we’d find it helpful to have a decision simulator that can help us explore the costs and likely benefits of the current process and marvelous technique. Fortunately, it’s possible to build such a simulator. So I built one, basically a longitudinal Monte Carlo program. It uses the numbers and percentages above, combined with a random-number generator to simulate correctness in the appropriate proportions. We’ll use it here, doing something like the equivalent of clinical trials for a business decision.

Here are the initial results from Aaron and Abigail (the A team) and their control group. The chart covers their first 10 decisions. First 10 decisions, the A team

Incidentally, there are slightly over a million possibilities for each of the lines. Some of those possibilities are much more likely than others.

The blue line is for the marvelous technique (the A team) and the red line is for the current process (their control group). The vertical axis is cumulative average performance. Here’s how we calculate it, using the blue (marvelous technique) line for illustration.

In decision 1, the A team decided yes and the right answer was yes. They got $15, minus the $0.50 cost of using the marvelous technique, for $14.50.
In decision 2, the A team decided yes and the right answer was yes again. They got another $14.50. Their cumulative total is $29.00, for a cumulative average of $14.50.
In decision 3, the A’s decided no and the right answer was no. They got $0 minus $0.50. Their cumulative total becomes $28.50, for a cumulative average of $9.50. Notice that the line went down even though they maintained their perfect decision-making record.
The only mistake made by the A team in the first 10 decisions is in number 8, where they incorrectly decided no. Their 90% hit rate is better than we’d expect, given the promised accuracy of 80% for the marvelous technique.

Their overall average, almost $10 for the first 10 decisions, is far above the predicted $4.60. That’s due not only to their better-than-expected 90% hit rate. It’s due also to there being 7 decisions where the right answer was yes, which pays $15 for a correct decision, as opposed to those where the right answer would be no, which pays $0 for a correct decision. Of course, the team doesn’t deserve credit for the attractiveness of the opportunities that were randomly presented to them, but it helps them look like a great team.

So, the A team did especially well due to a little luck on their decisions and quite a bit of luck on the nature of the decisions offered them.

Their control group, the red line, got 60% right, exactly as expected. They too benefited from the advantageous decisions: their mistakes mostly were of the $-2 nature (failing to take advantage of an opportunity) rather than the $-7 kind (investing in a bad deal).

Now let’s look at the first 10 decisions made by Barry and Bonnie and their control group. marvelous-techniques-chart-1

A very different picture. The B team was right only 6 times out of 10, and they had fewer high-payoff opportunities. Their control group was right only 3 times. Although they’re below expectations the 60% and 30% scores are not unreasonable, given the percentages and the small sample of 10 decisions, but, combined with the poorer opportunities, the results sure look bad compared to the A team.

By the way, the differences between the two control groups suggests that the difference between the A and B teams is not entirely due to the teams’ competence or motivation. Yes, the A team got 90% right and the B team got 60%, but that wouldn’t explain the control-group results.

Incentive feedback
You’re a businessperson and you care about the bottom line. You see Aaron and Abigail’s 90% track record, versus Barry and Bonnie’s 60%; you see the A’s average of $9.80 per decision and the B’s average of $-0.30. You compare them to the targets of 80% right decisions and $4.60 per decision. To whom do you deliver a congratulatory pat on the back and to whom a motivational kick a little lower down?

The evaluation continues. Let’s look now at the first 100 decisions for the A team and its control group. We’re keeping the first 10; that is, the first 10 out of the 100 are identical to the chart above. Those 10 are scrunched to the left. marvelous-techniques-chart-2

Are Aaron and Abigail slacking off, and is their complacency infecting their control group? Their average results dropped by a third from the first 10 decisions to the end of 100. Maybe, you wonder, you were too generous with the back-pat. Well, at least they’re all still above the per-decision targets ($4.60 and $2.70).

What’s going on with Barry and Bonnie? marvelous-techniques-chart-3

They seem to be paying attention. What a turnaround! That lower-down kick obviously worked. They’re still below Aaron and Abigail, but they sure are improving.

The good news is that the marvelous technique seems to work for both teams, as they clearly and consistently outperform their control groups. It should, of course, since it’s right more often and the $0.50 incremental cost of using it isn’t very much relative to the potential benefit. However, the $500 one-time cost means that the A team’s marvelous technique is still $290 behind its control groups’ current process. The B team is only $210 behind its control group, not so much because the B team is so good but because its control group is so not so good.

Time goes on
Let’s go to 1,000 decisions. The first 100 decisions are scrunched to the left, and they are the same as we saw before. Here we see the A team’s marvelous technique (blue) and its control group’s current process (red). marvelous-techniques-chart-4

Both lines decline gradually from the highs after 100 decisions to numbers approaching our long-term expectations. The lines smooth out, of course, because we’re looking at cumulative averages. The results of any 10 decisions may be just as erratic as they were for the first 10 decisions.

If we were to show the chart for the B team and its control group, you would see that their lines are tilting up slightly.

As the early random noise dampens out with the improved perspective of 1,000 decisions, the marvelous technique pays off consistently. In fact, taking the one-time costs into account, it’s now a net positive of $1,315 for the A team (versus its control group) and $1,464 for the B team (versus its control group).

The end of the test
Finally, let’s see what happens after 10,000 decisions. The first 1,000 are scrunched to the left.

Here, again, is the A team and its control group. Except that this time, if you were to ignore the first 1,000 decisions, you’d find that its chart is almost identical to that of the B team and its control group. marvelous-techniques-chart-5

There’s still some noise from random gains and losses; that’s why both the blue line (marvelous technique) and the red line (current process) are so thick. Each thick line may contain long series of all-right and all-wrong decisions, which is to be expected.

And due to having a large sample that sorts out the random fluctuations, the A team’s long-term average performance — $4.55 per decision — is awfully close to B’s $4.57. Both essentially match the projected $4.60 and both beat the always-yes strategy, even after taking into account the one-time cost of the marvelous technique. The two control groups are within a dime of the $2.70 expected value. And notice that there’s still room for improvement: as we calculated, perfect decision-making would improve this fictitious business’ results by more than 60%.

Decisions and distortions
We would conclude that the marvelous technique is a good idea. Notice, though, 1) we would have adopted the marvelous technique prematurely and unrealistically if we’d relied only on 1,000 decisions by the A team, and 2) we would have rejected the marvelous technique if we’d relied only on 1,000 decisions by the B team.

Moreover, the differences between the A and B teams were due 100% to random chance. There is no Aaron, Abigail, Barry, or Bonnie inside my computer; there is only a random-number generator, a series of simple equations as described above, and marvelously clever software that pulls it all together.

Rewarding the A’s looked ineffective, and punishing the B’s looked effective, because of how we interpreted the numbers, not because we actually motivated some electrons and pixels to behave well. (Notice, by the way, that if we’d taken a hands-off management style with the B’s, we’d have said that hands-off works. Ditto for rewarding their efforts, sending them for extra training, and so on. And a hands-off management style for the A’s would have looked just as ineffective as rewarding them.) The flaw in our interpretation is that we, as humans are prone to do, mistook correlation for causation.

I emphasize that I am not saying that performance is random. The most critical areas in the process we’ve just seen are those in which management and insight is involved, such as recognizing problems and opportunities, designing solutions, understanding costs, payoffs, and probabilities, ensuring that people are competent with the tools they use, and more. And just as randomness and analysis do not diminish the importance of management and insight, management and insight do not diminish the inevitability of randomness and the value of analysis.

Just how random can things get? Still using the numbers in our simulation, and based on 10,000 decisions, we see:

Ten right decisions in a row seems pretty good, perhaps even streak-like. Well, with the marvelous technique’s 80% right decisions, the odds of getting 10 decisions right in a row are 10.7%. (Calculate it as 0.80 to the 10th power.) With 10,000 decisions, there are 9,991 opportunities (allowing overlaps) for 10-right runs. So, you would expect 1,073 such runs. The A team got 737; the B team had 882.
With the current process’ 60% right decisions, the odds of 10 decisions right are 0.6%, so you’d expect 0.6% x 9,991, or 60 runs of 10 right in a row. The A team’s control group had 37 and the B team’s had 40.
The odds of 10 wrong in a row are much lower because both strategies get most decisions right. For the current process we’d expect 0.01% of the 9,991 possibilities, or 1. We got 0 in both simulations. For the marvelous technique we’d expect a much smaller number, and we got 0.
The always-yes strategy is right half of the time, since half of the time the right answer is yes. To get 10 right in a row, the odds are about 0.1%. (The same for 10 wrong in a row.) With 9,991 opportunities, we’d expect 10 streaks of ten and 10 slumps of ten. The simulations ranged from 8 to 15.
The longest run of right decisions came from the B team’s marvelous technique. It hit 38 in a row. The odds of that happening are 0.02%… which means we’d expect it to happen twice in this simulation. It’s a random streak, not a magic touch.
The longest run of wrong decisions came from the always-yes strategy in the B team’s simulation. Its slump was 16 long. The odds were long — 0.0015% — but with 9,985 opportunities for a 16-wrong slump with 50/50 odds, it happened. The current-process control groups had a maximum slump of 9, and the worst marvelous-technique team got only 5 in a row wrong in its longest slump.

Conclusion
The point here is not that you should buy marvelous techniques, and the point is not that you have to wait for at least 1,000 decisions for a net benefit. After all, with different get-it-right percentages, payoffs, and costs we might see that a marvelous technique is a great deal, an awful deal, or much ado about not very much, and it might happen sooner and it might happen later. The point is not even that a single input number always separates the marvelous from the merely splendid or the potentially dismal, because all the factors must be considered together. (Not to mention that we humans are notoriously bad at running models in our heads. See When I Was Wrong.)

As our arithmetic demonstrated at the start of this essay, we don’t need a simulator to figure out the long-term average consequences of current processes and marvelous techniques. I ran many simulations in the course of writing this essay, and they all look similar near the end of the simulation (the thick lines) because they are averages of bigger and bigger samples. However, they are strikingly, startlingly, provocatively different at the start of the simulation… that is, at the time when we in real life would be judging the value of our decisions, and of our decision-makers.

The value of the simulation is in literally seeing (as in the charts above) what could happen, so we don’t mistakenly think that something is working when it isn’t, or vice versa. The value is in practical learning about the effects of randomness. As we saw, self-inflicted misinterpretations can lead us to reward or punish inappropriately and thereby pursue ineffective strategies.

The point of this essay isn’t that marvelous techniques are inherently profitable or that it takes a certain time for them to pay off. The point is the analytic process by which you judge them. The point is knowing what to expect if you shift from one decision-making process to another. The point is preventing a few oddball wins or losses from distorting your assessment. The point is thinking through your values for the input numbers. The point is the difference between the small-sample and large-sample runs. (Not unlike making sure that customer surveys, clinical trials, and political polls have large-enough samples.) The point is that a simple strategy (“always yes”) may perform better despite its mistakes than a more-expensive strategy that gets it right a little more often. The point is thinking rigorously and numerately. The point is how a discipline can help you learn when you are actually improving your odds.

Scott Swigart of Cascade Insights (www.cascadeinsights.com) asked questions and made comments on the simulator that helped and inspired me. Thank you, Scott.

For further reading, see The Drunkard’s Walk by Leonard Mlodinow. [Update: Prof. Mlodinow has a terrific article in The Wall Street Journal, July 3, 2009: The Triumph of the Random.]

If you like the technique of 10 decisions, 100 decisions, 1,000, and 10,000, Google “powers of ten.”

Update. Low-scoring sports games can run into the same problem of randomly looking good or bad as we described above. See Richard Bookstaber’s “The Scoring Problem” on page W2 of The Wall Street Journal, July 10-11, 2010.

Comment

Comments

No Comments