Self-Fulfillment: How our actions make our analysis look right, by Mark Chussil

Data mining is the unleashing of massive computer power on massive consumer databases to look for patterns that managers can use to build sales and spend more efficiently. The work takes place with silicon, copper, and iron oxide, but it depends first on carbon: on how we humans think. If we believe, for instance, that demographics drive purchases, then we program our computers to look for correlations between demographic characteristics and purchase behavior.

Given enough data and computing, we will find correlations. Some of the correlations may be true in the sense that they are not random. Some will be false in the sense that they are random; they slip in because we will get false positives if we test enough potential correlations. But the important thing is that the computer thought as we (the human programmers) thought; in this case, to look for demographic connections.

And then self-fulfillment happens. Say the data mines reveal that people with characteristic X tend to buy more of a product than do people with characteristic Y. So we market more to those with characteristic X and less to those with characteristic Y, and guess what? Sales go up among X’s and down among Y’s. It looks like our analysis worked, but what happened is that our actions made it work: a finding (possibly random) triggered positive action, the action stimulated happy results, the happy results confirmed the finding. Should we credit the finding or the clear, focused action? The issue isn’t data mining per se. We could make the same points about self-fulfilling managerial truisms such as “invest in your fast-growing products.”

That an analysis may be self-fulfilling or have its genesis in beliefs doesn’t make the analysis wrong. On the other hand, the happy results don’t mean the analysis was right.

Update. Recommended: Jason Zweig’s article “Data Mining Isn’t a Good Bet For Stock-Market Predictions,” in the Wall Street Journal, August 8, 2009.

[…] have different assumptions behind them, and that is because human beings are designing them and human beings make assumptions […]