Popper is my homeboy: a manifesto

Will Wilkinson has an amusing series of posts demonstrating increasing frustration with the macroeconomic arguments about the stimulus proposal. In one, he sums up the source of his frustration:

When I see Delong more or less indiscriminately trashing everyone at Chicago, or Krugman trashing Barro, etc., what doesn’t arise in my mind is a sense that some of these guys really know what they’re talking about while some of them are idiots. What arises in my mind is the strong suspicion that economic theory, as it is practiced and taught at the world’s leading institutions, is so far from consensus on certain fundamental questions that it is basically useless for adjudicating many profoundly important debates about economic policy. One implication of this is that it is wrong to extend to economists who advise policymakers, or become policymakes themselves, the respect we rightly extend to the practitioners of mature sciences. There is a reason extremely smart economists are out there playing reputation games instead of trying to settle the matter by doing better science. The reason is that, on the questions that are provoking intramural trashtalk, there is no science.

This is just about perfectly stated.

I would state Will’s implicit working definition of science for the purpose of this discussion as “an intellectual discipline that produces useful, non-obvious and reliable prediction rules”. Or at least, that’s mine, and it’s consistent with Will’s statement. Note that this doesn’t let economists, political scientists or others off the hook by saying they want to “avoid physics envy” or whatever. To say that they are practicing non-science by this definition is to say that their theorizing produces decision rules are at least one of: useless, obvious or unreliable.

In fact, you can see debates in mature sciences that sound a lot like the one that Will describes; they just tend to be around frontier issues. Consider the physics of wings for airplanes. There is a reasonably stable body of findings that can be (and has been) translated into engineering practice that works. Airplanes stay up. Giant tubes of metal with comparatively tiny lift surfaces go up in the sky, travel thousands of miles at about the speed of sound and land safely every day (sometimes, with sufficient pilot expertise, on the Hudson River). That’s about as useful, non-obvious and reliable as anything I see around me. This is the ten tons that sits on one side of the scale whenever somebody wants to get into an argument about whether we “really know” this physics. This is what we lack in most parts of economics, and certainly in the kind of economics that is being shouted about in the stimulus debate.

What is the key methodological feature that distinguishes science from the kinds of economic debates that frustrate Will: experiments. Properly controlled experiments end debates (in addition, of course, to starting new ones).

I spent about the first ten years of my career executing increasingly sophisticated quantitative analyses that used data to try to evaluate and predict the success of business initiatives in order to develop corporate strategies. Eventually, I saw that these analyses led to the same kind of scholastic debates as we see among macroeconomists. The root issue was that it was impossible to find a methodology that could reliably distinguish correlation from causality. Only through exhaustion of all possible alternatives did I come to find that experiments that randomly assign units of analysis (customers, stores, sales territories, etc.) to test and control groups are the only reliable method for determining causality.

Once I figured this out, I became so fixated on it that I started what has now become a pretty good-sized software company, named Applied Predictive Technologies (APT). APT’s tools automate the design and interpretation of experiments for a good chunk of the Global 2000. We have generally tried to stay below the radar, but that’s impossible now, as the Harvard Business Review did an article in the current issue (How to Design Smart Business Experiments) that is mostly about what we have done at APT to make experimental learning a reliable business function. So I can come out of the closet on it a little bit.

Once you have this insight about using experiments to determine causality, it seems like it might be pretty straightforward to apply it, but most relevant business experiments are not trivial to design and interpret. It’s beyond the scope of the HBR article, which is pitched to senior general managers, but there were some fundamental analytical issues that we had to address to make this approach work in practice. It took years of work by scores of some of the most talented mathematics, software engineering and business analysis professionals in the country, plus field iteration with dozens of the world’s largest corporations running thousands of real experiments, to (partially) solve them.

This approach, once correctly installed, changes how many types of business decisions are made. Some issues are not practically testable. For example, a specific program may be non-replicable; the decision may have to be made faster than a test could be conducted, and so on. But if a program is practically testable and an experiment is cost-justified (i.e., if the expected value of the incremental information is worth the cost of the test), experimentation dominates all other methods of evaluation and prediction.

This Baconian revolution is coming to economics and social science.

In fact, it’s already happening. Weirdo experimental economists are starting to win the Nobel Prize. The recent Economist magazine round-up of the 10 most promising young economists in the world is rife with it. Established economists working in the current paradigm, as always, either dismiss it, or imagine that it is a niche sub-field that won’t affect them. Time will tell, but I think they’re entirely wrong.

Much of the work that we now think of as economics, political science and other social sciences will likely be displaced by some hybrid of biology, experimental economics, psychology and other fields that can evaluate hypotheses for the quantified prediction of human behavior through structured falsification tests (or, sometimes, true “natural experiments” in which non-intentional random assignment has occurred). As I’ve gone into in a recent post on possible interpretations of a specific clinical trial in Ghana, the big constraint on the practical utility of this science will likely be the problem of generalization from experimental results to forward predictions. Even in its current embryonic form, experimental economics already suffers from excessive rhetorical generalization from what some specific group of college sophomores did with $30 to fairly grand statements about human nature. But, as with business experimentation, where applicable, this new approach will dominate what we now think of as classical economics.

This will likely, at least for a long time, not address a lot of territory now covered in economics, including, for example, many of the issues related to the stimulus debates. These kinds of topics will of course remain interesting, and work will still be done on them in an academic setting; it will simply be even more obviously non-science, and be done down the hall in the history, philosophy and literature departments. Where it belongs.

Nullius in Verba.

(cross-posted at Atlantic Business)