Science, History and Economics

Imagine that the president is considering his options vis-à-vis the Iranian nuclear program. First, a science advisor comes into the room and predicts that if the Iranians take the following quantity of fissile material and compress it into a sphere of the following size under the following conditions, then it will cause an explosion large enough to destroy a major city. Next, an historian comes into the room, and predicts that if external attempts are made to thwart Iranian nuclear ambitions, then a popular uprising will sooner or later ensue in Iran that will change governments until Iran has achieved nuclear capability.

The president would be incredibly irresponsible to begin debating nuclear physics with his science advisor. Conversely, the president would be incredibly irresponsible not to begin a debate with the historian. This would likely include having several historians present different perspectives, querying them on their logic and evidence, combining this with introspection about human motivations, considering prior life experience, consulting with non-historians who might have useful perspectives on this, and so on.

Next, an economist walks into the room. She predicts that if the CIA were to successfully execute a proposed Iranian currency counterfeiting scheme designed to create an additional ten points of inflation in Iran for the next five years, then the change in Iranian employment over the next decade would be X. Is this more like the historian’s prediction or the physicist’s prediction?

Superficially, she might sound a lot more like the physicist. If pressed for an explanation of how she reached this conclusion, she would use lots of empirical data, equations and technical language. The problem is that the abstraction from reality implied by the data and equations is vastly more severe for the economist than for the physicist. Some parts of the prediction would have some firm foundation, e.g., a build-up of alternative production capacity at all known manufacturing plants based on measurement of physical capacity. But lots of things would arguably remain outside the grasp of formal models. How would consumer psychology in Iran respond to this change, and how would this then translate to overall demand changes? How would the economy respond to this problem over time by shifting resources to new sectors, and what innovations would this create? How would political reactions by other countries lead to war and other decisions, which would, in turn, feedback to economic changes? And so on, ad infinitum.

Any sensible economist would, of course, put all kinds of qualifications around her ten year employment prediction to reflect such issues. Often (and this kind of language goes back all the way at least to J.S. Mill) such complexities will be described as something like “disturbances” around the “basic thrust” or “central trend” or whatever. But once these qualifications are accepted as material, then how do we evaluate the reliability of the prediction? That is, how do we know that the “disturbances” aren’t, in fact, more fundamental than the “basic thrust” of the economic theory?

The physicist’s answer to challenges to the reliability of his prediction is simple: Please view the following film taken from a long series of huge explosions that result when independent evaluators combine the materials I described in the manner I described. Note that this prediction is not absolutely certain. It is possible, as per Hume, that the laws of physics will change one second from now, or that there is some unique, undiscovered physical anomaly in Iran such that these physical laws do not apply there. But for all practical purposes, the president can take this predictive rule as a known fact.

How would the economist respond if challenged with respect to the reliability of her prediction? As far as I can see, she can respond with recourse to three lines of evidence: (i) a priori beliefs about human nature, and conclusions that are believed to be logically derivable from them, (ii) analysis of historical data, which is to say, data-driven theory-building, and (iii) a review of the track record of prior predictions made using the predictive rule in question. The analogous lines of evidence that the physicist could have used would be (i) common sense observations of the physical world, and conclusions that are believed to be logically derivable form them, (ii) analysis of observational data, historical experiments and the logic of the physical theories that were developed from these sources, and used to create the predictive rule in question, and (iii) the results of controlled experiments that tested the predictive rule in question. The reason the physicist need only concentrate on (iii) is that controlled experiments are accepted as the so-called “scientific gold standard” method for testing theories. Distrust of untested theories, no matter how persuasive they sound, has been central to the scientific method at least since the time of Francis Bacon. Note that the first president faced with this kind of a briefing actually had an enormously expensive experiment conducted to test the theory in Trinity, New Mexico before using nuclear weapons.

The problem with the economist’s reference to her version of (iii) is that, in practice, so many things change in a macroeconomic event that it is not realistic to isolate the causal impact of any one factor. To call some of these macro events “natural experiments,” is almost always to dress up rhetoric in analytical language. In analyses of true macro events as natural experiments, you will almost inevitably find either unsupported assumptions (or in the sophisticated cases, econometric modeling) embedded within the analysis of the “experiment” because of non-random assignment of units of analysis to alternative treatments and other issues. It is really more observational data. Further, even the definition of the “event” within the continuous flow of history embeds all kinds of assumptions.

This brings us back to where we started. How does the economist know that her predictions, which sound like the physicist’s predictions, are reliable in a way that the historian’s are not? She doesn’t. Therefore the president would be wise to treat the economist’s prediction like the historian’s prediction, in that it should be subjected to useful cross-examination by laymen, weighing of technical and non-technical opinions, introspection concerning human motivation, and all the rest. Beyond this, he should always keep in mind the unreliability of such predictions, and treat the fog of uncertainty about the potential effects of our actions as fundamental when considering what to do. I’m not arguing that the economist’s output is valueless – I would no more advise a president to make a major economic decision without professional economic advice than I would advise him to make a decision about war and peace with consulting relevant historians – but I am arguing that we should be extremely humble about our ability to make reliable, useful and non-obvious predictions about the results of our economic interventions.

I think that this story gets to the essence of an exchange that I have been having with economist Karl Smith. In his most recent post in this series, responding to my challenge to him – “You say that you have the ability to predict the effect of stimulus. Prove it.” – Smith says this:

I don’t think think I am saying this. At least, not how I think Jim means it. I am saying I have reason to believe that the effects of stimulus will be X and I can make an argument for it.

I accept that Smith has (non-trivial) reasons that support his beliefs about what will happen in response to stimulus, and that he can make an informed argument for them. More than this, I agree that his theory is at least plausible. My question continues to be the same: Where is the proof that his plausible theory is correct?

Smith goes on to argue that no predictive rule even in physical science is ever proven in the absolute philosophical sense.

I sometimes tell my students that scientists don’t prove, mathematicians and philosophers prove. Scientists accumulate evidence that seems to suggest.

This I think is true in all fields of science and is doubly true when that science is applied to actually engineering results in the real world. Not only have well relied upon theories in physics been upended upon careful examination but there is no one I know of who can design an airplane using a physics textbook. Nor, would many people trust an airplane to fly without testing it first.

And, despite despite all of the testing that is done, airplanes can a do malfunction and crash. There simply isn’t a “proving it” when it comes to making predictions about the real world. What we hope to do is give an answer that’s better than random and better than folk wisdom. [Bold added]

Smith won’t get a lot of debate on this from me. As he indicates, no matter how great the engineers sound when describing the plans for a spiffy new engineering feature on a plane, we still want test flights. And no number of tests can ever prove in a philosophical sense that this predictive rule will continue to operate in future contexts. And further, the standard for accepting this proposed feature is normally not “scores perfectly on every test every time,” but is instead more like “is superior to the existing alternatives.”

So, Smith here seems to me to be accepting the principle that the standard of evidence by which we should judge a predictive rule is by how it stands up to rigorous, real-world tests. It is the straightforward application of this principle that leads me to ask for the tests that show some proposed predictive rule (“the effects of stimulus will be X”) is, in fact, “better than random and better than folk wisdom.”

Smith’s response is that:

Now perhaps Jim is not confident that we can achieve our goal of beating randomness and folk wisdom. There are two basic lines of reasoning I can offer.

One is evidence and logic.

He goes on to argue that, in effect, even without formal testing of the theory, external to the theory-building process, we should take the arguments internal to the theory seriously, as they are built on a lot more than “hey, sounds good to me.”

To use Smith’s analogy, this is like saying that these are very smart aeronautical engineers who have applied well-accepted engineering principles to create this new feature. The thing is, as he indicates, we still would like to see actual test flights for a sufficiently important change. The whole point of our exchange is that economic theories don’t get a free pass from falsification tests. Quite the opposite, in fact: The astounding complexity of the subject matter under consideration should lead us to be even more skeptical of counter-intuitive claims made in social science than of those made in physical science.

He goes to describe a second argument, which is closer to what I have meant by falsification testing. He begins with this:

The second line I offer is that of experience. That when economists had the helm we really were able to produce results. In the 1980s Central Banks were largely turned over to their economists who produced low inflation and low unemployment by manipulating the overnight lending rate.

This is the crux of his reply, and it strikes me as not very compelling. First, it seems to beg the question. Lots and lots of important things happened on planet Earth in the 1980s. How do we know that it was the central banks who “produced” low inflation and low unemployment? Where is Smith’s evidence that he has isolated causality in this way? It is a textbook example of a highly confounded problem. Second, even if we were to give central banks complete credit for the economy of 1980 – 2008, Smith would then further have to show (i) that it was the application of some specific predictive rule for the effect of stimulus that accounted for this results starting in the 1980s, and (ii) that we can reliably generalize this hypothetical rule to the situation for 2008 – 2010, before this would count as empirical verification of some rule to be used to predict the effects of the stimulus program under consideration.

Smith extends this example, and concludes with this:

If you are arguing that I don’t know for sure that these tools will work then you are right. I don’t know. What I am suggesting is that the same logic and evidence that worked for controlling the overnight rate is telling me certain things now.

I don’t ask that you simply trust this. We can go through the models. We can go through the logic. We can look at all the evidence. However, at the end of the day we have to make a choice. Even the choice to do nothing is a choice, with consequences for which we will be responsible.

I think making our choice based on logic, evidence and the experience of the Great Moderation is the way to go.

Of course we have to make choices all the time, and we can’t opt out of the game; but the issue under consideration is the reliability of the predictions of macroeconomic theories for the future effects of various alternative potential decisions. As I’ve said before, Smith sounds like a smart and practical guy, but does this sound like the economics profession has created a good answer to my request for proof? Does this sound like something that should lead a rational observer to reject other lines of non-technical reasoning as irrelevant in the way we would in the case of actual scientific knowledge?