A Reply to Noah Millman

Noah Millman makes a lot of complicated and interesting criticisms about an ongoing theme in a lot of my posts. There is a whole lot in his post, and what I take to be several independent arguments, so I’ll just try to address what I see as some of the most important topics.

Noah (if I may) ends by saying that he “has no truck with radical skepticism.” He begins with an anecdote about going to see his doctor with a pain, and the doctor going through a diagnostic process leading to the prescription of cortisone.

Noah says this:

Jim Manzi is fond of making unfavorable comparisons between economics and physics, and those sorts of comparisons are pretty good for making economics look bad. But I think a much better comparison is of economics to medicine. How does economics stack up in that comparison?

Well, is medicine a science? Doctors certainly have a lot of scientific knowledge. But there’s also a great deal they don’t know. And much of their actual practice involves operating in the area where knowledge is limited.

It turns out that I did a post right here at TAS on this topic in October. I was very critical of an article in The Atlantic that had expressed radical skepticism about the findings of medical research. The title of my post was “Has Medical Science Discovered Anything Useful?” I began with a summary answer in the form of a one-word paragraph: “Yes.” I argued that we need to be more fine-grained about claims of medical knowledge than the Atlantic author had been. I noted that even using the data presented in the article, one could see that two kinds of medical research findings appeared to hold up very well: (i) the results of well-designed randomized trials, and (2) findings concerning “traditional” medical procedures, rather than long-term, behaviorally-oriented interventions.

As I pointed out at the time, it is striking that the opposite of these characteristics – validation through non-experimental methods of data analysis rather than through controlled experiments, and interventions that that are attempts to change human behavior over extended periods – precisely describe the parts of social science that Noah notes that I have criticized.

Noah goes on to extend his story to say that the cortisone shot did not work to relieve the pain, and asks how we should use this fact to make further decisions, given that we don’t have a comprehensive understanding of the causal maze of the human body. This part of his story gives away the game.

How do we know that the cortisone didn’t relieve the pain? Because he (presumably) had a relatively consistent level of pain that he could reliably forecast would have continued absent some intervention. That is, the counterfactual of “what would have happened absent treatment?” is straightforward to answer in practical terms.

This is exactly why, as Noah notes, conscious trial-and-error progress has been possible in surgery for so long. There is, for example, an Egyptian papyrus dated to about 1500 BC that documents a surgical procedure approximating modern jaw surgery. The effects of many successful surgical procedures are so immediate and dramatic that abstract debates about causality are not necessary.

Progress for therapeutics was more problematic, however, because the change in outcomes was usually not so immediate and dramatic, and was often manifest as the reduction in probability of a disease or an increase in the probability of recovery. In information processing slang, we would say that the “signal-to-noise” ratio is usually much lower for therapeutics than for surgery. This is not always true, of course: Pasteur’s anthrax vaccine, for example, was exactly 100% effective for test animals for a disease that was exactly 100% deadly within a short period, and not every surgical procedure produces immediate, dramatic change.

So to return to Noah’s example, he has selected an analogy which assumes away those characteristics that make analysis of stimulus so hard: We can measure the counterfactual in the cortisone example, so we can learn after the injection whether or not it worked. As I’ve argued many times, if we could reliably measure the impact of stimulus spending, we would have much greater capacity to make intellectual progress, by beginning a process of trying alternatives, seeing what works, building up a range of case histories, and generally, developing a true expertise around when and how stimulus works.

To make Noah’s cortisone analogy more apt, we would have to imagine that he had some fleeting pain that came and went unpredictably, with widely varying magnitude. Historically, it is statistically correlated with lots of other external changes. It seems to be worse, on average: on Wednesdays and Sundays (though these are also days where he tends to run, and/or have visits from his mother-in-law); when it rains, but not when it snows; between 7 and 10 days after TV shows in Bangladesh on the British royal family, and so on. There are many thousands of such correlations. It also has some complicated statistical relationships with chemical properties of Noah’s body, as well as measurements of his mental state. There is extensive debate about whether each of these is a causal link, and if so, the structure of the causal relationships.

Noah gets a cortisone shot on Monday. The next day, the pain is slightly greater. One group of researchers builds a set of regression models showing that but for the cortisone injection, Noah’s pain would have been 10% worse. An academic discipline builds up around them at many leading universities. They are called the “Cortisonians”. An alternative group of scholars, dubbed the “anti-Cortisonians” builds up an alternative set of regression models showing that the cortisone had no effect.

What would I do? I’d run a randomized trial with 1,500 patients in the test group and 1,500 in control group, and I’d believe that answer. That’s what modern medicine does in this situation. (Of course, in reality, it usually does it before all this modeling takes place).

But doctors (just like engineers) are not making rote application of a set of known research-based treatments in a set of known situations, even if they have access to the results of a well-designed experiment. There is, as Noah notes, a valid role for expertise. I go into exactly this situation in some detail in the upcoming book (and go through the history of the related battles for control of decision-making). It is a complicated topic to describe fully. I’ll just try to focus in this post on a few items that I think are most relevant to Noah’s questions.

Suppose you are the doctor making a decision about whether or not to give Noah the cortisone injection, and there has been no research of any kind. I think you, rationally and morally, should have wide scope to try it, and also should be highly cautious about what you do in non-extreme situations because your ignorance level is so high. If you then are given access to the “dueling regressions” analysis, I think you should read the appropriate literature, likely filtered through intermediary interpreters. I further think that the formal results of the regression models should have limited impact on your decision-making, and if anything, the detailed case histories of individual patients are likely to be more helpful as input to your decision process.

Now suppose you are given access to the clinical trial results. This should carry great weight. But it’s still not as simple as saying: IF trial is successful, THEN always prescribe cortisone, and IF trial is failure, THEN never prescribe cortisone.

Even if one were to accept that, all else equal, treatment X is better than treatment Y, the problem is that all is else is never equal – patients have varying co-morbidity, are at different stages of life, have different lifestyles, needs and home situations, and so on ad infinitum. In practice, clinical judgment is required to determine the best course of action for a specific patient. If this were not the case, then simple observation would usually suffice to determine efficacy, as it has for thousands of years of trial-and-error learning about some kinds of surgery, or as it did for Pasteur in testing his anthrax vaccine. By example, this concern would be highly relevant for a treatment that demonstrated better outcome results than the best available alternative treatment for 52% of sufferers from a complex, chronic lifestyle-related disease with extensive and varying co-morbidity, but a worse outcome than the alternative treatment in 48% of cases. The wide variation of treatment effectiveness versus the best alternative is an indicator of significant hidden conditionals, and there are numerous realistic treatment alternatives. This objection can be applied to quite serious medical conditions as long as the believed effect of the complexities created by contextual issues are of comparable magnitude to the improvement created by the tested treatment, so that while the “best” treatment performs best on average, there is a large proportion of instances within the test and control population for which alternative treatments appear to do as well or better than the test treatment.

A single RCT as a test of some proposed therapeutic, then, is most appropriate for treatments that are in an intermediate zone of signal-to-noise: on one hand, that are not effective in more less every case, or else the conclusion would be obvious without the need for sophisticated controls; but on the other hand, that do not show improvement in a small enough majority of cases that even if a trial shows both statistical and practical significance it cannot provide a practical guide to action because too many other factors would have to be considered to make a rational decision. This is a special case of the problem of generalization from a known finding, which I have addressed in other articles, and will in the book, but in this post I’ll just note that this is really where the scope for expertise in the face of scientific findings applies most centrally, and that the way to replace some of this scope with further scientific findings is through a series of RCTs to test impacts under an ever-wider variety of conditions.

As a practical matter, however, a doctor who simply disregards a well-structured RCT showing some treatment is either beneficial or harmful, and especially a body of such RCTs that have shown this over and again that under a wide variety of conditions, is not doing his job very well. He might decide that other considerations than the reduction of pain are more important, he might, very rarely, make the decision that some case is so exceptional that these trials don’t apply and so on. But we would rightly castigate him as ignorant or ill-intentioned if he instead cited his intuition, or his reasoning from the first principles of biology, as making such RCTs irrelevant.

Now, try to apply (my extension of) Noah’s medical analogy to stimulus. The doctor is an elected official, and the researchers are credentialed macroeconomists. What kicked all this off was my contention that macroeconomists vastly overstate the reliability of their knowledge when they claim that elected officials are ignorant or ill-intentioned in ignoring validated findings about the impact of stimulus when making decisions about taxes, spending and policies for management of central banks. I claim that these economists are armed with the dueling “Cortisonian” and “anti-Cortisonian” regressions. I claim their knowledge is useful, but is not a reliable predictive tool that should trump political judgments even on specific, narrow decisions about stimulus in the way that the results from a series of well-structured RCTs should for a physician.

What kicked all this off was my contention that macroeconomists vastly overstate the reliability of their knowledge when they claim that elected officials are ignorant or ill-intentioned in ignoring validated findings about the impact of stimulus when making decisions about taxes, spending and policies for management of central banks. I claim that these economists are armed with the dueling “Cortisonian” and “anti-Cortisonian” regressions. I claim their knowledge is useful, but is not a reliable predictive tool that should trump political judgments even on specific, narrow decisions about stimulus in the way that the results from a series of well-structured RCTs should for a physician.

Jim: if that’s your point, then we are 90% in agreement. If their input is “useful” then it shouldn’t be ignored – not can a policymaker excuse ignoring it by saying, in effect, that mainstream economics is astrology. But their input is “useful” – it’s not a formula that gives you definitive right or wrong answers or terribly accurate specific predictions about the future.

If so, the main disagreement we have left is this: my impression is that the economists are (mostly) appropriately humble about the degree of certainty about what they know, whereas the elected officials are the ones who desperately want certainty, so they can trumpet their accomplishments (“we prevented over a million layoffs!”) or damn their opponents’ failures (“the stimulus has not created a single job!”).

This is entirely parallel to my impression of what goes on (still) on Wall Street with respect to the role of quants. The quants – who build the models that everybody trades on – are often blamed for big disasters because the disasters come from risks that the models “didn’t foresee.” That was true in 1998 with Long Term Capital Management, and it was true in 2008 with sub-prime-mortgage-backed securities. But the quants I have known are generally pretty open about the limits of their models. It’s the traders and their corporate bosses who want to believe that the models do more than they do, and invest them with an authority that they never actually claimed to have – who have proven, time and time again, willing to bet the firm on the certainty that they know their risk, when there is no way their risk can be completely known, and nobody ever actually told them it was.

— Noah Millman · Dec 7, 07:50 PM · #

I’d like some empirical basis for the claim that macroeconomists overclaim reliability. I’d also like to know whether econometrics is overrepresented in the overclaiming group, or if the worst offenders are the beautiful model/we don’t need no stinking facts types.

RCTs are beautiful things, but making them work in the social world undoubtedly also requires methodological compromise, and I can easily imagine overclaiming as a result of an RCT.

— Pithlord · Dec 7, 11:34 PM · #

Thank you for taking the time[url=http://www.caps2011.com monster energy hats for sale] to publish this information very useful!

— red bull hats · Dec 13, 10:39 AM · #

An ongoing review of politics and culture