More Clinical Trials Blogging: Experiments, Causality and Decisions

This seems to be the week for it.

Megan McArdle has a very interesting post up in which she describes a clinical trial (i.e., random assignment trial) in Ghana that tested the impact of offering free health care on the incidence of a specific childhood disease. It found no statistically significant effect.

McArdle’s take on this result is:

I don’t find it surprising when studies of American/European health care consumption show little relationship between consumption and health outcomes. After a certain point, after all, iatrogenic morbidity and mortality has to outweigh the benefit of marginal treatment. But I confess I am shocked that studies show the same thing in the developing world:

[To save you the trouble, if you’re like me, “iatrogenic” means “induced by a physician’s words or therapy (used especially of a complication resulting from treatment)”]

McArdle quotes this from the “Methods and Findings” summary from the paper:

…Introducing free primary health care altered the health care seeking behaviour of households; those randomised to the intervention arm used formal health care more and nonformal care less than the control group. Introducing free primary health care did not lead to any measurable difference in any health outcome. The primary outcome of moderate anaemia was detected in 37 (3.1%) children in the control and 36 children (3.2%) in the intervention arm… [Bold added]

If you go to Table 3 of the paper and look for the significant effects on measured treatment utilization, you can see that the test group increased reported visits to Formal Care by 0.3 visits per person per year (2.8 vs. 2.5), while they decreased reported visits to Informal Care by about 0.5 visits per person per year (4.59 vs. 5.10). So really, what we see reported is that the intervention had the apparent effect of shifting visits from one class of care to another (and reducing total number of visits). If anything, this would show in this case not that “more care doesn’t produce better health”, but something closer to “1 visit to Formal Care as practiced for this group of people in Ghana can not be shown to produce better health than ~2 visits to Informal Care”.

Further, as the study authors themselves fully acknowledge in the paper, one should be very cautious about even this conclusion. These effect sizes (in terms of both health outcomes and treatment utilization) are tiny. In the case of the health outcome this isn’t surprising. If you go to the Sample Size portion of the Methods section of the paper, the authors assumed a priori a 10% prevalence at study conclusion (as opposed to the ~3% actual prevalence). This doesn’t necessarily mean that a different a priori prevalence estimate would have led to a different sample size, but probably does indicate that they were testing for much more gross effects than they would if they designed the study now.

The more interesting measurement issue, though, is for treatment utilization. The measured change in treatment utilization was small (on average, a test group subject saw a doctor once more every three years than a control group subject). In order to develop this estimate the researchers relied on patients filling in diaries, which are notoriously inaccurate. The significance calculation between the test and control groups for utilization (basically, addressing the question of how big a difference is 2.5 vs. 2.8 visits per year as compared to how variation we see individual by individual within these groups) would not capture measurement error that was systematically different between the test and control populations. This is a big problem with diaries, especially when people feel that they “should” be doing something (e.g., it is widely believed that when Nielsen used diaries to create the famous Nielsen ratings for TV shows that people had a tendency to over-report time spent watching Masterpiece Theater and under-report time spent watching The Gong Show). One could easily imagine that people being given free health care for their children, but who nonetheless didn’t bother to go to the doctor more, might lie about this. We, by definition, can’t know what this error was for this study, but 1 extra visit per person per three years is not a whole lot.

But let’s assume for the moment that the measurements in the study are completely accurate. What I found so interesting about the post is that it illustrates an irreducible problem in analytical decision-making. Highly generalized “econometric” methods (regression analysis, time-series analysis and so on) attempt to identify general cause-and-effect rules (e.g., “more health care doesn’t improve health past a certain point”), but suffer from the problem of misattribution of causality. Random assignment trials solve the problem of causality (or least create, as the saying goes, “the scientific gold standard” for measuring causality”), but if we are to use them to inform decision-making, we must generalize beyond the experiment itself.

This problem that you can have greater knowledge of causality, but at the expense of generalization, or you can have greater generalization at the expense of certainty of causality, seems both irresolvable and central to the Hayekian knowledge problem that dogs all of economics and social science. Amusingly, it is also strangely analogous to the Uncertainty Principle in physics:

Cop, speaking to driver of car he has just pulled over: “Sir, do you have any idea how fast you were going?”

Driver: “No, but I can tell you exactly where I was.”

McArdle takes a specific point-finding (again, let’s assume it is exactly correct, for the moment) and attempts to fit it into a broader pattern of diminishing marginal returns to health care. This is not a crazy way to see it, and as I’ve said, one must try to do this pattern-fitting in order to make the experimental result help with any forward decisions. But as I’ve also tried to show, other patterns that I think the finding might fit into better are more like “Traditional medical practices are under-rated”, or “Health care delivery in very poor countries is very poor”, or even, if we make the background assumption that when you have very little formal health care it is extremely likely that more will help, something like “People tend to lie when filling out activity diaries”. Which pattern (or which unkown pattern) it actually fits into determines how we should use this result, but the experiment can’t, by definition, answer this question for us.

For somebody who (like me) advocates using random control trials to estimate causality reliably, this is the central problem for translating experimental results into actionable decision rules. It’s probably also a big deal for experimental economics.