Round-Up of Some Reactions to Experiments and Social Science

I spent some time speaking with Steve Pearlstein, who has written an excellent column for the Washington Post on experimentation that relates business experimentation to social scientific experimentation.

Will Wilkinson has a very thoughtful post on my article, in which he describes the difference between what he terms “liberty of discovery” and “liberty of respect”. I think these are somewhat like what I have called “liberty as means” and “liberty as goal”. He has what I think are some very smart things to say in it.

Andrew Sullivan says this in a post:

Jim ably dispatches some salient challenges. But there is a concept in this crucial conservative distinction between theoretical and practical wisdom that has been missing so far: individual judgment. A social change can never be proven in advance to be the right answer to a pressing problem. We can try to understand previous examples; we can examine large randomized trials; but in the end, we have to make a judgment about the timeliness and effectiveness of certain changes. It is the ability to sense when such a moment is ripe that we used to call statesmanship. It is that quality that no wonkery can ever replace.

It is why we elect people and not algorithms.

Just so, in my view.

In 1957, psychology and social science researcher Donald Campbell famously developed the idea of distinguishing between internal and external validity of an experiment:

“Validity can be evaluated in terms of two major criteria. First, and as a basic minimum, is what can be called internal validity: Did in fact the experimental stimulus make some significant difference in this specific instance? The second criterion is that of external validity, representativeness or generalizability: To what populations, settings or variables can this effect be generalized?

In the book, I term the problem of external validity to be one of one “predictive generalization”, i.e., answering the question of “What will happen if I execute policy X in the future?”, and distinguish this from what I call the problem of “strategic generalization”, i.e., answering the question of “Should I execute policy X?” I try to show why what Andrew says is inherently true – that as far as I can see, even in a situation in which normative concerns are not in play and we have agreement on goals, that there is no series of experiments or other analyses that can ever answer the second question in real-life situations.

But I think that the primary implication of this realization is to be very hesitant to take strategic leaps, and to do so only when other options appear to be foreclosed.

Mark Kleiman in the latest round of an exchange with me says:

But, if I read Manzi’s response correctly, my original comment allowed a merely verbal disagreement to exaggerate the extent of the underlying substantive disagreement. If indeed Manzi can offer some systematic analysis of how to look at existing institutions, figure out which ones might profitably be changed, try out a range of plausible changes, gather careful evidence about the results of those changes, and modify further in light of those results, then Manzi proposes what I would call a “scientific” approach to making public policy.

If all Manzi means when he disses “social science” is that you shouldn’t just read some random paper in an economics or social-psych journal and propose some insanely risky venture such as privatizing Social Security or voucherizing public education or wiping out labor unions based on that paper, then I’m happy to stand shoulder-to-shoulder with him against irresponsible radicalism and for cautious and evidence-sensitive approaches to bringing about social improvement.

I think that he is reading my response correctly. While I don’t think that “all I meant” was that “you shouldn’t read some random paper in an economics or social-pysch journal” and propose X, I certainly believe that. Most important, I acknowledge enthusiastically his “sauce for the goose is sauce for the gander” point that the recognition of our ignorance should apply to things that I theorize are good ideas, as much as it does to anything else. The law of unintended consequences does not only apply to Democratic proposals.

In fact, I have argued for supporting charter schools instead of school vouchers for exactly this reason. Even if one has the theory (as I do) that we ought to have a much more deregulated market for education, I more strongly hold the view that it is extremely difficult to predict the impacts of such drastic change, and that we should go one step at a time (even if on an experimental basis we are also testing more radical reforms at very small scale). I go into this in detail for the cases of school choice and social security privatization in the book.

Finally, Steve Sailer says this:

First, while experiments are great, correlation studies of naturally occurring data can be extremely useful. Second, a huge number of experiments have been done in the social sciences.

Third, the social sciences have come up with a vast amount of knowledge that is useful, reliable, and nonobvious, at least to our elites.

I claim that the purpose of science is to create useful, reliable, non-obvious predictive rules, and that experiments are a necessary but not sufficient component of this enterprise, as they are the most severe available test of reliability. So, a given correlation study might be “extremely useful”, but it does not eliminate the need for experimental tests of its assertions. I think that his first point implies that they are alternatives or substitutes; while I believe they are complementary.

I have attempted to collate every relevant, sufficiently large developed-world RFT reported in journals in the history of global social science. I don’t think I have been able to find every one, but I believe there have been, at most, a few thousand. In comparison, there have been something like 350,000 RCTs for therapeutics, and one company (Capital One) reportedly does more randomized field experiments per month than have been done in all of social science. More to the point, the number of findings of statistically significant positive results of social interventions that have been demonstrated in replicated RFTs appears to me to be tiny.

To Steve’s third point, my argument was not that we have not produced “knowledge” in some general sense, but that:

[F]ew programs can be shown to work in properly randomized and replicated trials. Despite complex and impressive-sounding empirical arguments by advocates and analysts, we should be very skeptical of claims for the effectiveness of new, counterintuitive programs and policies, and we should be reluctant to trump the trial-and-error process of social evolution in matters of economics or social policy.