Jonah, Goldberg and John Derbyshire are having an interesting exchange at The Corner on whether researchers have a found a gene that, under certain environmental conditions, predisposes individuals to liberal politics.
I wrote a long piece (gated) for National Review in 2008 that described why we should be very skeptical of assertions of causality that are derived from gene association studies. The basic reason is that, while these kinds of studies have remarkable rhetorical force because their purported subject is biology, if you look under the skin at the bones of the analysis, the core method is traditional social science. The article under consideration is an almost perfect illustration of this.
Start with the point that the press release (literally titled “Researchers liberate a ‘liberal gene’”) is basically worthless, as it so frequently and carelessly elides the difference between claims of “association” (i.e., correlation) and claims of causality.
The basic methodology employed in the real paper starts by arguing that prior research has led to a theory that a specific gene ought to be implicated in a specific behavior. In this case, the hypothesized behavior is that people with a specific gene variant that is believed to predispose individuals to seek out new experiences should be more liberal if they are also embedded in a social network with a broad variety of viewpoints. The point of the study is to “test” this hypothesis by, roughly speaking, looking at a group of people who have the gene variant to see if there is an “association” (there’s that word again) between the number of friends in adolescence and likelihood of being liberal, and then to compare this degree of association to that found among a group of people without the gene variant. They discover that for the group with the gene variant, there is a meaningful association, but that there is not for those without it.
The big problem, of course, is that other things might also vary between the groups, and these other differences might be the real cause of the observed behavior difference. Here’s how I put this in the piece from the magazine:
Media outlets will often speak loosely of things such as a “happiness gene,” a “gay gene,” or a “smart gene.” The state-of-the-art method for finding such a link is something called a “genome-wide association study” (GWAS). In a GWAS, scientists use blood or saliva samples to sequence the DNA for a group of several thousand people who exhibit a trait or behavior of interest (the “case group”), and for a second group of several thousand who do not exhibit the trait or behavior (the “control group”). …
A second limitation of a GWAS is that it detects association rather than causation. Suppose we found that a case group of persons suffering from a disease had a greater incidence of some gene than did a control group, but that we failed to notice that the case group was disproportionately of Chinese ancestry. Culturally transmitted behaviors in the case group might be responsible for the disease, even if these behaviors had nothing to do with the gene in question. That is, the gene could be nothing more than a marker for Chinese ancestry, and hence for participation in behaviors that cause the disease. Geneticists call this problem “stratification,” and deal with it by carefully matching individuals in the case and control groups to ensure that the groups really are comparable. The problem is that these stratification effects can be fiendishly subtle. No matter how carefully we match cases with controls, there can always be some unobserved environmental factor correlated with, but not caused by, a genetic difference between groups, and this environmental factor might be what is actually causing the disease.
The researchers are well aware of the centrality of this problem. The crucial methodological passage in the full research paper starts with this:
Genetic association studies test whether an allele or genotype occurs more frequently within a group exhibiting a particular trait than those without the trait (e.g., is the frequency of a particular allele or genotype higher among liberals than conservatives?). Because a significant association has several possible explanations, there are two main research designs employed in association studies to isolate the effect of an allele on a trait, case-control designs and family-based designs (Carey 2002). Due to potential population stratification in our sample, we chose to employ a family-based design, which eliminates the problem of population stratification by using family members, such as parents or siblings, as controls.
That is, the researchers intelligently use family members as controls to try and optimize case-control matching. But this does not come close to eliminating the problem, as the researchers then describe:
We include individuals from the same family in the analysis, and thus the observations are not independent. Therefore, we use a generalized estimating equations approach with an independent working correlation structure for the clustered errors, to estimate the model. Only siblings that have different genotypes, in this case a different number of 7R alleles, are informative for the within-family component of variance since wij equals zero otherwise. However, families that share the same genotype are also included in our analysis for improved estimation of the between-family component. We have also included controls in the model for both age and gender, as there are numerous instances of age effects in gene-environment interactions and there are sex specific genetic influences on political preferences (Hatemi, Medland, and Eaves 2009c). [Bold added]
In other words, the researchers have built the functional equivalent of a regression model, through which they believe that they have comprehensively controlled for other effects in just the way that any political science, economics or other social science researcher would have in a paper that tried to evaluate the effect of any non-genetic purported cause of such a propensity (which makes a lot of sense, as the article was actually published in The Journal of Politics).
But as I described in my piece, this means that in spite of all the white lab coat talk about alleles and so on, we should treat this with the same skepticism that we would bring to any social science regression model:
So how is a GWAS showing an association between Gene X and aggressiveness different from a social-science study showing a correlation between watching lots of violent TV and aggressiveness? Mathematically, it’s not. In both cases we start by measuring aggressiveness for each person. We then compile for each person a list of data providing information on potential causes of aggressiveness: in one case genomic information, and in the other, sociological observations on childhood experiences, school quality, and so on. In the first case we observe that aggressive people have a higher incidence of Gene X; in the second that they watch a lot of violent TV. The reliability of GWAS studies is thus subject to the same limitations that we think of in connection with sociology or economics (as opposed to, say, chemistry). The only way around this — the only way to attain the precision of chemistry — would be actually to show the chain of biochemical processes by which a set of named genes creates the observable brain functions collectively defined as “aggressiveness.” Of course, if we could do that, we would have no need for a GWAS study.
The claims of causality that arise from such studies should accordingly be treated with the appropriately intense skepticism that we apply to sociological or econometric studies. In the middle of the 20th century, Friedrich Hayek and the libertarians he inspired faced those who asserted that that an economy could be successfully planned. The libertarian position was not that such planning could be proved impossible in theory, but that we lacked sufficient information and processing power to accomplish it. The world of economic interaction is so complex that it overwhelms our ability to render it predictable; hence the need for markets to set prices. This is the same analytical problem we face when trying to predict a mental state that depends upon a large number of genes. It is unclear whether we will ever understand how this complicated machinery and its interactions with the environment come together to create characteristics of mind. It is certain, however, that we do not have such an understanding now, and that we won’t know such a project is achievable until we achieve it.
(Cross-posted to The Corner)