Goldstein, Genes and the Fatal Conceit

Andrew Sullivan points to some the blogosphere discussion arising from Dana Goldstein’s emperor’s-new-clothes article about the failure of genome-wide association studies to find genes that account for human mental states (as well as other non-mental conditions).

Goldstein is putting forward a theory for why it is plausible that evolution would have created such a situation. But whether or not Goldstein is correct about the causal pathway that created this situation, it is an empirical reality that a GWAS has a very hard time finding genes that create given mental states. This is fundamentally a problem of combinatorial mathematics.

Here’s how I put it in a National Review article in June:

Media outlets will often speak loosely of things such as a “happiness gene,” a “gay gene,” or a “smart gene.” The state-of-the-art method for finding such a link is something called a “genome-wide association study” (GWAS). In a GWAS, scientists use blood or saliva samples to sequence the DNA for a group of several thousand people who exhibit a trait or behavior of interest (the “case group”), and for a second group of several thousand who do not exhibit the trait or behavior (the “control group”). Scientists then look for genetic differences between the two groups. In cases where a single malfunctioning gene creates, for example, a catastrophic disease that overwhelms other genetic and environmental factors, a GWAS can quickly pinpoint the culprit. Sometimes, however, the behavior or trait is caused by several interacting genes — so that, for example, Gene 1 has some effect only if Gene 2 has a special structure. This is called “epistatic interaction,” and can involve a large number of genes. Epistatic interactions make genetic effects harder to identify. Scientists deal with this problem and others by creating larger and larger case and control groups. The scaling up of such studies is among the most exciting frontiers in genetics. It is essentially an engineering problem, and money poured into solving it will likely improve human health through genetic screening and, ultimately, therapies.

Seeing this momentum, it is natural to assume that eventually we will have explained all human behavior, not just diseases caused by one or a small number of interacting genes. But the GWAS technique hits structural limits when applied to conditions that involve epistatic interactions among lots of genes. Mental activity is now widely believed by scientists to depend on many genes (though mental illnesses such as schizophrenia or bipolar disorder may turn out to be partial exceptions). A person has about 20,000 genes, of which more than 5,000 are believed to play some role in regulating brain function. Consider a simplified case in which some personality characteristic — aggressiveness, for example — is regulated by 100 genes, each of which can have two possible states (“on” or “off”). The combinatorial math is daunting: There are more than a trillion trillion possible combinations of these gene states. Thus we could sequence the DNA of all 6.7 billion human beings and still not know which genes are responsible for aggressiveness.

A second limitation of a GWAS is that it detects association rather than causation. Suppose we found that a case group of persons suffering from a disease had a greater incidence of some gene than did a control group, but that we failed to notice that the case group was disproportionately of Chinese ancestry. Culturally transmitted behaviors in the case group might be responsible for the disease, even if these behaviors had nothing to do with the gene in question. That is, the gene could be nothing more than a marker for Chinese ancestry, and hence for participation in behaviors that cause the disease. Geneticists call this problem “stratification,” and deal with it by carefully matching individuals in the case and control groups to ensure that the groups really are comparable. The problem is that these stratification effects can be fiendishly subtle. No matter how carefully we match cases with controls, there can always be some unobserved environmental factor correlated with, but not caused by, a genetic difference between groups, and this environmental factor might be what is actually causing the disease.

Further, to think in terms of genes is to abstract away from a biochemical reality that is far more complex. On one hand, a gene is not an atomic entity, but a sophisticated machine with many components. Much as in the progress of particle physics over the past century, we keep discovering components-within-components of the genetic mechanism that are relevant to physical and mental outcomes, and it’s entirely plausible that we will eventually get all the way down to subatomic quantum effects as drivers of behavior. On the other hand, as we move away from the genome itself, we see that other dimly understood biochemical processes have a large impact on how the information contained in the gene gets expressed as an observable human characteristic. And all of this is before we consider interactions of the human organism as a whole with those factors that we typically term “environmental,” ranging from nutrition and exposure to pathogens to parenting styles and childhood experiences.

So how is a GWAS showing an association between Gene X and aggressiveness different from a social-science study showing a correlation between watching lots of violent TV and aggressiveness? Mathematically, it’s not. In both cases we start by measuring aggressiveness for each person. We then compile for each person a list of data providing information on potential causes of aggressiveness: in one case genomic information, and in the other, sociological observations on childhood experiences, school quality, and so on. In the first case we observe that aggressive people have a higher incidence of Gene X; in the second that they watch a lot of violent TV. The reliability of GWAS studies is thus subject to the same limitations that we think of in connection with sociology or economics (as opposed to, say, chemistry). The only way around this — the only way to attain the precision of chemistry — would be actually to show the chain of biochemical processes by which a set of named genes creates the observable brain functions collectively defined as “aggressiveness.” Of course, if we could do that, we would have no need for a GWAS study.

The claims of causality that arise from such studies should accordingly be treated with the appropriately intense skepticism that we apply to sociological or econometric studies. In the middle of the 20th century, Friedrich Hayek and the libertarians he inspired faced those who asserted that that an economy could be successfully planned. The libertarian position was not that such planning could be proved impossible in theory, but that we lacked sufficient information and processing power to accomplish it. The world of economic interaction is so complex that it overwhelms our ability to render it predictable; hence the need for markets to set prices. This is the same analytical problem we face when trying to predict a mental state that depends upon a large number of genes. It is unclear whether we will ever understand how this complicated machinery and its interactions with the environment come together to create characteristics of mind. It is certain, however, that we do not have such an understanding now, and that we won’t know such a project is achievable until we achieve it.