Goldstein, Genes and the Fatal Conceit
Andrew Sullivan points to some the blogosphere discussion arising from Dana Goldstein’s emperor’s-new-clothes article about the failure of genome-wide association studies to find genes that account for human mental states (as well as other non-mental conditions).
Goldstein is putting forward a theory for why it is plausible that evolution would have created such a situation. But whether or not Goldstein is correct about the causal pathway that created this situation, it is an empirical reality that a GWAS has a very hard time finding genes that create given mental states. This is fundamentally a problem of combinatorial mathematics.
Here’s how I put it in a National Review article in June:
Media outlets will often speak loosely of things such as a “happiness gene,” a “gay gene,” or a “smart gene.” The state-of-the-art method for finding such a link is something called a “genome-wide association study” (GWAS). In a GWAS, scientists use blood or saliva samples to sequence the DNA for a group of several thousand people who exhibit a trait or behavior of interest (the “case group”), and for a second group of several thousand who do not exhibit the trait or behavior (the “control group”). Scientists then look for genetic differences between the two groups. In cases where a single malfunctioning gene creates, for example, a catastrophic disease that overwhelms other genetic and environmental factors, a GWAS can quickly pinpoint the culprit. Sometimes, however, the behavior or trait is caused by several interacting genes — so that, for example, Gene 1 has some effect only if Gene 2 has a special structure. This is called “epistatic interaction,” and can involve a large number of genes. Epistatic interactions make genetic effects harder to identify. Scientists deal with this problem and others by creating larger and larger case and control groups. The scaling up of such studies is among the most exciting frontiers in genetics. It is essentially an engineering problem, and money poured into solving it will likely improve human health through genetic screening and, ultimately, therapies.
Seeing this momentum, it is natural to assume that eventually we will have explained all human behavior, not just diseases caused by one or a small number of interacting genes. But the GWAS technique hits structural limits when applied to conditions that involve epistatic interactions among lots of genes. Mental activity is now widely believed by scientists to depend on many genes (though mental illnesses such as schizophrenia or bipolar disorder may turn out to be partial exceptions). A person has about 20,000 genes, of which more than 5,000 are believed to play some role in regulating brain function. Consider a simplified case in which some personality characteristic — aggressiveness, for example — is regulated by 100 genes, each of which can have two possible states (“on” or “off”). The combinatorial math is daunting: There are more than a trillion trillion possible combinations of these gene states. Thus we could sequence the DNA of all 6.7 billion human beings and still not know which genes are responsible for aggressiveness.
A second limitation of a GWAS is that it detects association rather than causation. Suppose we found that a case group of persons suffering from a disease had a greater incidence of some gene than did a control group, but that we failed to notice that the case group was disproportionately of Chinese ancestry. Culturally transmitted behaviors in the case group might be responsible for the disease, even if these behaviors had nothing to do with the gene in question. That is, the gene could be nothing more than a marker for Chinese ancestry, and hence for participation in behaviors that cause the disease. Geneticists call this problem “stratification,” and deal with it by carefully matching individuals in the case and control groups to ensure that the groups really are comparable. The problem is that these stratification effects can be fiendishly subtle. No matter how carefully we match cases with controls, there can always be some unobserved environmental factor correlated with, but not caused by, a genetic difference between groups, and this environmental factor might be what is actually causing the disease.
Further, to think in terms of genes is to abstract away from a biochemical reality that is far more complex. On one hand, a gene is not an atomic entity, but a sophisticated machine with many components. Much as in the progress of particle physics over the past century, we keep discovering components-within-components of the genetic mechanism that are relevant to physical and mental outcomes, and it’s entirely plausible that we will eventually get all the way down to subatomic quantum effects as drivers of behavior. On the other hand, as we move away from the genome itself, we see that other dimly understood biochemical processes have a large impact on how the information contained in the gene gets expressed as an observable human characteristic. And all of this is before we consider interactions of the human organism as a whole with those factors that we typically term “environmental,” ranging from nutrition and exposure to pathogens to parenting styles and childhood experiences.
So how is a GWAS showing an association between Gene X and aggressiveness different from a social-science study showing a correlation between watching lots of violent TV and aggressiveness? Mathematically, it’s not. In both cases we start by measuring aggressiveness for each person. We then compile for each person a list of data providing information on potential causes of aggressiveness: in one case genomic information, and in the other, sociological observations on childhood experiences, school quality, and so on. In the first case we observe that aggressive people have a higher incidence of Gene X; in the second that they watch a lot of violent TV. The reliability of GWAS studies is thus subject to the same limitations that we think of in connection with sociology or economics (as opposed to, say, chemistry). The only way around this — the only way to attain the precision of chemistry — would be actually to show the chain of biochemical processes by which a set of named genes creates the observable brain functions collectively defined as “aggressiveness.” Of course, if we could do that, we would have no need for a GWAS study.
The claims of causality that arise from such studies should accordingly be treated with the appropriately intense skepticism that we apply to sociological or econometric studies. In the middle of the 20th century, Friedrich Hayek and the libertarians he inspired faced those who asserted that that an economy could be successfully planned. The libertarian position was not that such planning could be proved impossible in theory, but that we lacked sufficient information and processing power to accomplish it. The world of economic interaction is so complex that it overwhelms our ability to render it predictable; hence the need for markets to set prices. This is the same analytical problem we face when trying to predict a mental state that depends upon a large number of genes. It is unclear whether we will ever understand how this complicated machinery and its interactions with the environment come together to create characteristics of mind. It is certain, however, that we do not have such an understanding now, and that we won’t know such a project is achievable until we achieve it.
smart article.
— raft · Sep 18, 03:44 PM · #
Jim, yes. the math means that GWAS, by itself, isn’t likely to give us the ability to explain/predict human behavior.
However, since science studies all levels of human cognition, I’m wondering what that gets you.
You’ve no doubt seen this before, but I’d like to call your attention to a paper by Bonabeau and Dessalles defining emergence as a phenomenon of detection:
Emergence can then be defined with respect to the same tools used to define the complexity of a system. It occurs when an object or phenomenon cannot be detected or understood with a given set of tools but can be detected or understood by allowing some additional tools. For some reason (dynamic evolution of the system or changes in the set of observational tools) a new apprehension of the system becomes possible that offers a shorter overall description, and hence a smaller relative complexity. Emergence is thus associated with a decrease of the relative complexity.
Consider the truism that light goes in a straight line. At a certain level of detection — ours — this is true. Light travels in a straight line, and we can use this “fact” to plan and communicate and build and do all kinds of stuff. However, the “truth” is the exact opposite. Light doesn’t go straight, instead, light actually travels every path imaginable, all at the same time, with each path having different angular amplitudes (probabilities). Rather than being an inherent property of light, the “straight line” phenomenon emerges on the macro-level as the outcome of all the probabilities combining — as the renormalization of an infinitely complex “truth” that is beyond our ability to compute.
This kind of emergence is common throughout science, whether you’re moving from Einstein’s general relativity to Newton’s equations, or from physics to chemistry. It also happens when you move from epistatic interactions to neuroanatomy to cognitive science to behavioral psychology. Each step up gets computationally simpler, even though it sacrifices the previous level’s precision.
It’s at the latter, higher levels where the fight between determination and undetermination will rage, and be resolved. The smart money? — it’s on both long-leash and short-leash controls, at least two complementary levels of cognitive processing which provide us with a broad area of behavioral underdetermination, an area most accessible to the self-aware, reflective, deliberative mind.
— JA · Sep 18, 07:32 PM · #
Jim…unless the system is chaotic, we will be able to model it eventually.
Sure, its over-parameterized.
But that doesn’t mean infinite parameters.
— matoko_chan · Sep 18, 07:45 PM · #
JA:
Thanks for the, as always, extremely thougtful comment. BTW, I found your prose a lot clearer than the paper.
There is an almost philosophical discussion about whether genes + environment must create mental states because cause-and-effect is all there is, or whether this is not true. I wasn’t trying to engage this. I was asking a practical question: Can we take a vector of measurable genetic data for an individual and reliably identify how it causes various non-pathological mental states in that person? I argued that the answer is “no”, and that our current techniques for doing so do not hold out much immediate hope for doing this. I remain open-minded both about the philosophical question, and the question of whether we will ever be able to do this.
The idea that we will be able to get around the mathematical problems that I have described through emergent higher-level properties is ingeneous. I guess the issue would likely be that when people assert that we can explain behavior through genes (to grossly over-simplify, but I know you get my meaning), this would be, in analogy to your examples, like explaining chemmistry in terms of the underlying physics, or Newton as a special case approximation to Einstein (if I understood you). But in those cases, we require the ability to understand physics and Einstein in normal scientific terms in order to do this. That is, they are not saying “whatever with all this talk of genes, I can observe phenotype characteristics or behaviors and predict future mental states” (whcih would be like doing chemistry without knowing the physics); they are instead saying “I can explain mental states in terms of the hidden genetic processes” (which seems to me would be like knowing the physics and then showing how cheimstry is really just the higher-level-of-abstraction macro-manifestation of these underlying physical laws).
— Jim Manzi · Sep 18, 09:15 PM · #
Interesting post, and the point about the limitations of GWAS is well-taken.
However, maybe the combinatorial argument, which you say is simplified, is too simplified. Consider a decision tree, where at every node we ask a yes/no question about (the 100 telltale switches within) a person’s genome. The final “decision” is whether to call the sequenced individual “aggressive” or not.
It seems likely that this tree is quite heavily pruneable. Perhaps there are cases where if 5 genes occur in a certain combination, it doesn’t much matter what is true of the remaining 95 genes. Once we keep only the most useful paths on the tree, who’s to say whether the tree will be huge or manageable?
This is all the more true if we stop hoping to completely disambiguate the phenotype, which I agree is a pipe dream, and instead content ourselves with significantly disambiguating the phenotype. (In other words, the conditional entropy H(AggressivenessPhenotype|GeneVector) won’t be zero, but it might be quite a bit less than H(AggressivenessPhenotype))
Now, we agree about the conclusion that we can’t prejudge right now how useful or effective or feasible this stuff really is. But I think my priors might be a little different from yours.
— mk · Sep 18, 10:01 PM · #
Jim, thanks for the compliment, back at you as always.
I think we’re in agreement in practice.
You ask: Can we take a vector of measurable genetic data for an individual and reliably identify how it causes various non-pathological mental states in that person?
You’re right; no way. In practice, our testing results will be like, “Yeah, there’s a 30% chance your child will be slightly more inclined toward aggressive behavior in 5% of his likely environments — until he’s 13, that is, then we have no frickin’ clue.” — not exactly something we should plan around.
As you said above, the reason we won’t be able to do this is because genes don’t monologue; they’re in a non-commutative, recursive conversation with themselves and their environment.
In principle, if we could know everything? — yeah, who knows.
— JA · Sep 18, 11:30 PM · #
Dana Goldstein? The American Prospect writer?
— Stuart Buck · Sep 18, 11:34 PM · #
Jim,
As I pointed out last spring, you bring a mathematician’s and engineer’s perspective to a scientific problem. The mathematician isn’t satisfied until the theorem is proven and the engineer doesn’t sign off on the airplane until he’s confident it’s highly unlikely to crash. In contrast, the scientist is interested in making incrementally better predictions. You seem to think that there is something wrong with incrementalism — if the computer program won’t run, what good is it? — but in science, there isn’t.
We already can make better than random predictions about IQ based on extremely crude measures of heredity. The odds that somebody named, say, Goldstein will score higher on an IQ test than the average American is pretty good.
We’re going to slowly get better at that, especially if there is ever any funding for genome wide association studies that are combined with the traditionally successful way of looking at IQ: studies of identical vs. fraternal twins, adoption studies, and cross-ethnic studies. Sure, there will be a lot of numbers to crunch, but, in the long run, so what?
— Steve Sailer · Sep 19, 12:50 AM · #
Also Jim…..eventually we will be able to reverse engineer cognitive processes with nanotech.
If one suscribes to Tegmark Theory (as i do) we are eventually guaranteed a mathematical solution.
Even if it winds up being expressible only in q-math.
;)
— matoko_chan · Sep 19, 07:05 PM · #