Still on Strike Against Campaign Blogging: Why It’s Plausible that Epistatic Interactions Are Complex and Central
I’ve previously blogged here about an article I wrote for National Review that argued that we lack the ability to reduce most mental states to genetic causes reliably using Genome-Wide Association Studies (GWAS). One important reason (I argued) that this is so hard is that genes interact (this is termed epistatic interaction), so the computational problem becomes daunting. If you read through about the first 20 or so comments, you can see a very collegial exchange between Razib and me on the question of how severe epistatic interactions really are.
At one extreme, if genes do not interact at all, and each individual gene has a simple linear relationship to the outcome, then barring other complexities, a GWAS should work fine. One could then represent the relationship between a vector of genes and the outcome essentially in the form of a linear equation: A 1 X 1 + A 2 X 2 + …+ A N X N , where you have N relevant genes, A i = a constant and X i = the gene state for the i th gene. At the other extreme, if each possible combination of genes has an impact on the physical outcome that bears no discernible relationship to any other combination of genes, then a GWAS is unlikely to determine causality for any outcome that depends on many genes. There is no functional form that can represent this relationship; you would just have a huge lookup table that provides the value of the outcome for all possible combinations of relevant gene states. Intermediate conditions between these two extremes would be defined by some structure that is more complex than purely linear relationships with no interaction terms.
Here’s why I think it is plausible (thought this is hardly a proof) that the nature of genetic evolution indicates that the actual functioning of the human genome is closer to the complex end of this spectrum than the simple end. (The following will only make sense if you read through my long post that describes how the genetic operators of selection, crossover and mutation work, using the example of somebody needing to find the combination of switch settings that maximizes output at a chemical plant).
Now assume that there is a second, identical chemical plant next door to this one. A different person, who happens to have a chemical engineering degree, walks into this one and is tasked with finding the best possible combination of switch settings. He reads the label for each switch, scribbles some calculations in his notepad, and confidently flips each switch to a specific setting and announces that the has the plant operating at maximum output.
A third identical plant sits next to this one. A third person walks into this one, and simply starts randomly flipping switches as fast as he possibly can, and recording the output for each tested combination.
Who will get the plant operating at the highest throughput fastest? Well, if the chemical engineer is right, he will win, but if he’s wrong, he’s likely to come in last if the other two guys get to experiment for a while. The first guy is likely to win if there is some underlying structure to be found between various combinations of switches. But if there is no structure to be found, then all of the calculations that the first guy does between flipping switches will slow him down, with no offsetting benefit, and the third guy has the best odds of getting the best solution fastest.
In the AI business, a search algorithm that makes stronger assumptions that let it home in on the right answer faster if the assumptions are right, but tends to hose you if the assumptions are wrong, is termed “greedy”. The second guy is using the limit case of greediness: he simply asserts the answer with no experimental testing. The third guy is using the limit case of non-greediness: he simply tries random combinations. The first guy is using an intermediate case, termed, not shockingly a “genetic algorithm” (GA).
There are other intermediate cases that are more or less greedy than a GA. If, for example, a fourth guy assumed not only that there is some underlying structure, but that this structure involved no interaction terms (i.e., no epistatic interactions), then he could use a search algorithm that would find the best answer much, much faster than the GA. There is a huge sub-field of AI devoted to developing and testing many classes of search algorithms with various levels of greediness, and attempting to demonstrate the types of problems for which each method is more or less appropriate. In this context, GAs are considered to be a very non-greedy algorithm, appropriate for cases in which the underlying structure of the data is highly complicated and/or opaque to the investigator.
Now imagine that we have a line of 1,000,000 identical chemical plants. A different person is escorted into each plant and given the task of finding the best switch setting as fast as possible. Some would use very greedy algorithms, some less so. Some would use GAs, and others would use other approaches. What would determine which algorithm would win would be the degree to which there is some underlying structure in the data. Even if there were a simple structure that just happened to be opaque to the vast majority of observers, a small number of the many tested algorithms would find the structure.
In this way, operating at a higher level of abstraction, this million-plant experiment would be random search of the space of possible algorithms. We could in fact imagine that a super-agent sitting on top of these million experiments could start to combine algorithmic results in various ways, some of which might be analogous to a GA-like evolving set of rules for combining estimates from different algorithms. This group of a million plants might just be one of a million groups of a million plants each, which could then be combined using analogous methods, and so on. We could call this “turtles all the way up”.
We could call this process of competing algorithms struggling to find the best solution as fast as possible “meta-evolution”. That is, each potential search method must compete for survival. The fact that the algorithm that has won this (idealized) competition in the real world has the form of a GA seems to indicate that there is some structure to the relationship between gene vectors and physical outcomes, but that it is much more complex that simple linear combinations without interaction terms, otherwise nature never would have evolved the evolutionary algorithm with all of its computational overhead. If epistatic interactions were not central, meta-evoltuion should have killed off evolution as we know it a long, long time ago.
UPDATE: See a very informed discussion of this by Razib at GNXP.
I like this because it gives a firmer, logical backbone to the scientific consensus that we are on a long genetic leash rather than a short one.
Nice stuff.
Also, if you haven’t read it already, I think you’d appreciate Kurt Lewin’s topological psychology and force field analysis.
— JA · Oct 30, 05:22 PM · #
“then a GWAS is unlikely to determine causality for any outcome that depends on many genes. “
bet you a lot of traits controlled by LOTS of genes are buffeted by the genetic-covariance matrix with all the other phenotypes. IOW, all the QTLs effecting height, which barely show up on GWA cuz they’re such small effect, are large effect on a lot of other traits. i only bring this up because the evolution, whether there’s epstasis or not, on a trait of this type might just be a byproduct forces on other traits.
— razib · Oct 30, 09:38 PM · #
razib is right, and the covariant interactions are even more complex than that.
Because phenotype is determined by four kinds of inheritance, genetic, epigenetic, symbolic and behavioral.
OTOH, if the process is not actually chaotic, we will eventually be able to reverse engineer it in nanoscale.
Have faith Jim.
;)
— matoko_chan · Oct 30, 09:57 PM · #
Jim,
The political implications you claim from this argument don’t follow. It’s just a red herring.
— Steve Sailer · Oct 30, 10:55 PM · #
*Empirically, the narrow heritability for IQ is 75+% of broad heritability according to Devlin and Jensen. That’s indicative of tractability (with large sample sizes and full genomes instead of SNP-chips). *Automated experiments in microwells and the like may make assembling datasets from billions of experiments feasible (at least at the cellular level) *The functional internal organization of the genome can help enormously. Knowing which genes produce which proteins, which pathways those proteins are involved in, etc, greatly narrows search spaces.
— Utilitarian · Oct 31, 03:42 AM · #
JA:
Thanks.
Steve:
I think the issue in this post can be resolved indepedent of any political implications anybody might or might not attempt to draw from it. I’m inherehtly interested in the scientific question.
Razib / Motoko:
I’ve linked to Razib’s great post on this one, and submitted comments there.
— Jim Manzi · Oct 31, 03:00 PM · #
<i> The political implications you claim from this argument don’t follow. It’s just a red herring. </i>
Steve finds the political implications unbearable, is the issue. He’s following a non physical scientist who formulated his theory 40 years ago — Jensen — and whose theories are pretty completely out of touch with modern genetics and brain science.
<i> Empirically, the narrow heritability for IQ is 75+% of broad heritability according to Devlin and Jensen. </i>
heritability and genetic determinism are two totally different things, especially for a characteristic determined by as many factors as IQ. Also, FYI, heritability estimates for IQ range from 40 to 80 percent.
— mq · Nov 1, 04:36 PM · #
Also, I’m having a hard time seeing why Razib’s linked post is either very good or very informative to an educated amateur (which is what I and I presume most people here are). He basically just quotes a paragraph or two from an elderly scientist waving his hands and saying “hey, the modeling assumption of additive genetic effect that I always used to make my research problems tractable is fine, don’t worry about any problems with it”. You’ll notice the scientist in question cites his research from 1970 to do so. If there’s one thing I know about scientists, it’s that they LOVE the simplifying model assumptions that make their conclusions possible. This is especially true among scientists who came up in the pre-computer era, when massive simplifications were totally necessary to make progress because you couldn’t run big simulations — and this scientist was doing his research almost 40 years ago.
Epistatic interaction is a central, important issue. And it touches on the trickiest sorts of scientific question, the level of simplification permissible for a fruitful model. It’s the kind of question you need to read a literature review in a prominent journal to understand, or else get it unpacked for you in some detail by someone who really knows the cutting edge of the field and is willing to be impartial. A paragraph or two of handwaving doesn’t cut it.
— mq · Nov 1, 04:50 PM · #
mq,
If you examine my comment you’ll see that I said that they estimated narrow heritability as 75% of their broad heritability estimates. Devlin gives a narrow heritability of 0.37 and a broad of 0.48 in a meta-analysis of studies including children (who show lower heritability), while Jensen finds a narrow heritability of ~0.7 and broad of ~0.8.
— Utilitarian · Nov 1, 06:09 PM · #
Jim,
Here’s a good summary of different reasons why GWAS have not picked up much for many traits:
http://www.genetic-future.com/2008/03/why-do-genome-wide-scans-fail.html
Remember, IQ is valuable for survival ( http://www.udel.edu/educ/gottfredson/reprints/2007evolutionofintelligence.pdf ), so we expect mutation selection balance to be important. That means lots of deleterious rare alleles of small effect.
— Utilitarian · Nov 2, 05:29 AM · #
mq: “heritability and genetic determinism are two totally different things”
Right – but Jim’s scenario of massive epistasis is not in itself any less genetically deterministic than the proposed alternatives, so despite what JA says this has little to do with being on “a long genetic leash rather than a short one”. Genetic determinism and predictability are two different things, too.
— windy · Nov 3, 10:45 PM · #