Evolution depends on variation, and variation depends on mutations. The evolution of new features, in particular, wouldn’t be possible without new mutations. Thus, mutation is of great interest to evolutionary biologists. More specifically, how mutations affect an organism’s fitness has been discussed and debated ever since the concept of mutations entered evolutionary theory. Relatively speaking, how many mutations are harmful, beneficial, or neither? What kinds of mutations are likely to be each in which parts of the genome? It’s hard to get a confident picture on such questions, partly because there are so many possible mutations in any given gene, let alone genome, and partly because fitness isn’t always easy to measure (see Eyre-Walker and Keightley  for a review).
Hietpas et al. (2011) did something really cool that hasn’t been done before: they took a small piece of an important gene, and examined the fitness consequences of every possible mutation in that sequence. This approach is limited in its own way, of course. Due to the sheer number of possibilities, it’s only feasible for short sequences, which might make it hard to generalise any results. But the unique window it opens on the relationship of a gene’s sequence and its owner’s success is invaluable.
What did they do?
Let’s examine the method in a bit more detail, mainly to understand what “every possible mutation” means in this context; because it’s a little more complicated than it sounds.
The bit of DNA they chose codes for a 9-amino acid region of heat shock protein 90 (Hsp90) in brewer’s yeast. So it really is small, only 27 base pairs altogether (recall that in the genetic code, 3 base pairs [1 codon] translate to 1 amino acid). Hsp90 is a very important protein found all over the tree of life. It’s a so-called chaperone, a protein that helps other proteins fold correctly, and in eukaryotes it’s absolutely required for survival.
The team generated mutant versions of the Hsp90 gene, each of which differed from the “wild type” version in one codon out of these nine. So each “mutation” examined could actually be anywhere between one and three mutations. They generated all possible mutants like that, amounting to over 500 different sequences.
[NOTE: If you check back at the genetic code, you’ll note that most amino acids are encoded by more than one codon, so not all of the resulting proteins differed from one another. Mutations that don’t change the amino acid are called synonymous. This will become important later.]
Then came the measurement of fitness. The researchers took a strain of yeast whose own Hsp90 gene was engineered not to work at high temperatures, and infected the cells with small pieces of DNA called plasmids, each carrying either a wild type (temperature-insensitive) Hsp90 gene or one of the 500+ mutants. They then grew all cells together in a common culture. After a while, they raised the growing temperature to let the engineered genes determine the cells’ survival.
They took samples every few hours – wild type yeast populations doubled every 4 hours – and did something that would not have been possible even a few years ago: sequenced the region of interest from this mixed culture, and compared the abundance of different sequence variants. By counting how many times each mutant was sequenced at each time point, they got a very good estimate of their relative abundances. The way each mutant prospered or declined relative to others over time gave a measurement of their fitness.
What did they find?
There are so many interesting things in this study that I’m not sure where to begin. Let’s start with the result that concerns the first question posed in my introductory paragraph. How are the mutations distributed along the deleterious – beneficial axis?
Perhaps not surprisingly, most non-synonymous mutations were harmful to fitness. I say not surprisingly because this protein has been honed by selection for many, many millions of years. It is probably close to the best it can be, although the researchers tried to pick a region that contained variable as well as highly conserved amino acids.
[ASIDE: They didn’t really succeed in that – among the 400+ species they say they used for comparison, 4 of 9 positions don’t vary at all, 2 are identical in almost all species, another 2 can have two amino acids with roughly equal chance, and only one can hold three different amino acids. I’ve seen more variation in supposedly highly conserved sequences over smaller phylogenetic distances. Perhaps Hsp90 is just that conserved everywhere.]
There were a few mildly beneficial mutations, but no highly beneficial ones. Deleterious mutations could be divided into two large groups, with very few in between: mostly they were either very harmful or close to neutral. This constitutes support for the nearly neutral theory of molecular evolution, but as I said, the sequence they examined is hardly representative of all sequences under all circumstances. It would be interesting to see how (if) the distribution changes in sequences under directional selection, or sequences that don’t experience much selection at all. I’m kind of hoping that that’s their next project 😛
The second interesting observation – interesting to me, anyway – is that nonsense mutations, those that introduce an early stop codon in the sequence, were not as unfit as complete deletions of the gene. A stop codon means the end of the protein – an early stop codon eliminates everything that comes after it. Cells making a truncated protein were lousy at survival, but not quite as lousy as cells with no Hsp90 at all. This is a bit strange, given that earlier the paper states that a region of Hsp90 that comes after their 9 amino acids is necessary for its function. A nonsense mutation in the test region removes that supposedly necessary part, so why did those cells do any better than mutants lacking the gene entirely?
Looking at synonymous mutations, the team determined that these don’t affect fitness much. This has practical importance, because synonymous mutations have long been used as a “baseline” to detect signs of selection in other mutations. If they weren’t neutral, the central assumption of that approach would fall down.
Another question the study asked was whether certain positions in the protein require amino acids of a certain type. The twenty amino acids found in proteins can be loosely grouped according to their physical and chemical properties. For example, some of them are positively charged, while others carry no charge at all; some are (relatively speaking) huge and some are tiny. These properties determine how a protein folds and what its different regions can do, so one would expect that in important positions, only amino acids similar in size and chemistry could work.
To find all the amino acids that worked equally well in a given position, Hietpas et al. looked at a subset of amino acid changes: those whose fitness was very close to the wild type. Surprisingly, they found that several positions tolerated radically different amino acids without losing much fitness. Quoting from the paper,
“[t]his type of physical plasticity illustrates the degenerate relationship between physics and biology: Biology is governed by physical interactions, but biological requirements can have multiple physical solutions.”
This is kind of stating the obvious in this context, but it does echo a more general observation about life. In evolution, there is often more than one way to skin a cat.
[ASIDE: Analogous enzymes provide a striking demonstration of that. These are pairs – or even groups – of enzymes that catalyse the same reaction, without bearing any physical resemblance to one another. Their sequences are different, their 3D structures are different, and their catalytic mechanisms are different, yet they do essentially the same thing. But there are also more familiar, if less extreme, examples. For instance, within vertebrates only, we see three different solutions for powered flight and even more variations on gliding (here are some of them).]
The researchers built a “fit amino acid profile” of their test sequence using these “wild type-like” mutations, then compared it to the actual pattern of amino acid substitutions observed in “real” Hsp90 proteins. It turns out the two are quite different: eight out of the nine positions are conspicuously less variable in real life than the fitness profile would predict. The paper lists a few possible explanations. Lab environments are not natural environments, and amino acids that work fine in their very controlled environment may not be so great under harsher or less stable real-world conditions. Wild type-like fitness does not mean the substitution is completely neutral – many of them are slightly deleterious, which may come out more strongly under natural circumstances, especially over the long term. And one of the substitutions would require more than one mutation at the DNA level – with strongly deleterious intermediate steps.
That last point leads me to the part of the study I personally found most interesting. Thus far, we’ve taken the genetic code as a given, and hardly paid any attention to it at all. But, in fact, the genetic code itself is a product of evolution. Most likely, it didn’t spring into existence fully formed when organisms invented protein synthesis. There is a mind-blowingly large number of possible genetic codes – why is it that organisms use this particular one, with only minor variations? We won’t go into all of the hypotheses about that, mostly because I’m not very familiar with them. It’s enough to note that in principle, the genetic code could be accidental – it just happened to be the one some distant ancestor of all living things stumbled on –, a chemical inevitability of some sort, or it could have risen to prominence by natural selection.
[ASIDE: The options are not mutually exclusive. For example, it is possible that the only important thing about the genetic code is how easy it is to mutate from particular amino acids to certain others – in other words, that it’s the structure of the code that’s under selection, while its finer details, such as which four codons stand for glycine, may be largely coincidental or determined by chemical necessity.]
For this tiny region of the Hsp90 gene/protein, it looks very much like selection had a hand in it. Hietpas et al. used their theoretical fit amino acid profile and a sample of 1000 randomly generated genetic codes – and asked how many substitutions it would take to switch between equally fit amino acids under each genetic code. Intriguingly, very few genetic codes made it as easy as the real one. In other words, the genetic code seems geared to minimise the number of deleterious mutations.
What’s really fascinating about that result is that it came from an analysis of such a tiny sequence. Earlier, I mentioned that it might be hard to generalise anything from a short sequence. But it’s hard to believe that this particular finding doesn’t have general applicability. The genetic code sets the rules for all proteins – if it weren’t optimised in general, what’s the chance that such strong optimisation would be detected in such a tiny sample? This also suggests that roughly the same amino acids are interchangeable across the board, regardless of which protein we’re talking about. (Which is not necessarily surprising if you’ve ever spent time comparing protein sequences between species, but still, it’s valuable as a new way of looking at a familiar phenomenon).
All in all, this is the kind of paper that makes me all giddy with excitement. It digs deep into fundamental questions in evolutionary theory, and it finds some intriguing answers. It’s also a great reminder of how amazingly far technology has come – merely sequencing 27 base pairs would have been a formidable task at the dawn of molecular biology, and now we can mix 500 different versions together, sequence all of them in a single experiment, and reliably count how many of each variant there are. And that’s nowhere near the limits of current sequencing technology. This is the future, folks, and it’s better than sci-fi.
Eyre-Walker A & Keightley PD (2007) The distribution of fitness effects of new mutations. Nature Reviews Genetics 8:610-618
Hietpas RT et al. (2011) Experimental illumination of a fitness landscape. PNAS 108:7896-7901