New genes, new tricks

I’ve previously written about the birth of new genes. Since new genes are cool, and I just found two recent papers on them, you’re getting more of them.

Part 1: how to survive duplication

Technically, the first paper isn’t about new new genes: Assis and Bachtrog (2013) examined recently duplicated genes in fruit flies. But screw technicalities, what they’re saying makes my eyes pop.

When a gene is accidentally copied, a variety of possible fates can await it. Most of the time, the extra copy just dies. Some mechanisms of gene duplication just take the gene without the regulatory elements it needs to function properly. Even if the new copy works, it’s still redundant, so there’s nothing stopping mutations from destroying it over time. However, sometimes redundancy is removed before the new gene breaks irrevocably, and both copies are kept. This can, in theory, happen in a number of ways. Because I’m feeling lazy, let me just quote them from the paper (square brackets are mine, because I hate repeatedly typing out long ugly words :)):

Four processes can result in the evolutionary preservation of duplicate genes: conservation, neofunctionalization, subfunctionalization, and specialization. Under conservation, ancestral functions are maintained in both copies, likely because increased gene dosage is beneficial (1). Under neofunctionalization [NF], one copy retains its ancestral functions, and the other acquires a novel function (1). Under subfunctionalization [SF], mutations damage different functions of each copy, such that both copies are required to preserve all ancestral gene functions (9, 10). Finally, under specialization, subfunctionalization and neofunctionalization act in concert, producing two copies that are functionally distinct from each other and from the ancestral gene (11).

We might add a variation on NF, too: Proulx and Phillips (2006) theorised that differences in function that arise in different alleles (variants) of a single gene can turn duplication into an advantage, turning the conventional duplication-first, new function-next scenario on its head.

Either way, genomes contain lots of duplicated genes, there’s no question about that. What isn’t nearly as well understood is the relative importance of various mechanisms in producing all these duplicates. It’s much easier to theorise about mechanisms than to test the theories. Since evolution doesn’t stop once a new gene has earned its place in the genome, it can be hard to disentangle the mechanism(s) responsible for its preservation from the stuff that happened to it later. Also, to really assess the relative role of different mechanisms, you’ve got to look at whole genomes.

(Assis and Bachtrog say that this hasn’t been done before, and then go right on to cite He and Zhang [2005], which is a genome-wide study of SF and NF. I guess it doesn’t look at all the mechanisms…)

Assis and Bachtrog used the amazing resource that is the 12 Drosophila genomes project, focusing on D. melanogaster and D. pseudoobscura to find slightly under 300 pairs of genes that duplicated after the divergence of those two species. Since Drosophila genomes are very well-studied, they were able to identify the “parent” and “child” in each pair based on where they sit on their chromosomes. They then also extracted thousands of unduplicated genes from the melanogaster and pseudoobscura genomes, to use as a measure of background divergence between the two species.

To measure changes in gene function, they compared the expression of parent and child genes to each other and to the “ancestral” copy (i.e. the unduplicated gene in the other species) in different parts of the body (if a gene is suddenly turned on somewhere it wasn’t before, it’s probably doing something new!).

Long story short, it turned out that in the majority of cases (167/281) cases the child copy behaved much more differently from the “ancestor” than expected, while the parent copy stayed pretty close. These child copies also showed faster sequence evolution than their parents. This means that NF – and specifically that of the new copy – is the most common fate of newly duplicated genes in these animals. There’s also a fair number of gene pairs where both copies gained new functions or both stuck with the old ones, but only three where both copies lost functions. Pure SF, which very influential studies like Force et al. (1999) championed as the dominant mode of duplicate gene survival, appears to be an incredibly rare occurrence in fruit flies!

A few paragraphs ago I mentioned the caveat that duplicated genes don’t stop evolving just because they’ve managed to survive. Well, the advantage of having all these Drosophila genomes is that you can further break down “young” duplicates into narrower age groups, using the species that fall between melanogaster and pseudoobscura on the tree. However, looking at this breakdown doesn’t change the general pattern – NF of the child copy is the most common and SF is rare or nonexistent in even the youngest age groups, along both the melanogaster and the pseudoobscura lineages.

So what exactly is going on here?

Part of the difference in expression patterns between parent/ancestral and child copies is because these new genes are turned on in the testicles, which might give us a big clue. Testicles, you see, are a bit anarchical. Things that are normally kept silent in the genome, like various kinds of parasitic DNA, wake up and run wild during the making of sperm. If you remember my throwaway reference to duplication mechanisms that cut the gene off from its old regulatory elements – well, the balls are a place where even such lost and lonely genes get a second chance.

The genomic anarchy of testes is also one of the reasons these duplications happen in the first place; the aforementioned mechanism involves those bits of parasitic DNA that copy and paste themselves via an RNA intermediate. The enzymes they use to reverse transcribe this RNA into DNA and insert it back into the genome aren’t particularly discerning, and they’ll happily do their thing on a piece of RNA that isn’t the parasite. Indeed, slightly more NFed child genes than you’d expect originated via RNA, although it’s worth noting that more than half of them still didn’t. So while the testes look like a good place for new gene copies to find a use, they aren’t totally responsible for their origins.

Why is there so little SF among these genes?

This is the Obvious Question; my jaw nearly landed on my desk when I saw the numbers. The authors have two hypotheses, both of which may be true at the same time.

First, SF assumes that the two copies have the same functions to begin with. This is not necessarily true when just a small segment of DNA is duplicated – even when it’s not just a bare gene you’re copying, the new copy might lose part of its old regulatory elements and/or land next to new ones, not to mention Proulx and Phillips’s idea of new functions appearing before duplication. So maybe SF is more common after wholesale duplications of entire genomes, and Drosophila species didn’t have any of those recently.

Secondly, SF happens by genetic drift, which is a random process that works much better in small populations. Fruit flies aren’t known for their small populations, and therefore the dominant evolutionary force acting on their genomes will be selection.

This makes sense to me, but the degree to which NF dominates the picture is still pretty amazing. I wonder what you’d get if you applied the same methods to different species. Would species with smaller populations, or those that recently duplicated their whole genomes, show more evidence for SF as you’d expect if the above reasoning is correct? Or would the data slaughter all those seemingly reasonable explanations? What would you see in parthenogenetic species that have no males (and testicles)?

Part two, with really new genes, hopefully coming soon…

***

References:

Assis R & Bachtrog D (2013) Neofunctionalization of young genes in Drosophila. PNAS 110:17409-17414

He X & Zhang J (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169:1157-1164

Force A et al. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545

Proulx SR & Phillips PC (2006) Allelic divergence precedes and promotes gene duplication. Evolution 60:881-892

Lamprey Hox clusters and genome duplications, oh my!

What the hell is up with lamprey Hox clusters?

Lampreys are among the few living jawless vertebrates, creatures that parted evolutionary ways with our ancestors somewhere on the order of 500 million years ago. If you want to know where things like jaws, paired fins or our badass adaptive immune systems came from, a vertebrate that doesn’t possess some of these things and may have diverged from the rest of the vertebrates soon after others originated is just what you need for comparison.

The vertebrate fossil record is pretty rich thanks to us having hard tissues, so a lot can be inferred about these things from the wealth of extinct fishes we have at our disposal. However, there are times when comparisons of living creatures are just as useful, if not more, than examinations of fossils. (Fossils, for example, tend not to have immune systems. ;))

One of the things you absolutely need a living animal to study is, of course, genome evolution. Vertebrates – well, at least jawed vertebrates – are now generally accepted to have the remnants of four genomes. Our long-gone ancestors underwent two rounds of whole genome duplication. Afterwards, most of the extra genes were lost, but evidence for the duplications can still be found in the structure of our genomes, where entire recognisable gene neighbourhoods of our close invertebrate relatives often still exist in up to four copies (Putnam et al., 2008).

Among these neighbourhoods are the four clusters of Hox genes most groups of jawed vertebrates possess. A “normal” animal like a snail or a centipede only has one of these. Since Hox genes are involved in the making of body plans, you have to wonder how suddenly having four sets of them and other developmental “master genes” might have influenced the evolution of vertebrate bodies.

Of course, to guess that, you need to know precisely when these duplications happened. That’s where lampreys come in: their lineage branched off from our definitely quadruple-genomed one after the next closest, definitely single-genomed group. But was it before, between, or after, the two rounds of duplication?

A few years ago, a phylogenetic analysis of 55 gene families by Kuraku et al. (2009) suggested that the lamprey-jawed vertebrate split happened after the 2R. Just this year, the genome of the sea lamprey Petromyzon marinus was finally published (Smith et al., 2013), and its authors agreed that yes, lampreys probably split off from us post-2R. (I don’t entirely get all the things they did to arrive at this conclusion. Groups of linked genes show up again, among other approaches.)

However, that isn’t the whole story, the latest lamprey genomics paper argues (Mehta et al., 2013). The P. marinus genome assembly couldn’t stitch all the Hox clusters properly together. There were two that sat on nice big scaffolds with the whole row of Hox genes and a few of their neighbours, and then there were a bunch of “loose” Hox genes that they couldn’t link to anything (diagram comparing humans and P. marinus below from Smith et al., 2013; the really pale blue boxes under the numbers represent Hox genes):

Smith_etal2013-F4

Given that Hox9 genes exist in four copies in this species, it seems like there may be four clusters. However, in hagfish, the other kind of living jawless vertebrate, a study found Hox genes that seemed to have as many as seven copies (Stadler et al., 2004). Another round of duplication? It wouldn’t be unheard of. Most teleosts, which include most of the things we call “fish” in everyday parlance, have seven Hox clusters courtesy of an extra genome duplication and loss of one cluster*. Salmon and kin have thirteen, after yet another duplication. Maybe hagfish also had another one – but did lampreys? How many more clusters do those lonely Hox genes belong to?

Mehta et al. hunted down the Hox clusters of Japanese lampreys (Lethenteron japonicum), hoping to pin down exactly how many there were. They used large chunks of DNA derived partly from the testicles, where sperm cells and their precursors keep the full genome throughout the animal’s life (lampreys throw away large chunks of the genome in most non-reproductive cells [Smith et al., 2009]). They probed these for Hox genes and sequenced the ones that tested positive. Plus they also got about two-thirds of the full genome together in fairly big pieces. Together, these data allowed them to get a better idea of the mess that is lamprey Hox cluster genomics.

They assembled four whole clusters, including their neighbouring genes, and a partial fifth cluster. A bunch of other genes sat on smaller sequence fragments containing only a couple of Hoxes, or a Hox and a non-Hox, but they were tentatively assigned to a total of eight clusters, eight being the number of different Hox4 genes in the data (no known vertebrate Hox cluster contains more than one Hox4 gene). The L. japonicum equivalents of the 31 publicly available Hox sequences from P. marinus spread out over six of these, which indicates that both species have at least six clusters. Seems like lampreys had another round of genome duplication after 2R? (Summary of L. japonicum Hox clusters from Mehta et al. below.)

But wait, that’s not the end of it.

First of all, although there are undoubtedly four complete Hox clusters in there L. japonicum, the relationships of these clusters to our four are terribly confused. Whether you look at the phylogenetic trees of individual genes, or the arrangement of non-Hox genes on either side of the cluster, only a big pile of what the fuck emerges. Phylogenies are problematic because the unusual composition of lamprey genes and proteins (Smith et al., 2013) could easily throw them off. All the complete lamprey clusters have a patchwork of neighbours that look like a mashup of more than one of our Hox clusters. Might it mean that lampreys’ proliferation of Hox clusters occurred independently of ours? Did we split before 2R after all?

Hox genes are not the only interesting things in a Hox cluster. In the long gaps between them, there are all sorts of little DNA switches that regulate their behaviour. Some of these are conserved across the jawed vertebrates. Mehta et al. aligned complete Hox clusters of humans, elephant sharks and lampreys to look for such sequences – called conserved non-coding elements or CNEs – in the lamprey.

They only found a few, but that’s enough for a bit more head-scratching. Most CNEs in, say, the human HoxA cluster are only found in one elephant shark cluster, and vice versa. Humans have a HoxA cluster, elephant sharks have a HoxA cluster, they’re clearly the same thing, pretty straightforward. Not so for lampreys. Homologues of individual CNEs in the complete lamprey clusters are spread out over all four human/elephant shark clusters. More evidence for independent duplications?

Mehta et al. are cautious – they point out that the silly mix of Hox cluster neighbours in lampreys could just be due to independent post-2R losses, which is plausible if the split between lamprey and jawed vertebrate lineages happened not too long after 2R. There’s also the fact that the weird lamprey sequences are phylogenetic minefields – however, that’s a double-edged sword, since the same caveat applies to analyses that support a post-2R divergence. Then, perhaps the same argument that goes for Hox cluster neighbours could also apply to CNEs. And, of course, this is just Hox clusters. Smith et al.‘s (2013) findings about overall genome structure don’t go away just because lamprey Hox clusters are weird.

So, in summary, thanks, lampreys. Fat lot of help you are! 😛

***

*Actually, two losses of two separate clusters in two different teleost lineages. Because Hox evolution wasn’t already complicated enough.

***

References

Kuraku S et al. (2009) Timing of genome duplications relative to the origin of the vertebrates: did cyclostomes diverge before or after? Molecular Biology and Evolution 26:47-59

Mehta TK et al. (2013) Evidence for at least six Hox clusters in the Japanese lamprey (Lethenteron japonicum). PNAS 110:16044-16049

Putnam NH et al. (2008) The amphioxus genome and the evolution of the chordate karyotype. Nature 453:1064-1071

Smith JJ et al. (2009) Programmed loss of millions of base pairs from a vertebrate genome. PNAS 106:11212-11217

Smith JJ et al. (2013) Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nature Genetics 45:415-421

Stadler PF et al. (2004) Evidence for independent Hox gene duplications in the hagfish lineage: a PCR-based gene inventory of Eptatretus stoutii. Molecular Phylogenetics and Evolution 32:686-694