Thumbs down, what?

Bird fingers confuse me, but the explanations confuse me more, it seems.

I didn’t mean to post today, but I’ve just read a new review/hypothesis paper about the identities of the stunted little things that pass for fingers in the wings of modern birds. The review part is fine, but I’m not sure I get the difference between the hypothesis Čapek et al. (2013) are proposing and the hypothesis they are trying to replace/improve.

To recap: the basic problem with bird fingers is that fossil, genetic and developmental evidence seem to say different things about them.

1. Fossils: birds pretty clearly come from dinosaurs, and the early dinosaurs we have fossils of have five fingers on their hands with the last two being reduced. Somewhat closer to birds, you get four fingers with #4 vestigial. And the most bird-like theropods have only three fingers, which look most like digits 1, 2 and 3 of your ordinary archosaur. (Although Limusaurus messes with this scheme a bit.)

2. Embryology: in developing limb buds, digits start out as little condensations of tissue, which develop into bits of cartilage and then finger bones. Wing buds develop a short-lived condensation in front of the first digit that actually forms, and another one behind the last “surviving” digit. Taking this at face value, then, the fingers are equivalent to digits 2, 3 and 4.

3. Genetics: In five-fingered limbs, each digit has a characteristic identity in terms of the genes expressed during its formation. The first finger of birds is most like an ordinary thumb, both when you focus on individual genes like members of the HoxD cluster and when you take the entire transcriptome. However, the other two digits have ambiguous transcriptomic identities. That is, bird wings have digit 1 and two weirdos.

Add to this the fact that in other cases of digit loss, number one is normally the first to go and number four stubbornly sticks around to the end, and you can see the headache birds have caused.

So those are the basic facts. The “old” hypothesis that causes the first part of my confusion is called the frame shift hypothesis, which suggests that the ancestors of birds did indeed lose digit 1, as in the digit that came from condensation 1 – but the next three digits adopted the identities of 1-2-3 rather than 2-3-4. (This idea, IMO, can easily leave room for mixed identities – just make it a partial frame shift.)

Čapek et al.’s new one, which they call the thumbs down hypothesis, is supposedly different from this. This is how the paper states the difference:

The FSH postulates an evolutionary event in which a dissociation occurs between the developmental formation of repeated elements (digits) and their subsequent individualization.


According to the TDH no change of identity of a homeotic nature occurs, but only the phenotypic realization of the developmental process is altered due to redirected growth induced by altered tissue topology. Digit identity stays the same. Also the TDH assumes that the patterning of the limb bud, by which the digit primordia are laid down, and their developmental realization, are different developmental modules in the first place.

(Before this, they spent quite a lot of words explaining how the loss of the original thumb could trigger developmental changes that make digit 2 more thumb-like.)

I…. struggle to see the difference. If you’ve (1) moved a structure to a different position, (2) subjected it to the influence of different genes, (3) and turned its morphology into that of another structure, how exactly is that not a change in identity?

Maybe you could say that “an evolutionary event” dissociating digit formation and identity is different from formation and identity being kind of independent from the start, but I checked Wagner and Gauthier’s (1999) original frame shift paper, and I think what they propose is closer to the second idea than the first:

Building on Tabin’s (43) insight, we suggest causal independence between the morphogenetic processes that create successive condensations in the limb bud and the ensuing developmental individualization of those repeated elements as they become the functional fingers in the mature hand, thus permitting an opportunity for some degree of independent evolutionary change.

Am I missing something? I feel a little bit stupid now.



Čapek D et al. (2013) Thumbs down: a molecular-morphogenetic approach to avian digit homology. Journal of Experimental Zoology B, published online 29/10/2013, doi: 10.1002/jez.b.22545

Wagner GP and Gauthier JA (1999) 1,2,3 = 2,3,4: A solution to the problem of the homology of the digits in the avian hand. PNAS 96:5111-5116

A bunch of cool things

From the weeks during which I failed to check my RSS reader…

1. The coolest ribozyme ever. (In more than one sense.)

I’ve made no secret of my fandom of the RNA world hypothesis, according to which early life forms used RNA both as genetic material and as enzymes, before DNA took over the former role and proteins (mostly) took over the latter. RNA is truly an amazing molecule, capable of doing all kinds of stuff that we traditionally imagined as the job of proteins. However, coaxing it into carrying out the most important function of a primordial RNA genome – copying itself – has proven pretty difficult.

To my knowledge, the previous record holder in the field of RNA copying ribozymes (Wochner et al., 2011) ran out of steam after making RNA strands only half of its own length. (Which is still really impressive compared to its predecessors!) In a recent study, the same team turned to an alternative RNA world hypothesis for inspiration. According to the “icy RNA world” scenario, pockets of cold liquid in ice could have helped stabilise the otherwise pretty easily degraded RNA as well as concentrate and isolate it in a weird inorganic precursor to cells.

Using experimental evolution in an icy setting, they found a variation related to the aforementioned ribozyme that was much quicker and generally much better at copying RNA than its ancestors. Engineering a few previously known performance-enhancing mutations into this molecule finally gave a ribozyme that could copy an RNA molecule longer than itself! It still wouldn’t be able to self-replicate, since this particular guy can only copy sequences with certain properties it doesn’t have itself, but we’ve got the necessary endurance now. Only two words can properly describe how amazing that is. Holy. Shit. :-O


Attwater J et al. (2013) In-ice evolution of RNA polymerase ribozyme activity. Nature Chemistry, published online 20/10/2013, doi: 10.1038/nchem.1781

Wochner A et al. (2011) Ribozyme-catalyzed transcription of an active ribozyme. Science 332:209-212


2. Cambrian explosion: evolution on steroids.

This one’s for those people who say there is nothing special about evolution during the Cambrian – and also for those who say it was too special. (Creationists, I’m looking at you.) It is also very much for me, because Cambrian! (How did I not spot this paper before? Theoretically, it came out before I stopped checking RSS…)

Lee et al. (2013) used phylogenetic trees of living arthropods to estimate how fast they evolved at different points in their history. They looked at both morphology and genomes, because the two can behave very differently. It’s basically a molecular clock study, and I’m still not sure I trust molecular clocks, but let’s just see what it says and leave lengthy ruminations about its validity to my dark and lonely hours 🙂

They used living arthropods because, obviously, you can’t look at genome evolution in fossils, but the timing of branching events in the tree was calibrated with fossils. With several different methods, they inferred evolutionary trees telling them how much change probably happened during different periods in arthropod history. They tweaked things like the estimated time of origin of arthropods, or details of the phylogeny, but always got similar results.

On average, arthropod genomes, development and anatomy evolved several times faster during the Cambrian than at any later point in time. Including the aftermath of the biggest mass extinctions. Mind you, not faster than modern animals can evolve under strong selection – they just kept up those rates for longer, and everyone did it.

(I’m jumping up and down a little, and at the same time I feel like there must be something wrong with this study, the damned thing is too good to be true. And I’d still prefer to see evolutionary rates measured on actual fossils, but there’s no way on earth the fossil record of any animal group is going to be good enough for that sort of thing. Conflicted much?)


Lee MSY et al. (2013) Rates of phenotypic and genomic evolution during the Cambrian explosion. Current Biology 23:1889-1895


3. Chitons to sausages

Aplacophorans are probably not what you think of when someone mentions molluscs. They are worm-like and shell-less, although they do have tiny mineralised scales or spines. Although they look like one might imagine an ancestral mollusc before the invention of shells, transitional fossils and molecular phylogenies have linked them to chitons, which have a more conventional “sluggy” body plan with a wide foot suitable for crawling and an armoured back with seven shell plates.

Scherholz et al. (2013) compared the musculature of a living aplacophoran to that of a chiton and found it to support the idea that aplacophorans are simplified from a chiton-like ancestor rather than simple from the start. As adults, aplacophorans and chitons are very different – chitons have a much more complex set of muscles that includes muscles associated with their shell plates. However, the missing muscles appear to be present in baby aplacophorans, who only lose them when they metamorphose. (As a caveat, this study only focused on one group of aplacophorans, and it’s not entirely certain whether the two main groups of these creatures should even be together.)


Scherholz M et al. (2013) Aplacophoran molluscs evolved from ancestors with polyplacophoran-like features. Current Biology in press, available online 17/10/2013, doi: 10.1016/j.cub.2013.08.056


4. Does adaptation constrain mammalian spines?

Mammals are pretty rigid when it comes to the differentiation of the vertebral column. We nearly all have seven neck vertebrae, for example. This kind of conservatism is surprising when you look at other vertebrates – which include not only fairly moderate groups like birds with their variable necks, but also extremists like snakes with their lack of legs and practically body-long ribcages. Mammalian necks are evolutionarily constrained, and have been that way for a long time.

Emily Buchholz proposes an interesting explanation with links to previous hypotheses. Mammals not only differ from other vertebrates in the less variable numbers of vertebrae in various body regions; these regions are also more differentiated. For example, mammals are the only vertebrates that lack ribs in the lower back. In Buchholz’s view, this kind of increased differentiation contributes to adaptation but costs flexibility.

Her favourite example is the muscular diaphragm unique to mammals. This helps mammals breathe while they move, and also makes breathing more powerful, which is nice for active, warm-blooded creatures that use a lot of oxygen. However, it also puts constraints on further changes. Importantly, Buccholz argues that these constraints don’t all have to work in the same way.

For example, the constraint on the neck may arise because muscle cells in the diaphragm come from the same place as muscle cells associated with specific neck vertebrae. Moving the forelimbs relative to the spine, i.e. changing the number of neck vertebrae, would mess up their migration to the right place, and we’d end up with equally messed up diaphragms.

A second possible constraint has less to do with developmental mishaps and more to do with plain old functionality. If you moved the pelvis forward, you may not screw with the development of other bits, but you’d squeeze the space behind the diaphragm, which you kind of need for your guts, especially when you’re breathing in using your lovely diaphragm.


Buccholz E (2013) Crossing the frontier: a hypothesis for the origins of meristic constraint in mammalian axial patterning. Zoology in press, available online 28/10/2013, doi: 10.1016/j.zool.2013.09.001


And… I think that approximately covers today’s squee moments 🙂

New genes, new tricks, part 2

In my previous post, I marvelled over the strange and unexpected way duplicated genes behave in fruit flies. The second study I wanted to discuss is also about new fruit fly genes gaining new functions, but unlike the other one, it’s about new genes that didn’t come from pre-existing genes.

Reinhardt et al. (2013) wasn’t the best written paper I’ve read, and I had some difficulty figuring out exactly what was going on in places, but there is some interesting stuff in there nonetheless.

The authors investigated six recently evolved new ?protein-coding genes in Drosophila. They wanted to know how they came about and managed to stick. For example, did they first originate as non-coding RNA genes? Did they gain a function through their RNA copies alone before they began to encode a protein? Or did they first awaken from the no man’s land between old genes with protein-coding potential already present?

This harkens back to one of the papers about new genes that I’d previously discussed. Xie et al. (2012) found that the genes for several human-specific proteins began life (and function?) as RNA genes expressed in particular tissues in ancestral primates. What about the six fly genes the new study investigated?

Reinhardt et al.‘s illustration of the two routes to protein-coding geneness is below. Starting with an inactive stretch of DNA (black line), you need two things: (1) an “on” switch or promoter (green box), which causes the transcription of RNA (blue) from the region, and (2) a sequence that can be translated into a decent length protein (an open reading frame or ORF, pink box). These two can theoretically appear in either order.

Before we get into the meat of the paper, let’s borrow the Drosophila family tree from the 12 genomes project page:

D. melanogaster, third from the top, is the species that has been used for every variety of biological investigation for over a hundred years, and also the focus of this study. However, the other species were also used for comparison, to see exactly where and how the genes originated.

Five of the six genes had a relatively long history, with similar sequences being found in D. yakuba and erecta or even further out in D. ananassae. Three of them were not only there in those species, but could also potentially make a nice protein. In two genes, the sequence or part of it was recognisable all the way to ananassae, but it only had long sensible ORFs in melanogaster itself.

In terms of activity… well, first of all I think they screwed up Figure 2. Supposedly, the names of the species in which transcription of these genes was detected are bolded, but actually, all the names are bolded in all the trees, which doesn’t agree with what they say (or with the green dots signifying the origin of transcription in the same figure). Anyway, assuming the bolding was a mistake and the green dots are in the right place, it sounds like four of the six genes were already active in the common ancestor of melanogaster and yakuba or earlier, while another two were only turned on in the melanogaster/sechellia/simulans lineage.

The order of events varies from gene to gene: four genes had good solid ORFs right from the start, while two were transcribed before they were suitable protein templates. The authors note that we can’t actually be sure whether or not the first four developed an ORF before they became active. To be certain of that, we would need more distantly related species with a matching ORF that isn’t transcribed, but in all four cases the species lacking expression of the gene also totally lack any trace of the sequence. So, while the remaining two genes provide positive evidence for the transcription-first scenario, the jury is still out on the ORF-first option.

In D. melanogaster, the presence of the protein product was confirmed for the four genes with the oldest ORFs. The two youngest may still be translated: the protein data came only from embryos, and in fact all six genes contain short signals that are normally associated with the transport of proteins to specific parts of the cell. You might reason that a gene that never makes a protein doesn’t need such signals, but nevertheless, the authors couldn’t positively confirm the existence of these proteins without data from other life stages.

Where these genes are active brings us back to a common theme we encountered in the previous post. In adult D. melanogaster, all six are most strongly expressed in the testicles, and the products of one of them are exclusive to those organs. Likewise, male larvae show more expression of all six genes than females do. The other species show basically the same pattern.

What do these genes do? Actually, do they do anything? Being expressed, even being translated to protein, doesn’t necessarily equate to having a function. Luckily, “function” is not terribly difficult to test for in fruit flies. There are lots of clever tricks that allow you to manipulate their genes and look at the consequences. In this case, Reinhardt et al. bred flies where these genes were turned off. If I understood them correctly, they managed to do this for five genes, four of which resulted in very dead flies. Weirdly, for all four, the affected flies died at the same life stage, just before hatching from the pupa.

With a different strategy that produced only partial knock-down of the genes, they got themselves some grown-up survivors, which allowed them to test the effect of the genes on male fertility (a sensible question given where these genes are most active). Out of three knock-downs with surviving adults of both sexes, only one showed a serious effect, and that was the one that produced generally crappy, short-lived weakling males anyway, so while these genes are active in the testicles and they might disproportionately affect males, they don’t seem to have much to do with fertility per se.

In general, the results sound like new genes that come from random bits of DNA can very quickly become essential to the organism, and it also sounds very much like an overabundance of transcripts in the testicles doesn’t mean that that’s where their function lies – it’s probably more that all kinds of things are expressed in testicles, and these genes are still expressed there because that’s how they started their lives.

Something big missing from the study is actually testing when these genes became functional – we’re told when they became expressed and when they started making a protein, but without manipulating them in relevant non-melanogaster species, it’s impossible to tell whether either of those means function. *disappointed pout*

And what’s up with those four genes that were necessary for the flies’ survival? The knock-downs all did their killing at the same stage. I don’t know what to think about that, and the authors don’t really offer an explanation beyond describing control experiments to make sure the deaths weren’t an unfortunate side-effect of the manipulation itself. Is there something about the development of adults that attracts new genes? Is the process of metamorphosis especially sensitive to even minor mess-ups? (More sensitive than early embryonic development?) Intuitively, I’d find the first possibility more likely, but gods know intuition is a poor guide to reality…



Reinhardt JA et al. (2013) De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genetics 9:e1003860

Xie C et al. (2012) Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genetics 8:e1002942

New genes, new tricks

I’ve previously written about the birth of new genes. Since new genes are cool, and I just found two recent papers on them, you’re getting more of them.

Part 1: how to survive duplication

Technically, the first paper isn’t about new new genes: Assis and Bachtrog (2013) examined recently duplicated genes in fruit flies. But screw technicalities, what they’re saying makes my eyes pop.

When a gene is accidentally copied, a variety of possible fates can await it. Most of the time, the extra copy just dies. Some mechanisms of gene duplication just take the gene without the regulatory elements it needs to function properly. Even if the new copy works, it’s still redundant, so there’s nothing stopping mutations from destroying it over time. However, sometimes redundancy is removed before the new gene breaks irrevocably, and both copies are kept. This can, in theory, happen in a number of ways. Because I’m feeling lazy, let me just quote them from the paper (square brackets are mine, because I hate repeatedly typing out long ugly words :)):

Four processes can result in the evolutionary preservation of duplicate genes: conservation, neofunctionalization, subfunctionalization, and specialization. Under conservation, ancestral functions are maintained in both copies, likely because increased gene dosage is beneficial (1). Under neofunctionalization [NF], one copy retains its ancestral functions, and the other acquires a novel function (1). Under subfunctionalization [SF], mutations damage different functions of each copy, such that both copies are required to preserve all ancestral gene functions (9, 10). Finally, under specialization, subfunctionalization and neofunctionalization act in concert, producing two copies that are functionally distinct from each other and from the ancestral gene (11).

We might add a variation on NF, too: Proulx and Phillips (2006) theorised that differences in function that arise in different alleles (variants) of a single gene can turn duplication into an advantage, turning the conventional duplication-first, new function-next scenario on its head.

Either way, genomes contain lots of duplicated genes, there’s no question about that. What isn’t nearly as well understood is the relative importance of various mechanisms in producing all these duplicates. It’s much easier to theorise about mechanisms than to test the theories. Since evolution doesn’t stop once a new gene has earned its place in the genome, it can be hard to disentangle the mechanism(s) responsible for its preservation from the stuff that happened to it later. Also, to really assess the relative role of different mechanisms, you’ve got to look at whole genomes.

(Assis and Bachtrog say that this hasn’t been done before, and then go right on to cite He and Zhang [2005], which is a genome-wide study of SF and NF. I guess it doesn’t look at all the mechanisms…)

Assis and Bachtrog used the amazing resource that is the 12 Drosophila genomes project, focusing on D. melanogaster and D. pseudoobscura to find slightly under 300 pairs of genes that duplicated after the divergence of those two species. Since Drosophila genomes are very well-studied, they were able to identify the “parent” and “child” in each pair based on where they sit on their chromosomes. They then also extracted thousands of unduplicated genes from the melanogaster and pseudoobscura genomes, to use as a measure of background divergence between the two species.

To measure changes in gene function, they compared the expression of parent and child genes to each other and to the “ancestral” copy (i.e. the unduplicated gene in the other species) in different parts of the body (if a gene is suddenly turned on somewhere it wasn’t before, it’s probably doing something new!).

Long story short, it turned out that in the majority of cases (167/281) cases the child copy behaved much more differently from the “ancestor” than expected, while the parent copy stayed pretty close. These child copies also showed faster sequence evolution than their parents. This means that NF – and specifically that of the new copy – is the most common fate of newly duplicated genes in these animals. There’s also a fair number of gene pairs where both copies gained new functions or both stuck with the old ones, but only three where both copies lost functions. Pure SF, which very influential studies like Force et al. (1999) championed as the dominant mode of duplicate gene survival, appears to be an incredibly rare occurrence in fruit flies!

A few paragraphs ago I mentioned the caveat that duplicated genes don’t stop evolving just because they’ve managed to survive. Well, the advantage of having all these Drosophila genomes is that you can further break down “young” duplicates into narrower age groups, using the species that fall between melanogaster and pseudoobscura on the tree. However, looking at this breakdown doesn’t change the general pattern – NF of the child copy is the most common and SF is rare or nonexistent in even the youngest age groups, along both the melanogaster and the pseudoobscura lineages.

So what exactly is going on here?

Part of the difference in expression patterns between parent/ancestral and child copies is because these new genes are turned on in the testicles, which might give us a big clue. Testicles, you see, are a bit anarchical. Things that are normally kept silent in the genome, like various kinds of parasitic DNA, wake up and run wild during the making of sperm. If you remember my throwaway reference to duplication mechanisms that cut the gene off from its old regulatory elements – well, the balls are a place where even such lost and lonely genes get a second chance.

The genomic anarchy of testes is also one of the reasons these duplications happen in the first place; the aforementioned mechanism involves those bits of parasitic DNA that copy and paste themselves via an RNA intermediate. The enzymes they use to reverse transcribe this RNA into DNA and insert it back into the genome aren’t particularly discerning, and they’ll happily do their thing on a piece of RNA that isn’t the parasite. Indeed, slightly more NFed child genes than you’d expect originated via RNA, although it’s worth noting that more than half of them still didn’t. So while the testes look like a good place for new gene copies to find a use, they aren’t totally responsible for their origins.

Why is there so little SF among these genes?

This is the Obvious Question; my jaw nearly landed on my desk when I saw the numbers. The authors have two hypotheses, both of which may be true at the same time.

First, SF assumes that the two copies have the same functions to begin with. This is not necessarily true when just a small segment of DNA is duplicated – even when it’s not just a bare gene you’re copying, the new copy might lose part of its old regulatory elements and/or land next to new ones, not to mention Proulx and Phillips’s idea of new functions appearing before duplication. So maybe SF is more common after wholesale duplications of entire genomes, and Drosophila species didn’t have any of those recently.

Secondly, SF happens by genetic drift, which is a random process that works much better in small populations. Fruit flies aren’t known for their small populations, and therefore the dominant evolutionary force acting on their genomes will be selection.

This makes sense to me, but the degree to which NF dominates the picture is still pretty amazing. I wonder what you’d get if you applied the same methods to different species. Would species with smaller populations, or those that recently duplicated their whole genomes, show more evidence for SF as you’d expect if the above reasoning is correct? Or would the data slaughter all those seemingly reasonable explanations? What would you see in parthenogenetic species that have no males (and testicles)?

Part two, with really new genes, hopefully coming soon…



Assis R & Bachtrog D (2013) Neofunctionalization of young genes in Drosophila. PNAS 110:17409-17414

He X & Zhang J (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169:1157-1164

Force A et al. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545

Proulx SR & Phillips PC (2006) Allelic divergence precedes and promotes gene duplication. Evolution 60:881-892

Lamprey Hox clusters and genome duplications, oh my!

What the hell is up with lamprey Hox clusters?

Lampreys are among the few living jawless vertebrates, creatures that parted evolutionary ways with our ancestors somewhere on the order of 500 million years ago. If you want to know where things like jaws, paired fins or our badass adaptive immune systems came from, a vertebrate that doesn’t possess some of these things and may have diverged from the rest of the vertebrates soon after others originated is just what you need for comparison.

The vertebrate fossil record is pretty rich thanks to us having hard tissues, so a lot can be inferred about these things from the wealth of extinct fishes we have at our disposal. However, there are times when comparisons of living creatures are just as useful, if not more, than examinations of fossils. (Fossils, for example, tend not to have immune systems. ;))

One of the things you absolutely need a living animal to study is, of course, genome evolution. Vertebrates – well, at least jawed vertebrates – are now generally accepted to have the remnants of four genomes. Our long-gone ancestors underwent two rounds of whole genome duplication. Afterwards, most of the extra genes were lost, but evidence for the duplications can still be found in the structure of our genomes, where entire recognisable gene neighbourhoods of our close invertebrate relatives often still exist in up to four copies (Putnam et al., 2008).

Among these neighbourhoods are the four clusters of Hox genes most groups of jawed vertebrates possess. A “normal” animal like a snail or a centipede only has one of these. Since Hox genes are involved in the making of body plans, you have to wonder how suddenly having four sets of them and other developmental “master genes” might have influenced the evolution of vertebrate bodies.

Of course, to guess that, you need to know precisely when these duplications happened. That’s where lampreys come in: their lineage branched off from our definitely quadruple-genomed one after the next closest, definitely single-genomed group. But was it before, between, or after, the two rounds of duplication?

A few years ago, a phylogenetic analysis of 55 gene families by Kuraku et al. (2009) suggested that the lamprey-jawed vertebrate split happened after the 2R. Just this year, the genome of the sea lamprey Petromyzon marinus was finally published (Smith et al., 2013), and its authors agreed that yes, lampreys probably split off from us post-2R. (I don’t entirely get all the things they did to arrive at this conclusion. Groups of linked genes show up again, among other approaches.)

However, that isn’t the whole story, the latest lamprey genomics paper argues (Mehta et al., 2013). The P. marinus genome assembly couldn’t stitch all the Hox clusters properly together. There were two that sat on nice big scaffolds with the whole row of Hox genes and a few of their neighbours, and then there were a bunch of “loose” Hox genes that they couldn’t link to anything (diagram comparing humans and P. marinus below from Smith et al., 2013; the really pale blue boxes under the numbers represent Hox genes):


Given that Hox9 genes exist in four copies in this species, it seems like there may be four clusters. However, in hagfish, the other kind of living jawless vertebrate, a study found Hox genes that seemed to have as many as seven copies (Stadler et al., 2004). Another round of duplication? It wouldn’t be unheard of. Most teleosts, which include most of the things we call “fish” in everyday parlance, have seven Hox clusters courtesy of an extra genome duplication and loss of one cluster*. Salmon and kin have thirteen, after yet another duplication. Maybe hagfish also had another one – but did lampreys? How many more clusters do those lonely Hox genes belong to?

Mehta et al. hunted down the Hox clusters of Japanese lampreys (Lethenteron japonicum), hoping to pin down exactly how many there were. They used large chunks of DNA derived partly from the testicles, where sperm cells and their precursors keep the full genome throughout the animal’s life (lampreys throw away large chunks of the genome in most non-reproductive cells [Smith et al., 2009]). They probed these for Hox genes and sequenced the ones that tested positive. Plus they also got about two-thirds of the full genome together in fairly big pieces. Together, these data allowed them to get a better idea of the mess that is lamprey Hox cluster genomics.

They assembled four whole clusters, including their neighbouring genes, and a partial fifth cluster. A bunch of other genes sat on smaller sequence fragments containing only a couple of Hoxes, or a Hox and a non-Hox, but they were tentatively assigned to a total of eight clusters, eight being the number of different Hox4 genes in the data (no known vertebrate Hox cluster contains more than one Hox4 gene). The L. japonicum equivalents of the 31 publicly available Hox sequences from P. marinus spread out over six of these, which indicates that both species have at least six clusters. Seems like lampreys had another round of genome duplication after 2R? (Summary of L. japonicum Hox clusters from Mehta et al. below.)

But wait, that’s not the end of it.

First of all, although there are undoubtedly four complete Hox clusters in there L. japonicum, the relationships of these clusters to our four are terribly confused. Whether you look at the phylogenetic trees of individual genes, or the arrangement of non-Hox genes on either side of the cluster, only a big pile of what the fuck emerges. Phylogenies are problematic because the unusual composition of lamprey genes and proteins (Smith et al., 2013) could easily throw them off. All the complete lamprey clusters have a patchwork of neighbours that look like a mashup of more than one of our Hox clusters. Might it mean that lampreys’ proliferation of Hox clusters occurred independently of ours? Did we split before 2R after all?

Hox genes are not the only interesting things in a Hox cluster. In the long gaps between them, there are all sorts of little DNA switches that regulate their behaviour. Some of these are conserved across the jawed vertebrates. Mehta et al. aligned complete Hox clusters of humans, elephant sharks and lampreys to look for such sequences – called conserved non-coding elements or CNEs – in the lamprey.

They only found a few, but that’s enough for a bit more head-scratching. Most CNEs in, say, the human HoxA cluster are only found in one elephant shark cluster, and vice versa. Humans have a HoxA cluster, elephant sharks have a HoxA cluster, they’re clearly the same thing, pretty straightforward. Not so for lampreys. Homologues of individual CNEs in the complete lamprey clusters are spread out over all four human/elephant shark clusters. More evidence for independent duplications?

Mehta et al. are cautious – they point out that the silly mix of Hox cluster neighbours in lampreys could just be due to independent post-2R losses, which is plausible if the split between lamprey and jawed vertebrate lineages happened not too long after 2R. There’s also the fact that the weird lamprey sequences are phylogenetic minefields – however, that’s a double-edged sword, since the same caveat applies to analyses that support a post-2R divergence. Then, perhaps the same argument that goes for Hox cluster neighbours could also apply to CNEs. And, of course, this is just Hox clusters. Smith et al.‘s (2013) findings about overall genome structure don’t go away just because lamprey Hox clusters are weird.

So, in summary, thanks, lampreys. Fat lot of help you are! 😛


*Actually, two losses of two separate clusters in two different teleost lineages. Because Hox evolution wasn’t already complicated enough.



Kuraku S et al. (2009) Timing of genome duplications relative to the origin of the vertebrates: did cyclostomes diverge before or after? Molecular Biology and Evolution 26:47-59

Mehta TK et al. (2013) Evidence for at least six Hox clusters in the Japanese lamprey (Lethenteron japonicum). PNAS 110:16044-16049

Putnam NH et al. (2008) The amphioxus genome and the evolution of the chordate karyotype. Nature 453:1064-1071

Smith JJ et al. (2009) Programmed loss of millions of base pairs from a vertebrate genome. PNAS 106:11212-11217

Smith JJ et al. (2013) Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nature Genetics 45:415-421

Stadler PF et al. (2004) Evidence for independent Hox gene duplications in the hagfish lineage: a PCR-based gene inventory of Eptatretus stoutii. Molecular Phylogenetics and Evolution 32:686-694