The origin of Hox genes: a telltale neighbourhood

Gods, it’s been so hard to keep my mouth shut about this. A friend of mine just published a paper about Hox genes, and I’ve known about it for a while and it’s been keeping me crazy excited because it’s fascinating and, well: Hox genes! Now that it’s finally out, I can blather about it to my heart’s content, and so I will. Be prepared for a long ride 😉

First of all, a quick rundown of Hox genes for those who aren’t evo-devo geeks. These genes encode transcription factors – proteins that switch genes on/off. They are members of the large and distinguished class of homeobox genes, many of which play important roles in orchestrating embryonic development. Hox genes in particular are famous for laying out the plan for the head to tail axes of bilaterian animals, and for often sitting in neat clusters in the genome and being expressed along the body axis in the same order they are in the cluster. (Below: one of my favourite scientific figures ever, a fruit fly embryo stained in different colours for each of its Hox genes*. From Lemons and McGinnis [2006] via Pharyngula) In short, Hox genes are fucking awesome and extremely important to boot.

Tracing origins

One of the unresolved questions about Hox genes is exactly where they come from, and the new study draws some interesting conclusions regarding their origins. Before we delve into Mendivil Ramos et al. ( 2012) itself, perhaps it’s best to pull out my old sketch of animal phylogeny, because the relationships of the great old animal lineages are kind of important for the discussion. So this is the family tree of animals at first approximation (photos were all sourced from Wikimedia Commons; more info about them in my Nectocaris post):

Mendivil Ramos et al. follow one of the more popular resolutions of the question marks, in which cnidarians are closest to bilaterians and placozoans are the sister group to cnidarians+bilaterians. They don’t talk too much about ctenophores, but I’ll return to that later 🙂

Bilaterians all have Hox genes, and in most of them they do what they were originally discovered doing in fruit flies: patterning the anterior-posterior axis as they say in Jargonese. Some bilaterians have duplicated individual genes or even whole Hox clusters (we have four clusters, and salmon have as many as 13), but it’s pretty uncontroversial that a neat Hox cluster with representatives of most existing types of Hox genes was present already on the left side of the bilaterian box. So was the little sister of the Hox cluster, unimaginatively called the ParaHox cluster, which only contains three kinds of genes but operates in a similar way to its more famous sister (Brooke et al., 1998).

Where did Hox and ParaHox genes come from? Given the phylogeny of the genes, it’s likely that there was originally a small (maybe 2-3 genes) ProtoHox cluster that duplicated to give rise to both Hoxes and ParaHoxes. We know that cnidarians like sea anemones have both Hox and ParaHox genes, which behave somewhat like their bilaterian counterparts (Ryan et al., 2007). Therefore, the ProtoHox cluster must have existed before the common ancestor of these two great lineages.

Enter the Blob

What about placozoans? That’s where things get a bit complicated. Trichoplax, the mysterious little blob that is the only living representative of this oddball phylum, has only one Hox-like gene noncommittally named Trox-2. A relic of the ProtoHox era? Not really – in phylogenetic analyses of the protein sequence, it tends to group with the ParaHox gene Gsx, whereas you would expect a leftover ProtoHox gene to remain outside the Hox+ProtoHox clique.

Is Trox-2 a ProtoHox gene anyway? That would mean something weird happened in the evolution of Hox and ParaHox genes after the cluster duplication: Gsx (and its sisters Hox1-2) would have stagnated somewhere near its ancestral condition while all the other genes sped ahead. It’s a long shot, but evolution has been known to do strange things to gene sequences. Also, homeobox genes are often difficult to classify by sequence alone. Scientists typically use the DNA-binding region that the homeobox encodes for this purpose, but a homeodomain is only 60 amino acids and simply doesn’t contain enough information to place some problematic sequences. And unless we’re examining very closely related genes, the rest of the protein sequence is too different to be compared.

Guilt by association

However, there is another way of solving the mystery. Hox and ParaHox genes are not alone in the genome. They sit on huge chromosomes, and while they tend to banish non-*Hox genes from among them, the flanks of each cluster are populated by a variety of unrelated genes. The key thing is that Hox clusters and ParaHox clusters have different neighbours. Thus, looking at a problem gene’s neighbours can tell us what it is!

(Above: the neighbours of Trox-2. Yellow genes are ParaHox neighbours in humans, green genes are Hox neighbours, grey genes have no human counterparts, and orange genes are parts of both Hox and ParaHox neighbourhoods. From Mendivil Ramos et al. [2012])

This is exactly what happened. My lovely friend Olivia looked at the chunk of genomic sequence that contains Trox-2 and found about two dozen genes on it that had clear homologues in humans. She then tallied where each of the human homologues were, and behold: many of them crowded around ParaHox clusters (we also have several of those, courtesy of whole genome duplications), while only one was a Hox neighbour in humans. If Trox-2 were a ProtoHox, we’d expect a mixture of Hox and ParaHox neighbours, but that’s not what we find at all. Statistically speaking, it’s a no-brainer. Trox-2 is exactly where a ParaHox gene should be.

Ghosts in the genome

Now, we have a problem. If Trox-2 is a ParaHox gene, it must have come after the Hox/ParaHox duplication. So where the hell is the Hox cluster? Well, seeing as Trichoplax only has one ParaHox gene instead of the more typical three or so, gene loss certainly sounds like a possibility. Is there an “empty” Hox cluster lurking somewhere in the blob’s genome? Here, cnidarians turn out to be pretty helpful. After sequencing the genome of the sea anemone Nematostella vectensis, Putnam et al. (2007) attempted to reconstruct parts of the original chromosomes of the cnidarian-bilaterian ancestor. They called the results Putative Ancestral Linkage Groups, in other words, groups of genes that have stayed together since cnidarians and bilaterians diverged 600 or so million years ago.

One of these PALs contains over 200 conserved Hox neighbours, nearly all of which are present in Trichoplax. Strikingly, about half of them are close enough to one another that they are in the same chunk of sequence even though the Trichoplax genome hasn’t been stitched together to the level of whole chromosomes. That’s much more than you’d expect by chance. Trichoplax has a Hox locus without Hox genes, what Mendivil Ramos et al. call a ghost Hox locus.

Hox genes all the way down?

If you followed so far, you might have noticed that we’ve been pushing that elusive ProtoHox further and further back in animal evolution. It preceded bilaterians, it preceded cnidarians and bilaterians, and now it turns out it also preceded our split from placozoans. Will we find it if we look in the remaining animal lineages? Since a ctenophore genome hasn’t yet been released to the public, that question transforms into: will we find it in sponges?

The sponge Amphimedon queenslandica does have a publicly available genome, and much has been made of its apparent lack of many developmentally important transcription factor families (e.g. Larroux et al., 2008). It doesn’t have anything that looks like a Hox, ParaHox or ProtoHox gene, but what about the neighbourhoods?

Like that of Trichoplax, the Amphimedon genome sequence is in relatively small pieces, so a little clever statisticking was needed to decide whether it contains Hox, ParaHox or ProtoHox neighbourhoods. The starting points were the PAL of Hox neighbours mentioned above, and a PAL of ParaHox neighbours the team constructed using the human and Trichoplax genomes. These genes were distributed among many genomic scaffolds, but of course lacking chromosome-level information the group didn’t know whether any of these scaffolds are actually linked to each other in the sponge genome.

The solution was a simulation: take the number of genes in the PAL, take the number and size (in number of genes) of the thousands of Amphimedon scaffolds, and scatter the PAL members randomly among the scaffolds with the larger scaffolds proportionately more likely to receive a PAL gene. When all the PAL members are handed out, count the number of scaffolds with PAL members on them. Repeat this a thousand times, and you get an idea what the distribution of Hox and ParaHox neighbours would be if they weren’t clustered together. This approach showed that the real distribution is anything but random. Hox and ParaHox neighbours are clearly clustered in the sponge genome, and what’s more, they are clustered separately.

Still no ProtoHox locus, in other words. At some point in the murky depths of their ancestry, sponges lost bona fide Hox and ParaHox genes!


That raises a couple of issues. First, where is the ProtoHox? Hox-like genes have never been found outside animals. These are smart people we’re talking about, so they checked the genome of the closest non-animal relative we have today, a choanoflagellate. Neither Hox/ParaHox nor ProtoHox neighbourhoods were there – the PAL genes didn’t cluster together any more than they would by chance. The whole *Hox phenomenon seems unique to animals (or else the choanoflagellate genome is totally scrambled). It appears that somewhere in our ancestry, ProtoHox gene(s) appeared and parted ways before sponges split from the rest of the animals. Since we have no surviving descendants of these ancestors outside of sponges and the rest of the animals, we’ll probably never find unduplicated descendants of the ProtoHox cluster.

Second, what happened in ctenophores? Everything we know about their genomes suggests that they completely lack Hox-like genes. Although there have been studies that placed them even further out than sponges (Dunn et al., 2008), it’s more likely that they are much closer to bilaterians than that (Philippe et al., 2011). I think I’m not the only one itching to examine a ctenophore genome for Hox neighbours…

And finally, if some distant ancestor of all animals had full-blown Hox and ParaHox clusters, what the heck was it doing with them? Was it something unexpectedly complex that would need genes for axial patterning? Are sponges and placozoans grossly simplified descendants of a much more complex ancestor, or did Hox-like genes only become involved in dividing up body axes later in evolution?

The more we learn the less we know. One thing is (once again) clear: assuming that a simple animal is a good proxy for an ancestral animal is a dangerous, dangerous assumption to make.


*Technically, fruit flies have twelve Hox genes, but only seven are shown in the image. Hox2/proboscipedia is a normal Hox gene involved in the development of mouthparts among others, but four more genes have completely lost their “canonical” Hox gene-like activities. That includes all three of Drosophila‘s weird triplicated Hox3 genes.



Brooke NM et al. (1998) The ParaHox gene cluster is an evolutionary sister of the Hox gene cluster. Nature 392:920-922

Dunn CW et al. (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 457:745-759

Larroux C et al. (2008) Genesis and expansion of metazoan transcription factor gene classes. Molecular Biology and Evolution 25:980-996

Lemons D and McGinnis W (2006) Genomic evolution of Hox gene clusters. Science 313:1918-1922

Mendivil Ramos O et al. (2012) Ghost loci imply Hox and ParaHox existence in the last common ancestor of animals. Current Biology in press, available online 26/09/2012, doi: 10.1016/j.cub.2012.08.023

Philippe H et al. (2011) Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biology 9:e1000602

Putnam NH et al. (2007) Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317:86-94

Ryan JF et al. (2007) Pre-bilaterian origins of the Hox cluster and the Hox code: evidence from the sea anemone, Nematostella vectensis. PLoS ONE 2:e153


6 thoughts on “The origin of Hox genes: a telltale neighbourhood

  1. a100ciacierta March 5, 2014 / 16:00

    Hi there. Mnemyopsis genome is out, and apparently basal to sponges. Any thoughts on Hox/ParaHox genes in ctenophores?

    • Naraoia March 6, 2014 / 12:11

      Ctenophores don’t have them, as far as we know, and one of the publications saying so is based on the (then-unpublished) Mnemiopsis genome (Ryan et al., 2010).

      Now that Mnemiopsis is public, it would be possible to examine Hox/ParaHox neighbours. I’m not sure I expect much useful evidence to be there, however, as the genome seems quite badly scrambled according to the supplementary information from the genome paper. So we’d probably find no intact groups of Hox/ParaHox neighbours, but it wouldn’t really mean anything as there are no intact groups of anything.

      (FWIW, I’m not ready to buy the ctenophores are basal to all other animals hypothesis just yet.)

Chime in!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s