The Mammal’s very own Hox genes (excite! Woo!)

It’s kind of hard to begin this post. First of all, let’s get the important news out of the way: I’ve just published a paper. In a moment, I’ll get around to discussing it at even more than my usual length, but I feel that I can’t do my excited puppy act without at least trying to capture how bloody much this paper means to me. The following may get a little personal; if you want to jump straight to the Cool Stuff, feel free to scroll a couple of paragraphs down.

<personal bit>

As you may have guessed from the long silence here, it’s not been a good handful of years, Real Life and mental health-wise. After my PhD, the prospect of the research career I’d dreamed of since I first began to grasp the meaning of the word “scientist” no longer seemed so dreamlike. It may surprise you to hear this from someone who finished a PhD with four published papers and spent the years of said PhD blathering regularly on the internet, but I find writing things for other people to read very, very stressful. In the case of a job application or a thesis chapter, that becomes “I’m not eating or sleeping properly” stressful. (Don’t ask me how I survived 20+ years of formal education.)

Long story short, for the last 3 years I’ve been getting by with a minimum wage job for which I’m both vastly overqualified and singularly ill-suited. I started the research project that culminated in the paper you can now read (for free, yay!) in BMC Evolutionary Biology (Szabó and Ferrier, 2018) while unemployed and broke, and I did most of it in my free time around work. This paper is a hard-won victory over myself and my circumstances. It’s a tiny glint of self-worth in the depth of the tunnel. In some ways, it was harder than my thesis: no funding body to satisfy, no lab mates to gripe at, no deadlines to spur me on. The only constant was my ex-supervisor turned co-author, who took my hobby project under his wings for the slim reward of having his name on a paper and nudged me into finishing it with unending patience. Here’s to Dave Ferrier, champion of non-model organisms, homeobox guy extraordinaire and all-round excellent human being. Dave, I hope you know you’re an absolute star.

</personal bit>

With that out of the way, it’s time for the Cool Stuff. There are Hox genes! More Hox genes than anyone ever imagined! (That is kind of the point, in fact!)

Apologies for the word count. I thought it would be a good idea to explain a few things, but also, I think I enjoy waffling about my baby far too much 😊

Hox therapy

The story of my Hox paper begins with an unemployed biologist with an overabundance of free time and a desperate need to do something scientific. Since I have a slightly odd idea of “fun”, back in 2015 I decided to catalogue Hox gene (or rather, protein) diversity in the animal kingdom, with particular focus on obscure and poorly studied groups. (I didn’t get very far, as we’ll see.)

Since it’s hard to discuss the paper without dropping some arcane zoological nomenclature, here’s my trusty old animal phylogeny to (re)acquaint us with the general outlines of the animal kingdom (I might need to update this in light of the Great Ctenophore Controversy some day, but we’re not dealing with anything outside the Bilateria today):


For the purposes of my paper, we’re zooming into the deuterostome branch, which looks something like this on the inside (borrowing my own rather lacklustre last-minute figure from Szabó and Ferrier [2018]):


Everything on this tree apart from chordates (that’s us) belongs to a group called Ambulacraria, which contains two phyla, hemichordates (top two branches) and echinoderms (the next five). Echinoderms are the more familiar of the two – starfish and sea urchins and suchlike – and also the focus of my project. (I could find no Hox gene data from pterobranchs, which puts a slight caveat on everything I say about hemichordates)

Back to Hox genes.

Hox genes were kind of my gateway drug into evolutionary developmental biology. A few decades earlier, they had served the same purpose for developmental biology as a whole, since they were among the first genes to be discovered that (1) directed embryonic development (2) were comparable between very disparate animal groups. The short version, which will suffice for our purposes here, is that Hox genes are important in what we eggheads call anteroposterior patterning, or determining what body parts go where along the head (anterior) to tail (posterior) axis of a (bilaterian) animal.

In (I think, I haven’t counted) the majority of animals that have them, Hox genes are clustered to a greater or lesser extent. Rather than being scattered haphazardly across the genome, they sit close to one another along the same stretch of DNA. (Duboule [2007] is an excellent – albeit now slightly out of date – review of the various known configurations.)

Since my study is about echinoderms, the schematic Hox cluster shown below is the neatest known example from an echinoderm, the crown-of-thorns starfish Acanthaster planci (source: Baughman et al., 2014):


In this image, Hox genes are colour-coded according to a commonly used classification scheme. This classification is mostly based on the homeodomain, or the “business” end of the protein that a Hox gene encodes. A homeodomain makes up a relatively small portion (maybe 1/5th on average) of a typical Hox protein, but it’s the part that interacts with the DNA switches through which Hoxes control their target genes, and it’s often the only part that is similar enough to be compared between different Hox types.

The important genes for us today are the “posterior” Hox genes shown in pink and red above, especially the last two. The four posterior Hox genes seen here represent the “standard” set for ambulacrarians, although it’s uncertain whether Hox11/13b-c were already separate genes or just a single precursor gene in the ambulacrarian ancestor.

Eureka… or WTF?

“The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, ‘hmm… that’s funny…”Almost certainly not Isaac Asimov

In creating my grand catalogue, I’d quickly breezed through vertebrates (which are all essentially the same for my purposes) and other chordates (for which the data I could find were rather limited). I thought echinoderms would be an easy job, too: there were good in-depth studies of a few species, and they hadn’t revealed anything terribly unusual other than a rearrangement of the Hox cluster in sea urchins (Cameron et al., 2006).

In fact, through comparison with their sister group, the hemichordates (Freeman et al., 2012), it seemed likely that the ancestral echinoderm had a nice, ordered Hox cluster with few if any oddities (Baughman et al., 2014). So I clicked my way to the wonderful Echinobase, which has searchable draft genomes from four of the five living classes of echinoderms (crinoids, a.k.a sea lilies and feather stars, are missing, although a genome in a very early, fragmentary stage exists here). I expected to double-check the published data, collect the same genes from the groups for which Hox papers hadn’t been published, and be off to protostomes in a day or two. Two years later, I still haven’t made it to protostomes, but I’ve gone rather deeper than expected in echinoderms…

(Below: my cast. The main characters are Strongylocentrotus purpuratus [photo: Kirt L. Onthank] and Lytechinus variegatus [photo: Hans Hillewaert] representing sea urchins, Patiria miniata [photo: Jerry Kirkhart] and Acanthaster planci [photo: JSLUCAS75] for sea stars, Parastichopus parvimensis [from here] and Apostichopus japonicus [photo: OpenCage] for sea cucumbers, Metacrinus rotundus [photo: OpenCage] and Anneissia japonica [photo: OpenCage] for crinoids, Ophiothrix spiculata [photo: Jerry Kirkhart] for brittle stars, with supporting acts from Peronella japonica [sea urchins, photo: Endo et al., 2018], Ophiopsila aranea [brittle stars, photo: Bernard Picton], Balanoglossus simodensis [photo: Misaki Marine Biological Station, U of Tokyo], Saccoglossus kowalevskii [photo: Lowe lab] and Ptychodera flava [photo: Moorea BioCode via CalPhotos] for hemichordates, and Branchiostoma floridae [photo via JGI genome portal], Latimeria menadoensis [photo: Claudio Martino] and Callorhinchus milii [photo: fir0002/Flagstaffotos] for chordates. I sourced the photos through Wikipedia/Wikimedia Commons where I could; other sources are linked where applicable.)


You see, I didn’t want to stop at just homeodomains. Homeodomains are cool and important and all, but one thing I’d learned from my earlier forays into the world of Hox genes was that valuable information hid in small patches of conserved sequence elsewhere in their proteins. Besides, I am a pathological perfectionist. I felt a terrible need to collect complete Hox sequences wherever possible.

I already mentioned that sequence similarity between Hoxes outside the homeodomain can be weak to non-existent. I ran into this problem with Echinobase’s brittle star, Ophiothrix spiculata. Using the known sea urchin Hoxes to search its genome, I’d found believable matches for many of them, but the 11/13s defeated me. I had two homeodomains that I thought represented 11/13b and c, but I couldn’t for the life of me recover the rest of the proteins.

The problem with genome databases (or their great advantage depending on your perspective) is that they contain all of the DNA that could be sequenced from the owner of the genome. The problem with Hox genes – most of our genes, in fact – is that they aren’t continuous stretches of DNA. Your typical gene exists in multiple segments (exons) separated by a whole lot of DNA that leaves no trace in the protein product of the gene. (Hox genes normally have two or three exons, the first of which is devoid of homeodomain parts.)

When a gene is expressed, the cell first makes an RNA copy of all that, which is edited to throw out the introns and splice the exons together. That intron-less RNA copy is then carried off to be translated into a protein. Transcriptomes are derived from the RNA copies of active genes. Introns lie forgotten on the cutting room floor: in the sequenced transcripts, one exon continues straight into the next. Therefore, if I could find a brittle star transcriptome, and the 11/13b-c homeodomains in it, perhaps there would be enough of the rest in there to reconstruct those elusive first exons.

Luckily, Delroisse et al. (2016) had published exactly what I needed. In one of their transcriptomes, I found a homeodomain that looked like my Ophiothrix Hox11/13c, as part of a near-complete sequence. Excited, I did the reciprocal search against the Ophiothrix genome…

… and hit neither 11/13b nor 11/13c.

So here I am, staring at a beautiful match between this transcript and a part of the Ophiothrix genome that I hadn’t examined before. The match contains sequence from the first exon, which, given my previous experience with these buggers, is a sure sign that they’re the same gene. And it’s neither of the ones I’d expected.

A bit later in a different database, I hit upon an automatically predicted sea urchin protein that definitely isn’t 11/13b or c either. This is the model sea urchin, S. purpuratus, the one I thought we knew inside out when it came to Hoxes. I check the genome on Echinobase, and lo and behold, there’s the third 11/13b-c type gene, and it’s nowhere near the Hox cluster.

If memory serves, it’s roughly at this point that the words, “What. The. Actual. Fuck. Is. Going. On.” occur in my research notes. (Complete with punctuation.)

I checked the other species on Echinobase. Three 11/13b-c genes again, every time. Over on Genbank, I found a complete protein sequence from a sand dollar that Tsuchimoto and Yamaguchi (2014) had previously classified as 11/13c by exclusion. The Japanese duo had a clear b, but this other sequence was behaving oddly in their phylogenetic analyses. Now I had the obvious explanation: it wasn’t 11/13c at all.*

I wrote to Dave and found out that this was also news to him. By all appearances, I had stumbled on something truly new, in a gene family that’s both iconic in our field, and dear to my obsessive little heart.

We decided to try to turn it into a paper.

In search of the alphabet’s end

Once we’d made that decision, and following Dave’s advice, I had a few tasks ahead of me. I had to check how far back in evolution our new gene (which we called Hox11/13d) went. I had to test whether it had truly escaped the Hox cluster in all of our study species. I had to refresh my memory on deuterostome posterior Hox genes in general, both for paper-writing purposes and in case there was a forgotten reference to our “new” gene lurking somewhere in the literature.

There wasn’t, but.

In a figure legend in Thomas-Chollier et al., 2010), there is a brief mention of an unnamed “Hox11/13c-like” sequence in sea urchins. When I saw that, I damn near soiled myself, but the authors couldn’t definitively identify this sequence as a Hox gene, so they left it at that throwaway comment and a few bits of supplementary data. Luckily, they had a gene ID that I could look up on Echinobase.

Gods help me, it turned out to be another new Hox. When the shock of Hox11/13d had barely worn off, I was confronted with a possible Hox11/13e. And this one wasn’t in the Hox cluster either.

Aside from not being part of the Hox cluster, Hox11/13d is a pretty good echinoderm Hox gene. The homeodomain it encodes is reminiscent of Hox11/13b and c, and, although they are hard for automated searches to find, there are similarities outside the homeodomain that place it firmly in the same group as b-c.

Unlike d, Thomas-Chollier’s “11/13c-like” sequence isn’t that 11/13c-like at all, as you might have guessed from the fact that they weren’t even sure it’s a Hox. The region immediately following the homeodomain (sometimes known as the C-peptide) is very similar to the same part of Hox11/13d. These kinds of motifs can sometimes be used to tell different Hox genes apart. Two C-peptides being strongly similar is a clue that we’re dealing with related genes. However, the homeodomain of Hox11/13e, as we indeed dubbed Thomas-Chollier’s sequence, is really, really weird. It isn’t just unlike 11/13c, it’s unlike anything else I’d seen before. It groups with posterior Hoxes when we test it against a variety of homeodomains, but you wouldn’t know that simply from looking at it.

It is, however, an oddball with a history. As strange as that homeodomain is, once I knew what I was looking for, I found examples in all my other echinoderms. This combination of strong conservation of one Hox gene with considerable differences from other Hox genes just screams “study me more!”, especially when you realise that Hox11/13e appears to be limited to echinoderms (unless something like it is hiding in protostomes…). I looked quite carefully in the hemichordates available to me (Simakov et al., 2015), but the only thing I found that wasn’t one of the “canonical” four posteriors is something called “Abdominal B-like”, which is weird in its own way and not obviously connected to either of our two new genes.

Tangled histories and unhelpful clues

I alluded to the question of Hox11/13b-c origins earlier on. Posterior Hox genes in deuterostomes are notoriously difficult to classify (Ferrier et al., 2000; Thomas-Chollier et al., 2010). When you try to use traditional tree-building methods on them, you get a big unresolved mess, as if the twigs on the tree emerged from an impenetrable mist that hides the arrangement of the older branches from view. Ambulacrarians are definitely the better-behaved half of the Deuterostomia in this regard, since we can say with some confidence that Hox9/10, 11/13a and at least a single precursor to 11/13b-c were present in their last common ancestor.

Nonetheless, two new genes, at least one of which is clearly close to 11/13b-c, complicate matters (Abdominal B-like, as they say in scientist-speak, is beyond the scope of this work). Were they lost in hemichordates? Did echinoderms undergo extra gene duplications, and if so, was it from one or two ancestral genes? Where on earth does Hox11/13e fit? I did a lot of exploratory tree-building for this paper, none of which was particularly helpful in answering those questions.

My other hope was to look at the parts of the protein sequence that led me to my new Hoxes in the first place: all the stuff other than the homeodomain. Using a program called MEME, I found a fair few conserved motifs, but they only seemed to add to the confusion. Hox11/13e, for which I only had first exons (and tentative ones at that) from sea urchins and sea stars, yielded nothing of use apart from its striking C-peptide. In the others, the distribution of motifs created a patchwork of similarities that didn’t neatly align with any one possible history. Echinoderm Hox11/13c mostly did its own thing, while b and d each shared a different subset of motifs with one or both of the hemichordate b-c proteins.

I’m almost inclined to think that there was a single, “prototype” Hox11/13b+ sequence in the ambulacrarian ancestor, which contained all of the motifs I found. In that scenario, separate b and c (and d and maybe e) genes would have evolved independently in hemichordates and echinoderms, and each descendant gene would have lost some of the original motifs more or less at random. Duplicated genes can split the functions of their single ancestor between them (Force et al., 1999), so why not motifs? Short sequence motifs like the ones I was looking for can have important functions, after all. It’s a possibility, but we may never know for sure.

Hox genes gone rogue

I mentioned before that Hox11/13d was outside the Hox cluster. Well, so is Hox11/13e. As far as I can tell, Hox 11/13d and e always reside on separate chunks of the genome form any other Hox gene, including each other. They are always accompanied by neighbouring genes that aren’t Hoxes. Although detachment of a posterior gene from an otherwise apparently intact Hox cluster also happened in ragworms (Hui et al., 2012), it’s still a surprise in echinoderms. Since the relationship between the organisation of Hox genes and their regulation in space and time is… kinda complicated, we can’t really tell what, if anything, all this wandering implies without actually looking at some gene expression.

What are they for?

Then there’s the question of what on earth these genes do. Thanks to Tsuchimoto and Yamaguchi (2014), we know that Hox11/13d is active in later embryonic stages of some sea urchins. It even looks like it might be working with Hox11/13b in a Hox-like fashion, the two of them having adjacent expression domains. We have some transcriptomic evidence that this gene is also active in other sea urchins, brittle stars and starfish, but no idea what it’s doing in any of the above.

We know even less about Hox11/13e. The only evidence for expression I’m aware of is from starfish testicles, and testicles will express any old piece of DNA with an “on” switch. If it’s somehow involved in development, it must be either at very low levels that are difficult to capture in a transcriptome, or at developmental stages that weren’t included in the data I encountered.

If it does have a role in adult echinoderm development, that would be crazy exciting, as both adult echinoderm anatomy and Hox11/13e are so weird and unique. Although they develop from bilaterally symmetrical larvae, adult echinoderms have dispensed with the symmetry that gave Bilateria its name. Instead, like a sea anemone (or a regular anemone…), they are radially symmetrical. Hox genes are involved in both larval and adult development in echinoderms, but from what little I’ve been able to glean from the existing literature, it’s different subsets in larvae and adults rather than the entire Hox cluster together. Is Hox11/13e in the “adult” subset, missed until now due to its unusual sequence? I really hope someone with a lab and a ready supply of baby echinoderms investigates in the near future…

A lesson about expectations

I could go on for a lot longer about this project, but it’s probably time to form some sort of conclusion. For me, perhaps the most important take-home message of this adventure is not what I found, but how and where and why I found it.

I didn’t set out to discover anything. All I wanted to do was collect and organise information already out there. (If a genie popped out of my desk lamp, I might just wish for a full-time job where I get to build my Hox directory… given the volume of genome data already out there and coming out every time I look, continuing this as a hobby project in my free time seems hopelessly Sisyphean now.)

The discovery of Hox11/13d and all that followed was an accidental side effect of my penchant for perfectionism. If I’d contented myself with the homeodomains most students of Hox evolution focus on, I would never have seen a Hox that wasn’t in the books, a Hox I hadn’t expected to exist.

Expectations are important. I’d told myself that I wanted to make sure I had everything, but when my searches spat out a hundred different results, I started to slack off soon after I ticked off the Hoxes I knew. I gave the rest of the hit list a half-hearted effort at best. Hox11/13d has a homeodomain that’s split across two exons, and Hox11/13e is weird. In a search that scores both the closeness and the length of a match, that pushes them to the bottom of the results, where a casual observer, or an observer who thinks they know what they’re looking for, will most likely miss them. I thought I knew that sea urchins had a single, intact(ish) Hox cluster with 11 genes. I’d read a pretty good paper on it. Only the paper wasn’t quite right, after all.

To me, this study stands as a reminder to keep looking. In an era when new genomes are popping up left and right and Big Data with automated analyses is the scientific zeitgeist, it’s still worth rolling your sleeves up, picking up the old magnifying glass and taking a closer look – even in organisms you think you know. You might just chance upon some real treasure.



*A “Hox11/13c” behaving oddly should be immediately suspicious based on what I saw in my own trees, where echinoderm Hox11/13c consistently formed a strongly supported group. But that’s hindsight for you…



Baughman KW et al. (2014) Genomic organization of Hox and ParaHox clusters in the echinoderm, Acanthaster planci. Genesis 52:952-958

Cameron RA et al. (2006) Unusual gene order and organization of the sea urchin hox cluster. JEZ B 306:45-58

Delroisse J et al. (2016) De novo adult transcriptomes of two European brittle stars: spotlight on opsin-based photoreception. PLoS ONE 11: e0152988

Duboule D (2007) The rise and fall of Hox gene clusters. Development 134:2549-2560

Endo M et al. (2018) Hidden genetic history of the Japanese sand dollar Peronella (Echinoidea: Laganidae) revealed by nuclear intron sequences. Gene 659:37-43

Ferrier DEK et al. (2000) The amphioxus Hox cluster: deuterostome posterior flexibility and Hox14. Evol Dev 2:284-293

Force A et al. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545

Freeman R et al. (2012) Identical genomic organization of two hemichordate Hox clusters. Curr Biol 22:2053-2058

Hui JH et al. (2012) Extensive chordate and annelid macrosyntheny reveals ancestral homeobox gene organization. Mol Biol Evol 29:157-165

Simakov O et al. (2015) Hemichordate genomes and deuterostome origins. Nature 527:459-465

Szabó R and Ferrier DEKF (2018) Two more Posterior Hox genes and Hox cluster dispersal in echinoderms. BMC Evol Biol 18:203

Thomas-Chollier M et al. (2010) A non-tree-based comprehensive study of metazoan Hox and ParaHox genes prompts new insights into their origin and evolution. BMC Evol Biol 10:73

Tsuchimoto J and Yamaguchi M (2014) Hox expression in the direct-type developing sea urchin Peronella japonica. Dev Dyn 243:1020-1029

Return to Origin part 2

In which Darwin’s Introduction sends me off on tangents about academic writing, gender and the nature of explanations.

The Origin of Species reread returns! Eventually! So much for increasing my productivity, but hey, at least I didn’t give up after the first one! (For the record, this post has been 99% written for the past month. It only took me that long to convince myself that hitting the “publish” button won’t turn me into the laughing stock of the universe.)

This won’t be as long as Part One, since the Introduction isn’t as long as the Historical Sketch either. In comparison with modern scientific works, the Intro is basically the abstract of Origin, mixed with a few acknowledgements. It covers pp. 65-69 of my copy.

It’s amusing and endearing how much of the first couple of pages is spent swearing up and down that Darwin didn’t pull his theory out of his backside. Also, the “sorry I couldn’t give you all the facts, I had to be brief” apology always cracks me up – if 400 pages full of facts is your idea of brevity, man, you should be writing epic fantasy, not science 😛 (Also: perfectionist much?)

I have written my own handful of scientific articles in my time as a PhD student, which definitely gives one a different perspective on some of the writing conventions in such works. (It should go without saying, but this is my individual perspective; I certainly don’t claim to represent all writers of scientific articles.) When authors talk about caution and caveats and more data being needed, I think most of the time they are both sincere and not. Scientists – the ones I’ve met, at least – generally seem like decent people who honestly worry about getting stuff right and not letting wishful thinking get in the way of good science.

However, when you’re preparing a manuscript for a peer-reviewed academic journal, there is always an element of satisfying reviewers, and if you sound more confident than the reviewers think your data warrant, they will comment on that. Adding caveats is not just a sign that you understand the limitations of your work, it is also insurance against being hassled by editors and reviewers. (And then there’s always throwing a bone to your worst enemies just in case they try to sabotage your paper, because scientists can be just as petty and occasionally awful as humanity at large, and often, anonymity doesn’t actually make it that much harder to figure out whose paper you’re reviewing.)

With all that said, it never occurred to me that Darwin wasn’t perfectly sincere in his numerous apologies for not providing even more evidence. He just doesn’t seem like that kind of guy. Please don’t disillusion me. I’m a giant ol’ sap at heart, okay?

P65 has another shoutout to Wallace, and p66 a huge acknowledgement to Hooker (an eminent scientist in his own right). This Darwin-Hooker bromance is making me all mushy inside! (See above: giant, sappy)

Pp66-7 contain, aside from another little dig at the Vestiges of Creation, some first-class philosophy fodder. Here, Darwin emphasises the importance of providing mechanisms when positing a new phenomenon. Lots of people, he says, might look at the similarities among species and conclude that different species have descended from common ancestors. “Nevertheless,” he continues, “such a conclusion, even if well founded, would be unsatisfactory, until it could be shown how the innumerable species inhabiting this world have been modified, so as to acquire that perfection of structure and coadaptation which most justly excites our admiration.”

Do we agree with this assessment? How much is suggesting a “what” worth without an accompanying “how”? And how necessary is a mechanism for the acceptance of a new scientific idea? The simple, distilled high-school science class version of the story of continental drift, for example, tells you that Alfred Wegener was laughed out of the room because he couldn’t say what force might make continents waltz across the surface of the planet. Then someone came up with mantle convection, and Wegener’s idea finally triumphed. The actual story, as is usually the case, seems a bit more complicated than that, but it does sound like the general acceptance of the idea needed that mechanistic underpinning that its proponents couldn’t quite provide at first.

While looking for scientific ideas that might have been widely accepted without that underpinning, I found myself getting really philosophical and wondering what counts as a mechanism. Perhaps this is easier to answer in biology, where most explanations can at least be conceptualised. One doesn’t have much difficulty imagining some individuals being better at procreation than others, and babies resembling their parents (the very dumbed-down essence of natural selection). What about physics, where shit gets really weird and soon leaves the realm of human experience when you start digging deep enough? Did physicists accept concepts like gravity, dark matter and dark energy because the maths worked out, because the observations were so bloody obvious that something had to be going on, or because “attractive force”, “weakly interacting massive particle” or “vacuum energy” make sense to human brains? (Of course, I wouldn’t expect a physicist to accept anything based solely on the third, but where the maths could go multiple ways, as – so far as I understand – on the boundaries of modern cosmology, is it easier to lean towards the equations that correspond to concepts that make the most sense?)

… I guess what I’m saying is that this stuff is fascinating to ponder, and if anyone points me to a readable discussion of the subject by someone who actually knows what they’re talking about, I might well put it on my ever-expanding reading list…

P67 then reminded me how times have changed since Darwin’s day. Here, he discusses “man” and his “great power” in “accumulating slight variations”. Every time he talks about something humans did, it’s always a “he” (well, at least up to the end of the next chapter 😛 ). We’ve certainly come a long way when it comes to recognising the rest of humanity’s role in history…

This is where I decided that I needed to keep an eye out for any mention of female scientists (or just women in general) – women of science have existed for as long as science itself, but I’m curious whether Darwin drew on the work of any. It’s always satisfying to see women’s achievements recognised by their male contemporaries, especially in times when it wasn’t fashionable to do so. It would be extra satisfying to see it from a man I like and admire in his own right.

There is not much to say about the rest of the Introduction, except to note that it’s a decent summary of Darwin’s evolutionary theory. He lists the basic elements of the theory (variation + competition = natural selection + extinction), the main categories of evidence he used to come to his conclusions (artificial selection, embryology, ecology, biogeography, fossils) and the main questions that the theory must answer (novelties of morphology and behaviour, the sterility of hybrids, and the gaps in the fossil record). All of these will make extended appearances in the course of the book.

The last paragraph of the Intro is such a typical conclusion to a scientific abstract that I had to smile when reading it. There is still much to be learned, but the author is convinced that he is right about X, Y and Z. Not saying this is a bad way to conclude an introduction – all I’m saying is that for me, it’s a well-worn trope of academic writing that echoes with the voices of a thousand other works.

Next time, we’ll get into the meat of Origin proper. It turns out that the meat in Origin is often pigeon. (Seriously. Darwin was obsessed with pigeons.)

Return to Origin part 1

Introducing my new pet project

The Mammal has had an idea to boost her productivity! (OK, I’ve actually had this idea pretty much since I started CM. Speed has never been among my selling points 😛 ) Since most of the posts I start writing about current science seem to die in their cradles these days, I have decided to go back to Ye Olde Science and try my hands at an Origin of Species re-read. Because Origin is one of the foundational works of my discipline, and it proved to be a lot more interesting than a callow undergraduate student of evolutionary biology had expected. Also, there is a lot of material in it, and on previous reads I’ve had no shortage of thoughts about it. There is a faint chance it can sustain a few months of blog posts 🙂

I’ve never been too bothered about reading “the classics”. The classics of literature that school forced me to read often turned out to be terribly written, boring tomes whose “profound” meanings held little interest for a young person. Much of my higher education gave me the feeling that biology is such a fast-moving field that anything older than a decade or so is probably of little value except as a historical curiosity. I was, it turns out, very wrong about that for more than one reason. First, you don’t really start appreciating the value of older literature until decades-old, obscure zoological papers are the only place you can find any information at all about the question you are researching. Biology may move fast when it comes to the genetics of well-known model organisms, but it sure as hell takes its time in investigating the development and regeneration of serpulid opercula.

Second, history can be interesting in itself. Science is not a series of independent discoveries; it is a complex, organic growth cultivated by interconnected minds embedded deep in their respective societies. Which ideas get picked up and which ones are forgotten doesn’t just depend on the quality of the evidence but also on the zeitgeist. (The first example I thought of was, disturbingly, the triumph of Lysenkoism over real genetics under Stalin. I guess straight-up imprisoning or executing anyone who doesn’t like your pet advisor’s pet theory kind of counts as an effect of the zeitgeist?)

I first decided to read Darwin’s Origin a number of years ago out of curiosity. I was, after all, studying for an evolutionary biology degree, and Origin was kind of the book that kicked it all off. (I think it might have been on offer at the university bookstore when I went to buy some textbooks, too.) To be honest, I fully expected a boring, painfully outdated, horribly convoluted book. I expected reading it to be a chore. I certainly didn’t expect to find beauty – never mind revelations – in it.

Suffice to say I was pleasantly surprised. While Darwin’s writing style can be a drag for a modern reader with an attention span trained on Facebook posts and cat memes (he’s waaaaaay more fond of run-on sentences than I am, and I have to actively restrain myself from letting them grow out of control), it is quite beautiful at times. What’s more interesting from the scientist’s perspective – while a lot of Origin is indeed outdated, there are some surprisingly “modern” ideas that I never would have expected to find in a 19th century book. The third major surprise of my first read was the sheer amount of data involved. I suppose I’d always known that Darwin didn’t pull his theory out of thin air, but I hadn’t realised just how much careful observation went into his best-known work. No wonder it took him decades to finish [1].

And so, from the vantage of a few more years of learning, I decided to give Origin of Species another read and document my thoughts along the way. I have no idea if this is going to work out, but hopefully publishing the first part will give me the incentive to carry on.

Origin saw six editions altogether. The version I have, the 1985 Penguin Classics edition, contains the text of the first edition, but also includes the Historical Sketch that Darwin added later. The Sketch itself went through a number of revisions; I’m not entirely sure which version my copy has. (Origin can be read free of charge in several places online, including TalkOrigins and Darwin Online. TalkOrigins’s version is also a first edition text with the Sketch added.) It is with this Sketch that I’m going to kick off the re-read.

I won’t even attempt to summarise chapters, and I might have to split most of them into multiple posts for sheer length. Comments might be a bit disjointed, since they reflect thoughts that came into my head as I was reading – sometimes connected, but often quite scattered and tangential. Any page numbers refer to my Penguin edition, but I’ll try to remember to give some pointers (paragraph descriptions, section headings, quotes) for anyone reading a different version.

Darwin on the shoulders of giants – the Historical Sketch (pp53-63)

I find the Historical Sketch (hereafter: HS) tremendously interesting. I’m not entirely sure why it appeared in later editions of Origin; I had assumed that it was a response to claims that he was ripping people off, but googling the subject yielded surprisingly little information. Johnson (2007) calls its origin “somewhat obscure,” and Darwin’s own statements on the matter contradictory. Darwin’s correspondence doesn’t even clarify when the HS was written, let alone why. Similar historical introductions, Johnson notes, are not uncommon in scientific writings of the era, and it is quite possible that Darwin was already drafting one for his “big species book” (of which Origin was the abridged version) years before the publication of Origin, but not much evidence remains to fill in the details.

General observations

Regardless of where it comes from, the HS is an intriguing little run-down of the history of evolutionary ideas as Darwin saw it. If there is one take-home message from this brief preface to Origin, it is that no scientific advance is a lightbulb suddenly blinking on in the dark. Ideas have roots, and complicated ideas come together from many different roots, some of them, in this case, going right back to antiquity.

Any good biology curriculum includes some of the researchers and thinkers featured in the HS – who hasn’t heard of Lamarck and his silly-silly inheritance of acquired characteristics, for example?[2] However, schools tend to gloss over the sheer quantity of evolutionary thinking going on in late 18th and early- to mid-19th century biology. Well, in his ten-and-a-half-page summary, Darwin discusses 34 authors, all of whom entertained the idea that species might change over time, and many of whom considered possible mechanisms for such change. Most of these guys I’d either never heard of at the time I first read Origin, or I’d never known they had a connection to evolution. All in all, the HS definitely gives the impression of a biological community ripe for an evolutionary revolution.

Finally: oh GODS, some of this is so funny. The HS exhibits some prime examples of the kind of borderline impolite academic snark that you can also find in today’s scientific debates. Having done research and written a few papers myself, I find academic snark doubly entertaining; just how many ways can you call someone an idiot while maintaining that essential veneer of professionalism?

A page-by-page trip through Tangentia

A.k.a. any old silliness that popped into my head along the way.

Victorian titling conventions: clearly long before the invention of clickbait! The full title of Origin is the unwieldy The Origin of Species by Means of Natural Selection, or The Preservation of Favoured Races in the Struggle for Life. [3] The HS is technically called An Historical Sketch of the Progress of Opinion on the Origin of Species Previously to the Publication of the First Edition of This Work. Quite a mouthful, but at least it tells you exactly what to expect. None of this “You Won’t Believe What This Man Found in His Soup!” nonsense!

Right off the bat, on p53: giant footnote that takes up half the page AND some of the next page. The HS has several of those, and I’m not entirely sure why they aren’t simply part of the main text. Luckily, the rest of the book is blissfully devoid of them.

By the way, this first footnote is pretty interesting. To me, anyway, since I spent a lot of time interacting with creationists, and by the quotes Darwin gives here, Aristotle (!) got something right that most creationists (or most people?) struggle with to this day. That being the idea that the traits of organisms do not arise in order to fulfil some goal – they just are, and if organisms seem well-adapted to their circumstances, that is because any that weren’t were exterminated by said circumstances.

P55, the discussion of Étienne Geoffroy Saint-Hilaire (the guy who sort-of invented dorsoventral inversion) and his ideas about the descent of species from original “types”: I love how Darwin just assumes that his readership knows French. This isn’t the last untranslated French quote in this book by a long shot. (This particular one, Google Translate tells me, basically boils down to “we need more research”.)

Also on the same page, in the next paragraph about WC Wells’s views – I honestly hadn’t known that “negroes and mulattoes [enjoying] immunity from certain tropical diseases” was already established in the early 19th century. Darwin doesn’t detail which diseases – wonder if malaria is among them? Seeing as sickle cell trait and malaria immunity is one of the textbook examples of heterozygote advantage in modern courses on evolution. I’m quite impressed (though maybe I shouldn’t be) that not only were scientists aware of differences in disease susceptibility, but also attributed these to something akin to natural selection. (Although I’m pretty certain that an understanding of the genetics was far out of reach for the naturalists of the time.)

On the next couple of pages, there are at least three allusions to archetypes that related species are thought to have diverged from. There seems to be a theme running through all of these “type” concepts, although Darwin doesn’t always give direct quotes, so I don’t know how accurate his descriptions of his colleagues’ views are. In connection with Geoffroy Saint-Hilaire, he mentions related species being “degenerations of the same type”; W. Herbert supposedly suggested “highly plastic” original forms to be the ancestor of each plant genus, and Rafinesque (this is a direct quote) wrote that “varieties are gradually becoming species by assuming constant and peculiar characters”, that is, “except the original types or ancestors of a genus”.

So the general thrust of this seemingly fashionable idea is that the ancestors of living species were more variable and less specialised than their descendants. Does anyone hear definite SJ Gouldian undertones here? Isn’t this basically the late great Gould’s view of the Cambrian Explosion in a nutshell? I guess I should have expected this idea to go very far back, what with Platonic ideals and all that, but it still took me by surprise to find it in this context.

… also, it took me until this point, nearly halfway through the HS, to realise that Darwin was going in chronological order. I blame sleep deprivation.

P57 is where I had a sudden “I really should know this” moment. I’m reading this bit and thinking, who the fuck wrote the Vestiges of Creation? I distinctly remembered hearing about it in class years ago, but I couldn’t for the life of me attach a name to it. For the record, Vestiges, a pop-sci book about the evolution of everything (written by a Scotsman named Robert Chambers) was originally published anonymously, so I’m not going to feel too bad about not remembering the author.

Here comes our first example of wonderful academic snark. Vestiges, by Darwin’s account, sounds like a big heap of vitalistic mumbo-jumbo, complete with ladder-thinking and generally likely to make me tear my hear out should I ever be brave enough to read it. I get the distinct impression that I share this opinion with one Mr Darwin – heck, I’m just going to quote his description of Vestiges in its full glory:

“But I cannot see how the two supposed ‘impulses’ account in a scientific sense for the numerous and beautiful co-adaptations which we see throughout nature; I cannot see that we thus gain any insight how, for instance, a woodpecker has become adapted to its peculiar habits of Life. The work, from its powerful and brilliant style, though displaying in the earlier editions little accurate knowledge and a great want of scientific caution, immediately had a very wide circulation. In my opinion it has done excellent service in this country in calling attention to the subject, in removing prejudice, and in thus preparing the ground for the reception of analogous views.”

So: it’s an overenthusiastic pile of pseudoscience that fails to actually explain anything, but I guess it’s… well-written? Oh, ol’ Chuck, you’re such a diplomat.

(Totally random aside: despite leaning towards biology and occasionally astronomy from an early age – so, definitely NOT chemistry – my first real encounter with vitalism, the belief that living things run on some kind of special life force, was in a book about the history of chemistry. Specifically, I learned how the first synthesis of an organic compound – urea – from completely inorganic sources dealt a great blow to the whole life force thing. The book in question was written in 1960s socialist Hungary and was, if memory serves, quite ideologically charged in places. That book would make for another interesting re-read, though probably not for anyone besides myself…)

Immediately after delivering third-degree burns to Vestiges, Darwin unleashes his diplomatic snark on Richard Owen, who was a bit of a… character. He has a reputation for trying to pass other people’s discoveries off as his own, and apparently Darwin’s natural selection was one of his targets. In the HS (p59 of my copy), Darwin summarises his take on Owen thusly:

“It is consolatory to me that others find Professor Owen’s controversial writings as difficult to understand and to reconcile with each other, as I do. As far as the mere enunciation of the principle of natural selection is concerned, it is quite immaterial whether or not Professor Owen preceded me, for both of us, as shown in this historical sketch, were long ago preceded by Dr Wells and Mr Matthews.”

Or: Owen, WTF are you on about?

P60 – It appears that Herbert Spencer was the granddaddy of evolutionary psychology, in that he was the first to propose that mental capacities could evolve gradually in the same way physical characteristics can. Cool.

(Also in this general area: more untranslated French quotes. From Geoffroy Saint-Hilaire, not Spencer.)

P61: Add Naudin to the “plastic archetypes” fan club. And have YET MORE French.

There is an interesting, if only tangentially related, footnote here. All through my reread of the HS I was waiting for Darwin to explain why his take on evolution was special and important, and contrary to my hazy recollection, he never does. (Not in the HS, anyway; my memory of Origin proper is not good enough to recall whether he does it later.) The closest he gets is in the p61 footnote, where he notes that 27 of the 34 authors he discusses “have written on special branches of natural history or geology”, which I interpret to mean that a general discussion of evolutionary theory had been lacking up to that point. So Origin’s perk is the breadth of its coverage? I’m not going to disagree with that…

(Our regular scheduling is interrupted for more French quotes. *le sigh*)

On p62, we finally get to the “other guy”, Alfred Russel Wallace, whom everyone always forgets about. Darwin doesn’t go into great detail; I suppose he figured the fact that they kind of published their theories of natural selection together sufficed. However, he does praise Wallace’s 1858 essay that made its way into the Journal of the Linnean Society for its “admirable force and clearness”. Darwin does seem like a guy who gives credit where credit is due.

Something I found interesting a few paragraphs from the end of the HS: how the same “Great Man” can mean totally different things to different people. I know Karl von Baer as one of the founding fathers of evo-devo, with his famous laws of embryology. However, when Darwin mentions von Baer’s belief in common descent, he only says that it was based mainly on biogeography. Are we even talking about the same von Baer??

Finally, the HS concludes with a little hat-tip to Darwin’s long-time friend, correspondent and fellow nerd, Joseph Hooker. Hooker will make many more appearances in Origin if memory serves – he was the source of many of the observations on which Darwin built his mighty edifice. Those two: geeky bromance of the (19th) century.

Concluding thoughts

Reading this Historical Sketch again made me wonder why it is Darwin that we remember today as “the” father of evolution. Origin may not be a totally academic work, but it sure as hell isn’t light reading. Yet it was immensely popular – the first edition sold out as soon as it was published, and the book saw six editions during Darwin’s lifetime. Was it the completeness of his treatment? His excellent social network? Was it simply a case of right place, right time? I should probably let historians of science ruminate on that. Instead, I shall move on to the Introduction. But not today. Definitely enough meandering for today!


(Small) footnotes:

[1] Which, by the way, he still considered unfinished at the time of publication, but I’m jumping ahead of myself here. [Back to post]

[2] As we’ll (hopefully) see later in the re-read, this idea didn’t seem quite so silly at the time – Darwin himself didn’t fully discount it. He did scoff at other components of Lamarck’s theory of evolution, however. [Back to post]

[3] No, he does not use “race” in that sense. [Back to post]



Johnson CN (2007) The preface to Darwin’s Origin of Species: the curious history of the “Historical Sketch”. Journal of the History of Biology 40:529-556.

To dump a chunk of trunk

The Mammal has deemed that Hox genes and good old-fashioned feel-good evo-devo are a good way to blink back to life*. Also, tardigrades. Tardigrades are awesome. Here is one viewed from above, from the Goldstein lab via Encyclopedia of Life:


Tardigrades or water bears are also a bit unusual. Their closest living relatives are velvet worms (Onychophora) and arthropods. Exactly who’s closest to whom in that trio of phyla collectively known as the Panarthropoda is not clear, and I don’t have the energy to wade into the debate – besides, it’s not really important for the purposes of this post. What Smith et al. (2016) concluded about these adorably indestructible little creatures holds irrespective of their precise phylogenetic position.

Anyway. I said tardigrades were unusual, and I don’t mean their uncanny ability to survive the apocalypse and pick up random genes in the process (Boothby et al., 2015). (ETA: so apparently there may not be nearly as much foreign gene hoarding as the genome paper suggests – see Sujai Kumar’s comment below! Doesn’t change the fact that tardigrades are tough little buggers, though 🙂 ) The oddity we’re interested in today lies in the fact that all known species are built to the exact same compact body plan. Onychophorans and many arthropods are elongated animals with lots of segments, lots of legs, and often lots of variation in the number and type of such body parts. Tardigrades? A wee head, four chubby pairs of legs, and that’s it.

How does a tardigrade body relate to that of a velvet worm, or a centipede, or a spider? Based solely on anatomy, that’s a hell of a question to answer; even the homology of body parts between different kinds of arthropods can be difficult to determine. I have so far remained stubbornly uneducated on the minutiae of (pan)arthropod segment homologies, although I do see papers purporting to match brain parts, appendages and suchlike between different kinds of creepy-crawlies on a fairly regular basis. Shame on me for not being able to care about the details, I guess – but the frequency with which the subject comes up suggests that the debate is far from over.

Now, when I was first drawn to the evo-devo field, one of the biggest attractions was the notion that the expression of genes as a body part forms can tell us what that body part really is even when anatomical clues are less than clear. That, of course, is too good to be simply true, but sometimes the lure of genes and neat homology stories is just too hard to resist. Smith et al.‘s investigation of tardigrade Hox genes is definitely that kind of story.

Hox genes are generally a good place to look if you’re trying to decipher body regions, since their more or less neat, orderly expression patterns are remarkably conserved between very distantly related animals (they are probably as old as the Bilateria, to be precise). A polychaete worm, a vertebrate and an arthropod show the same general pattern – there is no active Hox gene at the very front of the embryo, then Hoxes 1, 2, 3 and so on appear in roughly that order, all the way to the rear end. There are variations in the pattern – e.g. the expression of a gene can have sharp boundaries or fade in and out gradually; different genes can overlap to different extents, the order isn’t always perfect, etc. – but staggered Hox gene expression domains, with the same genes starting up in the same general area along the main body axis, can be found all across the Bilateria.

Tardigrades are no exception, in a sense – but they are also quite exceptional. First, their complement of Hox genes is a bit of a mess. At long last, we have a tardigrade genome to hand, in which Smith et al. (2016) found good honest Hox genes. What they didn’t find was a Hox cluster, an orderly series of Hox genes sitting like beads on a DNA string. Instead, the Hox genes in Hypsibius dujardini, the sequenced species, are all over the genome, associating with all kinds of dubious fellows who aren’t Hoxes.

What Smith et al. also didn’t find was half of the Hox genes they expected. A typical arthropod has ten or so Hox genes, a pretty standard ballpark for an animal that isn’t a vertebrate. H. dujardini has only seven, three of which are triplicates of Abdominal-B, a gene that normally exists in a single copy in arthropods. So basically, only five kinds of Hox gene – number two and most of the “middle” ones are missing. What’s more, two more tardigrades that aren’t closely related to H. dujardini also appear to have the same five Hox gene types (though only one Abd-B each), so this massive loss is probably a common feature of Tardigrada. (No word on whether the scattering of the Hox  cluster is also shared by the other two species.)

We know that the genes are scattered and decimated, but are their expression patterns similarly disrupted? You don’t actually need an intact Hox cluster for orderly Hox expression, and indeed, tardigrade Hox genes are activated in a perfectly neat and perfectly usual pattern that resembles what you see in their panarthropod cousins. Except for the bit where half the pattern is missing!

Here’s part of Figure 4 from the paper, a schematic comparison of tardigrade Hox expression to that of other panarthropods – a generic arachnid, a millipede and a velvet worm. (otd is a “head” gene that lives in the Hox-free anterior region; lab is the arthropod equivalent of Hox1, Dfd is Hox4, and I’m not sure which of Hox6-8 ftz is currently supposed to be.) The interesting thing about this is that according to Hox genes, the entire body of the tardigrade corresponds to just the front end of arthropods and velvet worms.


In addition, one thing that is not shown on this diagram is that Abdominal-B, which normally marks the butt end of the animal, is still active in the tardigrade, predictably in the last segment (L4, that is). So if you take the Hox data at face value, a tardigrade is the arse end of an arthropod tacked straight onto its head. Weird. It’s like evolution took a perfectly ordinary velvet worm-like creature and chopped out most of its trunk.

The tardigrade data suggest that the original panarthropod was probably more like arthropods and velvet worms than tardigrades – an elongated animal with many segments. The strange tardigrade situation can’t be the ancestral one, since the Hox genes that tardigrades lack long predate the panarthropod ancestor. Now, it might be possible to lose half your Hox genes while keeping your ancestral body plan, but an unusual body plan and an unusual set of Hox genes is a bit of a big coincidence, innit?

Smith et al. point out that the loss of the Hox genes was unlikely to be the cause of the loss of the trunk region – Hox genes only specify what grows on a segment, they don’t have much say in how many segments develop in the first place. Instead, the authors reason, the loss of the trunk in the tardigrade ancestor probably made the relevant Hox genes dispensable.

Damn, this story makes me want to see the Hox genes of all those oddball lobopodians from the Cambrian. Some of them are bound to be tardigrade relatives, right?



Boothby TC et al. (2015) Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. PNAS 112:15976-15981

Smith FW et al. (2016) The compact body plan of tardigrades evolved by the loss of a large body region. Current Biology 26:224-229


*The Mammal has been pretty depressed lately. As in mired up to her head in weird energy-sucking flu. Unfortunately, writing is one of those things that the damn brain monster has eaten most of the fun out of. Also, I have a shitty normal person job at the moment, and shitty job taking up time + barely enough motivation to crawl out of bed and pretend to be human means I have, at best, one afternoon per week that I actually spend on catching up with science. That is just enough to scroll through my feeds and file away the interesting stuff, but woefully insufficient for the writing of posts, not to mention that my ability to concentrate is, to be terribly technical, absolutely fucked. It’s not an ideal state of affairs by any stretch, and I’m pretty sure that if I made more of an effort to read and write about cool things, it would pay off in the mental health department, but… well. That sort of reasonable advice is hard to hear with the oozing fog-grey suckers of that thing clamped onto my brain.

Worldbuilding. With SCIENCE!

Today, I felt like meandering around a random piece of my mind that is a bit outside my usual blogging territory. Most of my academic reading (and consequently, most of my stuff here) is in the general areas of evolutionary biology, developmental biology, palaeontology and intersections thereof. Occasionally I’ll see something about abiogenesis or exoplanets or animal cognition and read it for the coolness. However, besides being a scientist, I also happen to be an avid reader and occasional writer of fantasy fiction, and one of the most appealing aspects of that genre for me is worldbuilding.

I am fascinated by the diversity of human cultures; the myriad different ways of seeing the world and constructing identities for ourselves. I love reading novels with interesting, well thought-out cultures, and tinkering with my own world is one of my favourite pastimes. If I had unlimited money and weren’t the lazy sod I am, I’d probably be thinking about getting a cultural anthropology degree on top of my first one in evolutionary biology*. Since I have very limited money and motivation, I content myself with watching out for interesting titles in the generalist journals I read. Even as a worldbuilder, I can’t stop being a scientist, so I love seeing scientific takes on what makes cultures the way they are.

Music, the many ways thereof

The other day, for example, I bumped into an analysis of music from around the world in PNAS (PNAS is a pretty good general journal for the occasional worldbuilding fodder.) Savage et al. (2015) searched for universal features of human music in about 300 recordings from around the world. It was particularly interesting to me because I have a culture with what I always suspected was a really weird religious prohibition relating to music. From what I can gather from this paper, my suspicion was correct: my little religious gimmick would be very unusual in the real world.

One of the main points of the study, however, is that there aren’t really any truly universal properties of music. There are exceptions even to “self-evident” rules that stem from the way our brains work, like having a regular beat or (if the music isn’t purely percussion) a scale made of discrete pitches. (So: I can do what I want with the music of my imaginary cultures, as long as I don’t make them all weird in the same way. Science says so. *smug face*)

There’s also the fact that most of the music recorded in the database is performed by men despite the fact that women are just as capable of making music. This is a valuable piece of information for a worldbuilder, one I wasn’t (consciously) aware of before I read this paper, and also one that highlights the importance of context. Me being a girl and rather acutely aware of the curses of patriarchy from a young age, I have thought up several societies that are either gender-equal or matriarchal (most of these societies are not human). How would that change the balance? If the hypothesis that male-dominated music has something to do with sexual selection is correct, should we see pretty much equal participation in cultures where both men and women are promiscuous and participate in literal mating displays? (Playing with sexuality in a fantasy world is even more fun than playing with religion! Also, an evolutionary biology degree can give you some really funky worldbuilding ideas…)

(Incidentally, Savage et al. draw a parallel between male-dominated music in humans and male-dominated vocalisations in, among other groups, songbirds. I find it curious that they didn’t mention a recent study that suggested that actually, females probably also sang in the ancestral songbird, and pointed out that this state of affairs is still the norm rather than the exception when you look at the whole group [Odom et al., 2014].)

Religions evolving

Today, I found a paper introducing a really shiny new database in PLoS ONE (which is why I decided to ramble about worldbuilding). “Pulotu” (Watts et al., 2015a) is a free database of supernatural beliefs and practices from 100+ Austronesian cultures, designed to study the cultural evolution of religion. Austronesian peoples originated from Taiwan many thousands of years ago. Today, they inhabit a huge area including Indonesia, Papua New Guinea, New Zealand, zillions of Pacific islands (Polynesians!) and Madagascar. They are a very diverse bunch in every respect, and their family tree is pretty well understood from linguistics and genetics. A decent database of those diverse cultural traits combined with the understanding of history is truly an amazing resource for those interested in how said cultural traits evolve. (Seriously, this thing looks like a goddamned gold mine.)

The authors have clearly done thorough work, using multiple sources, ethnographies written by scholars who actually met the people in question where possible, to characterise each culture. The database has three separate time focuses to distinguish the “pristine” state of a culture from what happened after contact with major religions like Hinduism or Christianity. They recorded both characteristics of religion like the types of supernatural beings worshipped and the types of rituals practiced, and characteristics of the societies themselves such as how they get most of their food, and how many layers of political hierarchy they have. You can visualise these features on a map with a couple of clicks, so you can immediately see if they are randomly distributed or found in particular places.

So what can you learn about cultural evolution from this treasure trove? One example the paper gives concerns something I came across years ago when I was researching theories about the evolution of religion for an undergrad assignment. The idea is that fear of supernatural punishment, particularly the belief in “high gods” who punish immoral acts, fosters cooperation and promotes the formation of large and politically complex societies. The supernatural punishment hypothesis has been around for a while, but I think I first encountered it in Johnson (2005).

Johnson tried to test the idea by looking at correlations between belief in moralising high gods and various proxies of cooperation (e.g. size of the society, presence of money lending, centralised authorities) in a cross-cultural sample. However, correlation does not equal causation, so that kind of study leaves it unclear whether moralising gods lead to complex societies or the other way round. However, with a solid family tree of cultures, you can add a historical dimension to a cross-cultural comparison, which allows you to infer causality.

When the Pulotu authors did this (Watts et al., 2015b), they found that Johnson probably got his causal arrow pointing the wrong way. If moralising gods do indeed lead to complex societies, then societies with moralising gods should increase in complexity more often than societies without. What actually seems to be happening in Austronesia is that complex societies came first, and they were more likely to develop beliefs in moralising gods. Nonetheless, a more general version of the supernatural punishment hypothesis, in which agents that aren’t high gods (e.g. karma, ancestors) may do the punishing, is supported by the analysis.

That’s mostly irrelevant for worldbuilding, where the correlation alone is enough to work out what’s “realistic”, but I also find the science fascinating in its own right. And while I’ve not tried downloading the Pulotu dataset (as I said, I only found out about it today, and I’ve been writing this post since), from a brief look it’s a handy text file that appears to be useable by anyone who knows the first thing about spreadsheets. I might have to go and play with it. Just have to think of some interesting questions…

So, now you know. I’m a hopeless geek even when I’m not officially being a scientist. (Does this surprise anyone?)


*If I had unlimited money, I’d probably spend my entire life at university…


Johnson DDP (2005) God’s punishment and public goods. A test of the supernatural punishment hypothesis in 186 world cultures. Human Nature 16:410-446

Odom KJ et al. (2014) Female song is widespread and ancestral in songbirds. Nature Communications 5:3379

Savage PE et al. (2015) Statistical universals reveal the structures and functions of human music. PNAS 112:8987-8992

Watts J et al. (2015a) Pulotu: database of Austronesian supernatural beliefs and practices. PLoS ONE 10:e0136783

Watts J et al. (2015b) Broad supernatural punishment but not moralizing high gods precede the evolution of political complexity in Austronesia. Proceedings of the Royal Society B 282:20142556

In which a “living fossil’s” genome delights me

I promised myself I wouldn’t go on for thousands and thousands of words about the Lingula genome paper (I’ve got things to do, and there is a LOT of stuff in there), but I had to indulge myself a little bit. Four or five years ago when I was a final year undergrad trying to figure out things about Hox gene evolution, I would have killed for a complete brachiopod genome. Or even a complete brachiopod Hox cluster. A year or two ago, when I was trying to sweat out something resembling a PhD thesis, I would have killed for some information about the genetics of brachiopod shells that amounted to more than tables of amino acid abundances. Too late for my poor dissertations, but a brachiopod genome is finally sequenced! The paper is right here, completely free (Luo et al., 2015). Yay for labs who can afford open-access publishing!

In case you’re not familiar with Lingula, it’s this guy (image from Wikipedia):

In a classic case of looks being deceiving, it’s not a mollusc, although it does look a bit like one except for the weird white stalk sticking out of the back of its shell. Brachiopods, the phylum to which Lingula belongs, are one of those strange groups no one really knows where to place, although nowadays we are pretty sure they are somewhere in the general vicinity of molluscs, annelid worms and their ilk. Unlike bivalve molluscs, whose shell valves are on the left and right sides of the animal, the shells of brachiopods like Lingula have top and bottom valves. Lingula‘s shell is also made of different materials: while bivalve shells contain calcium carbonate deposited into a mesh of chitin and silk-like proteins,* the subgroup of brachiopods Lingula belongs to uses calcium phosphate, the same mineral that dominates our bones, and a lot of collagen (again like bone). But we’ll come back to that in a moment…

One of the reasons the Lingula genome is particularly interesting is that Lingula is a classic “living fossil”. In the Paleobiology Database, there’s even an entry for a Cambrian fossil classified as Lingula, and there are plenty of entries from the next geological period. If the database is to be believed, the genus Lingula has existed for something like 500 million years, which must be some kind of record for an animal.** Is its genome similarly conservative? Or did the DNA hiding under a deceptively conservative shell design evolve as quickly as anyone’s?

In a heroic feat of self-control, I’m not spending all night poring over the paper, but I did give a couple of interesting sections a look. Naturally, the first thing I dug out was the Hox cluster hiding in the rather large supplement. This was the first clue that Lingula‘s genome is definitely “living” and not at all a fossil in any sense of the word. If it were, we’d expect one neat string of Hox genes, all in the order we’re used to from other animals. Instead, what we find is two missing genes, one plucked from the middle of the cluster and tacked onto its “front” end, and two genes totally detached from the rest. It’s not too bad as Hox cluster disintegration goes – six out of nine genes are still neatly ordered – but it certainly doesn’t look like something left over from the dawn of animals.

The bigger clue that caught my eye, though, was this little family tree in Figure 2:


The red numbers on each branch indicate the number of gene families that expanded or first appeared in that lineage, and the green numbers are the families shrunk or lost. Note that our “living fossil” takes the lead in both. What I find funny is that it’s miles ahead of not only the animals generally considered “conservative” in terms of genome evolution, like the limpet Lottia and the lancelet Branchiostoma, but also the sea squirt (Ciona). Squirts are notorious for having incredibly fast-evolving genomes; then again, most of that notoriety was based on the crazily divergent sequences and often wildly scrambled order of its genes. A genome can be conservative in some ways and highly innovative in others. In fact, many of the genes involved in basic cellular functions are very slow-evolving in Lingula. (Note also: humans are pretty slow-evolving as far as gene content goes. This is not the first study to find that.)

So, Lingula, living fossil? Not so much.

The last bit I looked at was the section about shell genetics. Although it’s generally foolish to expect the shell-forming gene sets of two animals from different phyla to be similar (see my first footnote), if there are similarities, they could potentially go at least two different ways. First, brachiopods might be quite close to molluscs, which is the hypothesis Luo et al.‘s own treebuilding efforts support. Like molluscs, brachiopods also have a specialised mantle that secretes shell material, though having the same name doesn’t mean the two “mantles” actually share a common origin. So who knows, some molluscan shell proteins, or shell regulatory genes, might show up in Lingula, too.

On the other hand, the composition of Lingula’s shell is more similar to our skeletons’. So, since they have to capture the same mineral, could the brachiopods share some of our skeletal proteins? The answer to both questions seems to be “mostly no”.

Molluscan shell matrix proteins, those that are actually built into the structure of the shell, are quite variable even within Mollusca. It’s probably not surprising, then, that most of the relevant genes that are even present in Lingula are not specific to the mantle, and those that are are the kinds of genes that are generally involved in the handling of calcium or the building of the stuff around cells in all kinds of contexts. Some of the regulatory mechanisms might be shared – Luo et al. report that BMP signalling seems to be going on around the edge of the mantle in baby Lingula, and this cellular signalling system is also involved in molluscan shell formation. Then again, a handful of similar signalling systems “are involved” in bloody everything in animal development, so how much we can deduce from this similarity is anyone’s guess.

As for “bone genes” – the ones that are most characteristically tied to bone are missing (disappointingly or reassuringly, take your pick). The SCPP protein family is so far known only from vertebrates, and its various members are involved in the mineralisation of bones and teeth. SCPPs originate from an ancient protein called SPARC, which seems to be generally present wherever collagen is (IIRC, it’s thought to help collagen fibres arrange themselves correctly). Lingula has a gene for SPARC all right, but nothing remotely resembling an SCPP gene.

I mentioned that the shell of Lingula is built largely on collagen, but it turns out that it isn’t “our” kind of collagen. “Collagen” is just a protein with a particular kind of repetitive sequence. Three amino acids (glycine-proline-something else, in case you’re interested) are repeated ad nauseam in the collagen chain, and these repetitive regions let the protein twist into characteristic rope-like fibres that make collagen such a wonderfully tough basis for connective tissue. Aside from the repeats they all share, collagens are a large and diverse bunch. The ones that form most of the organic matrix in bone contain a non-repetitive and rather easily recognised domain at one end, but when Luo et al. analysed the genome and the proteins extracted from the Lingula shell, they found that none of the shell collagens possessed this domain. Instead, most of them had EGF domains, which are pretty widespread in all kinds of extracellular proteins. Based on the genome sequence, Lingula has a whole little cluster of these collagens-with-EGF-domains that probably originated from brachiopod-specific gene duplications.

So, to recap: Lingula is not as conservative as its looks would suggest (never judge a living fossil by its cover, right?) We also finally have actual sequences for lots of its shell proteins, which reveal that when it comes to building shells, Lingula does its own thing. Not much of a surprise, but still, knowing is a damn sight better than thinkin’ it’s probably so. We are scientists here, or what.

I am Very Pleased with this genome. (I just wish it was published five years ago 😛 )



*This, interestingly, doesn’t seem to be the general case for all molluscs. Jackson et al. (2010) compared the genes building the pearly layer of snail (abalone, to be precise) and bivalve (pearl oyster) shells, and found that the snail showed no sign of the chitin-making enzymes and silk type proteins that were so abundant in its bivalved cousins. It appears that even within molluscs, different groups have found different ways to make often very similar shell structures. However, all molluscs shells regardless of the underlying genetics are predominantly composed of calcium carbonate.

**You often hear about sharks, or crocodiles, or coelacanths, existing “unchanged” for 100 or 200 or whatever million years, but in reality, 200-million-year-old crocodiles aren’t even classified in the same families, let alone the same genera, as any of the living species. Again, the living coelacanth is distinct enough from its relatives in the Cretaceous, when they were last seen, to warrant its own genus in the eyes of taxonomists. I’ve no time to check up on sharks, but I’m willing to bet the situation is similar. Whether Lingula‘s jaw-dropping 500-million-year tenure on earth is a result of taxonomic lumping or the shells genuinely looking that similar, I don’t know. Anyway, rant over.



Jackson DJ et al. (2010) Parallel evolution of nacre building gene sets in molluscs. Molecular Biology and Evolution 27:591-608

Luo Y-J et al. (2015) The Lingula genome provides insights into brachiopod evolution and the origin of phosphate biomineralization. Nature Communications 6:8301

The things you can tell from a pile of corpses…

I’m really late to this party, but I never claimed to be timely, and the thing about the reproductive habits of Fractofusus is too interesting not to cover.* Rangeomorphs  like Fractofusus are really odd creatures. They lived in that Ediacaran twilight zone between older Precambrian seas devoid of macroscopic animals and younger Cambrian seas teeming with recognisable members of modern groups. Rangeomorphs such as RangeaCharnia and Fractofusus itself have such a unique fractal body plan (Narbonne, 2004) that no one really knows what they are. Although they were probably not photosynthetic like plants or algae (they are abundant in deep sea sediments where there wouldn’t have been enough light), their odd body architectures are equally difficult to compare to any animal that we know.

Mitchell et al. (2015) don’t bring us any closer to the solution of that mystery; they do, however, use the ultimate power of Maths to deduce how the enigmatic creatures might have reproduced. Fractofusus is an oval-shaped thingy that could be anywhere from 1 cm to over 40 cm in length. Unlike some other rangeomorphs, it lay flat on the seafloor with no holdfasts or stalks to be seen. Fractofusus fossils are very common in the Ediacaran deposits of Newfoundland. Since there are so many of them, and there is no evidence that they were capable of movement in life, the researchers figured their spatial distribution might offer some clues as to their reproductive habits. A bit of seafloor covered in Fractofusus might look something like this (drawing from the paper):

clusters within clusters

(The lines between individuals don’t actually come from the fossils, they just represent the putative connection between a parent and its babies.)

Statistical models suggest that the fossils are not randomly distributed but clearly clustered: small specimens around medium-sized ones, which are in turn gathered around the big guys. Two out of three populations examined show these clusters-within-clusters; the third has only one layer of clustering, but it’s still far from random. As the authors note, the real populations they studied involve a lot more specimens than shown in the diagram, but they “rarefied” them a bit for clarity of illustration while keeping their general arrangement.

The study looked not only at the distances between small, medium and large specimens, but also directions – both of where the specimens were and which way they pointed. If young Fractofusus spread by floating on the waves, they’d be influenced by currents in the area. It seems the largest specimens were – they are unevenly distributed in different directions. In contrast, smaller individuals were clustered around the bigger ones without regard to direction. Small and large specimens alike pointed randomly every which way.

What does this tell us about reproduction? The authors conclude that the big specimens probably arrived on the current as waterborne youngsters, hence their arrangement along particular lines . However, once there, they must have colonised their new home in a way that doesn’t involve currents. Mitchell et al. think that way was probably stolons – tendrils that grew out from the parent and sprouted a new individual at the end. This idea is further strengthened by the fact that among thousands of specimens, not a single one shows evidence for other types of clonal reproduction – no fragments, and no budding individuals, are known. (Plus if a completely sessile organism fragments, surely the only way the pieces could spread anywhere would be by riding currents, and that would show up in their distribution.)

Naturally, none of this tells us whether Fractofusus was an animal, a fungus or something else entirely. Sending out runners is not a privilege of a particular group, and while there is evidence that the original founders of the studied populations came from far away on the waves, we have no idea what it was that floated in to take root in those pieces of ancient seafloor. Was it a larva? A spore? A small piece of adult tissue? Damned if we know. Despite what Wikipedia and news headlines would have you believe, there is nothing to suggest that sex was involved. It may have been, but the evidence is silent on that count. (Annoyingly, the news articles themselves acknowledge that. Fuck headlines is all I’m saying…)

While sometimes we gain insights into ancient reproductive habits via spectacular fossils like brooding dinosaurs or pregnant ichthyosaurs, this study is a nice reminder that in some cases, a lot can be deduced even in the absence of such blatant evidence. This was an interesting little piece of Precambrian ecology, and a few remarks in the paper suggest more to come: “Other taxa exhibit an intriguing range of non-random habits,” the penultimate paragraph says, “and our preliminary analyses indicate that Primocandelabrum and Charniodiscus may have also reproduced using stolons.”

An intriguing range of non-random habits? No citations? I wanna know what’s brewing!


*Also, I’ve got to write something so I can pat myself on the back for actually achieving something beyond getting out of bed. Let’s just say Real Life sucks, depression sucks worse, and leave it at that.



Mitchell EG et al. (2015) Reconstructing the reproductive mode of an Ediacaran macro-organism. Nature 524:343-346

Narbonne GM (2004) Modular construction of Early Ediacaran complex life forms. Science 305:1141-1144

The unexpected complexity of nothing

I don’t think I’ve covered anything theoretical in a while, so here’s an interesting modelling study that I’ve just come across in PNAS (Shah et al., 2015). It discusses a key point in evolutionary theory – that “mutations” and “fitness” don’t exist in a vacuum. More specifically, it investigates how mutations that have little effect on fitness at the time interact with other mutations in a protein under strong purifying selection. Lots of studies deal with the role of interactions (or epistasis) between mutations in adaptation and innovation, but apparently, the question is much less explored when selection is keeping things the way they are.

Shah et al.’s approach is a mixture of theory and empirical data. The protein they consider is perfectly real – it’s the amino acid-binding protein argT from the bacterium Salmonella typhimurium (yep, that salmonella), and it was chosen because its structure is well-known and relatively simple. In contrast, the mutations happen entirely on a computer, although the models used to calculate their fitness effects in a simulated population of bacteria were calibrated to match the real-world distribution of the effects of mutations under similar circumstances.

The most important of these circumstances is the fact that this protein is not actively adapting to anything. It is already well-adapted to its function, and there is nothing pushing it in a new direction. Nonetheless, mutations happen whether or not they are needed; what the authors wanted to know is whether the mutations that arise in this environment constrain the course of evolution.

The researchers took the real argT protein sequence and introduced changes. In each round, ten random mutations were proposed, only one of which made it into the next round. For each proposed mutation, they used a program designed to model protein structure to calculate the stability of the new protein. Proteins are long chains of amino acids twisted and folded into specific 3D shapes. The stability of these shapes is important because a protein that is too rigid or too floppy can’t bind the right molecules with the right strength (remember, the function of argT is to grab certain amino acids).

The simulations assumed that the real protein is pretty much optimally stable already, and either increased or decreased its stability would decrease its fitness*. Protein stability was converted to fitness in a way that a realistic percentage of mutations were neutral, kinda bad or plain lethal (you can’t have beneficial mutations here, since the original protein is assumed to be optimised for its function). Finally, the least bad mutation in each round was chosen to update the protein sequence. This procedure was repeated until the protein had accumulated 30 changes, and the whole process was replicated 100 times.

With a hundred new virtual proteins in hand, the really interesting part of the experiment could begin. The grand aim of this whole study was to examine mutations in their historical context. All mutations that were added to the original argT sequence were neutral or nearly neutral at the time of their introduction – but would they be neutral if they were introduced at an earlier point in the evolutionary sequence? And would they still be neutral, or rather, reversible, after a bunch of other mutations had accumulated on top of them?

As you may have guessed, the answer to both questions is no. Even though the final 100 proteins were pretty much as good as the original, and each of the mutations that made it through had close to zero effect on fitness at the time, taking mutations out of their context and sticking them into different backgrounds showed that their lack of effect was highly contingent on the history of that particular sequence.

The graph below summarises what happens if you take mutation 16 and either shift it to an earlier point, or take it out at a later point (a similar pattern holds no matter which mutation you start from). On the vertical axis is the fitness effect of the same mutation at different points relative to its effect at the time it actually occurred. The left side of the graph is consistently below zero – at any point before its “proper” time, mutation 16 would have been more deleterious. It only worked with all 15 previous mutations already in place.


On the right side – mutation 16’s future – fitness effects rise rapidly. The more new mutations are added, the more “beneficial” (or more precisely, irreversible) mutation 16 becomes. Even though it didn’t do much at the time, as soon as other mutations come to rely on it, you can’t take it out without royally screwing up the whole protein. The mutation has become entrenched, to use the authors’ terminology. This figure is an average of all 100 simulations; the results are pretty consistent.

Of course, there are some caveats. One of the most important is that in real populations, mutations are not necessarily fixed one at a time, and the way multiple co-existing mutations interact could be quite different from the way individual mutations affect subsequent individual mutations. Another big if is the accuracy of the software that calculates protein stability – getting from protein sequence to structure and physical/chemical properties is still notoriously difficult. In this study, considering only the first few mutations in each series (i.e. before the virtual protein diverged too far from the original with known properties) doesn’t change the main results, so the authors don’t think this is a major problem for their conclusions. There is also the fact that global protein stability, the variable used here to estimate fitness, is not the same thing as function (in this case, binding specific amino acids). However, the latter depends only on a tiny proportion of the larger structure, so global stability is probably a reasonable proxy.

It occurs to me that what Shah et al.’s study simulated is basically the evolution of irreducibly complex nothing. Here we have a protein that does the exact same thing its ancestor did (with the above caveat) despite having a rather different sequence. This utter lack of change evolved one tiny step at a time; each step dispensable, each step insignificant. Yet try to take out any of the earlier steps from the final product, and the whole edifice collapses.

Call me strange, but I find this… amusing.


*They actually repeated the entire experiment with an alternative assumption that increases in stability are neutral rather than deleterious, but they got very similar results, largely due to the fact that very few mutations actually increased the stability of argT.



Shah P et al. (2015) Contingency and entrenchment in protein evolution under purifying selection. PNAS 112:E3226–E3235

Msp130 adventures, or the Mammal does science

I’ve been writing this blog almost since I started my PhD, but the closest I actually got to writing about my own work was a long fangirl squee about fan worms. Most of my project involved describing some really basic things about a relatively unknown animal, and probably not terribly interesting unless you’re an expert in my field (also, my brain is convinced that nearly everything I do is shit, so I don’t particularly like talking about it…). However, I do have this cool little story I’ve been burning to tell the world, and couldn’t because we wanted it published… Now it is (Szabó and Ferrier [2015]; there goes my super-secret identity, I suppose 😉 )

My story involves a family of proteins called msp130. I wish they had a more fun name than that, but they were named by sea urchin people, and unlike the fruit fly community, they don’t really seem to care about making their gene names fun. (Msp130 stands for “mesenchyme-specific protein, 130 kDa”, in case you wondered; kDa, kilodaltons, being units of molecular mass.)

It all started with a sea urchin

The original msp130 was discovered in sea urchin larvae. It is found in – or rather, on the surface of – primary mesenchyme cells (PMCs), a specialised population of cells that build the calcareous skeleton of the larva. Here’s a photo of a sea urchin embryo with PMCs stained blue, from Illies et al. (2002). At this stage, the embryo is basically a squashed ball with a hole through most of it; the hole is going to become the gut, and its opening is the future anus.


Here’s a polarised light photograph of an older larva of a sea biscuit. The skeleton is pretty much the only thing you can see, highlighted in stunning rainbow colours due to the birefringence of the mineral (Bruno Vellutini, flickr):

Msp130 turned out to be essential for skeleton formation – when researchers blocked its surface with antibodies, PMCs cultured in a dish couldn’t take up calcium and couldn’t make spicules (Carson et al., 1985; Anstrom et al., 1987). Not quite so long ago, Illies et al. (2002) found that S. purpuratus has at least three msp130 genes, and in the embryo/larva, the other two are also exclusively expressed in PMCs. This is what the first picture above shows: the blue stain appears in cells that express one of the msp130-related genes.

Anyway. A few years later, after the sequencing of the S. purpuratus genome, it turned out that there were at least seven such genes, residing in a couple of clusters in the genome (Livingston et al., 2006). However, until very recently, the msp130 family was only studied in echinoderms.

Horizons are expanded and weirdness is found

BUT, this being the genomic era, sea urchin guru Charles Ettensohn wanted to know more about these buggers – just how common are they? Where do they come from? Are they always lurking in genomes that have to produce calcified skeletons? What he found in sifting through the vast repository of sequence data that is Genbank was very interesting and somewhat puzzling: across the entire tree of life, msp130 genes only seemed to be present in echinoderms, acorn worms, lancelets, molluscs, a handful of algae… plus loads of bacteria and archaea (Ettensohn, 2014). There was no mistaking it: to someone accustomed to comparing protein sequences, the bacterial sequences very clearly were the same thing as the ones from animals and algae.

So, Ettensohn concluded, it looks like animals (and algae) probably didn’t inherit this thing directly from their common ancestor with other life forms. That would imply a lot of independent losses, and Occam’s razor dictates that we shouldn’t postulate so many hypothetical events without good reason (although, as Maeso et al. [2012] point out, animal genomes don’t seem to be quite as keen on Occam’s razor as scientists).

Instead, supposing that animals and algae repeatedly acquired these genes by horizontal gene transfer from bacteria (or each other?) seems like a simpler explanation. At least one loss probably did occur – among deuterostomes, vertebrates and sea squirts are the odd ones out in not having msp130 genes, and the most Occamific explanation of that pattern is that we just mislaid them somewhere along the line. Here’s a graphical representation of Ettensohn’s scenario from his paper – “HGT” stands for horizontal gene transfer events, and grey circles are meant to represent the extra msp130 genes that later evolved in each lineage by gene duplication:


However, Ettensohn also pointed out that whole genome-level information about most animal groups is still pretty thin on the ground (seriously, everyone, stop sequencing more stupid vertebrates. We’re all the same.) We don’t, for example, have published genomes from calcareous sponges, or from annelid worms who build calcareous tubes or have other calcareous hard parts. Like my wormies. And here’s where I come in – I happen to have a decent amount of transcriptome data (alas, no genome) from just the right kind of annelid. Better, my data are derived specifically from an organ with calcareous parts (the operculum – see my fanworm post).

Naturally, as soon as I read Ettensohn’s paper, the first thing I did was grab the sequence of the “original” msp130 protein and search my own data for a match. Ettensohn said that msp130 sequences were very easy to recognise… And yep, they are. With not much effort at all, I found a lovely, full-length msp130-like sequence in my big pile of data. Much as I hate doing molecular biology, I also managed to confirm the presence of the messenger RNA (or at least the presence of one end of it) in an actual test tube of actual RNA taken from the operculum. But that’s not really saying much re: the whole gene thievery issue – yeah, another animal fairly closely related to molluscs has an msp130 gene, and it’s active somewhere within a millimetre of a calcareous hard part. That, unfortunately, says precisely bugger all about their evolutionary origin.

But I had an idea, peeps. Introns!!!

Genes in pieces make answers come together

There is an important difference between the genes of prokaryotes like bacteria and eukaryotes like algae or animals. In the former, most genes are uninterrupted stretches of DNA. A bacterial gene is transcribed into messenger RNA, and everything in that mRNA that stands between the “start protein” and “end protein” signals is translated into a protein using the appropriate genetic code.

Most of the genes of eukaryotes, however, consist of chunks that encode parts of the protein product (exons) interrupted by chunks that get discarded during or after transcription (introns)*. So there’s a potentially easy way of telling whether a gene in two different animals came from their common ancestor or from some overly generous microbes. If they have introns in matching locations, that’s not a similarity they could have acquired just by getting the gene from the same bacterium!

I say potentially easy for at least two reasons. One, while some gene families keep their introns in the same places for a very long time, introns can come and go in evolution. They can even disappear completely under some circumstances, although something the size of msp130 does usually have at least a few. If msp130 genes have fast-evolving structures, we may not be able to tell whether molluscs and deuterostomes acquired them independently, or whether the positions of introns just changed too much since their common ancestor.

Two, introns can theoretically evolve twice in the same place – just as some parts of a genome can be hotspots for mutations, parts of a gene can be hotspots for new introns. Of course, the more similar the overall structure of two genes, the less likely “intron hotspots” become as an explanation.

I compared the exon-intron structures of all msp130 genes in a few representative species with sequenced genomes in which Ettensohn found such genes. Besides sea urchins (which are from one of the two main deuterostome lineages), I chose lancelets (from the other great branch of deuterostomes) and limpets (which are molluscs). Together, these three creatures represent all major animal lineages in which msp130 genes have been found. Alas, I couldn’t do it with my own animals, because I don’t have a genome to play with 😦 . I also checked all three algae – the two green algae on Ettensohn’s list are fairly closely related, but the third one is a brown alga separated by upwards of a billion years.

As I said, all of these species have fully sequenced genomes, but you really need two sources of data to do this kind of thing properly. A genome sequence includes the complete gene with all the introns – but without the corresponding mRNA sequences, we must use clever computer programs that search for characteristic DNA motifs and/or sequence similarity to other organisms to predict where introns begin and end. Aside from clever programs occasionally being remarkably stupid or getting confused by sequencing errors, you can hopefully see how relying on similarity doesn’t exactly provide unbiased evidence for my purposes.

Sequences derived from transcripts only contain exons, however, and not because a computer predicted them, but because they’re read from the fully edited mRNA. So aligning transcripts with genomes should tell you exactly where the introns are, although transcript data were incomplete or altogether missing for some of the genes I looked at. (I didn’t have that problem with sea urchins – Tu et al. [2012] helpfully sequenced transcripts of pretty much all urchin genes and uploaded the results to the genome browser.)

Nonetheless, the data that did exist told us enough to doubt Ettensohn’s idea. Importantly, I found enough to piece together the entire protein-coding portion of the mRNA for two of the limpet msp130 genes – in other words, the animals that Ettensohn thought likely to have acquired the family independently from sea urchins. In total, the animal species I investigated share not just one or two but seven intron locations (an msp130 gene has maybe a dozen introns altogether). One of those is also present in the algae, and the sequence next to it is almost identical across all of the genes. There’s really no mistaking that one! A few more introns are in generally similar locations, though they don’t line up perfectly in my best alignment**.

What can we conclude from this? I think we can probably say with reasonable certainty that deuterostomes and molluscs didn’t get msp130 genes from bacteria separately. Given the similarity with algae, they might not have got it from bacteria at all, although one similarly positioned intron is a lot easier to explain away as convergent evolution.

As I see it, either the last common ancestor of molluscs+annelids and deuterostomes had msp130 genes and only a few of its descendants kept them, or one of the two lineages snatched it from the other after those seven introns had originated. (Animals stealing genes from other animals is relatively uncommon, as far as I know.)

…some answers, anyway…

If you put the evidence for a single origin together with the incredibly gappy distribution of this gene family, the other side of the equation is a ridiculous number of losses. Why? And what’s the deal with msp130 and calcification? Is there a deal at all? Ettensohn speculated that acquiring msp130 might have had something to do with acquiring calcareous skeletons – did it?

IMO we really don’t have enough examples to properly assess this association, and my impression is that we actually know very little about the roles of these genes. Oh, we know that some of them are pretty specific to calcification in certain echinoderms, and they seem to be around in multiple organs in molluscs given that the hundreds of RNA sequences I found had been extracted from anything from gonads to mouthparts. And, of course, at least one of them is doing something in a partly calcified body part in my annelid, though we haven’t yet checked exactly where or what.

But calcification is pretty much the only context in which msp130s have been investigated; since everyone thought they were just echinoderm “calcification genes”, no one thought to look elsewhere. What do they do in, say, lancelets, which have six of the genes but not much of a calcified skeleton that we know of? Lancelets may well have something calcareous that isn’t a skeleton – other animals with no obvious calcareous skeletons, such as arachnids or earthworms, produce little calcareous granules that might work to store calcium or get rid of a surplus. Most of the limpet transcripts I found come from testicles or ovaries, which don’t tend to calcify, but gonads are a bit special and turn on lots of random genomic shit that may or may not actually have a function. AFAIK, none of the three algae from which msp130 genes are known has a calcareous skeleton, but many other algae do.

In summary, I did some detective work and discovered something and I feel rather clever about all of that, but in the process I learned just how much more we don’t know about this obscure but intriguing little gene family.

… actually, that sounds like a fairly typical summer in science. 🙂


*Don’t ask me how that happened (it’s not even remotely my area), but now that the system exists, it does enable eukaryotes to make loads of different proteins from a single gene just by picking and choosing which exons to keep. See fruit fly Dscam, or the “brutally murdering the one gene, one protein hypothesis, forty thousand splice variants at a time” gene. Introns can also contain a variety of regulatory sequences that determine either the behaviour of their own gene or even that of a different gene, so introns are far from useless. They’re just a bit… counterintuitive.

**Aligning similar sequences is part science, part art. Often, there’s no single clear best way to align two or more genes or proteins; the various programs people have written for this job will all come up with slightly different answers, and an experienced pair of eyes will probably want to tweak all of them. Whether introns are really in the same place in two genes can therefore be a bit ambiguous, depending on the degree of sequence similarity.



Anstrom JA et al. (1987) Localization and expression of msp130, a primary mesenchyme lineage-specific cell surface protein in the sea urchin embryo. Development 101:255-265

Carson DD et al. (1985) A monoclonal antibody inhibits calcium accumulation and skeleton formation in cultured embryonic cells of the sea urchin. Cell 41:639-648

Ettensohn CA (2014) Horizontal transfer of the msp130 gene supported the evolution of metazoan biomineralization. Evolution & Development 16:139-148

Illies MR et al. (2002) Identification and developmental expression of new biomineralization proteins in the sea urchin Strongylocentrotus purpuratus. Development Genes and Evolution 212:419-431

Livingston BT et al. (2006) A genome-wide analysis of biomineralization-related proteins in the sea urchin Strongylocentrotus purpuratus. Developmental Biology 300:335-348

Maeso I et al. (2012) Widespread recurrent evolution of genomic features. Genome Biology and Evolution 4:486-500

Szabó R & Ferrier DEK (2015) Another biomineralising protostome with an msp130 gene and conservation of msp130 gene structure across Bilateria. Evolution & Development 17:195-197

Tu Q et al. (2012) Gene structure in the sea urchin Strongylocentrotus purpuratus based on transcriptome analysis. Genome Research 22:2079-2087

Putting the cart before the… snake?

Time to reexamine some assumptions (again)! And also, talk about Hox genes, because do I even need a reason?

Hox genes often come up when we look for explanations for various innovations in animal body plans – the digits of land vertebrates, the limbless abdomens of insects, the various feeding and walking and swimming appendages of crustaceans, the strongly differentiated vertebral columns of mammals, and so on.

Speaking of differentiated vertebral columns, here’s one group I’d always thought of as having pretty much the exact opposite of them: snakes. Vertebral columns are patterned, among other things, by Hox genes. Boundaries between different types of vertebrae such as cervical (neck) and thoracic (the ones bearing the ribcage) correspond to boundaries of Hox gene expression in the embryo – e.g. the thoracic region in mammals begins where HoxC6 starts being expressed.

In mammals like us, and also in archosaurs (dinosaurs/birds, crocodiles and extinct relatives thereof), these boundaries can be really obvious and sharply defined – here’s Wikipedia’s crocodile skeleton for an example:

In contrast, the spine of a snake (example from Wikipedia below) just looks like a very long ribcage with a wee tail:

Snakes, of course, are rather weird vertebrates, and weird things make us sciencey types dig for an explanation.

Since Hox genes appear to be responsible for the regionalisation of vertebral columns in mammals and archosaurs, it stands to reason that they’d also have something to do with the comparative lack of regionalisation (and the disappearance of limbs) seen in snakes and similar creatures. In a now classic paper, Cohn and Tickle (1999) observed that unlike in chicks, the Hox genes that normally define the neck and thoracic regions are kind of mashed together in embryonic pythons. Below is a simple schematic from the paper showing where three Hox genes are expressed along the body axis in these two animals. (Green is HoxB5, blue is C8, red is C6.)


As more studies examined snake embryos, others came up with different ideas about the patterning of serpentine spines. Woltering et al. (2009) had a more in-depth look at Hox gene expression in both snakes and caecilians (limbless amphibians) and saw that there are in fact regions ruled by different Hoxes in these animals, if a little fuzzier than you’d expect in a mammal or bird – but they don’t appear to translate to different anatomical regions. Here’s their summary of their findings, showing the anteriormost limit of the activity of various Hox genes in a corn snake compared to a mouse:


Such differences aside, both of the above studies operated on the assumption that the vertebral column of snakes is “deregionalised” – i.e. that it evolved by losing well-defined anatomical regions present in its ancestors. But is that actually correct? Did snakes evolve from more regionalised ancestors, and did they then lose this regionalisation?

Head and Polly (2015) argue that the assumption of deregionalisation is a bit stinky. First, that super-long ribcage of snakes does in fact divide into several regions, and these regions respect the usual boundaries of Hox expression. Second, ordinary lizard-shaped lizards (from which snakes descended back in the days of the dinosaurs) are no more regionalised than snakes.

The study is mostly a statistical analysis of the shapes of vertebrae. Using an approach called geometric morphometrics, it turned these shapes from dozens of squamate (snake and lizard) species into sets of coordinates, which could then be compared to see how much they vary along the spine and whether the variation is smooth and continuous or clustered into different regions. The authors evaluated hypotheses regarding the number of distinct regions to see which one(s) best explained the observed variation. They also compared the squamates to alligators (representing archosaurs).

The results were partly what you’d expect. First, alligators showed much more overall variation in vertebral shape than squamates. Note that that’s all squamates – leggy lizards are nearly (though not quite) as uniform as their snake-like relatives. However, in all squamates, the best-fitting model of regionalisation was still one with either three or four distinct regions in front of the hips/cloaca, and in the majority, it was four, the same number as the alligator had.

Moreover, there appeared to be no strong support for an evolutionary pattern to the number of regions – specifically, none of the scenarios in which the origin of snake-like body plans involved the loss of one or more regions were particularly favoured by the data. There was also no systematic variation in the relative lengths of various regions; the idea that snakes in general have ridiculously long thoraxes is not supported by this analysis.

In summary, snakes might show a little less variation in vertebral shape than their closest relatives, but they certainly didn’t descend from alligator-style sharply regionalised ancestors, and they do still have regionalised spines.

Hox gene expression is not known for most of the creatures for which vertebral shapes were analysed, but such data do exist for mammals (mice, here), alligators, and corn snakes. What is known about different domains of Hox gene activation in these three animals turns out to match the anatomical boundaries defined by the models pretty well. In the mouse and alligator, Hox expression boundaries are sharp, and the borders of regions fall within one vertebra of them.

In the snake, the genetic and morphological boundaries are both gradual, but the boundaries estimated by the best model are always within the fuzzy boundary region of an appropriate Hox gene expression domain. Overall, the relationship between Hox genes and regions of the spine is pretty consistent in all three species.

To finish off, the authors make the important point that once you start turning to the fossil record and examining extinct relatives of mammals, or archosaurs, or squamates, or beasties close to the common ancestor of all three groups (collectively known as amniotes), you tend to find something less obviously regionalised than living mammals or archosaurs – check out this little figure from Head and Polly (2015) to see what they’re talking about:


(Moving across the tree, Seymouria is an early relative of amniotes but not quite an amniote; Captorhinus is similarly related to archosaurs and squamates, Uromastyx is the spiny-tailed lizard, Lichanura is a boa, Thrinaxodon is a close relative of mammals from the Triassic, and Mus, of course, is everyone’s favourite rodent. Note how alligators and mice really stand out with their ribless lower backs and suchlike.)

Although they don’t show stats for extinct creatures, Head and Polly argue that mammals and archosaurs, not snakes, are the weird ones when it comes to vertebral regionalisation. For most of amniote evolution, the norm was the more subtle version seen in living squamates. It was only during the origin of mammals and archosaurs that boundaries were sharpened and differences between regions magnified. Nice bit of convergent/parallel evolution there!



Cohn MJ & Tickle C (1999) Developmental basis of limblessness and axial patterning in snakes. Nature 399:474-479

Head JJ & Polly PD (2015) Evolution of the snake body form reveals homoplasy in amniote Hox gene function. Nature 520:86-89

Woltering JM et al. (2009) Axial patterning in snakes and caecilians: evidence for an alternative interpretation of the Hox code. Developmental Biology 332:82-89