The ctenophore conundrum, by popular demand

So, a new ctenophore genome has just been published in Nature (Moroz et al., 2014), it makes some extraordinary claims, and my resident palaeontologist/web-buddy Dave Bapst wants my opinion 😉

Given that I already planned to have an opinion about the first ctenophore genome back in December (Ryan et al., 2013) and miserably failed to finish the post… the temptation is just too strong. (That thesis chapter draft in the other window of MS Word wasn’t going to be finished today anyway  >_>)

Whatever I might seem from words on the internet, I’m not some kind of expert on phylogenetics, so I’m going to use a crutch. I had this idea back when I first read Ryan et al. (2013), because I remember thinking that it was written almost as if Nosenko et al. (2013) had never happened, and I’d really liked Nosenko et al. (as you can guess from the word count of this post), so I was mildly indignant about that. The Nosenko paper is going to be my crutch. (No offence to Hervé Philippe and friends, but there are only so many papers I’m going to reread for an out of the blue blog post 😉 )

Although I’m obviously not writing a public post specifically for a phylogeny nut, I may get somewhat technical, and I’m definitely going to get verbose.


Ctenophores. Comb jellies, sea gooseberries, Venus girdles. They are floaty, ethereal, mesmerizingly beautiful creatures, and I have it on good authority that they are also complete pains in the arse.

Here’s some pretty pictures before it gets too painful 😉 Left: Mnemiopsis leidyi from Ryan et al. (2013); right: Pleurobrachia bachei from Moroz et al. (2014). And a bonus video of a Venus girdle making like an ancient nature spirit. I could watch these beasties all day.


Venus from Sandrine Ruitton on Vimeo.

The problem(s)

And now, the pain. Let’s pull out my trusty old animal phylogeny, because the question marks are once again highly appropriate. (Also, I’m hell-bent on breaking your bandwidth with PICTURES.)


Ryan et al. (2013) helpfully have a figure distilling the ideas people have had about those question marks so far:


Bi = bilaterians, Cn = cnidarians, Ct = ctenophores, Tr = Trichoplax, and Po = sponges (Porifera).

I say “helpfully,” but it’s not all that helpful after all, since pretty much every possible configuration has been proposed. Why is this such a difficult question? Here’s a quick rundown of the problems Nosenko et al.’s study found to affect the question marks:

  1. Fast-evolving protein sequences – these can cause artefacts because too much change overwrites informative changes and creates chance similarities. Excluding faster-evolving sequences from the analysis changes the tree.
  2. Sequence data that don’t conform to the simplifying assumptions of popular evolutionary models – again, this can result in chance similarities and artefacts, and using a poorer model replicates the effects of using less ideal sequences.
  3. Long-branched outgroups – these are the non-animal groups used to place the root of animals. The more distant from animals and less well-sampled the outgroup, the longer the branches it forms, which can attract fast-evolving animal lineages towards the root. In Nosenko et al.’s analyses, even the closest outgroup seemed to cause problems, and removing the outgroup altogether made the conflicts between different models and datasets disappear completely – but this isn’t exactly helpful when you’re looking for the root of the animal tree!

The problem with ctenophores in particular is illustrated by this one of Nosenko et al.’s trees, made from one of their less error-prone datasets:


The ctenophore branch is not only longer overall than pretty much any other in the tree; its length is also very unevenly distributed between the loooong history common to all species and the short unique lineage of each individual species. That is bad news. And it may stay that way forever, because the last common ancestor of living ctenophores may genuinely be very recent, so there’s no way to divide up that long-ass internal branch without a time machine.

Round 1: Nosenko vs. Ryan

In fairness, the Mnemiopsis genome team probably didn’t have a whole lot of time to specifically deal with Nosenko et al.’s points (OTOH, none of those individual points were truly new). The Nosenko paper came out in January 2013, and the Mnemiopsis genome paper was received by Science in July of the same year – I imagine most of the data had been generated way before then, and you can’t just redo all your data analysis and rewrite a paper on short notice.

I’m still going to view Ryan et al. (2013) in the light of Nosenko, because regardless of the genome team’s ability to answer them, some of Nosenko et al.’s points are very relevant to the claims they make. Their biggest claim, of course, being that ctenophores are the sister group to all other animals.

In Nosenko et al.’s experiments, this placement showed up in trees where faster-evolving genes, poorer models or more distant outgroups were used, but not when the slowest-evolving gene set was analysed with the best models and the closest outgroup.

Ryan et al. acknowledge that “supermatrix analyses of the publicly available data are sensitive to gene selection, taxon sampling, model selection, and other factors [cite Nosenko].” Their data are obviously sensitive to such factors. In fact, they behave rather similarly to what I saw in the Nosenko study.

Ryan et al. used two method/model combinations – one of the models was the preferred CAT model of Nosenko et al., and the other was the OK but not great GTR model that CAT beat by miles in terms of actually fitting Nosenko et al.’s data. (Caveat: in the genome paper, the CAT and GTR models were used with different treebuilding methods, so we can’t blame the models for different results with any certainty.) Also, they analysed the data with three different outgroups.

And guess what – the ctenophores-outside-everything tree was best supported with (1) the GTR model, (2) the more distant outgroups. There is not much testing of the effect of gene choice – there were two different data sets, but they were both these massive amalgamations of everything useable, and they also included totally different samples of species.

However, here comes another nod to Nosenko et al. and all the other people who advocated trying things other than “conventional” sequence comparisons through the years. Provided you can securely identify genes across different organisms, you can also try to deduce evolutionary history based on their presences and absences rather than their precise sequences. This is not a foolproof approach because genes can be (commonly) lost or (occasionally) picked up from other organisms, but it is often regarded as less artefact-prone than sequence-based trees.

But does it help with ctenophores? Like the GTR model-based sequence trees, the tree based on gene presence/absence (you obviously need complete genomes for this!) supports ctenophores being the outsider among animals:


My problem with this? Note what else it supports. The white circles indicate groupings that this method had absolutely no doubt about. And these groupings include things that frankly sound like abject nonsense. Here’s one annelid worm (the leech Helobdella) sitting next to a flatworm, while another annelid worm (Capitella) teams up with a limpet right next to a chordate. If anything, that is more controversial than the placement of ctenophores, because we thought we had it settled!

So if we’re concluding that ctenophores are basal to all other animals, why aren’t we also making a fuss about the explosion of phylum Annelida? Surely, if this method gives us strong enough conclusions to arbitrate between different sequence-based hypotheses about ctenophores, it’s strong enough to make those claims too. The cake can’t quite decide if it’s being eaten, I think.

I’m not sure what to think about the sequence trees. I’m far more confident about the presence/absence one. Maybe I’m just demonstrating the Dunning-Kruger effect here, but I’m not buying that tree for a second.

Overall verdict?

Not convinced. Not by a long shot.

Round 2: Nosenko vs. Moroz

The Pleurobrachia genome took me completely by surprise. I’d known Mnemiopsis was sequenced since Ryan et al. (2010). (Three years. Can you imagine the twitching?) I had no idea this other project was happening, so I nearly fell off my chair when Nature dropped it into my RSS reader yesterday. Another ctenophore genome – and another one that supports ctenophore separatism? (This hypothesis is becoming strangely popular…)

Bonus: it’s not just a genome paper, it also describes the transcriptomes of ten different ctenophores. Transcriptomes, the set of all active genes, are a little bit easier to sequence and assemble than genomes, and if you’re thorough they’ll catch most of the genes the organism has, so they can be almost as good for the analysis of gene content.

Which they kind of don’t do properly. There is a discussion of specific gene families that ctenophores lack – including many immune- and nervous system-related genes – but that’s not exactly saying much given that we know even “important” genes can be lost (case in point: the disappearing (Para)Hox genes of Trichoplax). The fact that ctenophores seem to completely lack microRNAs is interesting, but again, it doesn’t mean they never had them. Sponges do have microRNAs but don’t seem to be nearly as big on them as other animals.

As for the global analysis of gene content – I had to chase down a reference (Ptitsyn and Moroz, 2012) to understand what they actually did. As far as I can tell, there is no phylogenetic analysis involved – they just took a tree they already had, and used this method to map gene gains and losses onto that tree. Which is cool if you’re fairly sure about your tree, but pretty much meaningless when the tree is precisely the question. The Mammal is disappointed.

One of the problems with listing genes that aren’t there or don’t work in the “expected” way in ctenophores is that even if they’re not outside everything else, it’s still a distinct possibility that these guys branched off from our lineage before cnidarians did. For example, the Pleurobrachia paper spends a lot of time on “nervous system-specific” genes like elav missing or not being expressed in neurons, and common neurotransmitters like serotonin not being used by ctenophores.

But, assuming that the tree of animals looks something like (sponges + (ctenophores + (cnidarians + bilaterians))), we wouldn’t expect ctenophore nervous systems to share every property that cnidarians and bilaterians share. Remember: (1) sponges don’t have nervous systems, so they’re not much use as a comparison, (2) cnidarians + bilaterians had a longer common ancestry than either did with ctenophores. Genes possessed by sponges PLUS cnidarians and/or bilaterians but missing from ctenophores are more suggestive, but only if you can demonstrate that they weren’t lost. (We’re kind of going in circles here…)

The other problem is that pesky last common ctenophore ancestor. If it really is very recent, then taking even all living ctenophores to represent ctenophore diversity is like taking my close family to represent human diversity. Just like my family contains pale-skinned, lactose tolerant people, it is entirely possible that this lone surviving ctenophore lineage possesses (or lacks) important traits that aren’t at all typical of ctenophores as a whole. Ryan et al.’s supplementary data are clear that at least the Mnemiopsis genome is horribly scrambled, all trace of conserved gene neighbourhoods erased from it. That’s not exactly promising if you’re hoping for “trustworthy” animals.

The actual phylogenetic trees in Moroz et al. (2014) seem to follow an approach of throwing AAAALLL the genes at the problem. The biggest dataset contains 586 genes, compared to 122 in Nosenko et al.’s largest collection, and there is not much filtering by gene properties other than “we can tell what it is”. I have no idea how the CAT + WAG model they used compares to CAT or WAG or GTR on their own; unfortunately, the Nosenko paper doesn’t test that particular setup and this one doesn’t do any model testing. Moroz et al.’s supplementary methods claim it’s pretty good, cite something, and I’m not gonna chase down that reference. (Sorry, I’ve been poring over this for four hours at this point).

Interestingly, the support for ctenophores being apart from other animals increases when they start excluding distant outgroups. The only time it’s low is when they add all ten ctenophores and use fewer genes. Hmm. This is where I would like to hear some real experts’ opinions, because on the face of it, I can’t pinpoint anything obviously wrong. (Other than saying that chucking more genes at a problem tree is perfectly capable of making the problem worse)

TL;DR version: While I’m generally underwhelmed by the gene content stuff, I literally have no idea what to think about the trees.

I’m banking on the hope that someone will do.


And… I think that is all the opinion I’m going to have about ctenophores for a long time. Lunch was a long time ago, my brain is completely fried, and I’m not sure how much of the above actually makes sense. To be clear, I don’t really have a horse in this race, though I’d really like to know the truth. (Fat chance of that, by the looks of it…) I think I’m going to need a bit more convincing before I stop looking sideways at this idea that ctenophores are further from us than sponges. If anything is clear from recent phylogenomics papers, it’s that what data you analyse and how you analyse them makes a huge difference to the result you get, and this is happening with data and methods where it’s not necessarily easy to dismiss an approach as clearly inferior.

It’s a mess, damn it, and I’m not qualified to untangle it. Urgh.



Moroz LL et al. (2014) The ctenophore genome and the evolutionary origin of neural systems. Nature advance online publication, 21/05/2014; doi: 10.1038/nature13400

Nosenko T et al. (2013) Deep metazoan phylogeny: When different genes tell different stories. Molecular Phylogenetics and Evolution 67:223-233

Ptitsyn A & Moroz LL (2012) Computational workflow for analysis of gain and loss of genes in distantly related genomes. BMC Bioinformatics 13:S5

Ryan JF et al. (2010) The homeodomain complement of the ctenophore Mnemiopsis leidyi suggests that Ctenophora and Porifera diverged prior to the ParaHoxozoa. EvoDevo 1:9

Ryan JF et al. (2013) The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342:1242592

The use of a larva?

Hi! Long time no see!

(I think we’ve reached the point where it’s weird to say happy new year. I could swear xkcd had a pertinent chart of funny, but I couldn’t find it.)

Once upon a time, I briefly mentioned the problematic relationships of hemichordates. Since a short paper bearing on the subject came out relatively recently (i.e. in December, yes, I’m far behind the times ;)), I thought I’d revisit it.

To begin, let’s orient ourselves on my trusty old animal phylogeny.


Hemichordates are a phylum of deuterostomes, and their closest relatives appear to be echinoderms like starfish. The inside of Deuterostomia looks something like this:


Hemichordates come in two flavours: the butt-ugly (but nevertheless intriguing) acorn worm, which even the artistic eye of 19th century zoologists couldn’t make appealing (a selection of them from Johann Wilhelm Spengel’s work below):

… and the slightly nicer-looking pterobranch. Well. They’re kind of fluffy. That counts as “nicer,” right? (A couple of Cephalodiscus from the Halanych lab below):

Acorn worms and pterobranchs have different bodies adapted to very different lifestyles. Pterobranchs are stalked, tentacled filter-feeders that often clone themselves into colonies that live together in a branching tube system. Acorn worms are solitary burrowers without tentacles, tubes or shells. Hemichordates possess features in common with vertebrates, such as gill slits, and they seem a lot less freakish than their sister phylum Echinodermata. So hemichordates are kind of the natural go-to group to look for properties of the deuterostome common ancestor.

The only problem is, to do that, you need a solid understanding of hemichordate phylogeny itself. Because there are two very different kinds of hemichordates, you have to first figure out which of those best represents their common ancestor: the sit-at-home plankton sifter or the roaming mud-eating worm. (Maybe neither. Wouldn’t that be funny.) And, as it happens, there’s some disagreement about that.

One view, espoused by the mighty zoological tome of Brusca and Brusca (2002) among others, puts acorn worms and pterobranchs as separate sister groups, and considers pterobranchs the more conservative of the two. The Bruscas write, on page 869, that “the enteropneusts [= acorn worms] have lost [their tentacles], no doubt in connection with their development of an infaunal lifestyle.” In this view, the deuterostome ancestor was a sessile filter feeder, and the long worm-like body and general moving-aboutiness of other deuterostomes is a new feature.

The other hypothesis, backed by DNA sequence data (Cannon et al., 2009)* and more recently the discovery of a tube-dwelling acorn worm from the Cambrian (Caron et al., 2013), is that pterobranchs are a weird subgroup of acorn worms and therefore unlikely to say much about our own distant ancestors.

One thing that AFAIK both camps agree on is that the ancestral acorn worm had a larva that looked nothing like an acorn worm. That’s something pretty common for marine invertebrates. Creatures as different as sea urchins and ragworms explore the seas by way of tiny, planktonic larvae that later metamorphose into a completely different animal**. (Tornaria larva of an unidentified hemichordate below by Alvaro E Migotto from the Cifonauta image database.)

However, the specific family of acorn worms that pterobranchs supposedly come from does not have such a larval stage. They develop more or less directly from fertilised eggs into mini-acorn worms.

Pterobranchs are poorly studied, so not much is known about their babies. Are they like the conventional acorn worm larva, with its distinctive body plan and curly rows of cilia? Or are they more straightforward precursors of the adult, like their presumed closest cousins? Stach (2013) describes a larva of the pterobranch Cephalodiscus gracilis that looks more like the latter. He found the minuscule creature crawling around in a colony of adult Cephalodiscus, and used thin sections and transmission electron microscopy to make a 3D reconstruction of it.

(His account of finding the baby makes me wonder how the hell he knew it did belong to Cephalodiscus. If my experience with tube-dwelling marine invertebrates is anything to go by, being found in a certain animal’s home is no guarantee that you’re related to said animal. I suppose, incomplete though they may be, older descriptions of pterobranch babies were good enough to identify the little guy?)

The image that emerges is of a rather featureless little sausage. According to Stach, it has a through gut, one full-fledged and one partially formed gill opening (asymmetry like that is not unheard of in deuterostome embryos/larvae), as well as a body cavity and a bunch of muscle cells. What it doesn’t have is any trace of the bands of cilia that “typical” acorn worm larvae use to swim and feed, nor some other structures (e.g. nerve centres) that characterise such larvae.

Taken at face value, this would suggest (assuming this is a typical pterobranch larva) that the pterobranchs-are-acorn worms people are right. I have my reservations, and not just because a sample size of one makes me statistically nervous. Using this description as evidence for evolutionary relationships assumes that traditional larvae with ciliary bands are hard to lose. But that’s quite possibly not the case.

Echinoderm larvae, for example, have changed a lot even in the last few million years. The changes occurred many times independently, and often involved a return from a full-fledged larval stage to more direct development (Raff and Byrne, 2006). I don’t know whether acorn worms display a similar sort of flexibility. How many have even been studied in terms of development?

So: detailed internal structure of a pterobranch larva? Cool. As to the worms first hypothesis… “consistent with” would be a better description than “supports”, I think.



*Although microRNAs beg to differ (Peterson et al., 2013).

**The history of these larvae is a mighty can of worms, or trochophores and tornariae as the case may be. I shall say no more on the matter here. 🙂



Brusca RC & Brusca GJ (2002) Invertebrates (second edition). Sinauer Associates.

Cannon JT et al. (2009) Molecular phylogeny of hemichordata, with updated status of deep-sea enteropneusts. Molecular Phylogenetics and Evolution 52:17-24

Caron J-B et al. (2013) Tubicolous enteropneusts from the Cambrian period. Nature 495:503-506

Peterson KJ et al. (2013) MicroRNAs support the monophyly of enteropneust hemichordates. Journal of Experimental Zoology B 320:368-374

Raff RA & Byrne M (2006) The active evolutionary lives of echinoderm larvae. Heredity 97:244-252

Stach T (2013) Larval anatomy of the pterobranch Cephalodiscus gracilis supports secondarily derived sessility concordant with molecular phylogenies. Naturwissenschaften 100:1187-1191

Thornbushes – it’s not just molecular data.

While I love phylogenetics, I rarely venture into the land of morphology-based phylogenetic trees.

Molecular sequences make sense to me as data. In a protein sequence, a proline is a proline, and if two proteins can acquire a proline in the same place by convergent evolution, well, you can look at large-scale patterns of amino acid substitution and estimate the chance of that. Genomes contain exactly 4 kinds of bases, they encode exactly 20 kinds of amino acids, and that’s that at least as far as conventional molecular phylogenies are concerned. Sequences are exactly the sort of neat, discrete data that you can describe and explore and simulate the heck out of to make sure that the assumptions you are making when you use them to infer relationships between genes or organisms are realistic.

Morphology, my brain says, is fuzzy and difficult and full of human subjectivity. In the anatomy of two animals, a limb and a limb can be totally different things with totally different evolutionary origins, and there’s no guarantee that you can tell them apart. Something can be “sort of” a limb, and there’s no well-defined number of ways of “limbness”.

Truth be told, morphology as a way of figuring out relationships kind of scares me.

However, I do love phylogenies. I’m also interested in the relationships of extinct creatures (where, unless they are very recently extinct, you simply don’t have molecular data to play with). Plus limitations intrigue me, not to mention that the limitations of the methods we use to arrive at conclusions have a huge practical importance. (As in: they can lead to bullshit conclusions.) Hence I thought a paper titled “When can clades be potentially resolved with morphology?” would be an interesting read.

And it absolutely was, only in a totally different way than I expected. I thought it would be all about the limitations I was thinking of – convergent evolution, defining and interpreting traits, the statistical biases of treebuilding methods, that sort of stuff. Instead, it ignored those issues completely in favour of a much more fundamental limitation. Bapst (2013) doesn’t talk about information that you or your fancy algorithms misinterpret. He talks about information that, due to the very nature of evolution, just isn’t there.

A modern classification of organisms is built out of clades: groups including all descendants of a single common ancestor. Phylogenetic trees are clades within clades within clades – or branches splitting into smaller branches splitting into twigs. A fully resolved tree consists only of two-pronged branching points. That is, if you pick any three creatures, you can tell which two of them are closer to each other than the third. (Resolution is determined by statistical support from methods such as bootstrapping. Bootstrapping basically asks whether all your data agree on the same tree.)

Clades are recognised by what their members share with one another but no one else: for example, a subgroup of dinosaurs that includes birds has feathers, which they inherited from their common ancestor. Each clade can have many such shared derived traits or synapomorphies. However, sometimes there are no synapomorphies. Take, for instance, the case of a single ancestral species “budding off” a series of descendants without changing much itself, like so:


(You could say that three-spine sticklebacks are doing exactly this – the ancestral form that lives in the sea is largely similar all over the northern hemisphere, but it keeps getting stuck in rivers and lakes and sprouting a huge variety of descendants.)

In such a scenario, Descendant 1 is kind of closer to Ancestor than Descendant 2 is, since there’s been less time since they split. However, because Ancestor didn’t change in all that time, there are no synapomorphies that unite it with D1 to the exclusion of D2. A morphology-based phylogenetic tree of these three species would be intrinsically unresolvable – no matter how much data you collect and how well you analyse them, you’re not going to get the true tree, only a sad little bush. (A molecular phylogeny may be able to resolve a history like this, since genomes aren’t going to stop evolving just because the creatures that have them look the same.)

This is the sort of limitation Bapst explores through his simulations. The simulations don’t actually model the evolution of morphology itself. They compress all morphological change into “differentiation events”, i.e. the point at which two taxa become distinguishable. (He later makes the important point that “taxa” could be anything on the traditional Linnaean scale – species, families, classes, whatever -, and his conclusions would remain the same.)*

Differentiation events then might happen in a variety of ways, illustrated by Bapst’s Figure 2 below:

In other words, there can be branching without differentiation, differentiation without branching, and anywhere in between.

The simulations investigate how many intrinsically unresolvable clades we should expect under various mixtures of the four scenarios above, combined with more or less complete sampling of the fossil record. Some of the observations I found fascinating:

  • More complete sampling actually decreases resolvability, since your dataset is then more likely to include both ancestors and their descendants.**
  • Unresolvable clades are spread evenly throughout the whole model phylogeny – they aren’t disproportionately older or younger than their well-behaved counterparts. This is very important to me because it means that intrinsic unresolvability could also affect the levels I’m most interested in, i.e. the phylum-level relationships of animals.
  • No realistic simulationi.e. those whose parameters and results are compatible with the real fossil record – produces fully resolvable phylogenies!

It’s worth noting that it’s actually close to impossible to tell whether the lack of resolution in any given real dataset is due to this intrinsic effect or some other issue. However, the take home message of this study is that however well you’ve eliminated other sources of ambiguity, you should pretty much never expect a fully resolved phylogeny if you are working with the morphology of real creatures. If you got one, you probably did something wrong!

(Considering that molecular data can be just as incapable of correctly resolving relationships under certain circumstances, I dearly hope that the problem groups are at least going to be different for the two kinds of data… :D)


*This is a distinctly punk eek-flavoured model, BTW; if morphological change is evenly spread out through time, the whole thing falls apart. But, then, if change is evenly spread through time, you wouldn’t have scenarios with unchanged ancestors like the one above, and I gather that the existence of those is an established palaeontological reality.

**However, this doesn’t mean that trees obtained from patchy fossil records will be more accurate – having a poorer sample also means potentially overlooking misleading changes like reversals to an ancestral state.



Bapst DW (2013) When can clades be potentially resolved with morphology? PLoS ONE 8:e62312

Celebrating the molecular revolution

I forgot to say happy Darwin Day yesterday, but to make up for that, I present to you Max Telford’s extremely cool way of celebrating.

In 1988, on Darwin Day, no less, a 5-page little paper was published in Science that would absolutely revolutionalise the study of animal evolution. Field et al. (1988) was one of the earliest studies to apply this newfangled thing called molecular biology to the phylogeny of animals. Methods for molecular phylogenetics (or indeed any kind of phylogenetics) were extremely limited by the performance of the computers of the time, but that didn’t stop scientists from trying them. And once someone kicked this snowball, the avalanche couldn’t be stopped.

This early attempt yielded some huge surprises. Arthropods, which were thought to have arisen from segmented worms, were not closely related to any kind of worm. Brachiopods, long thought to belong to their own major group, showed up deep among worms, molluscs and other uncontroversial protostomes instead. Cnidarians such as hydras and sea anemones, and bilaterians such as ourselves, arose independently from single-celled ancestors.

Some of their conclusions – among them the last one about several origins for animals – were contradicted by more sophisticated analyses. Nevertheless, what they stirred up was the beginning of our current understanding of animal phylogeny. For the 25th anniversary of this pivotal publication, Max Telford, animal phylogeneticist extraordinaire of University College London, went back to the roots of his field and reanalysed Field et al.’s data (Telford, 2013).

Could the data and methods of the time have yielded a more accurate tree? How does a “modernised” dataset fare under the latest methods? What advances in methodology and understanding led the molecular phylogenetics of animals from the first tentative steps in the 1980s to where we are today?

Analysing the original data with methods similar to the original, of course, repeats most of the original mistakes.  It’s when Telford starts tweaking things that the interesting stuff starts to happen. For example, just switching from the original method to a more complex one that was available but would have taken years to run at the time pulls all animals back together. “Updating” the analysis by using more complete sequences of the same gene, slower-evolving relatives of some original species, and modern methods impossible to run on 80s computers comes very close to today’s consensus. In other words, Field et al. basically did the best they could. Since then, data availability, careful sampling and far more computer muscle have changed some of their conclusion – but confirmed others.

Telford highlights one way in which the classics got lucky, too. Back in the eighties, sequencing nucleic acids was a difficult affair. Field et al. (1988) picked 18S ribosomal RNA mostly because it was less difficult than most others. But, as Telford points out, they also hit on a really good gene for phylogenetics. The 18S is quite long, providing an abundance of data. It has both very conserved and variable regions, so it has something to say on all levels of divergence. And, as Telford’s updated analysis shows, it can actually give reasonably accurate results on its own, which cannot always be said of single genes. For long years after Field et al. (1988), 18S rRNA continued to be used to probe into animal relationships, and had a few more revolutions up its sleeve (Aguinaldo et al., 1997; Ruiz-Trillo et al., 1999) before yielding to huge multi-gene datasets.

Contemplating Telford’s little historical excursion, I’m reminded of Isaac Asimov’s fantastic essay The Relativity of Wrong. We’ve come a long way from our first bumbling attempts at molecular phylogenetics. We were wrong many times, and I can guarantee you we’re still wrong about a lot of things. But I like to think that, as with the shape of the earth, we are not quite as wrong as our predecessors. Over the years, some great branches of the animal tree have crystallised from a sea of studies. With dogged determination, science approaches the truth.

I think that’s a good note to end on when we commemorate the birthday of a scientist who spent decades perfecting his theory of evolution before publishing perhaps the most important book in the history of biology. Happy belated Darwin Day! 🙂



Aguinaldo AMA et al. (1997) Evidence for a clade of arthropods and other molting animals. Nature 387:489-493

Asimov I (1989) The Relativity of Wrong. The Skeptical Inquirer 14(1):35-44.

Field KG et al. (1988) Molecular phylogeny of the animal kingdom. Science 239:748-753

Ruiz-Trillo I et al. (1999) Acoel flatworms: earliest extant bilaterian metazoans, not members of platyhelminthes. Science 283:1919-1923

Telford MJ (2013) Field et al. redux. EvoDevo 4:5


When I discussed sponge microRNAs last week, I said deep animal phylogeny was difficult. Quite fortuitously, another paper went online recently that explores exactly this difficulty (Nosenko et al., 2013). Following on from the microRNA post, I’ll use this paper as an excuse/guide to discuss the tangled relationships of animals.

First of all, let’s recap the problem. My trusty old family tree of animals just so happens to be an excellent illustration:


When I first made this tree to explain what the hell I was talking about re: the Cambrian creature Nectocaris, I put in some question marks mostly out of laziness. To illustrate why the “old” Nectocaris didn’t make sense, I only needed the relationships of bilaterians among themselves. Everything outside the Bilateria was irrelevant to the little creature’s mystery, so I decided to forgo reading up on them and stay on an uninformed fence.

But, in fact, said fence is not just my half-arsed perch. I appear to share it with an entire, very much whole-arsed field. While now there’s a reasonable agreement over ecdysozoans and deuterostomes and all that jazz, the non-bilaterians still wander all over the place depending on how you do your analysis. Nosenko et al. cite a number of recent large-scale studies, and point out that they totally fail to agree where to put poor Trichoplax and jellies of various kinds. The other thing they fail at is deciding how many branches sponges actually represent (the problem the microRNA study I discussed tried to tackle). To illustrate the extent of the chaos, I sketched the phylogenies six recent studies cited by Nosenko and colleagues came up with (sponge lineages are marked by dots):


Remarkably, all six studies agree on the basic deuterostome-ecdysozoan-lophotrochozoan arrangement inside Bilateria in spite of using different sets of bilaterian species. In contrast, the non-bilaterian animals – sponges of all kinds, cnidarians, ctenophores and Trichoplax – appear in pretty much every conceivable configuration.

A plethora of pitfalls

Why? What makes these questions so difficult that datasets made of 100+ genes from dozens of species representing all major animal groups and using the best available methods have this much trouble answering them?

Time is probably not the issue, or at least not in the simple sense of “it all happened too long ago”. The Nosenko paper brings up the example of fungi, which are roughly as ancient (or, in the context of all living things, as young) as animals. Studies that tried to use the exact same set of genes to analyse the relationships within each group could apparently produce a nice clear tree for fungi. Animals? A whole lot of noise.

Perhaps the “tree” of animals is really more like Rokas and Carroll’s (2006) evolutionary bushes, with its base branching so quickly that genes didn’t have time to accumulate many informative changes between one split and the next. Perhaps it even happened so fast that ancient within-species sequence variation was carried through several such events, resulting in what population geneticists call incomplete lineage sorting, a situation where the history of genes is not the same as the history of species.

Perhaps we haven’t got a good enough sample of genes, animals, or both.

If early animal evolution was bush-like, only a large amount of good data has any hope of accurately resolving how it went. But finding suitable genes for phylogenetic analysis is not easy. They have to be known in all of our species. They should have unambiguous identities so we know we’re actually comparing the same gene across species. They should evolve slowly enough that chance hasn’t had time to wash away their records of relatedness.

Likewise, picking suitable species can be difficult. Aside from the availability of sequences, the two greatest problems are taxon sampling and long branches. Good taxon sampling means covering the diversity of a group. So for example, if you have to pick three vertebrates, you don’t want them all to be mammals. A mammal, a shark and, say, a bony fish would be a much more representative sample.

Long branches are the bogeyman of phylogenetics. “Long” here means many evolutionary changes compared to other lineages in your sample. Similarities in gene/protein sequences are not always due to shared ancestry: because there’s a limited number of letters in the DNA and protein alphabets, sometimes they happen just by chance. If you have two unusually long branches, they might have a lot of these chance similarities, many more than either of them shares with its true relatives by common ancestry. Some of the newer changes might also have overwritten the older similarities linking them with their real families, a problem known as saturation. The overall outcome is that long branches attract each other.

Last but not least, perhaps the assumptions we put into our analyses don’t actually fit the data. All phylogenetic analyses are based on a model of evolution. For molecular data, these models specify, for example, how likely different sequence changes are, and which bases or amino acids are commonest and rarest. All analyses also need a way of picking the best tree, which range from simply choosing the one with the fewest changes to choices based on complicated probability theory. Sometimes, models and methods still work reasonably well when their assumptions are violated, but, as you might expect, counting on that is generally a stupid idea.

Nosenko et al. (2013) come to the conclusion that the issue of non-bilaterian animal phylogeny is plagued by pretty much the whole package.

Dissecting the Problem

First, studies may have increased the size of their datasets by incorporating less than ideal genes. To test the effect of gene sampling, Nosenko et al. (2013) divided their collection of 122 genes into two parts. One consisted of genes involved in protein synthesis, mostly genes encoding ribosomal proteins, which all evolve very slowly. The other was a mixed bag of non-ribosomal genes with all sorts of functions and evolutionary rates.

Perhaps not surprisingly, the latter set displayed a much higher level of saturation. Accordingly, when they analysed the ribosomal dataset with models of evolution that are more prone to errors due to saturation, they got the same trees they’d seen using more accurate models on the non-ribosomal data. Clearly, saturation, gene and model choice are affecting the answers they’re getting, and they are all problems that would affect your average phylogenomic study.

Second, the authors found every indication of a serious long-branch problem. In most phylogenetic trees, the longest branch is the outgroup. Outgroups are organisms outside your group of interest (the ingroup). Similarities between the outgroup and members of the ingroup are likely to have evolved before the origin of the ingroup, therefore they can be used to locate the root of the ingroup tree. However, outgroups are rarely sampled as well as ingroups, hence they tend to form long branches, making them a liability.

In the case of animals, removing the outgroup cleared the disagreements between the different gene sets, demonstrating that some of them had been due to long-branch artefacts. (Of course, without an outgroup you don’t know which animal lineages split first, which makes this solution not much use at all for important evolutionary questions like what the common ancestor of all animals looked like.)

Likewise, using a more distant outgroup changed the trees considerably. Ctenophores are worth special mention here. When Dunn et al. (2008) placed these jellyfish-like creatures as the sister group to all other animals, it was an odd, unexpected result. Well, ctenophore genomes evolve ridiculously fast, and there’s a good chance that their position “way out there” is an artefact of that. In Nosenko et al.‘s analyses, they ended up in the Dunn position when the more saturated non-ribosomal data were used – or when the ribosomal dataset was analysed with a more distant outgroup. When everything possible was done to reduce long-branch issues, they stayed deep in the crown of the tree next to cnidarians.

Fourth, the assumptions of even the best evolutionary model don’t take into account an annoying property of protein sequences: their overall amino acid compositions can differ across lineages. Changing the entire makeup of an organism’s protein complement involves changes in evolutionary patterns that none of the models account for. Once again, those damned ctenophores are one of the problem taxa with “deviant” sequence compositions. (The even worse news is that the closest available outgroups also differ from typical animals in this respect.)

Fifth, taxon sampling is influencing what you get. For example, the more sponges Nosenko et al. included, the more support they got for sponges being a single lineage. Ctenophores probably also suffer from this problem. For one thing, they’re very poorly known in almost every way that is relevant to picking species for phylogenetic analysis.

For another, they may actually have an additional problem that is literally impossible to crack – phylogenetic analysis of ctenophores themselves and a look at their fossil record hint that most ctenophore lineages have died out, with existing species all coming from a relatively recent common ancestor. That would make the entire phylum incurably long-branched no matter how many living species you throw at your datasets!

And finally, the ribosomal dataset that was the least prone to long-branch artefacts and the most informative about the deepest branches in animal phylogeny comes with a big caveat: it’s not a random selection of genes. In fact, all of these genes are interacting parts of a single system, which means they might not evolve independently (in the statistical sense). Are they all affected by a common set of biases, and does it render them unsuitable for recovering the true history of animals? We don’t yet know.

Hope dies last…

Being the phylogeny nut that I am, I really enjoyed this dissection of a thorny problem. At the same time, the results are kind of depressing. (Especially if, like me, you’re interested in early animal evolution.) No matter how carefully you set up your analysis, biases lurk around the corner waiting to jump on you and destroy your conclusions. You have a choice between not knowing where to root the tree of animals and being screwed by the outgroup. Well-worn measures of statistical confidence can support contradictory hypotheses. Ctenophores are fucking hopeless.

Is there anything we can do about this conundrum? Nosenko et al. conclude their paper on a somewhat hopeful note. There are other methods in molecular phylogenetics than simple sequence comparison. Although they’ve been no more helpful so far than traditional sequence analysis, we’re getting more and more full genome sequences from all over the animal kingdom. There’s more to look at than ever. Perhaps, one day, we’ll find a tool that can trim this thorny beast of a bush (or bush of beasts?) into shape.

Meanwhile, the quandary of deep animal phylogeny stands as a reminder that science is not all-powerful. The universe is a puzzle, but we have no reason to assume that nature left us enough information to solve it all. Which, as far as I’m concerned, shouldn’t stop us from trying. 😉



Dunn CW et al. (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452:745-749

Erwin DH et al. (2011) The Cambrian conundrum: early divergence and later ecological success in the early history of animals. Science 334:1091-1097

Nosenko T et al. (2013) Deep metazoan phylogeny: when different genes tell different stories. Molecular Phylogenetics and Evolution (in press), doi: 10.1016/j.ympev.2013.01.010

Philippe H et al. (2009) Phylogenomics revivew traditional views on deep animal relationships. Current Biology 19:706-712

Pick KS et al. (2010) Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships. Molecular Biology and Evolution 27:1983-1987

Rokas A & Carroll SB (2006) Bushes in the tree of life. PLoS Biology 4:e352

Schierwater B et al. (2009) Concatenated analysis sheds light on early metazoan evolution and fuels a modern “urmetazoon” hypothesis. PloS Biology 7:e20

Sperling EA et al. (2009) Phylogenetic-signal dissection of nuclear housekeeping genes supports the paraphyly of sponges and the monophyly of Eumetazoa. Molecular Biology and Evolution 26:2261-2274

For fuck’s sake, scientists!

Damn. Mistaking evolution for a ladder with us on top is something I fully expect from people who don’t study it for a living, but when evolutionary scientists make that mistake, it drives me apeshit. And they do it all the fucking time.

I don’t think most of them are aware of it. You’ve got to be really watching for the trap to have a chance of avoiding it. I slip every now and then, and then I spot it and rage at myself and get deeply philosophical about human nature and such. It’s such an easy and convenient thing to do. (Think of evolution as a ladder, not get philosophical, I mean.) It’s the way we’ve been conditioned to think since the first time we heard about evolution.

For most of the history of biology, no one blinked twice if you talked with culturally sanctioned anthropocentrism about “lower animals” or “higher vertebrates”. Evolution was a highway of progress, and some creatures just got further along than others. Naturally, we were speeding along right at the front.

Nowadays, I think most biologists who have to consider evolution in their work would tell you that evolution doesn’t work like that. The papers I read rarely contain such explicit references to the “march of progress”. (Can I call it the MOP?) However, that doesn’t mean the references are gone. They’ve just become so subtle that, I suspect, not even the people who make them realise they’re there.

It’s “basal lineages”. “Phylogenetically more primitive” creatures. Or “early-branching organisms”. Or “evolutionary old animals”. All of these are real terms used in real papers published this year. They aren’t restricted to bad papers. And if you stop to think about it, none of them make any goddamned sense.

Let’s picture an evolutionary tree first. I can’t really use my usual tree with all its question marks, but the one below, which I nicked from Srivastava et al. (2008), will do:


(The species from top to bottom are: brewer’s yeast, a choanoflagellate, this tentacled little guy, a sea anemone, humans, a limpet, everyone’s favourite fruit fly, the Blob, and a sponge.)

The “base” of the tree is to the left, where animals, Monosiga and fungi have their last common ancestor. (That was a long time ago.) “Basal” means close to the base. The branching point (node) that separates animals from the non-animals at the top is the basalmost node in this tree. The node that separates the sponge from the other animals is also a pretty basal node. The creature that gave rise to both sponges and other animals was a truly basal animal.

Now, which is the basal lineage?

The correct answer is “relative to what?”

Every node divides the tree into two lineages. It doesn’t make any sense to say that one of them is more basal than the other. There’s a basal node in the tree of animals. Sponges are on one side of that, the rest of the animals are on the other. If you take a vertebrate species, sponges are the last animal lineage you’ll encounter if you trace its ancestry back towards the base of the tree. If you take a sponge species, the lineage with vertebrates (and lots of other things) on it will be the last.

Basal lineage” depends on your point of view.

Maybe actually taking the sponge point of view will help illustrate this. This tree comes from a paper about sponges (Sperling et al., 2010):


Unlike the previous tree, its branches are labelled with larger groups rather than species, but these represent more or less the same range of creatures. Monosiga from tree one is a choanoflagellate. Amphimedon is a haplosclerid demosponge, on the second branch from the bottom. Every other animal from the first tree is compressed down into that one branch labelled “Eumetazoans”. (OK, Trichoplax is not a eumetazoan, but that’s a technicality that doesn’t affect the point.) From this angle, it’s rather harder to see sponges as a basal animal lineage!

Equally, sponges are just as old as non-sponge animals, so calling them “old” is a tad dodgy. Here, you could argue that sponges have been around longer than, say, vertebrates, which is true to the best of our knowledge. In that sense, “sponges” is an older lineage than “vertebrates”. But that only means that “sponges” should be compared to “non-sponges” rather than “vertebrates”, and anyone making such comparisons should be as aware of the diversity lurking within sponges as they are of the diversity of other animals.

The “evolutionary old animals” quote actually comes from a paper that looked at stem cell genes in Hydra to understand the evolution of stem cells in animals. (Hemmrich et al., 2012). It’s not comparing cnidarians (the phylum hydras belong to) to something genuinely younger than them. I can’t resist quoting the whole offending sentenc:

Our observations provided new and comprehensive insight into the complex network that orchestrates patterning and tissue homeostasis in an evolutionary old animal that branched off almost 600 million years ago. (p3277)

Honestly, what does that even mean? Branched off from what?

OK, I know it means from our own ancestors. But my point is that this should not be taken for granted, and if you do take a human-centric point of view, you should bloody well make that explicit. You should not write as though evolution had some sort of “main branch” leading to us from which things split every now and then. Lineages split from each other.

You might think that I’m being pedantic just to have an excuse to rant, but the implicit views underlying examples like the above have real consequences for the study of evolution. Namely, they might lead scientists to assume that representatives of “basal” lineages got stuck in the Precambrian and could just stand in for their distant ancestors. This is dangerous.

Take sponges. Yes, in many respects they probably resemble the first animals more than we do. Chances are those ancient animals didn’t have sophisticated organs and like two hundred different cell types. However, chances also are that they were made of distinct cells rather than huge merged syncytia, and that they didn’t have elaborate skeletons made of some sort of mineral, both of which are properties of many sponges. All animals alive today had exactly the same amount of time to evolve their own quirks since their last common ancestor. We shouldn’t just assume that anything “simple” in an animal we regard as “basal” is inherited straight from that ancestor just because it fits our favourite story.

Case in point: the Amphimedon genome was found to be impoverished in many families of developmentally important “master” genes, and this fit nicely into the prevailing view of the increasing complexity of animals throughout their history (Larroux et al., 2008). But it’s likely that at least some of those genes were actually lost by Amphimedon‘s ancestors and not gained by ours (Mendivil Ramos et al., 2012). Assuming that “basal” (relative to us) means “similar to ancestor X” can very easily lead to unwarranted conclusions, and that can hinder our ability to figure out what really happened. To me, that’s a big deal.



Hemmrich G et al. (2012) Molecular signatures of the three stem cell lineages in Hydra and the emergence of stem cell function at the base of multicellularity. Molecular Biology and Evolution 29:3267-3280

Larroux C et al. (2008) Genesis and expansion of metazoan transcription factor gene classes. Molecular Biology and Evolution 25:980-996

Mendivil Ramos O et al. (2012) Ghost loci imply Hox and ParaHox existence in the last common ancestor of animals. Current Biology 22:1951-1956

Sperling EA et al. (2010) Where’s the glass? Biomarkers, molecular clocks, and microRNAs suggest a 200-Myr missing  Precambrian fossil record of siliceous sponge spicules. Geobiology 8:24-36

Srivastava M et al. (2008) The Trichoplax genome and the nature of placozoans. Nature 454:955-960

DNA vs proteins – I learn something again…

If you want to use molecular sequences to uncover the relationships between organisms, you have two choices of molecule. You can use either DNA or the proteins it encodes. I always thought, why the hell would anyone use DNA when they can also use protein?

The DNA alphabet has four different letters, versus the 20 amino acids proteins are made of. There is much more danger of a chance similarity, there’s much more chance of multiple mutations at the same spot returning to the ancestral state and completely erasing important phylogenetic information. You could, I suppose, use codon-based models instead of single-nucleotide models, but what’s the point when you can just translate the sequence and analyse the protein instead?

Well, it seems there is a point. The crucial thing is that while DNA translates unambiguously to protein, this is not true the other way. Take a look at the genetic code table below (modified from here):

The letters in black represent RNA bases (the DNA would have T instead of U), and the coloured ones are the three-letter abbreviations of amino acids, except for Stop, which, as you might have guessed, means “end of protein, stop translating”.

The first thing to note about the table is that most amino acids are encoded by more than one DNA/RNA codon. There’s already more information here than if you simply took the protein sequence. The second point is that some amino acids have two sets of codons that aren’t easily interchangeable.

With something like glycine (bottom right box), all codons are almost the same, only differing in the third letter, which might even be irrelevant anyway due to third base wobble. Mutating between glycine’s codons is easy and unlikely to screw the organism.

In contrast, serine (red and yellow boxes) has two sets of codons that differ in both their first and second positions. It’s much easier to move within either of those sets by mutation than to jump from one set to the other. Changing either of the first two letters in any of these six codons results in a different amino acid, which has a lot more potential to wreak havoc than a mutation that leaves the protein alone.

And apparently, that can, from a phylogenetic point of view, practically turn serine into two different amino acids. In a fairly recent Nature paper, Regier et al. (2010) investigated arthropod relationships and found that while protein-based methods gave very similar results to DNA-based methods, they often couldn’t offer as much support for these results as DNA did. That paper hints at the serine problem and that a couple of the authors are working on it, and now the “working on” bit is out in PLoS ONE. (That’s how I came across this issue, in fact.)

The new analysis (Zwick et al., 2012) finds that tweaking protein-based evolutionary models so that the two kinds of serine count as different letters increases confidence in the resulting tree dramatically. The serines aren’t changing any major conclusions – if you take them out, you still get the same tree, just with lousy statistical support. But clearly, protein sequences alone were missing important evidence. In another situation, they might make the difference between a wrong answer and a right one.

(Now I wonder if anyone’s done codon-based Hox gene phylogenies. Hox genes/proteins can be really difficult to classify because only a short region can be compared among all of them, and this short region evolves pretty slowly, yielding very few informative differences. But what if there’s more information hiding in the codons? There’s not an awful lot of serine in homeodomains, though, and while the other sixfold degenerate amino acid (arginine) is pretty common in them, that one doesn’t have nearly the mutational chasm that separates the two codon clusters for serine. Meh. Maybe codons wouldn’t help at all with Hoxes.)


*OK, technically, you sort of have three choices, but RNA and DNA sequences contain the exact same information, so they don’t really count as different.



Regier JC et al. (2010) Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences. Nature 463:1079-1083

Zwick A et al. (2012) Resolving discrepancy between nucleotides and amino acids in deep-level arthropod phylogenomics: differentiating serine codons in 21-amino acid models. PLoS ONE 7:e47450

So… much… STUFF!

Gods, this is what I’m faced with all the time. Someone needs to tell me how proper science bloggers pick articles to discuss, because I just get my RSS alerts, start squeeing, and end up not writing about anything because damn, I WANT TO WRITE ABOUT EVERYTHING!

I give up. I’ll just dump all the cool stuff that’s accumulated on my desktop and bookmark bar here and return to lengthy meandering whenever I don’t feel like I’ve been caught in a bloody tornado 😉

So, here is some Cool Stuff…

(1) A group measured the rate of DNA decay in 158 moa bones of known age from three sites. Really cool stuff, to go out and directly measure how ancient DNA disappears from dead things under more or less identical conditions. The unsurprising result is that DNA decays exponentially, a bit like radioactive material. This suggests that the main cause of the decay is random breaking of the strands. The surprising bit is that this happens much more slowly than previously estimated, suggesting that in ideal (read: frozen) conditions, it might be worth looking for preserved DNA in samples as old as a million years.

(On a side note, if you ever get a chance to see a talk by Eske Willerslev, one of the authors and a leading expert on ancient DNA, don’t miss it. The man is absolutely hilarious.)

– Allentoft ME et al. (2012) The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proceedings of the Royal Society B FirstCite article, available online 10/10/2012, doi: 10.1098/rspb.2012.1745

(2) The beaks of the finches, or mixing and matching developmental recipes. This study examines the genetic basis of beak shape in three little birds closely related to Darwin’s famous finches. The three finches, just like Darwin’s, share the same basic beak shape, only bigger or smaller. However, there seem to be two distinct developmental programs at work, using different genes and parts of the skeleton to orchestrate beak development. One of the three newly investigated species (the one most closely related to Darwin’s finches) apparently uses the same developmental program as its more famous relatives, even though its beak is shaped more like the other two birds studied here. I told you – genetics, development and homology are complicated 😉

– Mallarino R et al. (2012) Closely related bird species demonstrate flexibility between beak morphology and underlying developmental programs. PNAS 109:16222–16227

(3) Armoured fossil links worm-like molluscs to chitons. There’s a little-known group (or groups) of molluscs called aplacophorans that have only a coat of tiny spicules instead of shells and look more like worms than “proper” molluscs. Exactly where they fit into our picture of mollusc evolution has been controversial to say the least – they could represent an old lineage separate from other molluscs, they could be related to cephalopods, they could be related to chitons, they could be one group or they could be two lineages in completely different places on the tree… Well, a new fossil named Kulindroplax seems to argue for the chiton connection: the animal has the characteristic armour plates of a chiton on an aplacophoran-like body. Similar creatures have been discovered before, but this guy with its detailed 3D preservation provides the clearest evidence of the link so far.

– Sutton MD et al. (2012) A Silurian armoured aplacophoran and implications for molluscan phylogeny. Nature 490:94-97

(4) More cool fossils – this time straight from my beloved Cambrian. Nereocaris, a newly described Burgess Shale arthropod, suggests to its discoverers that the earliest arthropods weren’t predators prowling the seafloor, but swimmers who might have been filter feeders and certainly weren’t predators. The animal has a bivalved shell around its front end, similar to many other Cambrian swimming arthropods, and a long abdomen with paddles at the end. It bears the arthropod hallmark of a hardened and jointed exoskeleton, but it lacks specialised limbs such as antennae or mouthparts. In a cladistic analysis of arthropods and their nearest relatives, the new species comes out on the first branch within true arthropods, and the next few branches as we move towards living arthropods all contain similar shelled, swimming creatures. Since the non-arthropods closest to the real thing (i.e. anomalocaridids) were also fin-tailed swimmers, this arrangement makes the transition between them and true arthropods smoother than previously thought. It also suggests that the hard exoskeleton so characteristic of arthropods originally functioned in swimming – perhaps as an anchor for swimming muscles.

– Legg DA et al. (2012) Cambrian bivalved arthropod reveals origin of arthrodization. Proceedings of the Royal Society B FirstCite article, available online 10/10/2012, doi: 10.1098/rspb.2012.1958


And … there was also

… but it’s almost bedtime, and if I wanted to summarise every one of those, I’d be here all weekend 😦

See, this is why being a science nerd today is both amazing and frustrating. There’s just so. Much. Stuff.

Bacteria invented multicellularity – then thought better of it

I get content alerts from a whole host of journals, some specialist publications focusing on my field, some more general. The majority of even the former is stuff I couldn’t care less about, and the ratio of interesting to irrelevant from generalist journals like PNAS or Nature is lower still. Nevertheless, sometimes you stumble on a title that isn’t directly related to your main interests, but still makes the whole rummaging through the Pile of Irrelevance worth it.

This was the case with a paper (Schirrmeister et al., 2011) just published in the online, open-access journal BMC Evolutionary Biology. It’s so fresh that they haven’t even formatted it – the full text is only available as a “provisional” pdf where all the figures are dumped at the back of the file, separated from their captions (why they can’t wait with publication until the damned thing is in readable format escapes me).

The study by Schirrmeister and others deals with an unusual group of bacteria. Cyanobacteria are probably still better known as blue-green algae, even though they have nothing to do with anything else we call an alga (well, in truth they have everything to do with algae, but in a rather more interesting way, as we’ll see below). If I had to pick one group of organisms that had the greatest impact on the history of our planet, cyanobacteria would be it. For more than two billion years, they have contributed huge amounts of biomass to the global carbon cycle. They are solely responsible for the oxygen-rich atmosphere of the earth – and, by extension, for most eukaryotic life and all animals. They are among the distinguished group of bacteria that can fix nitrogen – a vital ingredient of DNA and proteins – straight from the air, making it available for other organisms. Without them, the world would be a vastly different place, and we wouldn’t be possible. As Andrew Knoll puts it in his wonderful book Life on a Young Planet (consider this a recommendation ;)): “animals may be evolution’s icing, but bacteria are the cake”.

Cyanobacteria live everywhere there is light, from hot springs to the ocean to puddles to stone walls (as components of lichens). They also live inside the cells of every single eukaryote capable of photosynthesis: plants, red and green algae, brown algae, diatoms, dinoflagellates, euglenids (and any others I forgot to mention). The chloroplast is a pared down cyanobacterium – a symbiont that has lost most of its genes, but the ones that remain, together with its structure, still tell of its ancestry. Plants owe all their green splendour to these tiny buggers.

Cyanobacteria are not just immensely important, they are also quite unusual among prokaryotes. As the post title implies, they invented multicellularity. Multicellular cyanobacteria display a range of complexity. Some of them are just chains of identical cells. Others, though, have up to three different cell types. Heterocysts, thick-walled cells that ensure the oxygen-free environment that these bacteria require for nitrogen fixation, sit at regular intervals among “normal” cells, and when necessary, the “normal” cells can also differentiate into hardy resting cells that can survive bad times. The most complex cyanobacteria not only have filaments with different cell types, but also introduce branching into these filaments. This is the most complex prokaryotes get.

Filaments of an unbranched, differentiated cyanobacterium. The oversized heterocysts are quite obvious in some of them. Image by Kristian Peters, from Wikimedia Commons.

The new study raises an interesting possibility: that at least the simple form of multicellularity (i.e. undifferentiated filaments) occurred very early in the history of cyanobacteria. According to Schirrmeister et al., the vast majority of modern cyanobacteria descend from multicellular ancestors, even though a great many of them are single-celled today. Even more intriguingly, they find a lineage that might have re-evolved multicellularity after losing it. I don’t pretend to fully understand the methods used to come to these conclusions, but I have to say that it’s built on an impressive dataset – the group selected 58 cyanobacterial species for more detailed study from an original phylogenetic tree built from over a thousand taxa. They then constructed trees of this smaller dataset using two separate methods, and finally, tried to reconstruct the ancestral states at various points in those trees using several different statistical methods again. The analyses all agree: multicellularity is a very ancient trait in cyanobacteria, and it was lost left and right during their three-billion-year history.

These findings go against our ingrained view of evolution as an inexorable march towards increasing complexity. We, mammals, are among the (if not the) most complex organisms the earth has ever produced. We are assemblages of some 200 distinct cell types organised into a finely regulated machinery of a multitude of specialised organs. When we look at the large-scale patterns in the fossil record, we also see that this complexity has accumulated from much simpler beginnings over the aeons. We can be forgiven for thinking, in a characteristically self-centred way, that complexity is where evolution is intrinsically headed. But every now and then, nature reminds us that “more complicated” does not necessarily equal “favoured”.

Parasites are probably best known for their tendency to become simplified – after all, if you are bathed in your host’s digestion products all the time, why waste your energy on growing your own gut? However, simplification is abundant in organisms that make their own living, too. For example, two entire phyla of distinctly unsegmented, baglike worms – spoon worms and peanut worms -, likely came from more sophisticated segmented worms (Struck et al., 2007). Now, cyanobacteria join the club, and new questions surge in their wake. Why did they go back to unicellularity? How difficult is it for them to become multicellular? Such questions, of course, can be asked about any complex trait that followed a similar evolutionary trajectory.

Most intriguingly, these tiny microbes seem to violate another “law” of evolution, known as Dollo’s law: that once lost in a lineage, a complex trait won’t reappear. If the inferences of Schirrmeister et al. are correct, then either simple multicellularity isn’t such a big deal at all for these bacteria, or Dollo’s law isn’t as much of a law as we thought.

(Actually, the latter is probably the case however the history of cyanobacteria turns out. Dollo’s law has been questioned by others, and it was recently dealt a spectacular blow by a frog that almost certainly re-evolved teeth in its lower jaw (Wiens, 2011) after at least 200 million years of not having them.)

Evolution is a fascinating story. As the example of cyanobacterial multicellularity suggests, it can also be as complex as any good novel. I for one think this makes for a much more interesting and fulfilling narrative than simplistic listings of “what’s new” through the ages.

– – –


Schirrmeister BE, Antonelli A and Bagheri HC (2011) The origin of multicellularity in cyanobacteria. BMC Evol Biol 11:45

Struck TH et al. (2007) Annelid phylogeny and the status of Sipuncula and Echiura. BMC Evol Biol 7:57

Wiens JJ (2011) Re-evolution of lost mandibular teeth in frogs after more than 200 million years, and re-evaluation of Dollo’s Law. Evolution advance online publication, DOI: 10.1111/j.1558-5646.2011.01221.x