Thornbushes – it’s not just molecular data.

While I love phylogenetics, I rarely venture into the land of morphology-based phylogenetic trees.

Molecular sequences make sense to me as data. In a protein sequence, a proline is a proline, and if two proteins can acquire a proline in the same place by convergent evolution, well, you can look at large-scale patterns of amino acid substitution and estimate the chance of that. Genomes contain exactly 4 kinds of bases, they encode exactly 20 kinds of amino acids, and that’s that at least as far as conventional molecular phylogenies are concerned. Sequences are exactly the sort of neat, discrete data that you can describe and explore and simulate the heck out of to make sure that the assumptions you are making when you use them to infer relationships between genes or organisms are realistic.

Morphology, my brain says, is fuzzy and difficult and full of human subjectivity. In the anatomy of two animals, a limb and a limb can be totally different things with totally different evolutionary origins, and there’s no guarantee that you can tell them apart. Something can be “sort of” a limb, and there’s no well-defined number of ways of “limbness”.

Truth be told, morphology as a way of figuring out relationships kind of scares me.

However, I do love phylogenies. I’m also interested in the relationships of extinct creatures (where, unless they are very recently extinct, you simply don’t have molecular data to play with). Plus limitations intrigue me, not to mention that the limitations of the methods we use to arrive at conclusions have a huge practical importance. (As in: they can lead to bullshit conclusions.) Hence I thought a paper titled “When can clades be potentially resolved with morphology?” would be an interesting read.

And it absolutely was, only in a totally different way than I expected. I thought it would be all about the limitations I was thinking of – convergent evolution, defining and interpreting traits, the statistical biases of treebuilding methods, that sort of stuff. Instead, it ignored those issues completely in favour of a much more fundamental limitation. Bapst (2013) doesn’t talk about information that you or your fancy algorithms misinterpret. He talks about information that, due to the very nature of evolution, just isn’t there.

A modern classification of organisms is built out of clades: groups including all descendants of a single common ancestor. Phylogenetic trees are clades within clades within clades – or branches splitting into smaller branches splitting into twigs. A fully resolved tree consists only of two-pronged branching points. That is, if you pick any three creatures, you can tell which two of them are closer to each other than the third. (Resolution is determined by statistical support from methods such as bootstrapping. Bootstrapping basically asks whether all your data agree on the same tree.)

Clades are recognised by what their members share with one another but no one else: for example, a subgroup of dinosaurs that includes birds has feathers, which they inherited from their common ancestor. Each clade can have many such shared derived traits or synapomorphies. However, sometimes there are no synapomorphies. Take, for instance, the case of a single ancestral species “budding off” a series of descendants without changing much itself, like so:

unchangedAncestor

(You could say that three-spine sticklebacks are doing exactly this – the ancestral form that lives in the sea is largely similar all over the northern hemisphere, but it keeps getting stuck in rivers and lakes and sprouting a huge variety of descendants.)

In such a scenario, Descendant 1 is kind of closer to Ancestor than Descendant 2 is, since there’s been less time since they split. However, because Ancestor didn’t change in all that time, there are no synapomorphies that unite it with D1 to the exclusion of D2. A morphology-based phylogenetic tree of these three species would be intrinsically unresolvable – no matter how much data you collect and how well you analyse them, you’re not going to get the true tree, only a sad little bush. (A molecular phylogeny may be able to resolve a history like this, since genomes aren’t going to stop evolving just because the creatures that have them look the same.)

This is the sort of limitation Bapst explores through his simulations. The simulations don’t actually model the evolution of morphology itself. They compress all morphological change into “differentiation events”, i.e. the point at which two taxa become distinguishable. (He later makes the important point that “taxa” could be anything on the traditional Linnaean scale – species, families, classes, whatever -, and his conclusions would remain the same.)*

Differentiation events then might happen in a variety of ways, illustrated by Bapst’s Figure 2 below:

In other words, there can be branching without differentiation, differentiation without branching, and anywhere in between.

The simulations investigate how many intrinsically unresolvable clades we should expect under various mixtures of the four scenarios above, combined with more or less complete sampling of the fossil record. Some of the observations I found fascinating:

  • More complete sampling actually decreases resolvability, since your dataset is then more likely to include both ancestors and their descendants.**
  • Unresolvable clades are spread evenly throughout the whole model phylogeny – they aren’t disproportionately older or younger than their well-behaved counterparts. This is very important to me because it means that intrinsic unresolvability could also affect the levels I’m most interested in, i.e. the phylum-level relationships of animals.
  • No realistic simulationi.e. those whose parameters and results are compatible with the real fossil record – produces fully resolvable phylogenies!

It’s worth noting that it’s actually close to impossible to tell whether the lack of resolution in any given real dataset is due to this intrinsic effect or some other issue. However, the take home message of this study is that however well you’ve eliminated other sources of ambiguity, you should pretty much never expect a fully resolved phylogeny if you are working with the morphology of real creatures. If you got one, you probably did something wrong!

(Considering that molecular data can be just as incapable of correctly resolving relationships under certain circumstances, I dearly hope that the problem groups are at least going to be different for the two kinds of data… :D)

***

*This is a distinctly punk eek-flavoured model, BTW; if morphological change is evenly spread out through time, the whole thing falls apart. But, then, if change is evenly spread through time, you wouldn’t have scenarios with unchanged ancestors like the one above, and I gather that the existence of those is an established palaeontological reality.

**However, this doesn’t mean that trees obtained from patchy fossil records will be more accurate – having a poorer sample also means potentially overlooking misleading changes like reversals to an ancestral state.

***

Reference:

Bapst DW (2013) When can clades be potentially resolved with morphology? PLoS ONE 8:e62312

Advertisements

The bare bones of fins and limbs

Perhaps the central question in developmental biology is how cells that start out as identical end up making bodies with complex shapes and a multitude of different tissues. And perhaps the central question in evo-devo is how such bodies can change into other bodies during the course of evolution. A really cool paper by Zhu et al. (2010) probes a little bit at both, and shows how relatively simple rules can produce results that are surprisingly similar to what we observe in nature.

The authors modelled the development of limb (or fin) bones in vertebrates. They used a simple model made up of the following:

  1. a virtual limb bud (let me call them “simbuds” hereafter) growing continuously
  2. a signal spreading from the tip of the bud that tells “cells” to keep growing but wanes over time (mimicking the role of the apical ectodermal ridge in real limb buds)
  3. two equations describing the activity of (1) genes that make cells differentiate into bone (“activators”), and (2) genes that prevent cells from doing so (“inhibitors”)

The shape of the simbud could be set at the start, and so could the values of all the parameters in the activator and inhibitor equations.

This is much more simple than real limb develompent. It says nothing about cell movement, and it condenses the effect of genes other than the bone activators and inhibitors into two little parameters in the equations. Yet running it with pretty much any initial settings produces something vaguely limb-like, and some sets of parameters give you simbuds that look eerily like real limbs.

Development of a simbud mimicking a chicken wing, next to drawings of the real thing.

Or fins. Or mutant limbs. Or transitional fossils.

Fully developed simbuds resembling various fossil fins: Brachypterygius was a marine reptile from the Jurassic; the other four are more or less close relatives of tetrapods, among them the famous "fishapod" Tiktaalik.

The similarity is not perfect, of course – but the model is not perfect either. Overall, it’s still pretty amazing what a variety of very realistic limb skeletons you can get out of such a simple setup – and how much you can achieve just by varying small things like how wide the limb bud is to begin with or how strongly two gene networks interact. Evolving fins into limbs should be a piece of cake for a system like that!

***

Reference

Zhu J, Zhang Y-T, Alber MS, Newman SA (2010) Bare Bones Pattern Formation: A Core Regulatory Network in Varying Geometries Reproduces Major Features of Vertebrate Limb Development and Evolution. PLoS ONE 5:e10892