While I love phylogenetics, I rarely venture into the land of morphology-based phylogenetic trees.
Molecular sequences make sense to me as data. In a protein sequence, a proline is a proline, and if two proteins can acquire a proline in the same place by convergent evolution, well, you can look at large-scale patterns of amino acid substitution and estimate the chance of that. Genomes contain exactly 4 kinds of bases, they encode exactly 20 kinds of amino acids, and that’s that at least as far as conventional molecular phylogenies are concerned. Sequences are exactly the sort of neat, discrete data that you can describe and explore and simulate the heck out of to make sure that the assumptions you are making when you use them to infer relationships between genes or organisms are realistic.
Morphology, my brain says, is fuzzy and difficult and full of human subjectivity. In the anatomy of two animals, a limb and a limb can be totally different things with totally different evolutionary origins, and there’s no guarantee that you can tell them apart. Something can be “sort of” a limb, and there’s no well-defined number of ways of “limbness”.
Truth be told, morphology as a way of figuring out relationships kind of scares me.
However, I do love phylogenies. I’m also interested in the relationships of extinct creatures (where, unless they are very recently extinct, you simply don’t have molecular data to play with). Plus limitations intrigue me, not to mention that the limitations of the methods we use to arrive at conclusions have a huge practical importance. (As in: they can lead to bullshit conclusions.) Hence I thought a paper titled “When can clades be potentially resolved with morphology?” would be an interesting read.
And it absolutely was, only in a totally different way than I expected. I thought it would be all about the limitations I was thinking of – convergent evolution, defining and interpreting traits, the statistical biases of treebuilding methods, that sort of stuff. Instead, it ignored those issues completely in favour of a much more fundamental limitation. Bapst (2013) doesn’t talk about information that you or your fancy algorithms misinterpret. He talks about information that, due to the very nature of evolution, just isn’t there.
A modern classification of organisms is built out of clades: groups including all descendants of a single common ancestor. Phylogenetic trees are clades within clades within clades – or branches splitting into smaller branches splitting into twigs. A fully resolved tree consists only of two-pronged branching points. That is, if you pick any three creatures, you can tell which two of them are closer to each other than the third. (Resolution is determined by statistical support from methods such as bootstrapping. Bootstrapping basically asks whether all your data agree on the same tree.)
Clades are recognised by what their members share with one another but no one else: for example, a subgroup of dinosaurs that includes birds has feathers, which they inherited from their common ancestor. Each clade can have many such shared derived traits or synapomorphies. However, sometimes there are no synapomorphies. Take, for instance, the case of a single ancestral species “budding off” a series of descendants without changing much itself, like so:
(You could say that three-spine sticklebacks are doing exactly this – the ancestral form that lives in the sea is largely similar all over the northern hemisphere, but it keeps getting stuck in rivers and lakes and sprouting a huge variety of descendants.)
In such a scenario, Descendant 1 is kind of closer to Ancestor than Descendant 2 is, since there’s been less time since they split. However, because Ancestor didn’t change in all that time, there are no synapomorphies that unite it with D1 to the exclusion of D2. A morphology-based phylogenetic tree of these three species would be intrinsically unresolvable – no matter how much data you collect and how well you analyse them, you’re not going to get the true tree, only a sad little bush. (A molecular phylogeny may be able to resolve a history like this, since genomes aren’t going to stop evolving just because the creatures that have them look the same.)
This is the sort of limitation Bapst explores through his simulations. The simulations don’t actually model the evolution of morphology itself. They compress all morphological change into “differentiation events”, i.e. the point at which two taxa become distinguishable. (He later makes the important point that “taxa” could be anything on the traditional Linnaean scale – species, families, classes, whatever -, and his conclusions would remain the same.)*
Differentiation events then might happen in a variety of ways, illustrated by Bapst’s Figure 2 below:
In other words, there can be branching without differentiation, differentiation without branching, and anywhere in between.
The simulations investigate how many intrinsically unresolvable clades we should expect under various mixtures of the four scenarios above, combined with more or less complete sampling of the fossil record. Some of the observations I found fascinating:
- More complete sampling actually decreases resolvability, since your dataset is then more likely to include both ancestors and their descendants.**
- Unresolvable clades are spread evenly throughout the whole model phylogeny – they aren’t disproportionately older or younger than their well-behaved counterparts. This is very important to me because it means that intrinsic unresolvability could also affect the levels I’m most interested in, i.e. the phylum-level relationships of animals.
- No realistic simulation – i.e. those whose parameters and results are compatible with the real fossil record – produces fully resolvable phylogenies!
It’s worth noting that it’s actually close to impossible to tell whether the lack of resolution in any given real dataset is due to this intrinsic effect or some other issue. However, the take home message of this study is that however well you’ve eliminated other sources of ambiguity, you should pretty much never expect a fully resolved phylogeny if you are working with the morphology of real creatures. If you got one, you probably did something wrong!
(Considering that molecular data can be just as incapable of correctly resolving relationships under certain circumstances, I dearly hope that the problem groups are at least going to be different for the two kinds of data… :D)
*This is a distinctly punk eek-flavoured model, BTW; if morphological change is evenly spread out through time, the whole thing falls apart. But, then, if change is evenly spread through time, you wouldn’t have scenarios with unchanged ancestors like the one above, and I gather that the existence of those is an established palaeontological reality.
**However, this doesn’t mean that trees obtained from patchy fossil records will be more accurate – having a poorer sample also means potentially overlooking misleading changes like reversals to an ancestral state.
Bapst DW (2013) When can clades be potentially resolved with morphology? PLoS ONE 8:e62312