Slime moulds don’t play by the rules

I’m starting to think dictyostelids are seriously interesting. These are the guys whose eerily animal-like epithelial tissues prompted the idea of multicellularity being ancestral to the lineage containing animals, choanoflagellates, fungi and amoebae. (Incidentally, Parfrey and Lahr [2013] wrote a nice critical response to that hypothesis – it deserves a post of its own, but not this post.) They are used as model organisms in (evolutionary) developmental biology (Schaap, 2011), a field which is mostly dominated by animals and plants for obvious reasons.

Recently I wrote about the developmental hourglass pattern, which means that the most conserved developmental stages are not the earliest (as Karl von Baer thought at the dawn of comparative embryology), but some way into development. This pattern has been found in several animal phyla both at the morphological level and in various features of developmental gene expression, and it was recently also discovered in plants, which prompted my first post about it.

A group of researchers reckoned they should check how universal the hourglass is, and they thought the slime mould/social amoeba and honoured developmental model organism Dictyostelium is a good place to look (Tian et al., 2013). Unlike plants and animals, which develop from a single cell, the multicellular life stage of dictyostelids is a gathering of thousands of previously independent cells that may not be genetically identical. Therefore, these tiny creatures represent a very different approach to development from our favourite lab animals. Whether or not they still show an hourglass pattern could give clues about the deeper laws that govern all developmental processes.

Dictyostelids turn out to be complete deviants in this respect. Comparisons of the genes two species of Dictyostelium use in their multicellular development show neither von Baer’s “funnel” pattern of similarity nor an hourglass. If you include single-celled stages that aren’t, strictly speaking, “developmental”, similarities of gene expression give a “reverse hourglass” with lowest similarity in the middle. If you only consider the actual multicellular developmental stages, conservation increases towards the end – an “inverted funnel”. Other measures gave Tian et al. largely consistent results – genes expressed later in development were more likely to also be present in the other species, and their sequences were more similar on average.

Now that we have a pattern – what could explain it? The authors speculate that an idea that had been used to explain the hourglass in animals may apply just as well to the inverted funnel of slime moulds. This idea is that the evolvability of a developmental stage depends on the interactions that occur during it. The more interactions between genes/cells/tissues, the worse the effect of a tiny screw-up and the smaller the chance of a beneficial change, hence the most interconnected developmental stages will tend to be most conserved in evolution.

In animals, goes the reasoning, early development is relatively simple, and later development is relatively modular. Early on, there’s less to screw up, whereas later, every screw-up is limited to part of the embryo. In between is the sweet spot where everything talks to everything and a small modification can have large knock-on effects. The result is the hourglass. In slime moulds, however, that later stage when the developing organism is subdivided into semi-independent modules never comes. All tissues keep communicating and affecting each other right up to the point where the multicellular body is fully developed. Thus, if you like, only the first half of the hourglass happens in these creatures.

It’s an interesting idea. I like it.



Parfrey LW & Lahr DJG (2013) Multicellularity arose several times in the evolution of eukaryotes. BioEssays advance online publication, 11/01/2013, doi: 10.1002/bies.201200143

Schaap P (2011) Evolutionary crossroads in developmental biology: Dictyostelium discoideum. Development 138:387-396

Tian X et al. (2013) Dictyostelium development shows a novel pattern of evolutionary conservation. Molecular Biology and Evolution advance online publication 16/01/2013, doi: 10.1093/molbev/mst007

Interpreting ‘omics: a good example this time

Remember how I complained that people often seem to forget the scientific method when it comes to transcriptomics? Well, I’m glad to say some scientists still remember those all-important steps between data and conclusion. When looking at the predicted functions of the genes active in these cute little baby worms* during the first three days of their lives, Kenny and Shimeld (2012) not only compared their data to a “background” dataset from a well-studied animal, but also

  • did statistical tests to confirm that the differences they saw were real,
  • discussed several possible causes for them.

… including those that weren’t biologically interesting at all, like limitations of their methods. In the end, they couldn’t really draw strong conclusions from this particular part of their analysis, but the best thing is they sound perfectly aware of the difficulty and careful not to go too far in interpretation.

Folks, this is how you should write a transcriptomics paper. Not look at a few out-of-context numbers and concoct a story around them.

*Okay, am I the only one who finds trochophores adorable? :$



Kenny NJ & Shimeld SM (2012) Additive multiple k-mer transcriptome of the keelworm Pomatoceros lamarckii (Annelida: Serpulidae) reveals annelid trochophore transcription factor cassette. Development Genes & Evolution Online First™ article available online 8/10/2012, doi: 10.1007/s00427-012-0416-6

More genes from scratch

Following on from the yeast “proto-gene” study, I’ve started paying more attention to news about gene birth from non-coding DNA. (Or “junk” DNA, if you will, though “junk” is… something of a misnomer :-P) The yeast paper explored protein-coding genes in the process of birth. This new one I found in PLoS Genetics looks at genes that have already been born, and argues that the sequences they came from were functional long before they began to serve as templates for proteins.

Paternity testing for genes

So how do you know that a protein-coding gene came from non-coding DNA? Xie et al. (2012) looked specifically for genes born along the ape lineage, that is, the group that includes gibbons, orangs, chimps, gorillas and ourselves. They searched for human genes that had no protein-coding homologue in non-apes including rhesus macaques, mice, dogs and a handful of other mammals, but did match some of the other animals’ DNA sequence. It was also important that there were no other similar sequences in the human genome itself, which might have indicated that the “new” gene actually originated by duplication.

To ascertain that the selected genes represented gene birth in the ape lineage as opposed to a dying gene in other mammals, they also looked at the sequence changes that garbled the would-be protein product of the non-ape sequences. If these are the same in all non-ape versions of a gene, that probably means that they were inherited from the common ancestor of apes and all these species, that is, the non-coding version of the gene came first. Only genes that really seemed to have been born in our lineage were kept.

In the end, they came up with 24 genes that passed all muster. Some of these coded for proteins in both chimps and humans, others only in humans. Based on RNA-sequencing data from rhesus macaques, 20 non-coding versions of these genes were active in monkeys – and this is where things get interesting.

Function in the junkyard

The big question the team asked was this: are these non-coding RNAs just random noise in transcription, or are they already functional even without a protein product? They decided to answer this question by looking at the structure and expression patterns of the RNAs in macaques, chimps and humans.

By “structure”, they mean how the RNA is edited after it’s transcribed from the gene sitting in the DNA. The RNA from most genes in animals and other eukaryotes isn’t taken straight through protein synthesis. First, pieces called introns are chopped out and the rest (called exons) are spliced together to yield the final template for the protein.* When the researchers looked at RNA sequencing results from the three primates, they found that the non-coding sequences in macaques were cut and joined at the same points that the protein-coding human sequences were: a bunch of RNA sequences from macaques spanned both sides of a human splice site, and contained none of the intron in between. Such conservation, the authors argue, is indicative of functionality.

Expression patterns also suggested that the macaque sequences weren’t just noise: when they compared the abundance of the different RNAs between different tissues in the three primates, Xie et al. found that the non-coding RNAs weren’t just expressed all over the place – they were significantly more abundant in some parts of the body than others. This pattern was consistent across species: a sequence that was most abundant in the macaque’s brain was also likely to be brain-specific in humans, even though the macaque version didn’t code for a protein and the human gene did. (By the way, a lot of the 24 were most active in the brain, which is apparently something of a trend among new human genes regardless of their mode of origin. Guess our brains evolved rather a lot in the last few million years ;))

This is very cool, but I’m kind of worried about the arguments used for functionality. Maybe this is just me being a newbie in this area, but I’m not sure that useless non-coding RNAs should be expressed all over the place. One of the salient features of the 24 genes in this study is that they are nearby other genes, sometimes even overlapping them. In any case, they are close enough to use the switches that normally regulate the activity of the other gene(s). That would mean that they’d be most highly expressed wherever their neighbours are, which doesn’t depend on them having a function. If some of them happened to acquire proteing coding potential by mutation, presumably it’d only be kept by natural selection if the resulting proteins did something useful in those places, hence the conservation of expression patterns.

Likewise, splice sites may well arise by accident (they aren’t all that complex), and they don’t have to disappear just because a mutation somewhere else makes the sequence suitable for protein synthesis. Though in fact, splice site conservation sounds more convincing than expression conservation to me as far as arguments for function go. Because splice sites can come and go quite easily, there’s no reason they should be particularly conserved between any two sequences unless they’re important. And the splice sites can only be important if the sequence they’re in does something. Who cares where you cut a strand of random RNA that’ll only end up eaten by housekeeping enzymes anyway?

So, while I get all excited about the whole new genes side of things, and I love this sort of genomic detective work, I think I still have to sleep on that point about function coming before protein. It’s a pity they didn’t check if any of the “non-coding” RNAs in macaques (and chimps) were occasionally translated into a protein, albeit a smaller one than the human counterpart. The yeast people did that and it was awesome, and it would have been such an informative thing to do in this case.

(Also, it would have been darn cool if they’d tried knocking out some of them and seeing if they got screwed up monkeys, but let’s be realistic. Macaques don’t breed like mice, they take a hell of a long time to grow up, and we’re kinda reluctant to mistreat our close relatives like that. You’d have to be comic book supervillain insane to embark on that experiment.)


*This may sound like a weird way to run a genome, but it’s actually quite good for making more than one product from the same gene. It’s pretty important in real life – nearly all human genes with multiple exons are spliced in at least two different ways, and many genetic diseases originate from messed up splicing.



Xie C et al. (2012) Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genetics 8:e1002942

Genes from scratch

When we talk about evolutionary novelty, especially if the talking is to non-specialists, gene duplication is all the rage. From the sophistication of vertebrate blood clotting to the seemingly pointless complexity of a yeast proton pump (Finnigan et al., 2012), accidentally copied genes are undoubtedly an important source of new stuff in evolution. But copying and tweaking is not the only way new genes can arise. Sometimes, new genes really are new.

I admit, I wasn’t nearly excited enough about this possibility until this paper landed in my RSS reader a while back. Toll-Riera et al. (2012) find that the boring repetitive DNA that my gut feeling would’ve dismissed as true “junk” may actually be a great source of new proteins. First, it’s a good theoretical source . Long stretches of repetitive sequence are less likely than random sequence to suddenly and unceremoniously end in a stop codon* and translate to a short and useless amino acid sequence. Second, it appears that younger proteins do contain more repetitive sequence than old ones. What’s more, the repeats are often found within the regions that confer function on proteins. They aren’t just useless filler.

So, okay, a lot of proteins seem come from pieces of “junk” DNA. How?

Maybe they arise from random gene expression noise and turn into proper genes gradually, say Carvunis et al. (2012). It has been known for a while that DNA that doesn’t belong to traditionally recognised genes quite often gets transcribed into RNA in cells. Sometimes, these random bits of RNA may even be translated into an amino acid chain. If some of these accidents are actually useful, the researchers reasoned, they could create a selection pressure to turn the DNA that produced them into a proper gene.

They took this idea and applied it in a study of open reading frames (ORFs) in the yeast genome. An “ORF” is jargon for a stretch of DNA that isn’t interrupted by stop codons. In theory, any ORF could make a “meaningful” piece of protein. Most ORFs that aren’t genes are short, often just a handful of codons; and most ORFs known to be genes are long, with hundreds of codons. The team argued that if random ORFs can give rise to genes, there should be plenty of transitional forms.

To test this, they first classified all the hundreds of thousands of ORFs in the yeast genome according to their evolutionary age. The ones that were conserved in all of the yeast species they used for comparison were given a score of 10, and ORFs that only brewer’s yeast had were called zeroes. (Most known genes belong to classes 5-10, meaning they evolved quite far back on the yeast family tree.) The next step was to pick the Class Zero ORFs that were actually transcribed and translated, so might be in the pool of potential “proto-genes”. They found this set of “0+” ORFs by analysing RNA sequencing data in both happy yeast cells and yeast deprived of food, just to make sure they caught any sequences that only acted like genes under some circumstances. In addition, they also checked which of those RNAs were associated with ribosomes, the sites of translation. These filtering steps left over a thousand little ORFs that don’t belong to known genes, are completely unique to Saccharomyces cerevisiae, expressed, and probably translated.

Going up the conservation scale, ORFs become increasingly gene-like. The older ones are longer, their RNA copies are more abundant, and more of them appear constrained by natural selection. (Interestingly, when you translate them, the more gene-like ORFs produce less ordered protein structures. Not sure what to make of that.) Proper genes are also better suited to get ribosomes to translate them. Conservation classes 1-4, those ORFs that are shared only by closely related Saccharomyces species, are intermediate in all of these properties (and some more) between the zeroes and the older ORFs.

There is one more thing about this study that definitely bears mentioning When you count how many new gene duplicates this yeast species has versus how many new, potentially functional, random ORFs, the latter come out on top by far. Between them, S. cerevisiae and its closest sister species apparently have somewhere between one and five newly duplicated genes. The same duo also came up with nineteen new ORFs that are under selection and therefore probably functional. Potentially, these random little sequences people might have dismissed as background noise not long ago are more potent sources of new genes than the celebrated gene duplication.

I don’t know about you, but that absolutely fascinates me.


P.S.: Incidentally, this is all about protein-coding genes. However, thousands of genes in your own genome do NOT encode proteins. They include genes for the good old RNA components of the translation machinery, ribosomal and transfer RNA, but there are also other RNA genes with transcripts involved in everything from keeping parasitic DNA in check to editing the messenger RNAs of other genes. I kind of want to find out how these RNAs form and acquire functions. Also, when we are quite happy to call a piece of DNA that doesn’t have a protein product a “gene”, and cells are swarming with RNA that doesn’t come from things traditionally called “genes”, and some of this RNA actually does encode proteins, what does that do to the definition of a “gene”??


*Gotta love the mnemonics on that page. I didn’t think three three-letter combinations would be that hard to remember, but I have to admit I chuckled at “U Are Gone”.



Carvunis A-R et al. (2012) Proto-genes and de novo gene birth. Nature advance online publication, doi: 10.1038/nature11184

Finnigan GC et al. (2012) Evolution of increased complexity in a molecular machine. Nature 481:360-364

Toll-Riera M et al. (2012) Role of low-complexity sequences in the formation of novel protein-coding sequences. Molecular Biology and Evolution 29:883-886