I’ve been writing this blog almost since I started my PhD, but the closest I actually got to writing about my own work was a long fangirl squee about fan worms. Most of my project involved describing some really basic things about a relatively unknown animal, and probably not terribly interesting unless you’re an expert in my field (also, my brain is convinced that nearly everything I do is shit, so I don’t particularly like talking about it…). However, I do have this cool little story I’ve been burning to tell the world, and couldn’t because we wanted it published… Now it is (Szabó and Ferrier ; there goes my super-secret identity, I suppose ;) )
My story involves a family of proteins called msp130. I wish they had a more fun name than that, but they were named by sea urchin people, and unlike the fruit fly community, they don’t really seem to care about making their gene names fun. (Msp130 stands for “mesenchyme-specific protein, 130 kDa”, in case you wondered; kDa, kilodaltons, being units of molecular mass.)
It all started with a sea urchin
The original msp130 was discovered in sea urchin larvae. It is found in – or rather, on the surface of – primary mesenchyme cells (PMCs), a specialised population of cells that build the calcareous skeleton of the larva. Here’s a photo of a sea urchin embryo with PMCs stained blue, from Illies et al. (2002). At this stage, the embryo is basically a squashed ball with a hole through most of it; the hole is going to become the gut, and its opening is the future anus.
Here’s a polarised light photograph of an older larva of a sea biscuit. The skeleton is pretty much the only thing you can see, highlighted in stunning rainbow colours due to the birefringence of the mineral (Bruno Vellutini, flickr):
Msp130 turned out to be essential for skeleton formation – when researchers blocked its surface with antibodies, PMCs cultured in a dish couldn’t take up calcium and couldn’t make spicules (Carson et al., 1985; Anstrom et al., 1987). Not quite so long ago, Illies et al. (2002) found that S. purpuratus has at least three msp130 genes, and in the embryo/larva, the other two are also exclusively expressed in PMCs. This is what the first picture above shows: the blue stain appears in cells that express one of the msp130-related genes.
Anyway. A few years later, after the sequencing of the S. purpuratus genome, it turned out that there were at least seven such genes, residing in a couple of clusters in the genome (Livingston et al., 2006). However, until very recently, the msp130 family was only studied in echinoderms.
Horizons are expanded and weirdness is found
BUT, this being the genomic era, sea urchin guru Charles Ettensohn wanted to know more about these buggers – just how common are they? Where do they come from? Are they always lurking in genomes that have to produce calcified skeletons? What he found in sifting through the vast repository of sequence data that is Genbank was very interesting and somewhat puzzling: across the entire tree of life, msp130 genes only seemed to be present in echinoderms, acorn worms, lancelets, molluscs, a handful of algae… plus loads of bacteria and archaea (Ettensohn, 2014). There was no mistaking it: to someone accustomed to comparing protein sequences, the bacterial sequences very clearly were the same thing as the ones from animals and algae.
So, Ettensohn concluded, it looks like animals (and algae) probably didn’t inherit this thing directly from their common ancestor with other life forms. That would imply a lot of independent losses, and Occam’s razor dictates that we shouldn’t postulate so many hypothetical events without good reason (although, as Maeso et al.  point out, animal genomes don’t seem to be quite as keen on Occam’s razor as scientists).
Instead, supposing that animals and algae repeatedly acquired these genes by horizontal gene transfer from bacteria (or each other?) seems like a simpler explanation. At least one loss probably did occur – among deuterostomes, vertebrates and sea squirts are the odd ones out in not having msp130 genes, and the most Occamific explanation of that pattern is that we just mislaid them somewhere along the line. Here’s a graphical representation of Ettensohn’s scenario from his paper – “HGT” stands for horizontal gene transfer events, and grey circles are meant to represent the extra msp130 genes that later evolved in each lineage by gene duplication:
However, Ettensohn also pointed out that whole genome-level information about most animal groups is still pretty thin on the ground (seriously, everyone, stop sequencing more stupid vertebrates. We’re all the same.) We don’t, for example, have published genomes from calcareous sponges, or from annelid worms who build calcareous tubes or have other calcareous hard parts. Like my wormies. And here’s where I come in – I happen to have a decent amount of transcriptome data (alas, no genome) from just the right kind of annelid. Better, my data are derived specifically from an organ with calcareous parts (the operculum – see my fanworm post).
Naturally, as soon as I read Ettensohn’s paper, the first thing I did was grab the sequence of the “original” msp130 protein and search my own data for a match. Ettensohn said that msp130 sequences were very easy to recognise… And yep, they are. With not much effort at all, I found a lovely, full-length msp130-like sequence in my big pile of data. Much as I hate doing molecular biology, I also managed to confirm the presence of the messenger RNA (or at least the presence of one end of it) in an actual test tube of actual RNA taken from the operculum. But that’s not really saying much re: the whole gene thievery issue – yeah, another animal fairly closely related to molluscs has an msp130 gene, and it’s active somewhere within a millimetre of a calcareous hard part. That, unfortunately, says precisely bugger all about their evolutionary origin.
But I had an idea, peeps. Introns!!!
Genes in pieces make answers come together
There is an important difference between the genes of prokaryotes like bacteria and eukaryotes like algae or animals. In the former, most genes are uninterrupted stretches of DNA. A bacterial gene is transcribed into messenger RNA, and everything in that mRNA that stands between the “start protein” and “end protein” signals is translated into a protein using the appropriate genetic code.
Most of the genes of eukaryotes, however, consist of chunks that encode parts of the protein product (exons) interrupted by chunks that get discarded during or after transcription (introns)*. So there’s a potentially easy way of telling whether a gene in two different animals came from their common ancestor or from some overly generous microbes. If they have introns in matching locations, that’s not a similarity they could have acquired just by getting the gene from the same bacterium!
I say potentially easy for at least two reasons. One, while some gene families keep their introns in the same places for a very long time, introns can come and go in evolution. They can even disappear completely under some circumstances, although something the size of msp130 does usually have at least a few. If msp130 genes have fast-evolving structures, we may not be able to tell whether molluscs and deuterostomes acquired them independently, or whether the positions of introns just changed too much since their common ancestor.
Two, introns can theoretically evolve twice in the same place – just as some parts of a genome can be hotspots for mutations, parts of a gene can be hotspots for new introns. Of course, the more similar the overall structure of two genes, the less likely “intron hotspots” become as an explanation.
I compared the exon-intron structures of all msp130 genes in a few representative species with sequenced genomes in which Ettensohn found such genes. Besides sea urchins (which are from one of the two main deuterostome lineages), I chose lancelets (from the other great branch of deuterostomes) and limpets (which are molluscs). Together, these three creatures represent all major animal lineages in which msp130 genes have been found. Alas, I couldn’t do it with my own animals, because I don’t have a genome to play with :( . I also checked all three algae – the two green algae on Ettensohn’s list are fairly closely related, but the third one is a brown alga separated by upwards of a billion years.
As I said, all of these species have fully sequenced genomes, but you really need two sources of data to do this kind of thing properly. A genome sequence includes the complete gene with all the introns – but without the corresponding mRNA sequences, we must use clever computer programs that search for characteristic DNA motifs and/or sequence similarity to other organisms to predict where introns begin and end. Aside from clever programs occasionally being remarkably stupid or getting confused by sequencing errors, you can hopefully see how relying on similarity doesn’t exactly provide unbiased evidence for my purposes.
Sequences derived from transcripts only contain exons, however, and not because a computer predicted them, but because they’re read from the fully edited mRNA. So aligning transcripts with genomes should tell you exactly where the introns are, although transcript data were incomplete or altogether missing for some of the genes I looked at. (I didn’t have that problem with sea urchins – Tu et al.  helpfully sequenced transcripts of pretty much all urchin genes and uploaded the results to the genome browser.)
Nonetheless, the data that did exist told us enough to doubt Ettensohn’s idea. Importantly, I found enough to piece together the entire protein-coding portion of the mRNA for two of the limpet msp130 genes – in other words, the animals that Ettensohn thought likely to have acquired the family independently from sea urchins. In total, the animal species I investigated share not just one or two but seven intron locations (an msp130 gene has maybe a dozen introns altogether). One of those is also present in the algae, and the sequence next to it is almost identical across all of the genes. There’s really no mistaking that one! A few more introns are in generally similar locations, though they don’t line up perfectly in my best alignment**.
What can we conclude from this? I think we can probably say with reasonable certainty that deuterostomes and molluscs didn’t get msp130 genes from bacteria separately. Given the similarity with algae, they might not have got it from bacteria at all, although one similarly positioned intron is a lot easier to explain away as convergent evolution.
As I see it, either the last common ancestor of molluscs+annelids and deuterostomes had msp130 genes and only a few of its descendants kept them, or one of the two lineages snatched it from the other after those seven introns had originated. (Animals stealing genes from other animals is relatively uncommon, as far as I know.)
…some answers, anyway…
If you put the evidence for a single origin together with the incredibly gappy distribution of this gene family, the other side of the equation is a ridiculous number of losses. Why? And what’s the deal with msp130 and calcification? Is there a deal at all? Ettensohn speculated that acquiring msp130 might have had something to do with acquiring calcareous skeletons – did it?
IMO we really don’t have enough examples to properly assess this association, and my impression is that we actually know very little about the roles of these genes. Oh, we know that some of them are pretty specific to calcification in certain echinoderms, and they seem to be around in multiple organs in molluscs given that the hundreds of RNA sequences I found had been extracted from anything from gonads to mouthparts. And, of course, at least one of them is doing something in a partly calcified body part in my annelid, though we haven’t yet checked exactly where or what.
But calcification is pretty much the only context in which msp130s have been investigated; since everyone thought they were just echinoderm “calcification genes”, no one thought to look elsewhere. What do they do in, say, lancelets, which have six of the genes but not much of a calcified skeleton that we know of? Lancelets may well have something calcareous that isn’t a skeleton – other animals with no obvious calcareous skeletons, such as arachnids or earthworms, produce little calcareous granules that might work to store calcium or get rid of a surplus. Most of the limpet transcripts I found come from testicles or ovaries, which don’t tend to calcify, but gonads are a bit special and turn on lots of random genomic shit that may or may not actually have a function. AFAIK, none of the three algae from which msp130 genes are known has a calcareous skeleton, but many other algae do.
In summary, I did some detective work and discovered something and I feel rather clever about all of that, but in the process I learned just how much more we don’t know about this obscure but intriguing little gene family.
… actually, that sounds like a fairly typical summer in science. :)
*Don’t ask me how that happened (it’s not even remotely my area), but now that the system exists, it does enable eukaryotes to make loads of different proteins from a single gene just by picking and choosing which exons to keep. See fruit fly Dscam, or the “brutally murdering the one gene, one protein hypothesis, forty thousand splice variants at a time” gene. Introns can also contain a variety of regulatory sequences that determine either the behaviour of their own gene or even that of a different gene, so introns are far from useless. They’re just a bit… counterintuitive.
**Aligning similar sequences is part science, part art. Often, there’s no single clear best way to align two or more genes or proteins; the various programs people have written for this job will all come up with slightly different answers, and an experienced pair of eyes will probably want to tweak all of them. Whether introns are really in the same place in two genes can therefore be a bit ambiguous, depending on the degree of sequence similarity.
Anstrom JA et al. (1987) Localization and expression of msp130, a primary mesenchyme lineage-specific cell surface protein in the sea urchin embryo. Development 101:255-265
Carson DD et al. (1985) A monoclonal antibody inhibits calcium accumulation and skeleton formation in cultured embryonic cells of the sea urchin. Cell 41:639-648
Ettensohn CA (2014) Horizontal transfer of the msp130 gene supported the evolution of metazoan biomineralization. Evolution & Development 16:139-148
Illies MR et al. (2002) Identification and developmental expression of new biomineralization proteins in the sea urchin Strongylocentrotus purpuratus. Development Genes and Evolution 212:419-431
Livingston BT et al. (2006) A genome-wide analysis of biomineralization-related proteins in the sea urchin Strongylocentrotus purpuratus. Developmental Biology 300:335-348
Maeso I et al. (2012) Widespread recurrent evolution of genomic features. Genome Biology and Evolution 4:486-500
Szabó R & Ferrier DEK (2015) Another biomineralising protostome with an msp130 gene and conservation of msp130 gene structure across Bilateria. Evolution & Development 17:195-197
Tu Q et al. (2012) Gene structure in the sea urchin Strongylocentrotus purpuratus based on transcriptome analysis. Genome Research 22:2079-2087