New genes, new tricks, part 2

In my previous post, I marvelled over the strange and unexpected way duplicated genes behave in fruit flies. The second study I wanted to discuss is also about new fruit fly genes gaining new functions, but unlike the other one, it’s about new genes that didn’t come from pre-existing genes.

Reinhardt et al. (2013) wasn’t the best written paper I’ve read, and I had some difficulty figuring out exactly what was going on in places, but there is some interesting stuff in there nonetheless.

The authors investigated six recently evolved new ?protein-coding genes in Drosophila. They wanted to know how they came about and managed to stick. For example, did they first originate as non-coding RNA genes? Did they gain a function through their RNA copies alone before they began to encode a protein? Or did they first awaken from the no man’s land between old genes with protein-coding potential already present?

This harkens back to one of the papers about new genes that I’d previously discussed. Xie et al. (2012) found that the genes for several human-specific proteins began life (and function?) as RNA genes expressed in particular tissues in ancestral primates. What about the six fly genes the new study investigated?

Reinhardt et al.‘s illustration of the two routes to protein-coding geneness is below. Starting with an inactive stretch of DNA (black line), you need two things: (1) an “on” switch or promoter (green box), which causes the transcription of RNA (blue) from the region, and (2) a sequence that can be translated into a decent length protein (an open reading frame or ORF, pink box). These two can theoretically appear in either order.

Before we get into the meat of the paper, let’s borrow the Drosophila family tree from the 12 genomes project page:

D. melanogaster, third from the top, is the species that has been used for every variety of biological investigation for over a hundred years, and also the focus of this study. However, the other species were also used for comparison, to see exactly where and how the genes originated.

Five of the six genes had a relatively long history, with similar sequences being found in D. yakuba and erecta or even further out in D. ananassae. Three of them were not only there in those species, but could also potentially make a nice protein. In two genes, the sequence or part of it was recognisable all the way to ananassae, but it only had long sensible ORFs in melanogaster itself.

In terms of activity… well, first of all I think they screwed up Figure 2. Supposedly, the names of the species in which transcription of these genes was detected are bolded, but actually, all the names are bolded in all the trees, which doesn’t agree with what they say (or with the green dots signifying the origin of transcription in the same figure). Anyway, assuming the bolding was a mistake and the green dots are in the right place, it sounds like four of the six genes were already active in the common ancestor of melanogaster and yakuba or earlier, while another two were only turned on in the melanogaster/sechellia/simulans lineage.

The order of events varies from gene to gene: four genes had good solid ORFs right from the start, while two were transcribed before they were suitable protein templates. The authors note that we can’t actually be sure whether or not the first four developed an ORF before they became active. To be certain of that, we would need more distantly related species with a matching ORF that isn’t transcribed, but in all four cases the species lacking expression of the gene also totally lack any trace of the sequence. So, while the remaining two genes provide positive evidence for the transcription-first scenario, the jury is still out on the ORF-first option.

In D. melanogaster, the presence of the protein product was confirmed for the four genes with the oldest ORFs. The two youngest may still be translated: the protein data came only from embryos, and in fact all six genes contain short signals that are normally associated with the transport of proteins to specific parts of the cell. You might reason that a gene that never makes a protein doesn’t need such signals, but nevertheless, the authors couldn’t positively confirm the existence of these proteins without data from other life stages.

Where these genes are active brings us back to a common theme we encountered in the previous post. In adult D. melanogaster, all six are most strongly expressed in the testicles, and the products of one of them are exclusive to those organs. Likewise, male larvae show more expression of all six genes than females do. The other species show basically the same pattern.

What do these genes do? Actually, do they do anything? Being expressed, even being translated to protein, doesn’t necessarily equate to having a function. Luckily, “function” is not terribly difficult to test for in fruit flies. There are lots of clever tricks that allow you to manipulate their genes and look at the consequences. In this case, Reinhardt et al. bred flies where these genes were turned off. If I understood them correctly, they managed to do this for five genes, four of which resulted in very dead flies. Weirdly, for all four, the affected flies died at the same life stage, just before hatching from the pupa.

With a different strategy that produced only partial knock-down of the genes, they got themselves some grown-up survivors, which allowed them to test the effect of the genes on male fertility (a sensible question given where these genes are most active). Out of three knock-downs with surviving adults of both sexes, only one showed a serious effect, and that was the one that produced generally crappy, short-lived weakling males anyway, so while these genes are active in the testicles and they might disproportionately affect males, they don’t seem to have much to do with fertility per se.

In general, the results sound like new genes that come from random bits of DNA can very quickly become essential to the organism, and it also sounds very much like an overabundance of transcripts in the testicles doesn’t mean that that’s where their function lies – it’s probably more that all kinds of things are expressed in testicles, and these genes are still expressed there because that’s how they started their lives.

Something big missing from the study is actually testing when these genes became functional – we’re told when they became expressed and when they started making a protein, but without manipulating them in relevant non-melanogaster species, it’s impossible to tell whether either of those means function. *disappointed pout*

And what’s up with those four genes that were necessary for the flies’ survival? The knock-downs all did their killing at the same stage. I don’t know what to think about that, and the authors don’t really offer an explanation beyond describing control experiments to make sure the deaths weren’t an unfortunate side-effect of the manipulation itself. Is there something about the development of adults that attracts new genes? Is the process of metamorphosis especially sensitive to even minor mess-ups? (More sensitive than early embryonic development?) Intuitively, I’d find the first possibility more likely, but gods know intuition is a poor guide to reality…

***

References:

Reinhardt JA et al. (2013) De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genetics 9:e1003860

Xie C et al. (2012) Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genetics 8:e1002942

New genes, new tricks

I’ve previously written about the birth of new genes. Since new genes are cool, and I just found two recent papers on them, you’re getting more of them.

Part 1: how to survive duplication

Technically, the first paper isn’t about new new genes: Assis and Bachtrog (2013) examined recently duplicated genes in fruit flies. But screw technicalities, what they’re saying makes my eyes pop.

When a gene is accidentally copied, a variety of possible fates can await it. Most of the time, the extra copy just dies. Some mechanisms of gene duplication just take the gene without the regulatory elements it needs to function properly. Even if the new copy works, it’s still redundant, so there’s nothing stopping mutations from destroying it over time. However, sometimes redundancy is removed before the new gene breaks irrevocably, and both copies are kept. This can, in theory, happen in a number of ways. Because I’m feeling lazy, let me just quote them from the paper (square brackets are mine, because I hate repeatedly typing out long ugly words :)):

Four processes can result in the evolutionary preservation of duplicate genes: conservation, neofunctionalization, subfunctionalization, and specialization. Under conservation, ancestral functions are maintained in both copies, likely because increased gene dosage is beneficial (1). Under neofunctionalization [NF], one copy retains its ancestral functions, and the other acquires a novel function (1). Under subfunctionalization [SF], mutations damage different functions of each copy, such that both copies are required to preserve all ancestral gene functions (9, 10). Finally, under specialization, subfunctionalization and neofunctionalization act in concert, producing two copies that are functionally distinct from each other and from the ancestral gene (11).

We might add a variation on NF, too: Proulx and Phillips (2006) theorised that differences in function that arise in different alleles (variants) of a single gene can turn duplication into an advantage, turning the conventional duplication-first, new function-next scenario on its head.

Either way, genomes contain lots of duplicated genes, there’s no question about that. What isn’t nearly as well understood is the relative importance of various mechanisms in producing all these duplicates. It’s much easier to theorise about mechanisms than to test the theories. Since evolution doesn’t stop once a new gene has earned its place in the genome, it can be hard to disentangle the mechanism(s) responsible for its preservation from the stuff that happened to it later. Also, to really assess the relative role of different mechanisms, you’ve got to look at whole genomes.

(Assis and Bachtrog say that this hasn’t been done before, and then go right on to cite He and Zhang [2005], which is a genome-wide study of SF and NF. I guess it doesn’t look at all the mechanisms…)

Assis and Bachtrog used the amazing resource that is the 12 Drosophila genomes project, focusing on D. melanogaster and D. pseudoobscura to find slightly under 300 pairs of genes that duplicated after the divergence of those two species. Since Drosophila genomes are very well-studied, they were able to identify the “parent” and “child” in each pair based on where they sit on their chromosomes. They then also extracted thousands of unduplicated genes from the melanogaster and pseudoobscura genomes, to use as a measure of background divergence between the two species.

To measure changes in gene function, they compared the expression of parent and child genes to each other and to the “ancestral” copy (i.e. the unduplicated gene in the other species) in different parts of the body (if a gene is suddenly turned on somewhere it wasn’t before, it’s probably doing something new!).

Long story short, it turned out that in the majority of cases (167/281) cases the child copy behaved much more differently from the “ancestor” than expected, while the parent copy stayed pretty close. These child copies also showed faster sequence evolution than their parents. This means that NF – and specifically that of the new copy – is the most common fate of newly duplicated genes in these animals. There’s also a fair number of gene pairs where both copies gained new functions or both stuck with the old ones, but only three where both copies lost functions. Pure SF, which very influential studies like Force et al. (1999) championed as the dominant mode of duplicate gene survival, appears to be an incredibly rare occurrence in fruit flies!

A few paragraphs ago I mentioned the caveat that duplicated genes don’t stop evolving just because they’ve managed to survive. Well, the advantage of having all these Drosophila genomes is that you can further break down “young” duplicates into narrower age groups, using the species that fall between melanogaster and pseudoobscura on the tree. However, looking at this breakdown doesn’t change the general pattern – NF of the child copy is the most common and SF is rare or nonexistent in even the youngest age groups, along both the melanogaster and the pseudoobscura lineages.

So what exactly is going on here?

Part of the difference in expression patterns between parent/ancestral and child copies is because these new genes are turned on in the testicles, which might give us a big clue. Testicles, you see, are a bit anarchical. Things that are normally kept silent in the genome, like various kinds of parasitic DNA, wake up and run wild during the making of sperm. If you remember my throwaway reference to duplication mechanisms that cut the gene off from its old regulatory elements – well, the balls are a place where even such lost and lonely genes get a second chance.

The genomic anarchy of testes is also one of the reasons these duplications happen in the first place; the aforementioned mechanism involves those bits of parasitic DNA that copy and paste themselves via an RNA intermediate. The enzymes they use to reverse transcribe this RNA into DNA and insert it back into the genome aren’t particularly discerning, and they’ll happily do their thing on a piece of RNA that isn’t the parasite. Indeed, slightly more NFed child genes than you’d expect originated via RNA, although it’s worth noting that more than half of them still didn’t. So while the testes look like a good place for new gene copies to find a use, they aren’t totally responsible for their origins.

Why is there so little SF among these genes?

This is the Obvious Question; my jaw nearly landed on my desk when I saw the numbers. The authors have two hypotheses, both of which may be true at the same time.

First, SF assumes that the two copies have the same functions to begin with. This is not necessarily true when just a small segment of DNA is duplicated – even when it’s not just a bare gene you’re copying, the new copy might lose part of its old regulatory elements and/or land next to new ones, not to mention Proulx and Phillips’s idea of new functions appearing before duplication. So maybe SF is more common after wholesale duplications of entire genomes, and Drosophila species didn’t have any of those recently.

Secondly, SF happens by genetic drift, which is a random process that works much better in small populations. Fruit flies aren’t known for their small populations, and therefore the dominant evolutionary force acting on their genomes will be selection.

This makes sense to me, but the degree to which NF dominates the picture is still pretty amazing. I wonder what you’d get if you applied the same methods to different species. Would species with smaller populations, or those that recently duplicated their whole genomes, show more evidence for SF as you’d expect if the above reasoning is correct? Or would the data slaughter all those seemingly reasonable explanations? What would you see in parthenogenetic species that have no males (and testicles)?

Part two, with really new genes, hopefully coming soon…

***

References:

Assis R & Bachtrog D (2013) Neofunctionalization of young genes in Drosophila. PNAS 110:17409-17414

He X & Zhang J (2005) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169:1157-1164

Force A et al. (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545

Proulx SR & Phillips PC (2006) Allelic divergence precedes and promotes gene duplication. Evolution 60:881-892

“Same” function, but the devil is in the details.

Aaaaaand todaaaaay, ladies and, um, other kinds of people…. Hox genes!

Considering that I did my Honours project on them and I think they are made of awesome, I’m kind of shocked by the general lack of them here*. Hmmmmmm. Well, having just found Sambrani et al. (2013), I think today is a good time to do something about that.

Hox genes in general are “what goes where” type regulators of development. In bilaterian animals, they tend to work along the head to tail axis of the embryo. (Cnidarians like sea anemones also have them, but the situation re: main body axis and Hox genes in cnidarians is a leeeetle less clear. And heaven knows what sort of weird things happened with the rest of the animals.)

Hox genes are responsible for one of the peculiarities of the insect body plan. Unlike many other arthropods, insects have leg-free abdomens. On the left below is a poor little lobster with legs or related appendages all the way down (plus a bonus clutch of eggs). (Arnstein Rønning, Wikimedia Commons). To her right is a bland, boring insect abdomen (Hans Hillewaert, Wikimedia Commons).

As I said, Hox genes are responsible for the difference. Three of them are expressed in various segments of the abdomen of a developing insect: Ultrabithorax (Ubx), Abdominal-A and Abdominal-B. I’m going to whip out that amazing fluorescent image of Hox gene expression in a fruit fly embryo from Lemons and McGinnis (2006) because aside from being cool as hell, it also happens to be a good illustration:

(The embryo is folded back on itself, so the Abd-B-expressing tail end is right next to the Hox gene-free head)

In insects, all three can turn off the expression of the leg “master” gene distal-less (dll). However, they turn out to do so through two different mechanisms. Ubx and Abd-A proteins have long been known to team up with the distantly related Extradenticle (Exd) and Homothorax (Hth). With their partners, the Hoxes can sit on a regulatory region belonging to the dll gene and prevent its activation.

Sambrani et al. were curious whether Abd-B works in the same way. Sure enough, Abd-B also represses dll wherever it shows up. However, when it comes to interacting with Exd and Hth, differences start to emerge. For starters, those two aren’t even present in the rear end of the abdomen, where Abd-B does its business. When the researchers took the regulatory region of dll and threw various combinations of proteins at it, they found that (1) Abd-B is perfectly capable of binding the DNA on its own, (2) Exd, Hth or engrailed (another Hox cofactor) didn’t improve this ability at all, (3) Hth alone or in combination with the others actually inhibited the binding of Abd-B to the dll regulatory sequence.

Interestingly, dll repression in the anterior and posterior abdominal segments requires the exact same bits of regulatory DNA even though different proteins are involved. It looks like in the posterior segments, Abd-B actually takes over an “Exd” binding site – maybe that’s how it can do the job without getting Exd itself involved.

Furthermore, while the DNA-binding ability of Abd-B is crucial to its ability to kill dll expression, the same is not the case for Ubx. The authors speculate that cooperation with Exd and Hth kind of exempts Ubx from having to bind the regulatory sequences itself, while Abd-B, being on its own, can’t afford to slack off like that. The paper illustrates the idea with such a deliciously ugly pair of drawings that I feel compelled to post it:

(I know they’re going for colour-matching with the fluorescent images, but unfortunately glowy greens and reds that look good on a black background kind of just hurt my eyes on white.)

I don’t really have a point to make here. (There doesn’t always have to be a point, right?) There’s absolutely nothing surprising about the fact that different Hox genes evolved the same overall function in different ways –  after all, they existed as separate entities long before insects lost their buttward legs. I just think Hox genes are cool, and this was an interesting look into the nuts and bolts of how they work. And that’s that.

Cheerio!

***

*Well, aside from this one I’ve written three posts about them and a couple more where they are mentioned. That’s maybe not that bad considering how many different things I’m interested in.

***

References:

Lemons D and McGinnis W (2006) Genomic evolution of Hox gene clusters. Science 313:1918-1922

Sambrani N et al. (2013) Distinct molecular strategies for Hox-mediated limb suppression in Drosophila: From cooperativity to dispensability/antagonism in TALE partnership. PLoS Genetics 9:e1003307.