Molecular Biology 101: Synteny, Conservation and two wheat genomes


Somehow between going to the Netherlands, the Easter break, a week-long lab course and a conference talk to write I managed to miss not just one, but two really interesting, exciting and useful papers in Nature (Incidentally, I try not to write too much on here related to my PhD: I’m always a little scared that I’ll end up saying similar things about papers in my literature review and then being pulled up for plagiarism or something, but these are two interesting to miss.) But I digress.

Sequencing the wheat A and D genomes

Two weeks ago a consortium of Chinese and American scientists published two papers about sequencing both the A and the D genome progenitors for bread wheat. (Quick re-cap for the un-initiated. Wheat is a hexaploid i.e. instead of having one maternal and one paternal copy of each chromosome – that is, 2 in total, it has 3 pairs of each, making its genotype AABBDD). This is pretty big news for a couple of reasons:

As I’ve explained in the past, producing an assembled wheat genome sequence is a long and arduous process. It’s huge and full (2/3 to 4/5 full!) of repeating sequences that don’t code for genes. Sequencing the genome physically, chromosome by chromosome, is taking the IWGSC a long time, and in the mean time all data is useful data, even if it’s not perfectly assembled.

– The D genome in particular is really interesting, because hexaploid wheat can grow in a crazy range of environments. We expect a lot of cool and useful genes to be on the D genome.

– Actually the A genome is pretty interesting too. There are a lot of QTLs (regions corresponding to interesting additive traits like plant height and yield) in the A genome.

Even though the sequences they have aren’t in a full annotated map, there’s still an awful lot we can tell about them, largely because of two key concepts in molecular biology:

Conservation is the idea of ‘If it ain’t broke, don’t fix it’. As species evolve and diverge the most important proteins are the ones least likely to change. This is true even within particular genes: if a part of the protein interacts with another molecule (e.g. if it allows the protein to bind to DNA or another protein) then it may be consistently unchanged between related species. This means that even without an assembled wheat sequence, it is possible to compare wheat DNA to the DNA of rice, Brachypodium, and barley and detect genes of similar function.

Synteny is the idea that genes stay in the same order throughout evolution. If Gene A is next to Gene B and Gene C in Plant 1, then it will probably be found in the same place in related species, Plant 2. This turns out to be the case with the wheat genomes and Brachypodium. Because they’re in more or less the same order, we can use Brachy as a ‘reference sequence’ and align the contigs to that, helping us to figure out which contigs go next to one another even if they don’t actually overlap.

Syntenic relationships between the seven hypothetical chromosomes of T. urartu (1A–7A) and the five chromosomes of B. distachyon (Bd1–Bd5). The deletion bin maps of bread wheat 1A–7A chromosomes are noted at the bottom Jia, J., Zhao, S., et al (2013) Nature 496,91–95

Syntenic relationships between the seven hypothetical chromosomes of T. urartu and the five chromosomes of B. distachyon.
Jia, J., Zhao, S., et al (2013) Nature 496,91–95

Method to the Madness

So about these papers. In both papers the authors have taken one of the presumed parents of wheat (Triticum urartu for the A genome, and Aegilops tauschii* for the D genome) and shot-gun sequenced them. This means that instead of painstakingly sequencing big bits of DNA in a known order, you chop the DNA up (shear it) and then sequence the small bits. You then use an algorithm to look for overlaps and join them up to make a contig. (I explained that a little more here). The sequences are then compared to other known sequences (from similar plants).

A contig is a series of contiguous (overlapping) sequences that when put together give the original sequence of the gene, bin or chromosome of interest

A contig is a series of contiguous (overlapping) sequences that when put together give the original sequence of the gene, bin or chromosome of interest

In addition the authors used RNA-Seq sequencing. This is where you take RNA (i.e. a molecule like DNA that corresponds to only the gene sequences – none of the repeating rubbish in between them) turn it into DNA (by using an enzyme called reverse transcriptase) and sequence that. The point is that a) you get a ‘gene enriched’ dataset (i.e. more copies of the bits of the genome that are actually interesting) and that b) if you do lots of different tissue types e.g. leaf vs root, flowering plant vs just germinating plant then you can compare which genes are expressed at different times.

And they found…

  • 34, 879 genes on the A genome and 43, 150 genes on the D genome. (Well, strictly speaking protein-coding gene-regions, but provided the coverage is good enough, those are more or less the same thing). 
  • 2, 989, 540 SNPs between two lines of the A genome progenitor  and 151, 083 SNPs between two lines of the D genome progenitor. (These are potentially useful for breeding because you can use them to track genes and traits). Also, over 160 000 SSRs (short series repeats aka microsatellites), of which 33% are specific to the A genome on wheat (so they can be used to just look at one genome in a breeding population) and 29% are polymorphic in wheat (so can be used as markers).
  • The A genome progenitor, T. urartu, has around 6000 more genes than the A genome in wheat. (This is expected, as polyploids frequently lose copies of genes).
  • Both the A and the D genome contain several unique disease resistance genes. The A genome boasts 593 R proteins  (genes providing disease resistance), versus just 197 in Brachypodium and 460 in rice; and there are twice as many R gene analogues in the D genome as in rice, and six times as many as in maize. 
  • The D genome has a more than usual number of genes for disease resistance, abiotic stress tolerance, cold tolerance and grain quality (compared to other crops and grasses). There are also 485 cytochrome P450 genes (linked to dealing with abiotic stress such as detoxification) in Aegilops tauschii (c.f. 333 in rice and 262 in Brachypodium) and many more MYB-related genes.
  • Aegilops tauschii has several cold-tolerance genes, and several of those that are specific to it (i.e. not expressed in related species) are constitutively expressed (switched on all the time)
  • Most genes in the D genome progenitor have undergone purifying selection (i.e. they have low Ka/Ks ratios)

Summary:

Draft sequences from two of the three parents of wheat have been published. The data tell us some interesting things about how wheat has evolved, and why it is so robust, as well as providing useful markers for breeding.

* Incidentally, one of my guinea pigs was very nearly called Aegilops. Somehow I figured that and Triticum sounded kinda fantasy-novel like, and nobody would ever know that it was actually really geeky. Then I decided to try to pretend to be a normal human being for once.  _______________________________________________________

Ling, H-Q., Zhao, S. et al (2013) Draft genome of the wheat A-genome progenitor Triticum urartu. Nature 496, 87–90 doi:10.1038/nature11997

Jia, J., Zhao, S., et al (2013) Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496,91–95 doi:10.1038/nature12028

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s