In case you hadn’t gathered from all of the posts about the Rothamsted wheat trials, I have a bit of a soft spot for wheat. Wheat isn’t just globally important (as one of the Big Three staples, providing around 20% of our calories and a decent amount of protein for a cereal), it’s also really interesting. It has three genomes, making it incredibly genetically diverse, and therefore able to grow in a really wide variety of climates. That also means that it can undergo some nifty genetic changes: if one copy of a gene starts to evolve in a potentially-cool-but-potentially-hazardous way, there’s usually another ‘back up’ copy, allowing more divergent evolution than in a diploid like rice.
An unassembled wheat genome
Just over two years ago, a consortium made up of researchers from the Universities of Liverpool and Bristol and three institutes (NIAB, JIC and Rothamsted) released draft sequence coverage of the wheat genome. It’s kind of hard to explain how big a deal this is, but to give you some context, the Human Genome Project was over 10 years in the making, and came with a $3 billion price tag. The wheat genome is around five or six times bigger than the human genome, and because it contains multiple copies of very similar genes, much harder to assemble. (Think of it like having three jigsaws of watercolour landscapes in the same box. While having three times as many pieces is a big problem, having bits that could belong to either jigsaw is probably worse.) It also contains an awful lot of repeated sequence, which confuses the hell out of most computer algorithms.
The unassembled jigsaw
Anyway, the fact that they had managed to sequence more or less the entire genome was sort of a massive deal. All of the data is publicly available in a database called Cereals DB, that anybody can access and BLAST (basically search with a sequence to see if anything matches). This means that if you work on a gene in rice, you can check it against the wheat sequence to see how well it is conserved. Or if you have a cDNA clone, you can use the database to figure out how big the introns are.
But the bits of the puzzle still hadn’t been put together. A powerful algorithm could search through them, and lots of other algorithms could be used to construct most-likely contigs (i.e. contiguous overlapping sequences), but this would be a lot of work for individual researchers.
And now for the big news
As of 6pm yesterday, the consortium has released a new paper in Nature, which does some of the major analysis on this huge dataset. The main points are:
– The dataset contains around 94 000 – 96 000 genes: 28 000 in the A genome, 38 000 in the B genome and 36 000 in the D genome.
– There is a high degree of similarity between wheat and Brachypodium (an important model grass species with a small genome), but some regions are especially low in synteny.
– Gene copy loss is substantial (i.e. being a polyploid, wheat has lost plenty of genes), but actually significantly lower than maize and oilseed rape (although they are much older polyploids, so this could just be a time thing).
– Over 130 000 homoeologous SNPs (i.e. single base changes that identify whether something is the A, B or D genome) have been identified.
So what all of this means is basically that a huge genetic resource has become available for wheat researchers everywhere, which is also incredibly useful for scientists working on other cereals, other crops, and plants in general. The SNP data in particular is an absolute goldmine for anyone studying variation, and has huge implications for anyone working in the breeding industry.
Edit: Speaking of which, the BBSRC just tweeted a link to comments from Keith Norman (chairman of Crop Club) and Peter Jack at RAGT saying, slightly more coherently than I just did, that this is awesome news: http://ht.ly/fFSpk
Analysis of the bread wheat genome using whole-genome shotgun sequencing
Brenchley, R. Spannagl, M. Pfeifer, M, Barker, G.L.A et al (2012)
Nature 491, 705–710 doi:10.1038/nature11650