There’s a new PhD student in our lab, and I’ve been trying to sort her out with a list of useful tools and websites. I figured that there are probably plenty of other newbies in the field who are fumbling their way around as I did when I started and could benefit from access to my bookmarks bar, so here goes:
When you first start dealing with sequence you may find it useful to know there are tools for converting all sequence types (e.g. FASTA) to RAW format, which is what you’ll need sequence in for lots of alignment softare to use (I genuinely did it by hand a few times before I found that). Also there are obviously tools for reverse transcribing sequence and translating nucleotides into protein sequence.
BLAST (Basic Local Alignment Search Tool) is the ‘search for a gene or a protein’ database that everybody uses across all fields. Use it to find out what an unknown transcript sequence codes for, or how similar it is to the same gene in other species. (Use BLAST-N for nucleotide sequences, BLAST-P for protein to protein, and X for searching for one with the other)
Cereals DB is the wheat-only search engine as maintained by the university of Bristol. Search for 454 genomic sequence (from the University of Liverpool) – e.g. if you’re building a genomic contig to match your cDNA sequence; ESTs – i.e. a summary of where your gene might be expressed or SNPs (although this doesn’t yet tell you which varieties the SNPs are between).
Once you have some sequence, until you get your hands on proper alignment software (this is assuming you’re using <50 sequences – we’re not talking Next Gen data here) like Chromas, DNAman or Sequencher (the latest version does Next Gen but I’m still on 4.10) Clustal W2 is a handy alignment tool. (You’ll need all sequences going in the same direction, so use the reverse transcribe tool above). It works for protein transcripts too.
To find out a bit more about the function of a protein (e.g. what domains it has) use InterPro Scan (which searches other databases like Panther)
For comparative genomics you should be aware of Gramene which allows you to search genomic data for a bunch of plants together. That links to the European Nucleotide Archive, which covers similar ground to the BLAST database and the PlantsDB databases, hosted at MIPS, for if you’re interesting in things other than wheat.
You can get access to the shotgun sequencing data from the IWGSC consortium for particular chromosomes, although you need to request access and get a password. This is useful for mapping your gene of interest.
Has anybody got others that I’ve missed?
Posted in Biology, Genetics, Grad school, Science
Tagged alignment, bioinformatics, BLAST, genetics, genomics, protein, SNPs, wheat
This has been a bad year for farmers: last year’s wet summer and then the cold winter that just won’t end have scuppered one harvest and probably knocked this year’s right down too. Even when conditions are more ideal than they have been this year, farmers and breeders fight an uphill battle trying to prevent a significant proportion of the crop being lost to various pathogens. When it comes to wheat that means rust: black rust, brown rust and yellow rust. Where it strikes, yield losses are likely to be around 20% in susceptible varieties, and the problem is getting much worse. Most resistance to black rust (Puccinia triticina) is caused by a single gene, which a new resistant kind of rust (Ug99) managed to overcome in much the same way as MRSA became resistant to methicillin in our hospitals.
Now scientists from Norwich, Cambridge and the USA are trying to find out how some kinds of a similar disease, yellow rust, (Puccinia striiformis or PST) are able to overcome the plant’s natural defences and infect. Continue reading
Posted in Biology, Genetics, Science
Tagged biology, comparative genomics, genetics, genomics, In the news, microbiology, next gen, pathology, qPCR, rust, science, wheat
Somehow between going to the Netherlands, the Easter break, a week-long lab course and a conference talk to write I managed to miss not just one, but two really interesting, exciting and useful papers in Nature. (Incidentally, I try not to write too much on here related to my PhD: I’m always a little scared that I’ll end up saying similar things about papers in my literature review and then being pulled up for plagiarism or something, but these are two interesting to miss.) But I digress.
Sequencing the wheat A and D genomes
Two weeks ago a consortium of Chinese and American scientists published two papers about sequencing both the A and the D genome progenitors for bread wheat. (Quick re-cap for the un-initiated. Wheat is a hexaploid i.e. instead of having one maternal and one paternal copy of each chromosome – that is, 2 in total, it has 3 pairs of each, making its genotype AABBDD). This is pretty big news for a couple of reasons: Continue reading
Posted in Biology, Genetics, Molecular Biology 101, Science
Tagged biology, breeding, comparative genomics, conservation, crops, expression, genes, genetic markers, genetics, genomics, In the news, journal, next gen, paper, science, sequencing, wheat
Barley and Wheat are pretty similar. After all they’re sisters… No really. In a flash of food-security related brilliance I named my guinea pigs Wheat and Barley. Let it never be said that I don’t take my PhD seriously…
But seriously: wheat and barley are both cereals; important food crops, important feed crops, vital for producing my two favourite beverages (20% of the worldwide narley yield goes for malting), not to mention bread… Wheat tends to get a lot more coverage though: the world grows around a quarter of the amount of barley that it does of wheat and the amount spent on barley-related science is therefore always likely to be a bit lower. Continue reading
In case you hadn’t gathered from all of the posts about the Rothamsted wheat trials, I have a bit of a soft spot for wheat. Wheat isn’t just globally important (as one of the Big Three staples, providing around 20% of our calories and a decent amount of protein for a cereal), it’s also really interesting. It has three genomes, making it incredibly genetically diverse, and therefore able to grow in a really wide variety of climates. That also means that it can undergo some nifty genetic changes: if one copy of a gene starts to evolve in a potentially-cool-but-potentially-hazardous way, there’s usually another ‘back up’ copy, allowing more divergent evolution than in a diploid like rice.
Map of wheat production (average percentage of land used for its production times average yield in each grid cell) across the world compiled by the University of Minnesota Institute on the Environment with data from: Monfreda, C., N. Ramankutty, and J.A. Foley. 2008. Farming the planet: 2. Geographic distribution of crop areas, yields, physiological types, and net primary production in the year 2000. Global Biogeochemical Cycles 22: GB1022
On June 26th 2000 the consortium of researchers working on the Human Genome Project announced they had created their first ‘working draft’ of the human genome sequence. It had taken a decade to achieve, with groups in at least 5 countries collaborating, cost $3 billion and provided 7-fold coverage (i.e. on average, for any single point on the genome they had seven sequences).
Twelve years have gone by, and you might think that we would be busy sequencing other things. (Well, we sort of are: bonobos, bananas…) But actually we’re still looking at human DNA. We might know what humans look like at a molecular level ‘on average’: but on average doesn’t tell you why some people are more prone to cancer, or why some people are more likely to get Alzheimers, or why some people can taste PROP. It doesn’t tell you why some people can live to 100 either, in spite of having genes that make them prone to clotting disorders or dementia or heart disease.
In 2006, non-profit organisation X-prize announced that they would give a $10 million prize to the company that could sequence (and assemble) the entire genomes of 100 centenarians at a cost of less than $1000 per genome in 30 days. With an unprecedented level of accuracy. Given the HGP cost around $3 billion to complete, that’s quite an ask!
The general idea is that by comparing these genomes, particular area of interest might ‘pop out’ as being associated with longevity. 100 people is actually a pretty small sample size, but it’s a starting point and may identify several candidate genes that can then be studied by other (cheaper!) methods.
The first entrant
For the first time, technology has progressed to a place where achieving the aims of the X prize really is a viable possibility. The Ion Proton sequencer (which, with a bit of luck, we’ll be buying soon-ish – squee!) might just be up to the job, and as of this week its inventors are the first company to actually sign up to the challenge. There’s still almost a year left for other companies to get on board, but Illumina (makers of the GA II and the new-to-the-market HiSeq) have already come out and said that they don’t plan to participate. Oxford Nanopore and Complete Genomics are still not making any decisions about whether or not to enter: presumably because they’re waiting to see what state their technology is in 10 months from now.
As per the good old $1000 dollar genome, $10 000 analysis script, most of the time isn’t actually going to be spent sequencing the genomes. The vast majority of the 30 days will be spent in assembling the sequencing fragments into longer strings that match the known order of genes in the human chromosomes: the biggest time-constraint for the competitors is getting the sequencing done as fast as possible to leave the four whole weeks for genome assembly. We’ve been hearing for a while now that the $1000 dollar genome is right around the corner, so it’s exciting to see Ion Torrent putting their money where their mouth is. Come next October we can see whether they’re right.
Here are a few odds and ends that have caught my eye over the last day or two to finish the week with.
The 523Mb draft genome for banana, Musa acuminata, has been released (more on that later) in a paper in Nature by D’Hont et al and with it this awesome Venn Diagram comparing the genes that are homologous between banana and some other important plants. (Arabidopsis is the model species for all plants – the equivalent of a lab mouse; Brachypodium is the model for grasses; Oryza is rice).
Posted in Biology, Genetics, Just me, Science
Tagged crops, genetics, genomics, other blogs, science, science communication, science journalism, scientists, weekly round up, women in science