There’s a new PhD student in our lab, and I’ve been trying to sort her out with a list of useful tools and websites. I figured that there are probably plenty of other newbies in the field who are fumbling their way around as I did when I started and could benefit from access to my bookmarks bar, so here goes:
When you first start dealing with sequence you may find it useful to know there are tools for converting all sequence types (e.g. FASTA) to RAW format, which is what you’ll need sequence in for lots of alignment softare to use (I genuinely did it by hand a few times before I found that). Also there are obviously tools for reverse transcribing sequence and translating nucleotides into protein sequence.
BLAST (Basic Local Alignment Search Tool) is the ‘search for a gene or a protein’ database that everybody uses across all fields. Use it to find out what an unknown transcript sequence codes for, or how similar it is to the same gene in other species. (Use BLAST-N for nucleotide sequences, BLAST-P for protein to protein, and X for searching for one with the other)
Cereals DB is the wheat-only search engine as maintained by the university of Bristol. Search for 454 genomic sequence (from the University of Liverpool) – e.g. if you’re building a genomic contig to match your cDNA sequence; ESTs – i.e. a summary of where your gene might be expressed or SNPs (although this doesn’t yet tell you which varieties the SNPs are between).
Once you have some sequence, until you get your hands on proper alignment software (this is assuming you’re using <50 sequences – we’re not talking Next Gen data here) like Chromas, DNAman or Sequencher (the latest version does Next Gen but I’m still on 4.10) Clustal W2 is a handy alignment tool. (You’ll need all sequences going in the same direction, so use the reverse transcribe tool above). It works for protein transcripts too.
To find out a bit more about the function of a protein (e.g. what domains it has) use InterPro Scan (which searches other databases like Panther)
For comparative genomics you should be aware of Gramene which allows you to search genomic data for a bunch of plants together. That links to the European Nucleotide Archive, which covers similar ground to the BLAST database and the PlantsDB databases, hosted at MIPS, for if you’re interesting in things other than wheat.
You can get access to the shotgun sequencing data from the IWGSC consortium for particular chromosomes, although you need to request access and get a password. This is useful for mapping your gene of interest.
Has anybody got others that I’ve missed?
Posted in Biology, Genetics, Grad school, Science
Tagged alignment, bioinformatics, BLAST, genetics, genomics, protein, SNPs, wheat
This has been a bad year for farmers: last year’s wet summer and then the cold winter that just won’t end have scuppered one harvest and probably knocked this year’s right down too. Even when conditions are more ideal than they have been this year, farmers and breeders fight an uphill battle trying to prevent a significant proportion of the crop being lost to various pathogens. When it comes to wheat that means rust: black rust, brown rust and yellow rust. Where it strikes, yield losses are likely to be around 20% in susceptible varieties, and the problem is getting much worse. Most resistance to black rust (Puccinia triticina) is caused by a single gene, which a new resistant kind of rust (Ug99) managed to overcome in much the same way as MRSA became resistant to methicillin in our hospitals.
Now scientists from Norwich, Cambridge and the USA are trying to find out how some kinds of a similar disease, yellow rust, (Puccinia striiformis or PST) are able to overcome the plant’s natural defences and infect. Continue reading
Posted in Biology, Genetics, Science
Tagged biology, comparative genomics, genetics, genomics, In the news, microbiology, next gen, pathology, qPCR, rust, science, wheat
Somehow between going to the Netherlands, the Easter break, a week-long lab course and a conference talk to write I managed to miss not just one, but two really interesting, exciting and useful papers in Nature. (Incidentally, I try not to write too much on here related to my PhD: I’m always a little scared that I’ll end up saying similar things about papers in my literature review and then being pulled up for plagiarism or something, but these are two interesting to miss.) But I digress.
Sequencing the wheat A and D genomes
Two weeks ago a consortium of Chinese and American scientists published two papers about sequencing both the A and the D genome progenitors for bread wheat. (Quick re-cap for the un-initiated. Wheat is a hexaploid i.e. instead of having one maternal and one paternal copy of each chromosome – that is, 2 in total, it has 3 pairs of each, making its genotype AABBDD). This is pretty big news for a couple of reasons: Continue reading
Posted in Biology, Genetics, Molecular Biology 101, Science
Tagged biology, breeding, comparative genomics, conservation, crops, expression, genes, genetic markers, genetics, genomics, In the news, journal, next gen, paper, science, sequencing, wheat
In case you hadn’t gathered from all of the posts about the Rothamsted wheat trials, I have a bit of a soft spot for wheat. Wheat isn’t just globally important (as one of the Big Three staples, providing around 20% of our calories and a decent amount of protein for a cereal), it’s also really interesting. It has three genomes, making it incredibly genetically diverse, and therefore able to grow in a really wide variety of climates. That also means that it can undergo some nifty genetic changes: if one copy of a gene starts to evolve in a potentially-cool-but-potentially-hazardous way, there’s usually another ‘back up’ copy, allowing more divergent evolution than in a diploid like rice.
Map of wheat production (average percentage of land used for its production times average yield in each grid cell) across the world compiled by the University of Minnesota Institute on the Environment with data from: Monfreda, C., N. Ramankutty, and J.A. Foley. 2008. Farming the planet: 2. Geographic distribution of crop areas, yields, physiological types, and net primary production in the year 2000. Global Biogeochemical Cycles 22: GB1022
Why are you in work so late?! asked my housemate’s boyfriend, when I explained via Facebook chat that that was where I was.
I got really into writing. And now I’ve been here so long that the qPCR I planned to run over night is done, so I may as well do the analysis! I replied
Dedication to the cause! Came the reply. But I have no idea what qPCR is…
I pride myself on talking about science far too much and boring everyone around me, especially housemates and their boyfriends. So how have I possibly avoided explaining what qPCR is?!
DNA exists in relatively small quantities compared to how much we need to do molecular biology. In order to work with it – or, for that matter, check whether it is there – we need to make more of it. A particularly awesome feature of DNA makes this possible. It is a double stranded molecule: if you could straighten out the helix that it works itself into it would look like a ladder, and the two sides are inverse copies of one another. (Once you have finished marvelling open-mouthed at my non-existent artistic skills you will, I’m sure, spot that pink always pairs with orange etc).
This means that if you split the molecule in two, it’s possible to rebuild the other side from the side that you have. This happens inside your cells all the time and is called semi-conservative replication.
The Polymerase Chain Reaction
We can simulate this in the lab through a reaction called PCR (Polymerase Chain Reaction). PCR can best be summed up by my favourite geeky advert of all time. (A geeky advert so immense that for my first 4 months as a PhD student I drank from a BioRad mug for no other reason).
In case that was all a bit much for you, here’s a quick recap: