There’s a new PhD student in our lab, and I’ve been trying to sort her out with a list of useful tools and websites. I figured that there are probably plenty of other newbies in the field who are fumbling their way around as I did when I started and could benefit from access to my bookmarks bar, so here goes:
When you first start dealing with sequence you may find it useful to know there are tools for converting all sequence types (e.g. FASTA) to RAW format, which is what you’ll need sequence in for lots of alignment softare to use (I genuinely did it by hand a few times before I found that). Also there are obviously tools for reverse transcribing sequence and translating nucleotides into protein sequence.
BLAST (Basic Local Alignment Search Tool) is the ‘search for a gene or a protein’ database that everybody uses across all fields. Use it to find out what an unknown transcript sequence codes for, or how similar it is to the same gene in other species. (Use BLAST-N for nucleotide sequences, BLAST-P for protein to protein, and X for searching for one with the other)
Cereals DB is the wheat-only search engine as maintained by the university of Bristol. Search for 454 genomic sequence (from the University of Liverpool) – e.g. if you’re building a genomic contig to match your cDNA sequence; ESTs – i.e. a summary of where your gene might be expressed or SNPs (although this doesn’t yet tell you which varieties the SNPs are between).
Once you have some sequence, until you get your hands on proper alignment software (this is assuming you’re using <50 sequences – we’re not talking Next Gen data here) like Chromas, DNAman or Sequencher (the latest version does Next Gen but I’m still on 4.10) Clustal W2 is a handy alignment tool. (You’ll need all sequences going in the same direction, so use the reverse transcribe tool above). It works for protein transcripts too.
To find out a bit more about the function of a protein (e.g. what domains it has) use InterPro Scan (which searches other databases like Panther)
For comparative genomics you should be aware of Gramene which allows you to search genomic data for a bunch of plants together. That links to the European Nucleotide Archive, which covers similar ground to the BLAST database and the PlantsDB databases, hosted at MIPS, for if you’re interesting in things other than wheat.
You can get access to the shotgun sequencing data from the IWGSC consortium for particular chromosomes, although you need to request access and get a password. This is useful for mapping your gene of interest.
Has anybody got others that I’ve missed?