How do I use What Rat is That?
If you have a tissue sample from a rat, you can obtain an identification of the species in two steps:- Use standard molecular laboratory techniques to obtain nucleotide sequence from the mtDNA control region (5'end), mtDNA cytochrome b (5'end) OR mtDNA cytochrome oxidase I (5'end).
- Submit the sequence to this site and select the appropriate reference sequence dataset for comparison. An advanced cluster search option gives you the opportunity to perform a bootstrap analysis, while the maximum likelihood will perform more rigorous statistical analyses in placing your query sequence on the tree. Both the advanced cluster and maximum likelihood options will send you the results by email.
Submitting a Sequence
To submit a sequence for analysis:- click on the Simple search link
- paste your sequence into the Data Entry window
- select the reference dataset and the genomic locus
- click on the Submit button
>mysampleor
ACCATAATAGTACAGCTGAAGGAATCTGTAGAAATTAAACCATAATAGTACAGCTGAAGGAATC
GTAGAAATTAAACCATAATAGTACAGCTGAAGGAATCTGTAGAAATTAAACCATAATAGTACAG
CTGAAGGAATCTGTAGAAATTAA
ACCATAATAGTACAGCTGAAGGAATCTGTAGAAATTAAACCATAATAGTACAGCTGAAGGAATCOnly one sequence may be submitted at a time.
GTAGAAATTAAACCATAATAGTACAGCTGAAGGAATCTGTAGAAATTAAACCATAATAGTACAG
CTGAAGGAATCTGTAGAAATTAA
If your sequence contains illegal characters, that is those not included in the IUPAC ambiguity codes, then it will be rejected with an error message. If your sequence does contain any of the ambiguity codes, then they will be used both in aligning the sequence and in calculating evolutionary distances.
Your sequence will be analysed automatically. Please wait about 15 seconds and then click the Retrieve Results button to view your results. It will take longer for results to become available if full alignment and/or bootstrap resampling are requested.
Reference Datasets
Domain | Cytochrome Oxidase I (v2) = COI_v2 | Cytochrome b (v2) = CytB_v2 | Control Region (v2) = DLoop_v2 | Cytochrome Oxidase I (v3) = COI_v3 | Cytochrome b (v3) = CytB_v3 | Control Region (v3) = DLoop_v3 |
---|---|---|---|---|---|---|
Rattus | Link | Link | Link | Link | Link | Link |
Pos 1-250 | N/A | N/A | N/A | N/A | N/A | Link |
Pos 1-200 | N/A | N/A | N/A | Link | N/A | N/A |
Pos 101-300 | N/A | N/A | N/A | Link | N/A | N/A |
Pos 201-400 | N/A | N/A | N/A | Link | N/A | N/A |
Pos 301-500 | N/A | N/A | N/A | Link | N/A | N/A |
Pos 401-600 | N/A | N/A | N/A | Link | N/A | N/A |
Pos 501-end | N/A | N/A | N/A | Link | N/A | N/A |
IUPAC Nucleotide Codes
Ambiguous | Symbol | Meaning | Origin of designation |
G | G | Guanine | |
A | A | Adenine | |
T | T | Thymine | |
C | C | Cytosine | |
U | U | Uracil | |
X | R | G or A | puRine |
X | Y | T or C | pYrimidine |
X | M | A or C | aMino |
X | K | G or T | Keto |
X | S | G or C | Strong interaction (3 H bonds) |
X | W | A or T | Weak interaction (2 H bonds) |
X | H | A or C or T | not-G, H follows G in the alphabet |
X | B | G or T or C | not-A, B follows A |
X | V | G or C or A | not-T (not-U), V follows U |
X | D | G or A or T | not-C, D follows C |
X | N | G or A or T or C | aNy |
Sequence Alignment
The sequence input by the user is aligned with the chosen reference set of sequences by a simple profile alignment (Gribskov et al. 1987, 1990, Gribskov and Veretnik 1996). Clustal X implements a more sophisticated method which allows the user to specify local gap costs and other parameter values (Thompson et al. 1997). To optimize system performance, the reference sequences have been prealigned.The parameters used in the alignment are displayed with the dataset information.
Calculation of Evolutionary Distances
The evolutionary distances among all of the aligned sequences, reference and submitted, are then calculated using the F84 model (Felsenstein 1984; Kishino and Hasegawa 1989). The parameter values used are those displayed with the dataset information.Building the Phylogenetic Tree
A phylogenetic tree is build to include the members of the reference set of sequences chosen by the user and the sequence that the user has submitted. The tree is built using theNeighbor-Joining (NJ)algorithm (Saitou and Nei 1987) and rooted using an outgroup appropriate for each data set.Advanced search and bootstrapping
The Advanced search window adds additional functions to the search process:Bootstrapping
To perform a bootstrap analysis:- click on the Advanced search link
- paste your sequence into the Data Entry window
- select the reference dataset and genomic locus
- select the number of bootstrap replicates you require
- optionally enter an email address to which the results will be sent
- click on the Submit button
Emailed response
You can choose to have the results sent to you by email. If you enter an optional email address, you can close your browser once the search has been submitted.Maximum Likelihood Analysis
The reference alignment, and the associated phylogenetic tree, are considered to be prior knowledge about the relationships among the reference organisms. Potentially the query sequence can be joined to that tree on any branch. We seek the connection point that has the highest statistical likelihood, thereby giving the maximum likelihood estimate of the relationship between the query and reference sequences. The maximum likelihood connection point is represented in the output by a dashed branch. For a particular connection point the determined likelihood score is the maximum likelihood estimate under the associated topology (that is, all the branch lengths are re-optimised for each connection point).The Shimodaira-Hasegawa (SH) test is used for assessing a confidence limit on the connection point with the highest expected likelihood. The expected likelihood of a connection point is the expectation of likelihood under the true process of evolution (as a random variable). The SH test calculates such a confindence limit by simulating replicate datasets under an approximation of the least configurable configuration (LFC) in which is that all connection points have equivalent expected likelihoods, and comparing the observed differences in likelihood with the expected distribution of likelihoods under the LFC.
The utilised implementation of the SH test simulates 1000 non-parametric bootstraps, and uses the RELL (Shimodaira and Hasegawa 1999) approximation. Branches that represent connection points within the confidence limit are colour red. A critical value of = 0.05 is used (95% confidence limit).
The Results
The results will be displayed first as a phylogenetic tree in which the differences between sequences are proportional to the lengths of the horizontal branches separating the tips. The names of the reference species are colour-coded to help you identify close relatives. To save a copy of the tree as a PNG-format file, right-click (PC) or control-click (Mac) on the image and choose Download Image to Disk, or similar, from the pop-up menu.If you have performed a bootstrap analysis, the resulting phylogenetic tree will display numbers at some of the nodes. These numbers are the percentage of bootstrap pseudoreplicates that contain the clade formed by the subtree starting at that node. This measure of bootstrap supportis displayed only when at least 50% of the pseudoreplicates contain the clade. The phylogenetic tree displayed is the estimated tree, and not the consensus of the bootstrap pseudoreplicate trees.
If you scroll further down past the tree, you will also find a table showing the evolutionary distances between the user-submitted sequence and each of the sequences in the reference set. Sites having IUPAC ambiguity codes are included in the calculation of evolutionary distances. To save the contents of the tableto disk, select all of the table, copy it, open a text file document on your computer (eg Notepad or SimpleText) and then paste it in.
If you scroll further down further again, there is a text version of the phylogenetic tree in Newick format. To save this to disk, select the contents of the text box in which it is displayed, open a text file document on your computer (eg Notepad or SimpleText) and then paste it in.
You can fine-tune your analysis by clicking on the Submit a sequence link to return to the Data Entry page, where you can choose a different reference set.