Scan the database for hits with the 2 compiled list

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ence residue BLAST method for proteins 2. Scan the database for hits with the 2. compiled list of words. Two approaches: compiled x Use index of all possible words (for Use w=4, need =4, array of size 204=160,000. Can compress this array =160,000. index using pointers to save space. index x Use finite state machine (actually used) 3 Calculate a state transition table that tells what state Calculate to go to based on the next character in the sequence to 3a. Extend hits to form HSPs (high-scoring 3a. segment pairs) segment BLAST method for proteins 3b. BLAST2 or gapped BLAST uses an 3b. gapped approach similar to FASTA to combine hits combine before trying to extend them as in 3a. before 4. Compare the score for each HSP to a 4. threshold S to decide whether to keep it threshold 5. Proceed to estimating statistical 5. significance (see below) significance BLAST Method for DNA s 1. Make list of all contiguous w-mers in the 1. -mers query sequence (often w=12) s 2. Compress database by packing 4 2. nucleotides into a single byte (use auxiliary table to tell you where sequences start and stop within the compressed database) -doesn't allow for unspecified bases (wildcards) (wildc...
View Full Document

This note was uploaded on 01/13/2012 for the course BIO 101 taught by Professor Staff during the Fall '10 term at DePaul.

Ask a homework question - tutors are online