BLAST Method for DNA s s 3. Compress the w-mers from the query sequence 3. -mers the same way. the 4. Search the compressed database for matches 4. with the compressed w-mers x Since all frames of the query sequence are considered Since separately, any match of length w>=11 must contain a >=11 match of length 8 that lies on a byte boundary of one of the w-mers from the query sequence. Thus can scan a -mers (packed) byte at a time, improving speed 4-fold over comparing one nucleotide at a time. comparing BLAST Method for DNA s Problem: if query sequence has a stretch of Problem: unusual base composition (e.g., A-T rich) or a repeated sequence element (e.g., Alu Alu sequence) there will be many hits with "uninteresting" regions. "uninteresting" BLAST Method for DNA s Solution: x x x x During compression of the database, tabulate During frequencies of all 8-tuples. frequencies Make a list of those occurring very frequently (more Make frequently than expected by chance). frequently Remove these words from the query list of w-mers Remove -mers before searching database. before Remove words matching a sublibrary of repeated Remove sequences (but...
