Compress the w mers from the query sequence 3 mers

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ards) BLAST Method for DNA s s 3. Compress the w-mers from the query sequence 3. -mers the same way. the 4. Search the compressed database for matches 4. with the compressed w-mers x Since all frames of the query sequence are considered Since separately, any match of length w>=11 must contain a >=11 match of length 8 that lies on a byte boundary of one of the w-mers from the query sequence. Thus can scan a -mers (packed) byte at a time, improving speed 4-fold over comparing one nucleotide at a time. comparing BLAST Method for DNA s Problem: if query sequence has a stretch of Problem: unusual base composition (e.g., A-T rich) or a repeated sequence element (e.g., Alu Alu sequence) there will be many hits with "uninteresting" regions. "uninteresting" BLAST Method for DNA s Solution: x x x x During compression of the database, tabulate During frequencies of all 8-tuples. frequencies Make a list of those occurring very frequently (more Make frequently than expected by chance). frequently Remove these words from the query list of w-mers Remove -mers before searching database. before Remove words matching a sublibrary of repeated Remove sequences (but...
View Full Document

This note was uploaded on 01/13/2012 for the course BIO 101 taught by Professor Staff during the Fall '10 term at DePaul.

Ask a homework question - tutors are online