This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ards) BLAST Method for DNA
s 3. Compress the w-mers from the query sequence
the same way.
4. Search the compressed database for matches
with the compressed w-mers
x Since all frames of the query sequence are considered
separately, any match of length w>=11 must contain a
match of length 8 that lies on a byte boundary of one of
the w-mers from the query sequence. Thus can scan a
(packed) byte at a time, improving speed 4-fold over
comparing one nucleotide at a time.
comparing BLAST Method for DNA
s Problem: if query sequence has a stretch of
unusual base composition (e.g., A-T rich)
or a repeated sequence element (e.g., Alu
sequence) there will be many hits with
"uninteresting" BLAST Method for DNA
x x x x During compression of the database, tabulate
frequencies of all 8-tuples.
Make a list of those occurring very frequently (more
frequently than expected by chance).
Remove these words from the query list of w-mers
before searching database.
Remove words matching a sublibrary of repeated
View Full Document