423f11-bwt

423f11-bwt - Burrows-Wheeler Transform CMSC 423 Motivation...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Burrows-Wheeler Transform CMSC 423 Motivation - Short Read Mapping A Cow Genome Sequencing technologies produce millions of “reads” = a random, short substring of the genome If we already know the genome of one cow, we can get reads from a 2nd cow and map them onto the known cow genome. Need to do millions of string searches in a long string. Bowtie Bioinformatics (2009) 25(14):1754-1760. BWA Bowtie Performance Langmead et al. (2008) Maq & SOAP build hash table of locations of k-mers Burrows-Wheeler Transform Text transform that is useful for compression & search. banana$ anana$b nana$ba ana$ban na$bana a$banan $banana banana $banan a a$bana n ana$ba n anana$ b banana $ nana$b a na$ban a BWT(banana) = annb$aa Tends to put runs of the same character together. Makes compression work well. “bzip” is based on this. sort Another Example appellee$ ppellee$a pellee$ap ellee$app llee$appe lee$appel ee$appell e$appelle $appellee appellee$ $appelle e appellee $ e$appell e ee$appel l ellee$ap p lee$appe l llee$app e pellee$a p ppellee$ a BWT(appellee$) = e$elplepa Doesn’t always improve the compressibility... sort Recovering the string e $ e l p l e p a $ a e e e l l p p $appelle e appellee $ e$appell e ee$appel l ellee$ap p lee$appe l llee$app e pellee$a p ppellee$ a BWT sort BWT $a ap e$ ee el le ll pe pp sort these 2 columns → f r s t c o l u m n → f r s t 2 c o l u m n s e $ e l p l e p a $a ap e$ ee el le ll pe pp prepend BWT column $ap app e$a ee$ ell lee lle pel ppe Sort these 3 columns → f r s t 3 c o l u m n s Inverse BWT def inverseBWT(s): B = [s 1 ,s 2 ,s 3 ,...,s n ] for i = 1..n: sort B prepend s i to B[i] return row of B that ends with $ Another BWT Example dogwood$ ogwood$d gwood$do wood$dog ood$dogw od$dogwo d$dogwoo $dogwood $dogwoo d d$dogwo o dogwood $ gwood$d o od$dogw o ogwood$ d ood$dog w wood$do g sort last column BWT(dogwood$) = do$oodwg Another BWT Example do$oodwg d o $ o o d w g $ d d g o o o w $d d$ do gw od og oo wo $d d$ do gw od og oo wo d o $ o o d w g $do d$d dog gwo od$ ogw ood woo d o $ o o d w g $do d$d dog gwo od$ ogw ood woo $dog d$do dogw gwoo od$d ogwo ood$ wood $dog d$do dogw gwoo od$d ogwo ood$ wood d o $ o o d w g $dogw d$dog dogwo gwood od$do ogwoo ood$d wood$ $dogw d$dog dogwo gwood od$do ogwoo ood$d wood$ d o $ o o d w g $dogwo d$dogw dogwoo gwood$ od$dog ogwood ood$do wood$d d o $ o o d w g $dogwo d$dogw dogwoo gwood$ od$dog ogwood ood$do wood$d $dogwoo d$dogwo dogwood gwood$d od$dogw ogwood$ ood$dog wood$do $dogwoo d$dogwo dogwood gwood$d od$dogw ogwood$ ood$dog wood$do d o $ o o d w g $dogwood d$dogwoo dogwood$ gwood$do od$dogwo ogwood$d ood$dogw wood$dog P r e p e n d S o r t P r e p e n d S o r t P r...
View Full Document

This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.

Page1 / 26

423f11-bwt - Burrows-Wheeler Transform CMSC 423 Motivation...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online