This preview shows pages 1–5. Sign up to view the full content.
SufFx Arrays
CMSC 423
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document SufFx Arrays
•
Even though SufFx Trees are O(n) space, the constant hidden by the
bigOh notation is somewhat “big”:
≈
20 bytes / character in good
implementations.
•
If you have a 10Gb genome, 20 bytes / character = 200Gb to store
your sufFx tree. “Linear” but large.
•
SufFx arrays are a more efFcient way to store the sufFxes that can do
most of what sufFx trees can do, but just a bit slower.
•
Slight space vs. time tradeoff.
Example SufFx Array
•
Idea: lexicographically sort
all the sufFxes.
•
Store the starting indices of
the sufFxes in an array.
s =
attcatg$
attcatg$
ttcatg$
tcatg$
catg$
atg$
tg$
g$
$
1
2
3
4
5
6
7
8
$
atg$
attcatg$
catg$
g$
tcatg$
tg$
ttcatg$
8
5
1
4
7
3
6
2
sufFx of s
index of sufFx
sort the sufFxes
alphabetically
the indices just
“come along for
the ride”
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document Example SufFx Array
•
Idea: lexicographically sort
all the sufFxes.
•
Store the starting indices of
the sufFxes in an array.
s =
attcatg$
attcatg$
ttcatg$
tcatg$
catg$
atg$
tg$
g$
$
1
2
3
4
5
6
7
8
8
5
1
4
7
3
6
2
sufFx of s
index of sufFx
sort the sufFxes
alphabetically
the indices just
“come along for
the ride”
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.
 Fall '07
 staff

Click to edit the document details