SufFx Arrays
CMSC 423
View Full Document SufFx Arrays
•
Even though SufFx Trees are O(n) space, the constant hidden by the
bigOh notation is somewhat “big”:
≈
20 bytes / character in good
implementations.
•
If you have a 10Gb genome, 20 bytes / character = 200Gb to store
your sufFx tree. “Linear” but large.
•
SufFx arrays are a more efFcient way to store the sufFxes that can do
most of what sufFx trees can do, but just a bit slower.
•
Slight space vs. time tradeoff.
Example SufFx Array
•
Idea: lexicographically sort
all the sufFxes.
•
Store the starting indices of
the sufFxes in an array.
s =
attcatg$
attcatg$
ttcatg$
tcatg$
catg$
atg$
tg$
g$
$
1
2
3
4
5
6
7
8
$
atg$
attcatg$
catg$
g$
tcatg$
tg$
ttcatg$
8
5
1
4
7
3
6
2
sufFx of s
index of sufFx
sort the sufFxes
alphabetically
the indices just
“come along for
the ride”
