{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# suffix - 1 Sux Trees and Sux Arrays 1.1 Basic Denitions and...

This preview shows pages 1–3. Sign up to view the full content.

1 Suffix Trees and Suffix Arrays Srinivas Aluru Iowa State University 1.1 Basic Definitions and Properties .................... 1-1 1.2 Linear Time Construction Algorithms ............. 1-4 Suffix Trees vs. Suffix Arrays Linear Time Construction of Suffix Trees Linear Time Construction of Suffix Arrays Space Issues 1.3 Applications ............................................ 1-11 Pattern Matching Longest Common Substrings Text Compression String Containment Suffix-Prefix Overlaps 1.4 Lowest Common Ancestors .......................... 1-17 1.5 Advanced Applications ............................... 1-18 Suffix Links from Lowest Common Ancestors Approximate Pattern Matching Maximal Palindromes 1.1 Basic Definitions and Properties Suffix trees and suffix arrays are versatile data structures fundamental to string processing applications. Let s 0 denote a string over the alphabet Σ. Let \$ / Σ be a unique termination character, and s = s 0 \$ be the string resulting from appending \$ to s 0 . We use the following notation: | s | denotes the size of s , s [ i ] denotes the i th character of s , and s [ i..j ] denotes the substring s [ i ] s [ i + 1] . . . s [ j ]. Let suff i = s [ i ] s [ i + 1] . . . s [ | s | ] be the suffix of s starting at i th position. The suffix tree of s , denoted ST ( s ) or simply ST , is a compacted trie of all suffixes of string s . Let | s | = n . It has the following properties: 1. The tree has n leaves, labelled 1 . . . n , one corresponding to each suffix of s . 2. Each internal node has at least 2 children. 3. Each edge in the tree is labelled with a substring of s . 4. The concatenation of edge labels from the root to the leaf labelled i is suff i . 5. The labels of the edges connecting a node with its children start with different characters. The paths from root to the suffixes labelled i and j coincide up to their longest common prefix, at which point they bifurcate. If a suffix of the string is a prefix of another longer suffix, the shorter suffix must end in an internal node instead of a leaf, as desired. It is to avoid this possibility that the unique termination character is added to the end of the string. Keeping this in mind, we use the notation ST ( s 0 ) to denote the suffix tree of the string obtained by appending \$ to s 0 . 0-8493-8597-0/01/ \$ 0.00+ \$ 1.50 c 2001 by CRC Press, LLC 1-1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
1-2 1 3 4 6 7 8 9 10 11 r v w y z \$ \$ i p s i i s i \$ p i \$ 5 x 2 p p i \$ p p i \$ i s s i p p i \$ m i s s i s s i i \$ p p p p i \$ s s i p p i \$ p p i \$ s s i p p i \$ s s i 12 u 12 11 5 8 2 1 10 9 7 4 6 3 0 1 4 1 0 0 1 0 2 1 3 SA Lcp FIGURE 1.1: Suffix tree, suffix array and Lcp array of the string mississippi . The suffix links in the tree are given by x z y u r , v r , and w r . As each internal node has at least 2 children, an n -leaf suffix tree has at most n - 1 internal nodes. Because of property (5), the maximum number of children per node is bounded by | Σ | + 1. Except for the edge labels, the size of the tree is O ( n ). In order to allow a linear space representation of the tree, each edge label is represented by a pair of integers denoting the starting and ending positions, respectively, of the substring describing the edge label. If the edge label corresponds to a repeat substring, the indices corresponding to any occurrence of the substring may be used. The suffix tree of the string
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 22

suffix - 1 Sux Trees and Sux Arrays 1.1 Basic Denitions and...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online