Lecture12_SuffixTrees

# Lecture12_SuffixTrees - String algorithms 6.046...

This preview shows pages 1–2. Sign up to view the full content.

String algorithms II Prof. Manolis Kellis 6.046 – Introduction to Algorithms – Spring ’05 Lecture 12 String algorithms • Last time: Exact string matching – Naïve algorithm – Fundamental pre-processing • Knuth-Morris-Pratt / Boyer-Moore / Z-algorithm – Semi-numerical string matching • Rabin-Karp algorithm • Today: String matching II – Suffix-trees – Linear time construction – Applications • Recitation: – More on Suffix Trees – Finite State Machines – Regular Expression Matching Where have we gotten so far? • Last time – Fundamental preprocessing in linear time – Searching for pattern p in linear time: O( Text ) • Today’s challenge: Can we do better? – Searching for any pattern p in linear time O( pattern ) – After pre-processing the text once T= P= baabacabab ad ab Length n m Text T=‘ Pattern P=‘Knuth’ More involved pre-processing step • Fundamental pre-processing only searched for: – Common prefix / suffix at any position – Redundancy with beginning/end of string • Suffix trees – Redundancy across all substrings • starting at every position • over the remainder of the list •E x am p l e : – Suffix tree of xabxac x a b a b x a c x a c c b x a c c c Suffix tree definition Definition: Suffix tree T for string S (of length n) – Rooted, directed tree T, n leaves, numbered 1. .n – Path to leaf i spells out the suffix S[i. .], by concatenating edge labels – Common prefixes share common paths, diverge to form internal nodes Î Effectively exhibit common prefixes of every suffix Î Explores full substring redundancy structure of S x a b a b x a c x a c c b x a c c 1 2 3 4 5 xabxac bxac xac ac c 1 3 4 5 6 c 6 abxac 2 c 1 3 5 6 2 4 Exact string matching with suffix trees • Given the suffix tree for text T • Search pattern P in O(pattern) time – For every character in P, traverse the appropriate path of the tree, reading one character each time – If P is not found in a path, P does not occur in T – If P is found in its entirety, then all occurrences of P in T are exactly the children of that node

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 01/20/2012 for the course CS 6.006 taught by Professor Erikdemaine during the Spring '08 term at MIT.

### Page1 / 4

Lecture12_SuffixTrees - String algorithms 6.046...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online