Lecture12_SuffixTrees

Lecture12_SuffixTrees - String algorithms 6.046...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
String algorithms II Prof. Manolis Kellis 6.046 – Introduction to Algorithms – Spring ’05 Lecture 12 String algorithms • Last time: Exact string matching – Naïve algorithm – Fundamental pre-processing • Knuth-Morris-Pratt / Boyer-Moore / Z-algorithm – Semi-numerical string matching • Rabin-Karp algorithm • Today: String matching II – Suffix-trees – Linear time construction – Applications • Recitation: – More on Suffix Trees – Finite State Machines – Regular Expression Matching Where have we gotten so far? • Last time – Fundamental preprocessing in linear time – Searching for pattern p in linear time: O( Text ) • Today’s challenge: Can we do better? – Searching for any pattern p in linear time O( pattern ) – After pre-processing the text once T= P= baabacabab ad ab Length n m Text T=‘ Pattern P=‘Knuth’ More involved pre-processing step • Fundamental pre-processing only searched for: – Common prefix / suffix at any position – Redundancy with beginning/end of string • Suffix trees – Redundancy across all substrings • starting at every position • over the remainder of the list •E x am p l e : – Suffix tree of xabxac x a b a b x a c x a c c b x a c c c Suffix tree definition Definition: Suffix tree T for string S (of length n) – Rooted, directed tree T, n leaves, numbered 1. .n – Path to leaf i spells out the suffix S[i. .], by concatenating edge labels – Common prefixes share common paths, diverge to form internal nodes Î Effectively exhibit common prefixes of every suffix Î Explores full substring redundancy structure of S x a b a b x a c x a c c b x a c c 1 2 3 4 5 xabxac bxac xac ac c 1 3 4 5 6 c 6 abxac 2 c 1 3 5 6 2 4 Exact string matching with suffix trees • Given the suffix tree for text T • Search pattern P in O(pattern) time – For every character in P, traverse the appropriate path of the tree, reading one character each time – If P is not found in a path, P does not occur in T – If P is found in its entirety, then all occurrences of P in T are exactly the children of that node
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/20/2012 for the course CS 6.006 taught by Professor Erikdemaine during the Spring '08 term at MIT.

Page1 / 4

Lecture12_SuffixTrees - String algorithms 6.046...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online