StringAlg - STRING ALGORITHMS (Cormen, Leiserson, Riveset,...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
STRING ALGORITHMS (Cormen, Leiserson, Riveset, and Stein, 2001, ISBN: 0-07-013151-1 (McGraw Hill), Chapter 32, p906) String processing problem Input: Two strings T and P. Problem: Find if P is a substring of T. Example (1): Input: T = gtgatcagatcact, P = tca Output: Yes. gtga tca ga tca ct, shift=4, 9 Example (2): Input: T = 189342670893, P = 1673 Output: No. Naïve Algorithm (T, P) suppose n = length(T), m = length(P); for shift s=0 through n-m do if (P[1. .m] = = T[s+1 . . s+m]) then // actually a for-loop runs here print shift s; End algorithm. Complexity: O((n-m+1)m), or O(max{ nm, m 2 } ) A special note: we allow O(k+1) type notation in order to avoid O(0) term, rather, we want to have O(1) (constant time) in such a boundary situation. Note: Too many repetition of matching of characters. Rabin-Karp scheme Consider a character as a number in a radix system, e.g., English alphabet as in radix-26. Pick up each m-length "number" starting from shift=0 through (n-m). So, T = gtgatcagatcact, in radix-4 (a/0, t/1, g/2, c/3) becomes gtg = '212' in base-4 = 32+4+2 in decimal, tga = '120' in base-4 = 16+8+0 in decimal, …. Then do the comparison with P - number-wise.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Advantage: Calculating strings can reuse old results. Consider decimals: 4359 and 3592 3592 = (4359 - 4*1000)*10 + 2 General formula: t s+1 = d (t s - d m-1 T[s+1]) + T[s+m+1], in radix-d, where t s is the corresponding number for the substring T[s. .(s+m)]. Note, m is the size of P. The first-pass scheme: (1) preprocess for (n-m) numbers on T and 1 for P, (2) compare the number for P with those computed on T. Problem: in case each number is too large for comparison Solution: Hash , use modular arithmetic, with respect to a prime q. New recurrence formula:
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/10/2012 for the course CSE 5211 taught by Professor Dmitra during the Spring '12 term at FIT.

Page1 / 10

StringAlg - STRING ALGORITHMS (Cormen, Leiserson, Riveset,...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online