15_PatternMatching - CSC 30155 Wednesday 02/11/10 Dr....

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSC 30155 Wednesday 02/11/10 Dr. Daniel Hughes daniel.hughes@xjtlu.edu.cn The Plan for Today l The Pattern Matching Problem (20 mins) l Brute Force Pattern Matching (30 mins) l FSA-based Pattern Matching (45 mins) Note on Student Feedback l Few comments this week, you all seemed to enjoy the Longest Common Subsequence and Lossless Compression Material. l There was a request for more examples. I will try and provide more this week. Supporting Reading l Optional reading: l Cormen et al., Introduction to Algorithms , MIT Press, 2001, Chapter 32: String Matching (32.1) CSC 30155 The Pattern Matching Problem Dr. Daniel Hughes daniel.hughes@xjtlu.edu.cn String Matching l Finding all occurrences of a pattern in a text is a problem that arises frequently in various contexts: l Text Editing: searching for a particular word in a document that is being edited. l Language Analysis: counting the frequency and proximity of words to attribute authorship. l DNA Mapping: finding a pattern of nucleotides { a, t, g, c } (a gene) in an animals genetic code (a genome). l WWW searching: finding text occurrences in the union of all web- pages in the Internet. Why we need efficient algorithms: British National Corpus (BNC) alone is 100 million words. The Human Genome consists of 220 million base-pairs. WWW is over 100 billion pages. String Matching l Finding all occurrences of a pattern in a text is a problem that arises frequently in various contexts: l Text Editing: searching for a particular word in a document that is being edited. l Language Analysis: counting the frequency and proximity of words to attribute authorship. l DNA Mapping: finding a pattern of nucleotides { a, t, g, c } (a gene) in an animals genetic code (a genome). l WWW searching: finding text occurrences in the union of all web- pages in the Internet. l Why we need efficient algorithms: l British National Corpus (BNC) alone is 100 million words. l The Human Genome has 3 billion base pairs l The WWW is over 100 billion pages. Problem Definition: l Given an alphabet A , a text T (an array of n characters from A ) and a pattern, P (another array of m ( n) characters from A ), we say that P occurs with shift s in T (or P occurs beginning at position s + 1 in T ) if: 0 s n - m and T[s + j] = P[j] for 1 j m l A shift is valid if P occurs with shift s in T and invalid otherwise. The string-matching problem is thus the problem of finding all valid shifts for a given choice of P and T . A Short History of String Algorithms 19XX: Brute Force Algorithms 1970: Cook Theorizes Linear-Time Algorithm 1976: Knuth-Morris-Pratt Derive Linear Algorithm 1977: Boyer and Moore Derive an Alternaive. l 1970: Cook proved a theoretical result showing that a 2-way deterministic pushdown automation language can be recognized in linear-time on a random-access machine. This result implies the problem can be solved in O(n+m) time. No algorithm was explicitly provided. A Short History of String Algorithms...
View Full Document

This note was uploaded on 05/22/2011 for the course CSC 30155 taught by Professor Garyli during the Spring '11 term at University of Liverpool.

Page1 / 70

15_PatternMatching - CSC 30155 Wednesday 02/11/10 Dr....

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online