JustinWiseman-Approximate String Matching

JustinWiseman-Approximate String Matching - Approximate...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Click to edit Master subtitle style 8/1/11 Approximate String Matching A Guided Tour to Approximate String Matching Gonzalo Navarro Justin Wiseman 11
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8/1/11 Outline: Definition of approximate string matching (ASM) Applications of ASM Algorithms Conclusion 22
Background image of page 2
8/1/11 Approximate string matching Approximate string matching is the process of matching strings while allowing for errors. 33
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8/1/11 The edit distance Strings are compared based on how close they are This closeness is called the edit distance The edit distance is summed up based on the number of operations required to transform one string into another 44
Background image of page 4
8/1/11 Levenshtein / edit distance Named after Vladimir Levenshtein who created his Levenshtein distance algorithm in 1965 Accounts for three basic operations: Inserts , deletions, and replacements In the simplified version, all operations have a cost of 1 55
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8/1/11 Other distance algorithms Hamming distance: Allows only substitutions with a cost of one each Episode distance: Allows only insertions with a cost of one each Longest Common Subsequence distance: Allows only insertions and deletions costing one each 66
Background image of page 6
8/1/11 Outline: What is approximate string matching (ASM)? What are the applications of ASM? Algorithms Conclusion 77
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8/1/11 Applications Computational biology Signal processing Information retrieval 88
Background image of page 8
8/1/11 Computational biology DNA is composed of Adenine, Cytosine, Guanine, and Thymine (A,C,G,T) One can think of the set {A,C,G,T} as the alphabet for DNA sequences Used to find specific, or similar DNA sequences Knowing how different two sequences are can give 99
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8/1/11 Signal processing Used heavily in speech recognition software Error correction for receiving signals 10
Background image of page 10
8/1/11 Information Retrieval Spell checkers Search engines Web searches (Google) Personal files (agrep for unix) Searching texts with errors such as digitized books 11
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline: What is approximate string matching (ASM)? What are the applications of ASM?
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/30/2011 for the course COP 4810 taught by Professor Staff during the Spring '11 term at University of Central Florida.

Page1 / 39

JustinWiseman-Approximate String Matching - Approximate...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online