{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

ApproximativeStringMatching - A Guided Tour to Approximate...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
A Guided Tour to Approximate String Matching GONZALO NAVARRO University of Chile We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems. Categories and Subject Descriptors: F.2.2 [ Analysis of algorithms and problem complexity ]: Nonnumerical algorithms and problems— Pattern matching, Computations on discrete structures ; H.3.3 [ Information storage and retrieval ]: Information search and retrieval— Search process General Terms: Algorithms Additional Key Words and Phrases: Edit distance, Levenshtein distance, online string matching, text searching allowing errors 1. INTRODUCTION This work focuses on the problem of string matching that allows errors , also called approximate string matching . The general goal is to perform string matching of a pat- tern in a text where one or both of them have suffered some kind of (undesirable) corruption. Some examples are recovering the original signals after their transmis- sion over noisy channels, finding DNA sub- sequences after possible mutations, and text searching where there are typing or spelling errors. Partially supported by Fondecyt grant 1-990627. Author’s address: Department of Computer Science, University of Chile, Blanco Erncalada 2120, Santiago, Chile, e-mail: [email protected] Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept, ACM Inc., 1515 Broadway, New York, NY 10036 USA, fax +1 (212) 869-0481, or [email protected] c 2001 ACM 0360-0300/01/0300-0031 $5.00 The problem, in its most general form, is to find a text where a text given pat- tern occurs, allowing a limited number of “errors” in the matches. Each application uses a different error model, which defines how different two strings are. The idea for this “distance” between strings is to make it small when one of the strings is likely to be an erroneous variant of the other under the error model in use.
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern