Unformatted text preview: Search and Decoding in Speech Recognition Regular Expressions and Automata 24 August 2009 Veton Kpuska 2 Outline Introduction Regular Expressions Basic Regular Expression Patterns Disjunction, Grouping and Precedence Examples Advanced Operators Regular Expression Substitution, Memory and ELIZA FiniteState Automata Using an FSA to Recognize Sheeptalk Formal Languages Example NonDeterministic FSAs Using an NFSA to Accept Strings Recognition as Search Relating Deterministic and NonDeterministic Automata Regular Languages and FSAs Summary Introduction Regular Expression (RE) is a language for specifying text search strings. First developed by Kleene (1956) Requires a: Pattern specification formula using a special language that specifies simple classes of strings. Corpus a body of text to search through. 24 August 2009 Veton Kpuska 3 24 August 2009 Veton Kpuska 4 Introduction Imagine that you have become a passionate fan of woodchucks . Desiring more information on this celebrated woodland creature, you turn to your favorite Web browser and type in woodchuck . Your browser returns a few sites. You have a flash of inspiration and type in woodchucks . Instead of having to do this search twice, you would have rather typed one search command specifying something like woodchuck with an optional final s . Or perhaps you might want to search for all the prices in some document; you might want to see all strings that look like $ 199 or $ 25 or $ 24.99 . In this chapter we introduce the regular expression , the standard notation for characterizing text sequences. The regular expression is used for specifying: text strings in situations like this Web search example, and in other information retrieval applications, but also plays an important role in wordprocessing, computation of frequencies from corpora, and other such tasks. 24 August 2009 Veton Kpuska 5 Introduction Regular Expressions can be implemented via finitestate automaton . Finitestate automaton is one of the most significant tools of computational linguistics. Its variations: Finitestate transducers Hidden Markov Models, and Ngram grammars Important components of the Speech Recognition and Synthesis, spellchecking, and information extraction applications that will be introduced in latter chapters. Regular Expressions and Automata Regular Expressions 24 August 2009 Veton Kpuska 7 Regular Expressions Formally, a regular expression is an algebraic notation for characterizing a set of strings. Thus they can be used to specify search strings as well as to define a language in a formal way....
This note was uploaded on 02/11/2012 for the course ECE 5527 taught by Professor Staff during the Fall '11 term at FIT.
 Fall '11
 Staff

