This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: D R A F T Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition. Daniel Jurafsky & James H. Martin. Copyright c circlecopyrt 2006, All rights reserved. Draft of September 18, 2007. Do not cite without permission. 6 HIDDEN MARKOV AND MAXIMUM ENTROPY MODELS Numquam ponenda est pluralitas sine necessitat Plurality should never be proposed unless needed William of Occam Her sister was called Tatiana. For the first time with such a name the tender pages of a novel, well whimsically grace. Pushkin, Eugene Onegin , in the Nabokov translation Alexander Pushkins novel in verse, Eugene Onegin , serialized in the early 19th century, tells of the young dandy Onegin, his rejection of the love of young Tatiana, his duel with his friend Lenski, and his later regret for both mistakes. But the novel is mainly beloved for its style and structure rather than its plot. Among other interesting structural innovations, the novel is written in a form now known as the Onegin stanza , iambic tetrameter with an unusual rhyme scheme. These elements have caused compli- cations and controversy in its translation into other languages. Many of the translations have been in verse, but Nabokov famously translated it strictly literally into English prose. The issue of its translation, and the tension between literal and verse transla- tions have inspired much commentary (see for example Hofstadter (1997)). In 1913 A. A. Markov asked a less controversial question about Pushkins text: could we use frequency counts from the text to help compute the probability that the next letter in sequence would be a vowel. In this chapter we introduce two impor- tant classes of statistical models for processing text and speech, both descendants of Markovs models. One of them is the Hidden Markov Model ( HMM ). The other, is the Maximum Entropy model ( MaxEnt ), and particularly a Markov-related vari- ant of MaxEnt called the Maximum Entropy Markov Model ( MEMM ). All of these are machine learning models. We have already touched on some aspects of machine learning; indeed we briefly introduced the Hidden Markov Model in the previous chap- ter, and we have introduced the N-gram model in the chapter before. In this chapter we D R A F T 2 Chapter 6. Hidden Markov and Maximum Entropy Models give a more complete and formal introduction to these two important models. HMMs and MEMMs are both sequence classifiers . A sequence classifier or se- SEQUENCE CLASSIFIERS quence labeler is a model whose job is to assign some label or class to each unit in a sequence. The finite-state transducer we studied in Ch. 3 is a kind of non-probabilistic sequence classifier, for example transducing from sequences of words to sequences of morphemes. The HMM and MEMM extend this notion by being probabilistic sequence classifiers; given a sequence of units (words, letters, morphemes, sentences, whatever) their job is to compute a probability distribution over possible labels and choose the...
View Full Document
- Fall '11
- Natural Language Processing