jurafsky&martin_3rdEd_17 (1).pdf

# The intuition of bayesian classification is to use

• 499
• 100% (1) 1 out of 1 people found this document helpful

This preview shows pages 63–65. Sign up to view the full content.

The intuition of Bayesian classification is to use Bayes’ rule to transform Eq. 5.1 into a set of other probabilities. Bayes’ rule is presented in Eq. 5.2 ; it gives us a way to break down any conditional probability P ( a | b ) into three other probabilities: P ( a | b ) = P ( b | a ) P ( a ) P ( b ) (5.2) We can then substitute Eq. 5.2 into Eq. 5.1 to get Eq. 5.3 : ˆ w = argmax w 2 V P ( x | w ) P ( w ) P ( x ) (5.3) We can conveniently simplify Eq. 5.3 by dropping the denominator P ( x ) . Why is that? Since we are choosing a potential correction word out of all words, we will be computing P ( x | w ) P ( w ) P ( x ) for each word. But P ( x ) doesn’t change for each word; we are always asking about the most likely word for the same observed error x , which must have the same probability P ( x ) . Thus, we can choose the word that maximizes this simpler formula: ˆ w = argmax w 2 V P ( x | w ) P ( w ) (5.4) To summarize, the noisy channel model says that we have some true underlying word w , and we have a noisy channel that modifies the word into some possible misspelled observed surface form. The likelihood or channel model of the noisy likelihood channel model channel producing any particular observation sequence x is modeled by P ( x | w ) . The prior probability of a hidden word is modeled by P ( w ) . We can compute the most prior probability probable word ˆ w given that we’ve seen some observed misspelling x by multiply- ing the prior P ( w ) and the likelihood P ( x | w ) and choosing the word for which this product is greatest. We apply the noisy channel approach to correcting non-word spelling errors by taking any word not in our spell dictionary, generating a list of candidate words , ranking them according to Eq. 5.4 , and picking the highest-ranked one. We can modify Eq. 5.4 to refer to this list of candidate words instead of the full vocabulary V as follows: ˆ w = argmax w 2 C channel model z }| { P ( x | w ) prior z }| { P ( w ) (5.5) The noisy channel algorithm is shown in Fig. 5.2 . To see the details of the computation of the likelihood and the prior (language model), let’s walk through an example, applying the algorithm to the example mis- spelling acress . The first stage of the algorithm proposes candidate corrections by

This preview has intentionally blurred sections. Sign up to view the full version.

64 C HAPTER 5 S PELLING C ORRECTION AND THE N OISY C HANNEL function N OISY C HANNEL S PELLING ( word x , dict D , lm, editprob ) returns correction if x / 2 D candidates, edits All strings at edit distance 1 from x that are 2 D , and their edit for each c , e in candidates, edits channel editprob(e) prior lm(x) score[c] = log channel + log prior return argmax c score [ c ] Figure 5.2 Noisy channel model for spelling correction for unknown words. finding words that have a similar spelling to the input word. Analysis of spelling error data has shown that the majority of spelling errors consist of a single-letter change and so we often make the simplifying assumption that these candidates have an edit distance of 1 from the error word. To find this list of candidates we’ll use the minimum edit distance algorithm introduced in Chapter 2, but extended so that
This is the end of the preview. Sign up to access the rest of the document.
• Fall '09

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern