jurafsky&martin_3rdEd_17 (1).pdf

Consider the job of predicting the next word in this

Info icon This preview shows pages 52–54. Sign up to view the full content.

Consider the job of predicting the next word in this sentence, assuming we are inter- polating a bigram and a unigram model. I can’t see without my reading . The word glasses seems much more likely to follow here than, say, the word Kong , so we’d like our unigram model to prefer glasses . But in fact it’s Kong that is more common, since Hong Kong is a very frequent word. A standard unigram model will assign Kong a higher probability than glasses . We would like to capture the intuition that although Kong is frequent, it is mainly only frequent in the phrase Hong Kong , that is, after the word Hong . The word glasses has a much wider distribution. In other words, instead of P ( w ) , which answers the question “How likely is w ?”, we’d like to create a unigram model that we might call P CONTINUATION , which answers the question “How likely is w to appear as a novel continuation?”. How can we estimate this probability of seeing the word w as a novel continuation, in a new unseen context? The Kneser-Ney intuition is to base our estimate of P CONTINUATION on the number of different contexts word w has appeared in , that is, the number of bigram types it completes. Every bigram type was a novel continuation the first time it was seen. We hypothesize that words that have appeared in more contexts in the past are more likely to appear in some new context as well. The number of times a word w appears as a novel continuation can be expressed as: P CONTINUATION ( w ) µ |{ v : C ( vw ) > 0 }| (4.29) To turn this count into a probability, we normalize by the total number of word bigram types. In summary: P CONTINUATION ( w ) = |{ v : C ( vw ) > 0 }| |{ ( u 0 , w 0 ) : C ( u 0 w 0 ) > 0 }| (4.30) An alternative metaphor for an equivalent formulation is to use the number of word types seen to precede w (Eq. 4.29 repeated): P CONTINUATION ( w ) µ |{ v : C ( vw ) > 0 }| (4.31) normalized by the number of words preceding all words, as follows: P CONTINUATION ( w ) = |{ v : C ( vw ) > 0 }| P w 0 |{ v : C ( vw 0 ) > 0 }| (4.32) A frequent word (Kong) occurring in only one context (Hong) will have a low continuation probability. The final equation for Interpolated Kneser-Ney smoothing for bigrams is then: Interpolated Kneser-Ney P KN ( w i | w i - 1 ) = max ( C ( w i - 1 w i ) - d , 0 ) C ( w i - 1 ) + l ( w i - 1 ) P CONTINUATION ( w i ) (4.33) The l is a normalizing constant that is used to distribute the probability mass we’ve discounted.: l ( w i - 1 ) = d P v C ( w i - 1 v ) |{ w : C ( w i - 1 w ) > 0 }| (4.34)
Image of page 52

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

4.6 T HE W EB AND S TUPID B ACKOFF 53 The first term d P v C ( w i - 1 v ) is the normalized discount. The second term |{ w : C ( w i - 1 w ) > 0 }| is the number of word types that can follow w i - 1 or, equivalently, the number of word types that we discounted; in other words, the number of times we applied the normalized discount. The general recursive formulation is as follows: P KN ( w i | w i - 1 i - n + 1 ) = max ( c KN ( w i i - n + 1 ) - d , 0 ) P v c KN ( w i - 1 i - n + 1 v ) + l ( w i - 1 i - n + 1 ) P KN ( w i | w i - 1 i - n + 2 ) (4.35) where the definition of the count c KN depends on whether we are counting the highest-order N-gram being interpolated (for example trigram if we are interpolat- ing trigram, bigram, and unigram) or one of the lower-order N-grams (bigram or
Image of page 53
Image of page 54
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern