jurafsky&martin_3rdEd_17 (1).pdf

In this chapter we take up the somewhat less fraught

This preview shows pages 35–37. Sign up to view the full content.

In this chapter we take up the somewhat less fraught topic of predicting words. What word, for example, is likely to follow Please turn your homework ... Hopefully, most of you concluded that a very likely word is in , or possibly over , but probably not refrigerator or the . In the following sections we will formalize this intuition by introducing models that assign a probability to each possible next word. The same models will also serve to assign a probability to an entire sentence. Such a model, for example, could predict that the following sequence has a much higher probability of appearing in a text: all of a sudden I notice three guys standing on the sidewalk than does this same set of words in a different order: on guys all I of notice sidewalk three a sudden standing the Why would you want to predict upcoming words, or assign probabilities to sen- tences? Probabilities are essential in any task in which we have to identify words in noisy, ambiguous input, like speech recognition or handwriting recognition . In the movie Take the Money and Run , Woody Allen tries to rob a bank with a sloppily written hold-up note that the teller incorrectly reads as “I have a gub”. As Rus- sell and Norvig (2002) point out, a language processing system could avoid making this mistake by using the knowledge that the sequence “I have a gun” is far more probable than the non-word “I have a gub” or even “I have a gull”. In spelling correction , we need to find and correct spelling errors like Their are two midterms in this class , in which There was mistyped as Their . A sentence starting with the phrase There are will be much more probable than one starting with Their are , allowing a spellchecker to both detect and correct these errors. Assigning probabilities to sequences of words is also essential in machine trans- lation . Suppose we are translating a Chinese source sentence: ÷ À Õ Ü ; Å Ö π He to reporters introduced main content

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
36 C HAPTER 4 L ANGUAGE M ODELING WITH N- GRAMS As part of the process we might have built the following set of potential rough English translations: he introduced reporters to the main contents of the statement he briefed to reporters the main contents of the statement he briefed reporters on the main contents of the statement A probabilistic model of word sequences could suggest that briefed reporters on is a more probable English phrase than briefed to reporters (which has an awkward to after briefed ) or introduced reporters to (which uses a verb that is less fluent English in this context), allowing us to correctly select the boldfaced sentence above. Probabilities are also important for augmentative communication (Newell et al., 1998) systems. People like the physicist Stephen Hawking who are unable to physi- cally talk or sign can instead use simple movements to select words from a menu to be spoken by the system. Word prediction can be used to suggest likely words for the menu.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern