Natural Language Processing 1 Statistical Natural Language Processing The explosion of on-line text allows us to gather statistical data about language use. The most common applications include part-of-speech tagging and parsing. This usually requires a pre-tagged or pre-parsed training corpus! Statistical data can be gathered for explicit knowledge (e.g., common expressions), search control knowledge (e.g., to guide a chart parser), or preferences (e.g., the likelihoods of alternatives). Statistical methods can also be used to gather domain-specific knowledge to reveal strong preferences for a domain. Ex: in transportation texts the word “train” is usually a noun, but in educational texts “train” is usually a verb. Natural Language Processing 2 The Basics of Probability Theory Intuitively, the probability of an event is the likelihood that it will occur. Probabilities are usually described in terms of a random variable that can range over a set of values. Ex: You want to measure the likelihood that a tossed coin would land tail-side up. We would use a random variable TOSS and write the probability as P(TOSS=h). There are two possibilities, P(TOSS=h) = 1/2 = 0.5 A conditional probability is the likelihood of an event X given that a condition Y is true, expressed as P(X | Y). Natural Language Processing 3 Conditional Probabilities and Bayes’ Rule Conditional probability is defined as: P(X | Y) = P(X & Y) / P(Y) Example: if P(Elvis seen & UFO seen) = .01 and P(UFO seen) = .02 and P(Elvis seen) = .03 then P(Elvis seen | UFO seen) = .01/.02 = .5

