lect9-noisychannel.ppt

lect9-noisychannel.ppt - N o is y C h a n n e l, N - g r a...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Andrew McCallum, UMass Amherst Lecture #9 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Andrew McCallum, UMass Amherst Today’s Main Points • Application of the Noisy Channel Model • Markov Models – including Markov property definition • Smoothing – Laplace, (Lidstone’s, Held-out, Good-Turing.)
Background image of page 2
Andrew McCallum, UMass Amherst Noisy Channel Model • Optimize Encoder for throughput and accuracy. – compression: remove all redundancy – accuracy: adding controlled redundancy • Capacity: rate at which can transmit information with arbitrarily low probability of error in W’ Encoder Channel p(x|y) Decoder W Y X W Message from a finite alphabet Input to channel Output from channel Attempt to reconstruct message based on output
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Andrew McCallum, UMass Amherst Noisy Channel in NLP Channel p(y|x) Decoder Y X Y Input to channel Output from channel Attempt to reconstruct message based on output language model channel prob. p(L1) from each class class prior probability word sequence in document class label Document classification acoustic model prob of word sequences acoustic speech signal word sequences Speech Recognition probability of word given tag prob of POS sequences English word sequence POS tag sequences Part of Speech (POS) tagging model of OCR errors prob of language text text with OCR errors actual text Optical Character Recognition (OCR) translation model p(L1) in a language model L2 word sequences L1 word sequences Machine Translation p(x|y) p(y) Output Input Application
Background image of page 4
Andrew McCallum, UMass Amherst Probabilistic Language Modeling • Assigns probability p(t) to a word sequence t = w 1 w 2 w 3 w 4 w 5 w 6 • Chain rule and joint/conditional probabilities for text t: The chain rule leads to a history-based model: we predict following things from past things.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Andrew McCallum, UMass Amherst n-gram models the classic example of a statistical model of language • Each word is predicted according to a conditional distribution based on limited context • Conditional Probability Table (CPT): p(X|“both”) – p(of|both) = 0.066 – p(to|both) = 0.041 – p(in|both) = 0.038 • a.k.a. Markov (chain) models – sequences of random variables in which the future variable is determined by the present variable, but is independent of the way in which the present state arose from its predecessors
Background image of page 6
1 -gram model • Simplest linear graphical model • Words are random variables, arrows are direct dependencies between them (CPTs) • These simple engineering models have been amazingly successful. <s> In both ?? W
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 48

lect9-noisychannel.ppt - N o is y C h a n n e l, N - g r a...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online