lect8-infotheory.ppt

# lect8-infotheory.ppt - C la s s if ic a t io n In f o r m a...

This preview shows pages 1–9. Sign up to view the full content.

Andrew McCallum, UMass Amherst Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Andrew McCallum, UMass Amherst Today’s Main Points • Automatically categorizing text – Parameter estimation and smoothing – a general recipe for a statistical CompLing model – Building a Spam Filter • Information Theory – What is information? How can you measure it? – Entropy, Cross Entropy, Information gain
Andrew McCallum, UMass Amherst Maximum Likelihood Parameter Estimation Example: Binomial • Toss a coin 100 times, observe r heads • Assume a binomial distribution – Order doesn’t matter, successive flips are independent – One parameter is q (probability of flipping a head) – Binomial gives p(r|n,q). We know r and n. – Find arg max q p(r|n, q)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Andrew McCallum, UMass Amherst Maximum Likelihood Parameter Estimation Example: Binomial • Toss a coin 100 times, observe r heads • Assume a binomial distribution – Order doesn’t matter, successive flips are independent – One parameter is q (probability of flipping a head) – Binomial gives p(r|n,q). We know r and n. – Find arg max q p(r|n, q) likelihood = p ( R = r | n , q ) = n r " # \$ % & q r (1 ( q ) n ( r log ( likelihood = L = log( p ( r | n , q )) ) log( q r ( q ) n ( r ) = r log( q ) + ( n ( r )log(1 ( q ) * L q = r q ( n ( r 1 ( q + r ( q ) = ( n ( r ) q + q = r n Our familiar ratio-of-counts is the maximum likelihood estimate! (Notes for board)
Andrew McCallum, UMass Amherst Binomial Parameter Estimation Examples Make 1000 coin flips, observe 300 Heads – P(Heads) = 300/1000 Make 3 coin flips, observe 2 Heads – P(Heads) = 2/3 ?? Make 1 coin flips, observe 1 Tail – P(Heads) = 0 ??? Make 0 coin flips – P(Heads) = ??? We have some “ prior ” belief about P(Heads) before we see any data. After seeing some data, we have a “ posterior ” belief.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Andrew McCallum, UMass Amherst Maximum A Posteriori Parameter Estimation • We’ve been finding the parameters that maximize – p(data|parameters), not the parameters that maximize – p(parameters|data) (parameters are random variables!) • p(q|n,r) = p(r|n,q) p(q|n) = p(r|n,q) p(q) p(r|n) constant • And let p(q) = 2 q(1-q)
Andrew McCallum, UMass Amherst Maximum A Posteriori Parameter Estimation Example: Binomial posterior = p ( r | n , q ) p ( q ) = n r " # \$ % & q r (1 ( q ) n ( r q ( q )) log ( posterior = L ) log( q r + 1 ( q ) n ( r + 1 ) = ( r + 1)log( q ) + ( n ( r + 1)log(1 ( q ) * L q = ( r + 1) q ( ( n ( r + 1 ( q + ( r + 1)(1 ( q ) = ( n ( r + q + q = r + 1 n + 2 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Andrew McCallum, UMass Amherst Bayesian Decision Theory • We can use such techniques for choosing among models: – Which among several models best explains the data?
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 02/22/2012 for the course CMPSCI 585 taught by Professor Staff during the Fall '08 term at UMass (Amherst).

### Page1 / 33

lect8-infotheory.ppt - C la s s if ic a t io n In f o r m a...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online