This preview shows page 1. Sign up to view the full content.
Unformatted text preview: CS440 Introduction to Artiﬁcial Intelligence The binomial distribution
If p is the probability of heads, the probability of getting
exactly k heads in n independent yes/no trials is given by
the binomial distribution Bin(n,p): Lecture 20:
Bayesian learning;
conjugate priors
nk
P (k heads) =
p (1 − p)n−k
k
n!
=
pk (1 − p)n−k
k !(n − k )! Julia Hockenmaier Expectation E(Bin(n,p)) = np
Variance var(Bin(n,p)) = np(1p) [email protected]
3324 Siebel Center
Ofﬁce Hours: Thu, 2:003:00pm
http://www.cs.uiuc.edu/class/sp11/cs440 CS598JHM: Advanced NLP
2 Binomial likelihood Parameter estimation What distribution does p (probability of heads) have,
given that the data D consists of #H heads and #T tails?
Likelihood L( ;D=(#Heads,#Tails)) for binomial distribution
0.007 θ  Maximum a posterior (MAP): 0.005 Use the ! which has the highest posterior probability P(! D).
θM AP = arg max P (θD) = arg max P (θ)P (Dθ) 0.004 θ 0.003 θ  Bayesian estimation: 0.002
0.001
0  Maximum likelihood estimation (MLE):
Use the ! which has the highest likelihood P(D !).
θM LE = arg max P (Dθ) L( ,(5,5))
L( ,(3,7))
L( ,(2,8)) 0.006 Given data D=HTTHTT, what is the probability ! of heads? Integrate over all ! = compute the expectation of ! given D:
1
P (x = H D) =
P (x = H θ)P (θD)dθ = E [θD]
0 0 0.2 0.4 0.6 0.8 1 CS598JHM: Advanced NLP CS598JHM: Advanced NLP 3 4 Maximum likelihood estimation
 Maximum likelihood estimation (MLE): Bayesian statistics
 Data D provides evidence for or against our beliefs.
We update our belief ! based on the evidence we see: ﬁnd ! which maximizes likelihood P(D  !). θ ∗ P (θD) = = arg max P (Dθ) Posterior θ = arg max θ (1 − θ)
H T θ = Prior Likelihood P (θ)P (Dθ)
P (θ)P (Dθ)dθ Marginal Likelihood (=P(D)) H
H +T CS598JHM: Advanced NLP CS598JHM: Advanced NLP 5 6 Bayesian estimation In search of a prior... Given a prior P(!) and a likelihood P(D!),
what is the posterior P(! D)?
How do we choose the prior P(!)?  The posterior is proportional to prior x likelihood:
P(! D)! P(!) P(D!)  The likelihood of a binomial is:
P(D!) = !H(1!)T  If prior P(!) is proportional to powers of ! and (1!),
posterior will also be proportional to powers of ! and (1!):
P(!) ! ! a(1!)b
" P(! D)! ! a(1!)b !H(1!)T = !a+H(1!)b+T We would like something of the form: P (θ) ∝ θa (1 − θ)b
But  this looks just like the binomial:
nk
P (k heads) =
p (1 − p)n−k
k
n!
=
pk (1 − p)n−k
k !(n − k )! …. except that k is an integer and ! is a real with 0<! <1. CS598JHM: Advanced NLP CS598JHM: Advanced NLP 7 8 The Gamma function
The Gamma function "(x) is the generalization of the
factorial x! (or rather (x1)!) to the reals: Γ(α) = ∞ xα−1 e−x dx for α > 0 The Gamma function
(x) function
25 20 0
15 For x >1, "(x) = (x1)"(x1).
10 For positive integers, "(x) = (x1)!
5 0 0 1 2 3 4 CS598JHM: Advanced NLP CS598JHM: Advanced NLP 9 10 The Beta distribution 5 Beta(",#) with " >1, # >1 A random variable X (0 < x < 1) has a Beta distribution with
(hyper)parameters " (" > 0) and # (# > 0) if X has a continuous
distribution with probability density function P (xα, β ) = Γ(α + β ) α−1
x
(1 − x)β −1
Γ(α)Γ(β ) The ﬁrst term is a normalization factor (to obtain a distribution) Unimodal
Beta Distribution Beta( , )
7 Beta(1.5,1.5)
Beta(3,1.5)
Beta(3,3)
Beta(20,20)
Beta(3,20) 6
5
4
3 0 1 xα−1 (1 − x)β −1 dx = Expectation: Γ(α + β )
Γ(α)Γ(β ) α
α+β 2
1
0
0 0.2 0.4 0.6
! CS598JHM: Advanced NLP CS598JHM: Advanced NLP 11 12 0.8 1 Beta(",#) with " <1, # <1 Beta(",#) with "=#
Symmetric. #=$=1: uniform Ushaped
Beta Distribution Beta( , )
6 Beta Distribution Beta( , )
3.5 Beta(0.1,0.1)
Beta(0.1,0.5)
Beta(0.5,0.5) 5 Beta(0.1,0.1)
Beta(1,1)
Beta(2,2) 3
2.5 4 2
3
1.5
2 1 1
0 0.5 0 0.2 0.4 0.6 0.8 1 0
0 0.2 0.4 0.6 ! 0.8 1 ! CS598JHM: Advanced NLP CS598JHM: Advanced NLP 13 14 Beta(",#) with "<1, # >1 Beta(",#) with " = 1, # >1
# = 1, 1< $ < 2: strictly concave.
# = 1, $ = 2: straight line
# = 1, $ > 2: strictly convex Strictly decreasing
Beta Distribution Beta( , )
8 Beta(0.1,1.5)
Beta(0.5,1.5)
Beta(0.5,2) 7 Beta Distribution Beta( , )
3.5 6 Beta(1,1.5)
Beta(1,2)
Beta(1,3) 3 5
2.5
4
2 3 1.5 2 1 1
0
0 0.2 0.4 0.6
! CS598JHM: Advanced NLP
15 0.8 1 0.5
0
0 0.2 0.4
0.6
CS598JHM: Advanced NLP
!
16 0.8 1 Beta as prior for binomial
Given a prior P(! #,$) = Beta(#,$), and data D=(H,T),
what is our posterior? P (θα, β , H, T ) ∝ P (H, T θ)P (θα, β ) So, what do we predict?
Our Bayesian estimate for the next coin ﬂip P(x=1  D):
P (x = H D) = 1 0 1 P (x = H θ)P (θD)dθ ∝ θH (1 − θ)T θα−1 (1 − θ)β −1 = = θH +α−1 (1 − θ)T +β −1 = E [Beta(H + α, T + β )]
H +α
=
H +α+T +β 0 θP (θD)dθ = E [θD] With normalization
P (θα, β , H, T ) = Γ(H + α + T + β ) H +α−1
θ
(1 − θ)T +β −1
Γ(H + α)Γ(T + β )
= Beta(α + H, β + T )
CS598JHM: Advanced NLP CS598JHM: Advanced NLP 17 18 Conjugate priors Multinomials: Dirichlet prior The beta distribution is a conjugate prior to the binomial:
the resulting posterior is also a beta distribution.
We can interpret its parameters #, $ as pseudocounts
P(H  D) = (H + #)/(H + # + T + $)
All members of the exponential family of distributions have
conjugate priors. Multinomial distribution:
Probability of observing each possible outcome ci exactly Xi
times in a sequence of n yes/no trials:
N
n!
x
x
P (X1 = xi , . . . , XK = xk ) =
θ1 1 · · · θKK if
xi = n
x1 ! · · · xK !
i=1
Dirichlet prior: Examples:  Multinomial: conjugate prior = Dirichlet
 Gaussian: conjugate prior = Gaussian Dir(θα1 , ...αk ) = Γ(α1 + ... + αk ) αk −1
θk
Γ(α1 )...Γ(αk )
k=1 CS598JHM: Advanced NLP CS598JHM: Advanced NLP 19 20 More about conjugate priors
 We can interpret the hyperparameters as “pseudocounts”
 Sequential estimation (updating counts after each
observation) gives same results as batch estimation  Addone smoothing (Laplace smoothing) = uniform prior
 On average, more data leads to a sharper posterior
(sharper = lower variance) CS598JHM: Advanced NLP
21 ...
View
Full
Document
This note was uploaded on 10/13/2011 for the course CS 440 taught by Professor Levinson,s during the Spring '08 term at University of Illinois, Urbana Champaign.
 Spring '08
 Levinson,S

Click to edit the document details