CSC 411 / CSC D11
Estimation
7
Estimation
We now consider the problem of determining unknown parameters of the world based on mea
surements. The general problem is one of
inference
, which describes the probabilities of these
unknown parameters. Given a model, these probabilities can be derived using Bayes’ Rule. The
simplest use of these probabilities is to perform
estimation
, in which we attempt to come up with
single “best” estimates of the unknown parameters.
7.1
Learning a binomial distribution
For a simple example, we return to coinflipping. We flip a coin
N
times, with the result of the
i
th
flip denoted by a variable
c
i
: “
c
i
=
heads
” means that the
i
th flip came up heads. The probability
that the coin lands heads on any given trial is given by a parameter
θ
. We have no prior knowledge
as to the value of
θ
, and so our prior distribution on
θ
is uniform.
1
In other words, we describe
θ
as coming from a uniform distribution from 0 to 1, so
p
(
θ
) = 1
; we believe that all values of
θ
are
equally likely if we have not seen any data. We further assume that the individual coin flips are
independent, i.e.,
P
(
c
1:
N

θ
) =
p
i
p
(
c
i

θ
)
. (The notation “
c
1:
N
” indicates the set of observations
{
c
1
,...,
c
N
}
.) We can summarize this model as follows:
Model:
CoinFlipping
θ
∼ U
(0
,
1)
P
(
c
=
heads
)
=
θ
P
(
c
1:
N

θ
)
=
p
i
p
(
c
i

θ
)
(1)
Suppose we wish to learn about a coin by flipping it 1000 times and observing the results
c
1:1000
, where the coin landed heads
750
times? What is our belief about
θ
, given this data? We
now need to solve for
p
(
θ

c
1:1000
)
, i.e., our belief about
θ
after
seeing the 1000 coin flips. To do
this, we apply the basic rules of probability theory, beginning with the Product Rule:
P
(
c
1:1000
,θ
) =
P
(
c
1:1000

θ
)
p
(
θ
) =
p
(
θ

c
1:1000
)
P
(
c
1:1000
)
(2)
Solving for the desired quantity gives:
p
(
θ

c
1:1000
) =
P
(
c
1:1000

θ
)
p
(
θ
)
P
(
c
1:1000
)
(3)
The numerator may be written using
P
(
c
1:1000

θ
)
p
(
θ
) =
P
i
P
(
c
i

θ
) =
θ
750
(1
−
θ
)
1000

750
(4)
1
We would usually expect a coin to be fair, i.e., the prior distribution for
θ
is peaked near
0
.
5
.
Copyright c
c
2009 Aaron Hertzmann and David Fleet
35
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentCSC 411 / CSC D11
Estimation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
H=1,T=0
H=750,T=250
Figure 1: Posterior probability of
θ
from two different experiments: one with a single coin flip
(landing heads), and 1000 coin flips (750 of which land heads). Note that the latter distribution is
much more peaked.
(Note: the vertical scale is wrong on these plots, they should integrate to 1.)
The denominator may be solved for by the marginalization rule:
P
(
c
1:1000
) =
i
1
0
P
(
c
1:1000
,θ
)
dθ
=
i
1
0
θ
750
(1
−
θ
)
1000

750
dθ
=
Z
(5)
where
Z
is a constant (evaluating it requires more advanced math, but it is not necessary for our
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 DavidFleet
 Normal Distribution, Machine Learning, Maximum likelihood, Aaron Hertzmann, CSC D11, David Fleet

Click to edit the document details