Unformatted text preview: ly is determined by its mean and its variance. Before we can attempt to model an
unknown density with a density from this family, we must rst decide if the density is
Gaussian. Maximum likelihood model selection can then be used to estimate and from a
sample. The form of the Gaussian density makes nding the maximum likelihood parameters
very easy. The log likelihood of a sample of a Gaussian density is
X
log`a = logPX xa
2.36
xa 2a
X
= logg xa ,
2.37
xa 2a
2
X
= log p 1 , 1 xa , :
2.38
2 2
xa 2a
The most likely minimizes X xa , 2 ; xa2a a quadratic function of . Di erentiating and solving for zeroes we nd that the most likely
is
X
= 1
xa : Na xa2a
36 2.4. MODELING DENSITIES AITR 1548 0.8
Sample
True Gaussian
M.L. Fit 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 4 3 2 1 0 1 2 3 4 Figure 2.1: Three views of a Gaussian density with a mean of 0:0 and a variance 1:0: First
a sample of 100 points drawn from the density. Each point is is represented a vertical black
line. Second the density of the true Gaussian. Third the density of the Gaussian estimated
from the sample mean = 0:045, variance = 0:869. This is a very satisfying result. We have proven that the most likely estimate for the mean, ,
is the mean of the sample. A very similar argument can be used to prove that the maximum
likelihood estimate for the variance, , is the sample variance:
1 X x , 2 :
=N
a
a xa 2a Figure 2.1 displays a 100 point sample drawn from a Gaussian density. The true density
is shown together with the most likely model. Because the sample mean and sample variance
are not perfect measures of the true mean and variance, the most likely model is not perfect.
The accuracy of the estimated mean and variance gets better as the sample size increases.
Even for a sample of 100 points there is signi cant variability in the estimated model for
di erent samples. Figure 2.2 shows ten di erent estimates from ten di erent samples of the
same density.
37 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY
0.8
0
1
2
3
4
5
6
7
8
9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 4 3 2 1 0 1 2 3 4 Figure 2.2: The maximum likelihood density estimates for ten di erent samples of 100 points
drawn from the same Gaussian. 2.4.2 Other Parametric Densities
Finding the most likely Gaussian model for a sample is a very e cient operation. The mean
and the variance are trivially computable in linear time. E cient estimation is a property
shared by all of the exponential densities, a class of densities which include the Gaussian
density, the Gamma density, and others. For all other types of densities it is not possible
to nd maximum likelihood parameter estimates directly from statistics of the density. The
most likely set of parameters must be determined by a search process. Since there are an
in nite number of possible parameter values, nding values for these parameters that are
optimal is not straightforward. Generally problems of this sort are solved using an iterative
re nement process known as gradient descent. The gradient descent procedure is described
in Appendix A.1.
The Gaussian density has many advantages. Why not use it to model every sample? The
simple answer is that not all real densities are Gaussian. In fact, many real densities are far
from Gaussian. One of the strongest limitations of the Gaussian, and of all the exponential
densities, is that they are unimodal they have a single peak. Modeling densities that have
multiple peaks as if they had a single peak is foolhardy. Figure 2.3 shows an attempt to t
a two peaked function with a single Gaussian. In many situations it may seem as though
the simplicity and e ciency that arises from using a Gaussian density outweigh the added
accuracy that arises from using a more accurate model. As we will see, this is a temptation
38 2.4. MODELING DENSITIES AITR 1548 0.4
Sample
True Distribution
M.L. Fit 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 4 3 2 1 0 1 2 3 4 Figure 2.3: Three views of a density constructed from a combination of two Gaussians. The
Gaussians have variance 0:3 and means of 2:0 and ,2:0 respectively. As before the sample
contains 100 points. The maximum likelihood Gaussian has mean 0:213 and variance 3:824.
to which many have succumbed.
Once the decision to use a more complex model has been made, the set of possible model
densities is literally in nite. In terms of accuracy this is an unambiguous advantage. Density
can be modeled by any function that can be guaranteed to integrate to one. The most
common model after the simple Gaussian is a mixture of Gaussians: M x; W = N
X
i=1 ci gi x , i ; 2.39 here W represents the collection of parameters N; fig; fig; fcig. When P ci = 1, the
mixture model is guaranteed to integrate to one. A mixture density need not be unimodal;
it may have as many as N peaks. Figure 2.3 contains a graph of a mixture of Gaussian
density with two equal components. Given a large number of Gaussians, almost any density
can be modeled accurately. As before, maximum likelihood can be used to se...
View
Full Document
 Spring '10
 Cudeback
 The Land, Probability distribution, Probability theory, probability density function, Mutual Information, Paul A. Viola

Click to edit the document details