37 xa 2a 2 x log p 1 1 xa 238 2 2 xa 2a the most

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ly is determined by its mean and its variance. Before we can attempt to model an unknown density with a density from this family, we must rst decide if the density is Gaussian. Maximum likelihood model selection can then be used to estimate  and  from a sample. The form of the Gaussian density makes nding the maximum likelihood parameters very easy. The log likelihood of a sample of a Gaussian density is X log`a = logPX xa 2.36 xa 2a X = logg xa ,  2.37 xa 2a 2 X = log p 1  , 1 xa ,  : 2.38 2 2  xa 2a The most likely  minimizes X xa , 2 ; xa2a a quadratic function of . Di erentiating and solving for zeroes we nd that the most likely  is X = 1 xa : Na xa2a 36 2.4. MODELING DENSITIES AI-TR 1548 0.8 Sample True Gaussian M.L. Fit 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -4 -3 -2 -1 0 1 2 3 4 Figure 2.1: Three views of a Gaussian density with a mean of 0:0 and a variance 1:0: First a sample of 100 points drawn from the density. Each point is is represented a vertical black line. Second the density of the true Gaussian. Third the density of the Gaussian estimated from the sample mean = 0:045, variance = 0:869. This is a very satisfying result. We have proven that the most likely estimate for the mean, , is the mean of the sample. A very similar argument can be used to prove that the maximum likelihood estimate for the variance, , is the sample variance: 1 X x , 2 : =N a a xa 2a Figure 2.1 displays a 100 point sample drawn from a Gaussian density. The true density is shown together with the most likely model. Because the sample mean and sample variance are not perfect measures of the true mean and variance, the most likely model is not perfect. The accuracy of the estimated mean and variance gets better as the sample size increases. Even for a sample of 100 points there is signi cant variability in the estimated model for di erent samples. Figure 2.2 shows ten di erent estimates from ten di erent samples of the same density. 37 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY 0.8 0 1 2 3 4 5 6 7 8 9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -4 -3 -2 -1 0 1 2 3 4 Figure 2.2: The maximum likelihood density estimates for ten di erent samples of 100 points drawn from the same Gaussian. 2.4.2 Other Parametric Densities Finding the most likely Gaussian model for a sample is a very e cient operation. The mean and the variance are trivially computable in linear time. E cient estimation is a property shared by all of the exponential densities, a class of densities which include the Gaussian density, the Gamma density, and others. For all other types of densities it is not possible to nd maximum likelihood parameter estimates directly from statistics of the density. The most likely set of parameters must be determined by a search process. Since there are an in nite number of possible parameter values, nding values for these parameters that are optimal is not straightforward. Generally problems of this sort are solved using an iterative re nement process known as gradient descent. The gradient descent procedure is described in Appendix A.1. The Gaussian density has many advantages. Why not use it to model every sample? The simple answer is that not all real densities are Gaussian. In fact, many real densities are far from Gaussian. One of the strongest limitations of the Gaussian, and of all the exponential densities, is that they are unimodal they have a single peak. Modeling densities that have multiple peaks as if they had a single peak is foolhardy. Figure 2.3 shows an attempt to t a two peaked function with a single Gaussian. In many situations it may seem as though the simplicity and e ciency that arises from using a Gaussian density outweigh the added accuracy that arises from using a more accurate model. As we will see, this is a temptation 38 2.4. MODELING DENSITIES AI-TR 1548 0.4 Sample True Distribution M.L. Fit 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 Figure 2.3: Three views of a density constructed from a combination of two Gaussians. The Gaussians have variance 0:3 and means of 2:0 and ,2:0 respectively. As before the sample contains 100 points. The maximum likelihood Gaussian has mean 0:213 and variance 3:824. to which many have succumbed. Once the decision to use a more complex model has been made, the set of possible model densities is literally in nite. In terms of accuracy this is an unambiguous advantage. Density can be modeled by any function that can be guaranteed to integrate to one. The most common model after the simple Gaussian is a mixture of Gaussians: M x; W  = N X i=1 ci gi x , i ; 2.39 here W represents the collection of parameters N; fig; fig; fcig. When P ci = 1, the mixture model is guaranteed to integrate to one. A mixture density need not be uni-modal; it may have as many as N peaks. Figure 2.3 contains a graph of a mixture of Gaussian density with two equal components. Given a large number of Gaussians, almost any density can be modeled accurately. As before, maximum likelihood can be used to se...
View Full Document

This note was uploaded on 02/10/2010 for the course TBE 2300 taught by Professor Cudeback during the Spring '10 term at Webber.

Ask a homework question - tutors are online