As a result the empirical entropy should tend toward

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: herefore the minimum empirical entropy should be obtained when the variance is zero. This di culty only arises when the sample from which the density is estimated is the same as the sample with which empirical entropy is calculated. If these two samples are di erent, and the density is not degenerate in some way, then no point should appear in both samples. As the variance of the smoothing functions tends to zero the density at all points that are not in the Parzen sample tends to zero. As a result the empirical entropy should tend toward positive in nity as the variance 47 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY 5.5 Log Likelihood 5 4.5 4 3.5 3 2.5 2 1.5 1 0.001 0.01 0.1 1 10 100 Figure 2.9: A log plot of negative log likelihood versus . Near the minimum, the log likelihood is not terribly sensitive to the . Values within a factor of 10 are all roughly equivalent. tends to zero. This e ectively precludes the solution where the variance of the smoothing functions is zero. We can simulate having two di erent samples by a process called cross-validation. Crossvalidation splits a single sample a into two samples. One sample has a single point fxg and the second contains the remaining points a , fxg. There are Na di erent ways to split the sample in two parts. Rather than draw two di erent samples, we use the Na di erent split samples. In each case the larger sample, of size Na , 1, is used for the Parzen estimate, and the smaller sample is used to estimate the entropy. Estimating log likelihood or empirical entropy with two samples a and b yields log`b = ,NbhbX  = ,NbEb log P X; a ; versus ,NaEa logP xa; a , fxag ; 2.53 2.54 using cross validation. The cross validated empirical entropy is an unbiased estimate of the two sample empirical entropy. 48 2.4. MODELING DENSITIES AI-TR 1548 The Quality of the Parzen Estimate One way to evaluate the quality of the Parzen estimate is to evaluate the standard deviation of its estimate. Another, perhaps more useful statistic is the standard deviation normalized by the mean X  : 2.55 EX The normalized standard deviation measures expected deviation from the mean as a function of the overall scale. For many types of problems, when the mean of a variable is large, small deviations about the mean are usually unimportant. But when the mean is very small, a small deviation can make a big di erence. Normalized standard deviation is a good measure to use when the log of a variable is important like log likelihood and entropy. Using the constant and linear terms of the Taylor expansion of logarithm and assuming that the standard deviation of X is small, logX   EX  : X 2.56 The standard deviation of the Parzen density estimate at a point x is a function of the total number of sample points used to estimate the density. The normalized standard deviation of a Parzen estimate is P x; a = P x; a ; 2.57 E P x; a pX  where both the standard deviation and the expectation are taken over the space of possible samples. The two equations are equal whenever P x; a is an unbiased estimator for pX . The standard deviation of the Parzen estimate can be computed exactly when the smoothing functions are box car functions. The Parzen estimate is then the number of sample points that fall into the box car window divided by the total number of sample points, 1 X Rx , x  = kNin : P x; a N 2.58 a N a xa2a a where Nin is the number of points for which Rx , xa is non-zero, and k = R0 is chosen 49 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY so that P  integrates to one. The standard deviation of the Parzen estimate is, k P x; a = N Nin : a Assuming each point in the sample is independent then v !  u k uN P x; a 1 , P x; a : x; a = ta P N k a The normalized standard deviation of the Parzen estimate is then v  ! u P x; a = 1 k uN P x; a 1 , P x; a ta E P x; a E P x; a Na k s  x; k  P 1 a N NaP x; a1 , P k a  x;v a u k u 1 , P kx;a  :  pN t P x; a a 2.59 2.60 2.61 2.62 2.63 The Parzen estimate has a larger normalized standard deviation at points where the estimated probability is small. The Parzen density estimate converges to the true estimate at a rate 1 proportional to pNa . The de nition of Parzen window estimation can be generalized to higher dimensions by replacing the one dimensional smoothing functions by their d dimensional counterparts see Section 2.4 for a d dimensional Gaussian. Though the de nition of Parzen estimation is the same for any number of dimensions, the behavior of the algorithm can be very di erent. As the number of dimensions grows the number of data points required rapidly increases. In d dimensions, the window of support of a Gaussian smoothing function is an d dimensional sphere whose radius r is a function of its standard deviation. The volume of a d dimensional sphere of radius r is Vd rd, where Vd is a constant dependent only on the dimension. Assuming r d that all of the sample data is c...
View Full Document

Ask a homework question - tutors are online