This preview shows page 1. Sign up to view the full content.
Unformatted text preview: herefore the minimum empirical
entropy should be obtained when the variance is zero. This di culty only arises when the
sample from which the density is estimated is the same as the sample with which empirical
entropy is calculated. If these two samples are di erent, and the density is not degenerate in
some way, then no point should appear in both samples. As the variance of the smoothing
functions tends to zero the density at all points that are not in the Parzen sample tends to
zero. As a result the empirical entropy should tend toward positive in nity as the variance
47 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY
5.5
Log Likelihood
5 4.5 4 3.5 3 2.5 2 1.5 1
0.001 0.01 0.1 1 10 100 Figure 2.9: A log plot of negative log likelihood versus . Near the minimum, the log
likelihood is not terribly sensitive to the . Values within a factor of 10 are all roughly
equivalent. tends to zero. This e ectively precludes the solution where the variance of the smoothing
functions is zero.
We can simulate having two di erent samples by a process called crossvalidation. Crossvalidation splits a single sample a into two samples. One sample has a single point fxg and
the second contains the remaining points a , fxg. There are Na di erent ways to split the
sample in two parts. Rather than draw two di erent samples, we use the Na di erent split
samples. In each case the larger sample, of size Na , 1, is used for the Parzen estimate, and
the smaller sample is used to estimate the entropy. Estimating log likelihood or empirical
entropy with two samples a and b yields
log`b = ,NbhbX = ,NbEb log P X; a ;
versus ,NaEa logP xa; a , fxag ; 2.53 2.54
using cross validation. The cross validated empirical entropy is an unbiased estimate of the
two sample empirical entropy.
48 2.4. MODELING DENSITIES AITR 1548 The Quality of the Parzen Estimate One way to evaluate the quality of the Parzen estimate is to evaluate the standard deviation
of its estimate. Another, perhaps more useful statistic is the standard deviation normalized
by the mean
X :
2.55
EX The normalized standard deviation measures expected deviation from the mean as a function
of the overall scale. For many types of problems, when the mean of a variable is large, small
deviations about the mean are usually unimportant. But when the mean is very small, a
small deviation can make a big di erence. Normalized standard deviation is a good measure
to use when the log of a variable is important like log likelihood and entropy. Using
the constant and linear terms of the Taylor expansion of logarithm and assuming that the
standard deviation of X is small,
logX EX :
X 2.56 The standard deviation of the Parzen density estimate at a point x is a function of the total
number of sample points used to estimate the density. The normalized standard deviation of
a Parzen estimate is
P x; a = P x; a ;
2.57
E P x; a
pX where both the standard deviation and the expectation are taken over the space of possible
samples. The two equations are equal whenever P x; a is an unbiased estimator for pX .
The standard deviation of the Parzen estimate can be computed exactly when the smoothing functions are box car functions. The Parzen estimate is then the number of sample points
that fall into the box car window divided by the total number of sample points,
1 X Rx , x = kNin :
P x; a N
2.58
a
N
a xa2a a where Nin is the number of points for which Rx , xa is nonzero, and k = R0 is chosen
49 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY so that P integrates to one. The standard deviation of the Parzen estimate is, k
P x; a = N Nin :
a Assuming each point in the sample is independent then
v
!
u
k uN P x; a 1 , P x; a :
x; a =
ta
P
N
k
a The normalized standard deviation of the Parzen estimate is then
v
!
u
P x; a =
1
k uN P x; a 1 , P x; a
ta
E P x; a E P x; a Na
k
s
x;
k
P 1 a N NaP x; a1 , P k a
x;v a
u
k u 1 , P kx;a :
pN t P x; a
a 2.59 2.60 2.61
2.62
2.63 The Parzen estimate has a larger normalized standard deviation at points where the estimated
probability is small. The Parzen density estimate converges to the true estimate at a rate
1
proportional to pNa .
The de nition of Parzen window estimation can be generalized to higher dimensions by
replacing the one dimensional smoothing functions by their d dimensional counterparts see
Section 2.4 for a d dimensional Gaussian. Though the de nition of Parzen estimation is the
same for any number of dimensions, the behavior of the algorithm can be very di erent. As
the number of dimensions grows the number of data points required rapidly increases. In
d dimensions, the window of support of a Gaussian smoothing function is an d dimensional
sphere whose radius r is a function of its standard deviation. The volume of a d dimensional
sphere of radius r is Vd rd, where Vd is a constant dependent only on the dimension. Assuming r d
that all of the sample data is c...
View
Full
Document
 Spring '10
 Cudeback
 The Land

Click to edit the document details