Unformatted text preview: e over X . We show that the integral is approximated by
the Parzen estimator,
f f ef f
pX = x = ,1 pX = xjX = X 0pX X 0dX 0
f ef f
p x , X 0pX X 0dX 0
= EX p x , X
Ea p x , xa
1 X p x , x ;
=N 0 ~a
a xa 2a
where a is a sample of X . The probability of the uncorrupted random variable X is apf
proximated by the Parzen estimate constructed from the samples of X where the smoothing
function is the density function of the noise. The probability of the corrupted random variable
can be derived from a very similar argument,
1 X p p x , x :
pX = x N 0 ~a
a xa 2a
The probability of a noise corrupted random variable X is approximated by the Parzen
estimate using the smoothing function p p x. This result is independent of the density
of X . Often is Gaussian noise, a very common assumption that we will return to in our
discussions of entropy. The smoothing function is then a Gaussian density that has twice the
standard deviation of . Finding the Best Smoothing Functions
As we have seen, when a priori information about the density is available Parzen estimation
will converge to the correct density. Moreover, when we know either that the density is smooth
or that it has been perturbed by noise it is possible to nd the correct smoothing function.
In the absence of a priori information, the quality of the Parzen estimate is dependent on the
variance of the smoothing functions. Figures 2.7 and 2.8 display the dependence of the
density estimate on . Each shows the Parzen estimates computed from a 100 point sample
as is changed. Notice that the actual density function that results is very dependent on
the variance. The qualitative nature of this dependence varies across the range of variances
45 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY
0 -4 -3 -2 -1 0 1 2 3 4 Figure 2.7: Five plots of the Parzen density estimates derived from a 100 point sample of
a Gaussian. The Gaussian has variance 1:0 and mean 0:0. The di erent estimates use a
di erent value for the variance of the component smoothing functions. The variances used
range over a factor of 256, from 0:005 to 1:28.
shown. When the variance of the smoothing function is small, less that 0.1, the resulting
density changes very rapidly as variance is changed. Above 0.2 small changes in variance do
not change the resulting density nearly as rapidly.
Selection of the correct variance for the smoothing functions need not be a hit or miss
process. Much in the same way that likelihood can be used to nd the parameters of a
Gaussian to t a sample, likelihood can be used to nd the variance of the Gaussians that
make up the Parzen estimate. In general it is possible to compute the best variance for each
Gaussian in the Parzen density estimate separately. This process requires a great deal of time
and data. Since we wish to preserve the simplicity of the Parzen estimate, a single variance
will be used for all of the smoothing functions.
Recall that likelihood is maximized when empirical entropy is minimized see Section 2.3.1.
Since subsequent chapters will focus on empirical entropy, we will use empirical entropy to
estimate the optimal variance. Figure 2.9 graphs the empirical entropy of the sample versus
variance. The sample used in this graph is the same as was used to estimate the densities in
Figures 2.7 and 2.8. The broad minimum in entropy at 0:25 implies that the Parzen density
estimate is not critically dependent on variance. The variance need only be within a factor
of ten of the optimal variance.
46 2.4. MODELING DENSITIES AI-TR 1548 0.7
0 3 2 1 0 -1 -2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 -3 Figure 2.8: A parametric surface plot of the Parzen density versus variance this is the same
data shown in the previous graph. The horizontal and vertical axes are the location and
density respectively. Variance changes with depth in the graph. Here variance ranges from
0:80 to 0:01.
The true entropy of a Gaussian with variance 1:0 is 1:419. The optimal Parzen density
estimate has an empirical entropy of 1:47. This close agreement is not coincidence. It is
argued in the next chapter that the true entropy of a density can be e ectively estimated
from a Parzen density estimate.
There is a small technical note that should not be overlooked. We must be careful
whenever the same sample that is used both to construct the Parzen estimate and to estimate
entropy. Recall that the most likely, or lowest entropy, density estimate for a sample is a
collection of delta functions centered at each point from the sample see 2.2.1. We also know
that this delta function density will have an entropy of negative in nity. The Parzen density
is very similar in form to the delta function density. It too centers a function at each point
from the sample. In the limit as the variance of the smoothing functions tends towards zero,
each smoothing function approximates a delta function. T...
View Full Document
- Spring '10
- The Land, Probability distribution, Probability theory, probability density function, Mutual Information, Paul A. Viola