Unformatted text preview: The
densities of the di erent projections is shown in Figure 3.8. 3.5 Conclusion
This chapter has presented a new technique for estimating the entropy of a distribution
called EMMA. Provided the density being approximated is smooth, we have proven that
the technique will converge to the correct entropy estimate. Moreover we have presented
73 CHAPTER 3.
Paul A. Viola EMPIRICAL ENTROPY MANIPULATION AND STOCHASTIC GRADIENT DESCENT
PCA 2 1.5 1 0.5 0 -4 -3 -2 -1 0 1 2 3 4 Figure 3.6: The Parzen density estimates of YPCA and YECA from the previous graph.
a computationally e cient stochastic technique for manipulating entropy. For reasonable
sample sizes, the technique is not guaranteed to optimize true entropy. Instead it optimizes
a very similar statistic that retains all of the salient characteristics of entropy.
We have also described a simple application of EMMA. EMMA enables us to nd low
dimensional projections of higher dimensional data that minimize or maximize entropy. 74 3.5. CONCLUSION AI-TR 1548 ECA-MIN
-4 -2 0 2 4 Figure 3.7: A scatter plot of a 400 point sample from a two dimensional density. Each cluster
has very high kurtosis along the horizontal axis. See text for description of projection axes.
PCA Density 2
-4 -3 -2 -1 0
Position 2 3 4 Figure 3.8: The densities along various projection axes.
75 Chapter 4
Matching and Alignment
This chapter is perhaps the most important in this thesis. Previous chapters have presented
the mathematics and algorithms that underly the computation of empirical entropy. We
have already seen that empirical entropy can be used to de ne a new algorithm for nding
the most informative projection of a distribution. This chapter will show that matching
and alignment can also be formulated as an entropy problem. In addition, we will discuss
the intuition behind our framework and suggest some simpli ed schemes that re ect these
intuitions. Throughout this chapter a number of synthetic alignment problems will drive our
We will begin with a rederivation of correlation as a maximum likelihood method. This
derivation will make clear the assumptions under which correlation will work, and when it
may fail. We will then attempt to generalize correlation so that it will work with a wider set of
inputs. While this generalization is theoretically straightforward it will prove computationally
Dropping our focus on correlation, we will de ne an intuitive approach to alignment
which is e ciently computable. Using this intuition we will then de ne an approximation to
a maximum likelihood technique that is both concrete and computable. Finally we will draw
a parallel between this technique and mutual information. Experimental data from synthetic
alignment problems will help us evaluate the proposed alignment techniques.
This chapter will conclude with an entirely di erent motivation for the use of mutual
76 4.1. ALIGNMENT AI-TR 1548 information in alignment. We will show how the alignment problem can be thought of as
a Minimum Description Length problem Rissanen, 1978; Leclerc, 1988. This formulation
will naturally focus on the task of coding e ciency and entropy minimization. A very similar
set of alignment equations will arise from these considerations. 4.1 Alignment
We are given two signals of time or space: ux and vy. We will call ux the model.
Often it is a description of a physical object that has been computed with great care. For
example, in one of our experiments the model is an accurate three dimensional description of
a skull. The second signal, vy, is an image of the model. In general the form and even the
coordinate systems of the model and image can be very di erent. For example, one of our
3D models is a collection of three dimensional points and normals; the corresponding image
is a two dimensional array of intensities. It is assumed that vy is an observation of ux,
for example that vy is a picture of the skull ux.
The relationship between ux and vy is based on the physics of imaging. The process
of constructing an observation has two separate components. The rst component is called
a transformation, or pose, denoted T . It relates the coordinate frame of the model, x, to
the coordinate frame of the image, y. The transformation tells us which part of the model
is responsible for a particular pixel of the image. The second component is the imaging
function, F ux; q. The imaging function determines the value of image point vT x. In
general a pixel's value may be a function both of the model and other exogenous factors.
For example, an image of an object depends not only on the object but also on the lighting.
The parameter, q, collects all of the exogenous in uences into a single vector. The complete
imaging model is then:
vT x = F ux; q + ;
vy = F uT ,1y; q + ;
where is a random variable that models noise in the imaging process.
For a number of practical problems, the trans...
View Full Document
- Spring '10
- The Land, Probability distribution, Probability theory, probability density function, Mutual Information, Paul A. Viola