This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ction 5.1.3, because neither edges, nor their precursor, gradient magnitude, are stable in image position
with respect to lighting changes see Figure 5.16. While our technique works well using
only shading, it also works well in domains having surface property discontinuities and silhouette information see Section 5.1.1. 147 Paul A. Viola CHAPTER 7. CONCLUSION 7.1 Related Work
Alignment by extremizing properties of the joint signal has been used by Hill and Hawkes
Hill et al., 1994 to align MRI, CT, and other medical image modalities. They use third order
moments to characterize the clustering of the joint data. We believe that mutual information
is perhaps a more direct measure of the salient property of the joint data at alignment, and
demonstrate an e cient means of estimating and extremizing it.
There are many schemes that represent models and images by collections of edges and
de ne a distance metric between them that is proportional to the number of edges that coincide
see the excellent survey articles: Besl and Jain, 1985; Chin and Dyer, 1986. A smooth,
optimizable version of this metric can be de ned by introducing a penalty both for unmatched
edges and for the distance between those that are matched Lowe, 1985; Wells III, 1992b;
Huttenlocher et al., 1991. This metric can then be used both for image model comparison
and for pose re nement. Edge based metrics can work under a variety of di erent lighting
conditions, but they make two very strong assumptions: the edges that arise are stable under
changes in lighting, and the models are well described as a collection of edges. Clearly
smoothly curved objects are a real problem for these techniques. As we alluded before, Wells
has performed a number of experiments where he attempts to match edges that are extracted
under varying lighting. In general for even moderately curved objects, the number of unstable
and therefore unreliable edges is problematic. Faces, cars, fruit and a myriad of other objects
have proven to be very di cult to model using edges.
Others use more direct techniques to build models. Generally these approaches revolve
around the use of the image itself as an object model. Objects need not have edges to be well
represented in this way, but care must be taken to deal with changes in lighting and pose.
Turk and Pentland have used a large collection of face images to train a system to construct
representations that are invariant to some changes in lighting and pose Turk and Pentland,
1991. These representations are a projection onto the largest eigenvectors of the distribution
of images within the collection. Their system addresses the problem of recognition rather
than alignment, and as a result much of the emphasis and many of the results are di erent.
For instance, it is not clear how much variation in pose can be handled by their system.
We do not see a straightforward extension of this or similar eigenspace work to the problem
of pose re nement. On a related note Shashua has shown that all of the images, under
di erent lighting, of a Lambertian surface are a linear combination of any three of the images
148 7.2. A PARALLEL WITH GEOMETRICAL ALIGNMENT AITR 1548 Shashua, 1992. This also bears a clear relation to the work of Turk and Pentland in that
the eigenvectors of a set of images of an object should span this three dimensional space.
Entropy is playing an ever increasing role within the eld of neural networks. We know
of no work on the alignment of models and images, but there has been work using entropy
and information in vision problems. None of these techniques uses a nonparametric scheme
for density entropy estimation as we do. In most cases the distributions are assumed to be
either binomial or Gaussian. This both simpli es and limits such approaches.
Linsker has used the concept of information maximization to motivate a theory of development in the primary visual cortex Linsker, 1986. He has been able to predict the
development of receptive elds that are very reminiscent of the ones found in the primate
visual cortex. He uses a Gaussian model both for the signal and the noise.
Becker and Hinton have used the maximization of mutual information as a framework for
learning di erent lowlevel processing algorithms such as disparity estimation and curvature
estimation Becker and Hinton, 1992. They assume that the signals whose mutual information is to be maximized are Gaussian. In addition, they assume that the only joint information
between images is the information that they wish to extract i.e. they train their disparity
detectors on random dot stereograms.
Finally, Bell has used a measure of information to separate signals that have been linearly
mixed together Bell and Sejnowski, 1995. His technique assumes that the di erent mixed
signals carry little mutual information. While he does not assume that the distribution has
a particular functional form, he does assume that the distribution is well matched to a preselected transfer function. For example, a Gaussian is well matched to the logistic function
because applying a correctly positioned and scaled logistic function results in a uniform
distribution. 7.2 A Parallel with Geometric...
View Full
Document
 Spring '10
 Cudeback
 The Land

Click to edit the document details