Unformatted text preview: 95. Their system can estimate the emotional content of these faces i.e.
it can determine that you have drawn a very happy face that is somewhat surprised. Their
142 6.2. ALIGNMENT OF LINE DRAWINGS AI-TR 1548 Figure 6.6: At left is the MRI scan shown with a modi ed intensity scale. The inhomogeneity
is not much more apparent. In the center is the correct scan shown using the same intensity
scale. At right is the estimated correction eld. Figure 6.7: A coronal slice from an MRI scan of a head.
system works by constructing a non-rigid transformation that maps a novel drawing onto a
hand drawn neutral" face see gure Figure 6.10. The shape of the non-rigid transformation
determines the emotion of the face.
Jones and Poggio use a representation for non-rigid transformations that is called ow. A
ow is an image of displacement vectors. They search for a ow that minimizes the di erence
between the base image bx and the novel image nx,
C f = nx , bx + f x2 :
where f is the ow image and the summation is over all of the pixels in the novel image. The
problem of non-rigidly transforming one image into another has been very heavily studied in
computer vision. In most previous work ow is represented directly as an image of displace143 Paul A. Viola CHAPTER 6. OTHER APPLICATIONS OF EMMA Corrupted
Corrected 1200 1000 800 600 400 200 0
-0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Figure 6.8: The distribution of pixel value in the MRI scan before and after correction. Solid
line is the original distribution. The dashed line is the corrected distribution. Figure 6.9: The same scan but with a modi ed intensity scale. Points above the intensity for
grey matter appear white. Points below the intensity for white matter appear black. There
is a linear scale between. Notice that the lower part of the image is darker in the uncorrected
image. On the right is the estimated bias.
ment vectors. The search problem is then conditioned with the addition of a smoothness prior
over ow images. Jones and Poggio decompose ow into a linear combination of component
C f ig =
nx , bx + ifix :
i The search for ow then becomes an unconstrained optimization over the parameters i.
Each component ow represents a di erent type of emotion. For example, one kind of
ow might transform a neutral face into a smiling face. Another might make a face look
more surprised. These ows can be mixed together to produce images that have combination
144 6.2. ALIGNMENT OF LINE DRAWINGS AI-TR 1548 Figure 6.10: The rst face is the neutral" face. The others are frown", narrow eyes",
surprise", eyebrows", and smile".
of properties. Given a novel image, a set of i's are determined that provide for the closest
match with the neutral face. These i's determine to what extent the face is smiling or
frowning. Figure 6.11 shows several novel images and the best reconstruction obtained by
transforming the neutral face.
In their paper Jones and Poggio use Levenberg-Marquardt, a second order gradient descent procedure, to determine the ow parameters see William H. Press and Veterling, 1992
for an excellent discussion optimization techniques. Together we have replaced this technique with a much simpler stochastic gradient descent procedure. Over the course of many
experiments stochastic alignment has improved running times from 60 seconds to 2 seconds.
The quality of the minima found with stochastic gradient descent is equivalent to, if not
better, than Levenberg-Marquardt. Moreover, stochastic gradient descent rarely if ever gets
stuck in local minima. There were a number of cases where Levenberg-Marquardt converges
far from the the best solution. On these same problems, stochastic gradient descent almost
always nds a good solution. 145 Paul A. Viola CHAPTER 6. OTHER APPLICATIONS OF EMMA Figure 6.11: A series of novel faces with the best reconstruction displayed below. 146 Chapter 7
Maximization of mutual information appears to be a new and powerful means of performing
local alignment of objects and images. In a typical vision application it is an intensity-based,
rather than feature based method. While intensity based, it is more robust than traditional
correlation, as shown by the insensitivity to lighting demonstrated in the experiment of
Section 5.1.3. In addition, the method is insensitive to negating the image data, as well
as a variety of non-linear transformations, which would defeat conventional intensity-based
The weaknesses of intensity correlation may be corrected, to some extent, by performing
correlations on the magnitude of the brightness gradient. This, as well as edge-based matching
techniques, can perform well on objects having discontinuous surface properties, or useful
silhouettes. These approaches work because the image counterparts of these discontinuities
are reasonably stable with respect to illumination.
Gradient magnitude correlation, as well as edge-based methods can have serious di culties in domains lacking discontinuities, such as the example shown in Se...
View Full Document
- Spring '10
- The Land, Probability distribution, Probability theory, probability density function, Mutual Information, Paul A. Viola