Unformatted text preview: ncrease the mutual information, then the e ect of the rst term in the brackets
may be interpreted as acting to increase the squared distance between pairs of samples that
are nearby in image intensity, while the second term acts to decrease the squared distance
between pairs of samples that are nearby in both image intensity and the model properties.
It is important to emphasize that distances are in the space of values intensities, brightness,
or surface properties, rather than coordinate locations.
The term dT vi , vj will generally involve gradients of the image intensities and the
derivative of transformed coordinates with respect to the transformation. In the simple case
that T is a linear operator, we obtain the following outer product expression, d vT x = rvT x xT :
dT 4.4 Matching and Minimum Description Length
There is another entirely di erent motivation for using mutual information as a alignment
metric. Alignment, and many other vision problems, can be reformulated as minimum description length MDL problems Rissanen, 1978; Leclerc, 1988. MDL can provide us with
some new insight into the problem of alignment and help us derive a missing and often useful
term in the alignment equations.
The standard framework of MDL involves a sender and a receiver communicating descriptions of images. Given that the sender and the receiver have an agreed upon language
for describing images, the sender's goal is to nd a message that will accurately describe an
image in the fewest bits. The concept of description length is clearly related to the code
length introduced in Section 2.2.
For the problem of alignment we will assume that the sender and the receiver share the
same set of object models. The sender's goal is to communicate an image of one of these
101 Paul A. Viola CHAPTER 4. MATCHING AND ALIGNMENT models. Knowing nothing else, the sender could simply ignore the models and send a message
describing the entire image. This would require a message that is on average as long as the
entropy of the image. However, whenever the image is an observation of a model a more
e cient approach is possible. For example the sender could send the pose of the model
and a description for how to render it. From these the receiver can reconstruct the part of
the original image in which the model lies. To send the entire image, the sender need only
encode the errors in this reconstruction, if there are any, and any part of the image that is
unexplained by the model. Alignment can be thought of as the process by which the sender
attempts to nd the model pose that minimizes the code length of the overall message.
The encoding of the entire image has several parts: 1 a message describing the pose;
2 a message describing the imaging function; 3 a message describing the errors in the
reconstruction; and 4 a message describing the parts of the image unexplained by the
model. The length of each part of the message is proportional to its entropy. We can assume
that poses are uniformly distributed, and that sending a pose incurs some small uniform
cost. The length of part 4 is the entropy of the image that is unexplained. Parts 2 and 3
can be interpreted in two ways. We can assume that the imaging function can be sent with
a xed or small cost. Part 3 is then proportional to the conditional entropy of the image
given the model and imaging function. This is precisely what was estimated and minimized
with weighted neighbor alignment. A second interpretation comes from EMMA. EMMA
estimates the joint entropy of the model and image, hu; v. The conditional entropy of the
image given the model can be computed as hvju = hu; v , hu. Since the entropy of the
model is xed, minimizing the joint entropy minimizes the conditional entropy. In both cases
entropy based alignment as proposed in the rst part of this chapter minimizes the cost of
sending parts 1, 2 and 3. MDL suggests that we must also minimize the entropy of the
unmodeled part of the image.
In the previous information theoretic formulation there was no concept of pixels or of
the proportion of the image explained by the model. In fact, in the previous formulation the
entropy of the explained part of the image could get larger as the model shrunk. For example,
assume that the model covers a contiguous region of an image where most of the pixels have
constant value. At the center of this region is a small patch containing varied pixels. Recall
that the image is sampled at points that are projected from the model. Most of the model
points will project into the region of constant intensity and a few will project onto the varied
patch. The resulting distribution of image pixels, because it has many samples of the same
102 4.4. MATCHING AND MINIMUM DESCRIPTION LENGTH AI-TR 1548 value, has fairly low entropy. If the model were shrunk to cover only the varied patch, then
all of the points from the model would fall in the varied region. The new distribution of pixel
values will hav...
View Full Document
This note was uploaded on 02/10/2010 for the course TBE 2300 taught by Professor Cudeback during the Spring '10 term at Webber.
- Spring '10
- The Land