This preview shows page 1. Sign up to view the full content.
Unformatted text preview: transformation, a model for
the imaging process could be used to predict the image that will result. If we had a good
imaging model then deciding whether an image contained a particular model at a given pose
is straightforward: compute the predicted image and compare it to the actual image directly.
Given a perfect imaging model the two images will be identical, or close to it. Of course
nding the correct alignment is still a remaining challenge.
The relationship between an object model no matter how accurate and the object's
image is a complex one. The appearance of a small patch of a surface is a function of the
surface properties, the patch's orientation, the position of the lights and the position of the
observer. Given a model ux and an image vy we can formulate an imaging equation, vT x = F ux; q
or equivalently, 1.1 vy = F uT ,1y; q : 1.2
The imaging equation is separable into two distinct components. The rst component is
called a transformation, or pose, denoted T . It relates the coordinate frame of the model
to the coordinate frame of the image. The transformation tells us which point in the model
is responsible for a particular point in the image. The second component is the imaging
function, F ux; q. The imaging function determines the value of image point vT x. In
general a pixel's value may be a function both of the model and other exogenous factors. For
example an image of a three dimensional object depends not only on the object but also on
the lighting. The parameter, q, collects all of the exogenous in uences into a single vector.
One reason that it is, in principle, possible to de ne F is that the image does convey
information about the model. Clearly if there were no mutual information between u and v,
there could be no meaningful F . We propose to nesse the problem of nding and computing
F by dealing with this mutual information directly. We will present an algorithm that aligns
by maximizing the mutual information between model and image. It requires no a priori
model of the relationship between surface properties and scene intensities it only assumes
11 Paul A. Viola CHAPTER 1. INTRODUCTION that the model tells more about the scene when it is correctly aligned. 1.1.1 An Alignment Example
One of the alignment problems that we will address involves nding the pose of a threedimensional object that appears in a video image. This problem involves comparing two
very di erent kinds of representations: a threedimensional model of the shape of the object
and a video image of that object. For example, Figure 1.1 contains a video image of an
example object on the left and a depth map of that same object on the right the object in
question is a person's head: Ron. A depth map is an image that displays the depth from
the camera to every visible point on the object model. A depth map is a complete description
of the shape of the object, at least the visible parts.
From the depth map alone it might be di cult to see that the image and the model are
aligned. The task can be made much easier, at least for us, if we simulate the imaging
process and construct an image from the 3D model. Figure 1.2 contains two computer
graphics renderings of the object model. These synthetic images are constructed assuming
that the 3D model has a Lambertian surface and that the lighting comes from the right. It
is almost immediately obvious that the model on the left is more closely aligned to the true
image than the model on the right. Unfortunately, what we nd trivial is very di cult for a
computer. The intensities of the true video image and the synthetic images are very di erent.
The true image and the correct model image are in fact uncorrelated. Yet any person can
glance at these images and decide that both are images of a head and that both heads are
looking in roughly the same direction. The human visual system is capable of ignoring the
super cial di erences that arise from changes in illumination and surface properties.
It is not easy to build an automated alignment procedure that can make this kind of
comparison. It is harder still to construct a system that can nd the correct model pose.
We have built such a system. That system selected the pose of the model shown at left in
Figure 1.2.
As mentioned above, the synthetic images of Ron were generated under the assumption
the model surface is Lambertian and the lighting is from the right. Lambert's law is perhaps
the simplest model of surface re ectivity. It is an accurate model of the re ectance of a matte
12 1.1. AN INTRODUCTION TO ALIGNMENT AITR 1548 Figure 1.1: Two di erent views of Ron. On the left is a video image. On the right is a
depth map of a model of Ron. A depth map describes the distance to each of the visible
points of the model. White denotes points that are closer, black further. Figure 1.2: At left is a computer graphics rendering of a 3D model of Ron. The position of
the model is the same as the position of the actual head. At right is a rende...
View
Full
Document
This note was uploaded on 02/10/2010 for the course TBE 2300 taught by Professor Cudeback during the Spring '10 term at Webber.
 Spring '10
 Cudeback
 The Land

Click to edit the document details