Unformatted text preview: that covers the entire chin area. The nal alignment is very
close to the correct one despite the occlusion. Figure 5.10 shows an initial and nal pose for a
more complex occlusion. In this image we have replaced a rectangular window with another
randomly chosen window of the image. The source of the rectangle is near the bottom of the
image. In a number of experiments, we have found that alignment to occluded images can
require more time for convergence. 117 Paul A. Viola CHAPTER 5. ALIGNMENT EXPERIMENTS Figure 5.8: Final pose of the skull model after alignment. Figure 5.9: An image including an arti cial occlusion. White spots denote the pose of the
model. On the left is the initial pose, on the right is the nal pose. Figure 5.10: An image including an arti cial occlusion. White spots denote the pose of the
model. On the left is the initial pose, on the right is the nal pose.
118 5.1. ALIGNMENT OF 3D OBJECTS TO VIDEO AI-TR 1548 5.1.2 Alignment of Head Model
We have repeated many of the skull experiments with a three dimensional model of a human head. This model was obtained from a Cyberware scan of the subject that was taken
approximately two years before the video images3. A Cyberware scan is a complete three
dimensional representation of the shape of the subject's head in cylindrical coordinates. The
surface normals were computed from the surface by smoothing and di erencing neighboring
The experiments in this section are designed to answer two questions: 1 Will the same
techniques and parameters work with two di erent types of models and images? 2 Is it
possible to use the pose re nement procedure to track a moving object in a video sequence?
Figure 5.11 shows an image of the head and a rendering of the model.
How are the face experiments di erent from the skull experiments? Firstly, the face
model is much smoother than the skull model. There really aren't any creases or points of
high curvature. As a result it is much less likely that an edge-based system could construct
a representation either of the image or the model that would be stable under changes in
illumination. Secondly, the albedo of the actual object is not exactly constant. The face
contains eyebrows, lips and other regions where the albedo is not the same. As a result this
is a test of EMMA's ability to handle objects where the assumption of constant albedo is
violated. Thirdly, not all of the occluding contours of the object are present in the model.
The model is truncated both at the chin and the forehead. As a result experiments with this
model demonstrate that EMMA can work even when the occluding contours of the image and
model are not in agreement.
In the previous experiment projecting points from the model into the image was su cient
to describe the model pose. Since the head model is very smooth and some occluding contours
are missing simply projecting the model points into the image is not su cient to determine
the quality of an alignment. For our experiments with the head model we will display the
original image, augmented with model points, alongside a rendered image of the model.
Figures 5.11 and 5.12 show the model before and after alignment. In this experiment the
model has been rotated 30 degrees around the vertical and translated 40 millimeters to the
3 Thanks to Ron Kikinis for providing the Cyberware scan and for allowing me to take the images of him. 119 Paul A. Viola CHAPTER 5. ALIGNMENT EXPERIMENTS Figure 5.11: An initial incorrect pose. The model has been rotated 30 degrees about the
vertical and translated 40 millimeters to the right. On the left is an image of the head along
with a collection of points projected from the model. On the right is a rendering of the model
in the same pose. Figure 5.12: The nal aligned poses. On the left is an image of the head along with a
collection of points projected from the model. On the right is a rendering of the model in the
right. Figures 5.13 and 5.14 show another experiment where EMMA alignment corrects for
a 150 millimeter translation in depth.
We have also tested EMMA alignment on a video sequence digitized from a video tape.
The sequence was taken at the same time as the other images, though the camera and the
lens were di erent. Ten frames were acquired from a video tape at 3 frames per second. The
quality of the resulting images is very low. The images were degraded both by their storage
on video tape and by the frame grabber that was used. It was somewhat surprising that these
images worked nearly as well as the higher quality still frames.
Motion in the video sequence was tracked by sequentially aligning the model to each of
the frames. The starting pose for each frame was obtained by using the nal estimated pose
from the previous frame. The starting pose for the rst frame was hand selected so that
EMMA alignment could acquire a good initial alignment. The sequence and pose estimates
are displayed in Figure 5.15.
120 5.1. ALIGNMENT OF 3D OBJECTS TO VIDEO AI-TR 1548 Fig...
View Full Document
- Spring '10
- The Land, Probability distribution, Probability theory, probability density function, Mutual Information, Paul A. Viola