This preview shows page 1. Sign up to view the full content.
Unformatted text preview: s estimating the conditional entropy of an image given a model. While it is more general than
correlation, weighted neighbor likelihood is still limited. It can be still further generalized
yielding a technique called EMMA alignment. This technique explicitly estimates the mutual
information between an image and model. A number of synthetic experiments demonstrate
that mutual information is a very exible measure of alignment.
In the nal section of this chapter the concept of minimum description length is used as
yet another motivation for mutual information as an alignment measure. This alternative
framework encourages the model to explain as much of the image as possible. 105 Chapter 5
Alignment Experiments
This chapter contains a number of experiments designed to demonstrate that alignment by
maximization of mutual information is a practical technique. The previous chapter contained
a very general de nition of alignment. Though a procedure for adjusting the pose parameters
during alignment was derived, many of the details regarding representations and implementations were left out. Each experiment described in this chapter will include both these needed
details and a general discussion of the experimental framework.
By the end of this chapter we will have developed some familiarity with the application
of EMMA alignment. The chapter will conclude with a section describing explicit limitations of this approach. In addition to describing problems for which EMMA alignment is
poorly suited, it will be emphasized that EMMA alignment by itself is not a complete object
recognition system.
For clarity the parameters and assumptions that underly EMMA alignment have been
identi ed. We have broken the process of setting up an experiment into discrete steps that
can be applied to a wide variety of alignment problems see Table 5.1. Along with the
description of each experiment we will include a similar table with a speci c realization for
each step. 106 5.1. ALIGNMENT OF 3D OBJECTS TO VIDEO AITR 1548 1. Choose a model and image representation i. e. de ne u and
v. De ne an interpolation scheme for sampling v at nonintegral coordinates.
2. Choose a scheme for sampling the model i.e. de ne x.
3. Determine the space of possible aligning transformations and its
concrete representation i.e. de ne T . The de nition of the
random variables ux and vT x is now complete.
4. Derive an expression for dvy=dy.
5. Pick a metric for computing distances between pairs of samples
of ux, vT x, and fux, vT xg.
6. Pick the variance for the component densities: .
7. Choose a value for pmin .
8. Determine the number of samples used to estimate the distribution, and the number used to estimate the entropy.
9. Pick a parameter update rate, . In general the update rate will
decrease with time.
Table 5.1: The process for setting up an alignment. 5.1 Alignment of 3D Objects to Video
In our rst experiment we will return to the example described in the introduction: alignment
of a threedimensional object to a video image. In all of our alignment experiments we will
assume that the entire object has the same surface properties. We can then treat surface
property as yet another exogenous variable.
Following Table 5.1:
1. Models are a collection of points that lie on the surface of the object. We chose this representation because it is capable of representing any shape including smoothly curved
or irregular forms. It is equally capable of representing objects with at faces such as
polyhedra. The models have been constructed so the distribution of surface points is
as close to uniform as possible. Associated with each surface point is the local surface
normal, a unit vector perpendicular to the surface. The models used have between 7000
and 65,000 points. Video images are represented as simple two dimensional arrays.
2. The random variable x that is used to sample the model and image is de ned from the
107 Paul A. Viola CHAPTER 5. ALIGNMENT EXPERIMENTS model. A trial of x is a randomly selected model point. The value of the trial is the
3D location of that model point. We sample the points of the model uniformly.
3. The transformation space is the space of rigid three dimensional translations and rotations followed by perspective projection. The overall transformation is a concatenation
of rotations and translations each acting on the prede ned center" of the object.
Because of selfocclusion, not every point on the model is visible. Visibility is determined by a Zbu er rendering of the model. Zbu er rendering takes each point in the
model and projects it into the image. When multiple points fall onto the same pixel,
only the point that is nearest is considered visible. As pose changes, some points become visible and others become invisible. In theory Zbu ering needs be repeated every
time the pose of an object changes. Unfortunately, Zbu ering takes time proportional
to the size of the model. This cost is far...
View
Full
Document
This note was uploaded on 02/10/2010 for the course TBE 2300 taught by Professor Cudeback during the Spring '10 term at Webber.
 Spring '10
 Cudeback
 The Land

Click to edit the document details