Unformatted text preview: periments have we pre-segmented the
image. The initial poses often project the model into regions of the image that contain a
signi cant amount of clutter. EMMA reliably settles on a pose where few if any of the model
points project onto the background.
In answer to the second question, EMMA requires roughly 35 seconds on a Sun SparcStation5 for each of the alignments shown above. Run times are identical because we have
chosen to use a xed number of update iterations for each alignment experiment. In some
cases an accurate alignment was obtained well before the full number of iterations had been
completed. In others it appeared that the nal alignment could have been improved if the
number of iterations were increased.
There are few if any principled results on the convergence of stochastic approximation.
Convergence detection is a subtle issue. For example, EMMA does not make a direct estimate
of the mutual information between model and image. During alignment only a stochastic
estimate of the gradient is available. It may be possible to construct an ad hoc procedure
that would be able to detect convergence. Alignment could then be continued until the pose
estimate had converged.
From an analysis of the program's memory access and computation patterns, we conclude
that an implementation on a digital signal processor would be as much as 100 times faster
than our current implementation. One major issue is cache performance. Because EMMA
114 5.1. ALIGNMENT OF 3D OBJECTS TO VIDEO AI-TR 1548 Figure 5.5: Final pose of the skull model after alignment.
randomly accesses each of the points in the image and model, much time is wasted ushing
and re lling the cache. The cache on a general purpose processor is often fairly limited. Most
digital signal processors include a large quantity of fast SRAM, eliminating the need for a
cache. For random memory accesses a digital signal processor should be approximately 5
times faster than a conventional computer. The inner loop of the EMMA derivative estimation
procedure is dominated by simple oating point operations. Modern digital signal processors
can execute these instructions 10 to 20 times faster than conventional computers. Together
these advantages should lead to an overall improvement in speed of between 50 and 100.
A number of randomized experiments were performed to determine the reliability, accuracy and repeatability of alignment. This data is reported in Table 5.3. An initial alignment
was performed to establish a base pose. This pose, shown in Figure 5.5, is used as a point
of reference. A set of randomized experiments was performed where the base pose is rst
perturbed, and then EMMA is used to re-align the image and model. The perturbation is
computed as follows: a random uniformly distributed o set is added to each translational
axis labeled T and then the model is rotated about a randomly selected axis by a random
uniformly selected angle . There were four experiments each including 50 random initial poses. The distribution of the nal and initial poses can be compared by comparing the
variance of the location of the centroid, computed separately in X, Y and Z. Furthermore, the
average angular rotation from the true pose is computed labeled j 4 j. Finally, the number
of poses that failed to converge near the correct solution is reported. The nal statistics are
115 Paul A. Viola CHAPTER 5. ALIGNMENT EXPERIMENTS Figure 5.6: Final pose of the skull model after alignment.
10; 20 4 INITIAL
20; 40 5.94
14.83 Y Z j 4 j FINAL
5.56 6.11 5.11 .61
18.00 16.82 5.88 1.80
12.04 10.77 11.56 1.11
15.46 14.466 28.70 1.87 Y mm
2.22 Z j 4 j 5.49
78 Table 5.3: Skull Results Table. The nal column contains the percentage of poses that
successfully converged to a pose near the correct pose.
only evaluated over the poses that converged near the correct solution.
These experiments demonstrate that the alignment procedure is reliable when the initial
pose is close to the correct" pose. Outside of this range gradient descent, by itself, is not
capable of converging to the correct solution. The capture range is not unreasonably small
however. Translations as large as half the diameter of the skull can be accommodated, as
can rotations in the plane of up to 45 degrees. Empirically it seems that alignment is most
sensitive to rotation in depth. This is not terribly surprising since only the visible points play
a role in the calculation of the derivative. As a result, when the chin is hidden the derivative
gives you no information about how move the chin out from behind the rest of the skull.
Finally, we have done a number of experiments to demonstrate that EMMA alignment
116 5.1. ALIGNMENT OF 3D OBJECTS TO VIDEO AI-TR 1548 Figure 5.7: Final pose of the skull model after alignment.
can deal with occlusion. Figure 5.9 shows an initial and nal alignment for an image that
includes an arti cial occlusion...
View Full Document
- Spring '10
- The Land, Probability distribution, Probability theory, probability density function, Mutual Information, Paul A. Viola