This preview shows pages 1–8. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Tracking and Modeling NonRigid Objects with Rank Constraints Lorenzo Torresani’r Danny B. Yangl l {ltorresa, dbyang, bregler} @cs.stanford.edu Computer Science Department
Stanford University, Stanford, CA 94305 Abstract This paper presents a novel solution for ﬂawbased
tracking and 3D reconstruction of deforming objects in
monocular image sequences. A nonrigid 3D object under
going rotation and deformation can be effectively approxi
mated using a linear combination of 3D basis shapes. This
puts a bound on the rank of the tracking matrix. The rank
constraint is used to achieve robust and precise lowlevel
optical ﬂow estimation without prior knowledge of the 3D
shape of the object. The bound on the rank is also ex
ploited to handle occlusion at the tracking level leading
to the possibility of recovering the complete trajectories
of occluded/disoccluded points. Following the same low
rank principle, the resulting ﬂow matrix can be factored to
get the 3D pose, conﬁguration coeﬁicients, and 3D basis
shapes. The ﬂow matrix is factored in an iterative manner;
looping between solving for pose, conﬁguration. and basis
shapes. The flowbased tracking is applied to several video
sequences and provides the input to the 3D nonrigid re
construction task. Additional results on synthetic data and
comparisons to ground truth complete the experiments. 1. introduction This paper addresses the problem of 3D tracking and
model acquisition of non—rigid motion in video sequences.
We are speciﬁcally concerned with human motion, which is
a challenging domain. Standard. lowlevel tracking schemes
usually fail due to local ambiguities and noise. Most re
cent approaches overcome this problem with the use of a
model. t In those techniques optical ﬂow vectors or the mo
tion of feature locations can be constrained by a low degree
of—frecdom parametric model. For instance, to track joint—
angles of human limb segments an approximate kinematic
chain model can be used. The models lose many details
that cannot be recovered by simple cylinder or sphere shape
models and ﬁxed axis rotations. Nonrigid torso motions,
deforming shoe motions, or subtle facial skin motions are 076951272—0/01 $10.00 © 2001 IEEE 1—493 Eugene J. Alexander1E Christoph Breglerl
[email protected]
Mechanical Engineering Department
Stanford University, Stanford, CA 94305 problem areas. Alternatively, such non—rigid motions can
be captured with basisshape models that are learned from
example data. Most of the previous work is based on PCA
techniques applied to 2D or 3D training data. For exam—
ple, human face deformations have been tracked in 2D and
3D with such models. For 3D domains, prior models are
aquired using stereo cameras or cyber—scan hardware. Care—
fully labeled data have to be provided to derive the PCA
based models. We are interested in cases where no such 3D models are
available, or existing models are too restricted and would
not be able to recover all subtleties. The input to our tech—
nique is a singleview video reCording of an arbitrary de
forming object, and the output is the 3D motion AND a 3D shape model parameterized by its modes of nonrigid defor
mation. ' We are facing three very challenging problems: 1. Without a model, how can we reliably track ambigu
ous and noisy local features in this domain? 2. Without point feature tracks or robust optical ﬂow,
how can we derive a model? 3. Given reliable 2D tracks, how can we recover 3D
nonrigid motion and shape structure? We have previously demonstrated that single—view 2D
point tracks are enough to recover 3D nonrigid motion and
structure by exploiting lowrank constraints [7]. Based on
the same assumption, we show in this paper that it is also
possible to constrain the low—level ﬂow estimation and to
handle occlusion without any model—assumption. Irani [14]
has demonstrated that modelfree lowrank constraints can
be applied to overcome local ambiguities in ﬂow—estimation
for rigid scenes. We show that this can be extended to 3D
non—rigid tracking and modelacquistion. Our new tech
niques do not need 2D point tracks, can deal with ambigous
and noisy local features, and can handle occlusion. By ex
ploiting the lowrank constraints in lowlevel tracking and
in 3D non—rigid model acquisition we are able to solve all
three challenges mentioned above in one uniﬁed manner. We demostrate the technique on tracking several video se
quences and on deriving 3D deformable models from those
measurements. 2 Previous Work Many non—rigid tracking solutions have been proposed
previously. As mentioned earlier, most techniques use an
a—priori model. Examples are [16, 5, 9, 19, 3, 4]. Most
of these techniques model 2D non—rigid motion, but some
of these approaches also recover 3D pose and deformations
based on a 3D model. The 3D model is obtained from 3D
scanning devices [6], stereo cameras [10], or multi—view
reconstruction [18, 11]. The multiview reconstruction is
based on the assumption that for a speciﬁc deformed con
ﬁguration all views are sampled at the same time. This is
equivalent to the structure from motion problem, that as
sumes rigidity between the different views [22]. Extensions
have been proposed, such as the multibody factorization
method of _Coseira and Kanade [8] that relaxes the rigid
ity constraint. In this method, K independently moving ob
jects are allowed, which results in a tracking matrix of rank
3K and a permutation algorithm that identiﬁes the subma—
trix corresponding to each object. More recently, Bascle
and Blake [l] proposed a method for factoring facial ex—
pressions and pose during tracking. Although it exploits the
bilinearity of 3D pose and nonrigid object conﬁguration, it
requires again a set of basis images selected before factor—
ization is performed. The discovery of these basis images is
not part of their algorithm. In addition, most techniques treat lowlevel tracking and
3D structural constraints independently. In the following
section we describe how we can track and reconstruct non
rigid motions from single views without prior models. 3 Technical Approach The central theme in this paper is the exploitation of
rank—hounds for recovering 3D nonrigid motion. We ﬁrst
describe in general why and under what circumstances 3D
nonrigid motion puts rank bounds on 2D image motion
(section 3. l ). We then detail how these b0unds can be used
to constrain low—level tracking in a modelfree fashion (sec
tion 3.2). We then describe how this technique can also be
used for prediction of occluded features (section 3.3), and
we then introduce three techniques that are able to recon
struct’3D deformable shapes and their motion from those
2D measurements (section 3.4. l, 3.4.2, and 3.4.3). 3.1 Lowrank constraints for nonrigid motion Given a sequence of F video frames, the optical flow of
P pixels can be coded into two F x P matrices, U and V. [—494 Each row of U holds all x—displacements of all P locations
for a speciﬁc time frame, and each row of V holds all y
displacements for a speciﬁc time frame. It has been shown
that if U and v describe a 3D rigid motion, the rank of [9,]
has an upper bound, which depends on the assumed camera
model (for example, for an orthographic camera model the
rank is r S 4, while for a perspective camera model the rank
is r 5 8) [22, 14]. This rank constraint derives from the fact
that [g] can be factored into two matrices: Q x S. Q””
describes the relative pose between camera and object for
each time frame, and 8'” describes the 3D structure of the
scene which is invariant to camera and object motion. Previously we have shown that non—rigid object motion
can also be factored into 2 matrices [7] but of rank r that is
higher than the bounds for the rigid case. Assuming the 3D
nonrigid motion can be approximated by a set of K modes
of variatidn, the 3D shape of a speciﬁc object conﬁguration
can be expressed as a linear combination of K basis—shapes
(51,52, ...Sk). Each basisshape Si is a 3 x P matrix describ—
ing P points. The shape of a speciﬁc conﬁguration is a linear
combination of this basis set: X
3:2433
, (:1 Assuming weakperspective projection, at a speciﬁc time
frame I the P points of a conﬁguration S are projected onto
2D image points (uni, v,,,): um 141,? :R_ +T (2)
vr,l W.” t i=1 m l ( Rt=[r1 r2 r3] (3) r4 r5 76 5,3. enema, an (1) where R, contains the ﬁrst two rows of the full 3D cam
era rotation matrix, and 7} is the camera translation. The
weak perspective scaling (f/ngg) of the projection is im—
plicitly coded in 1,,1,...l,,K. As in [22], we can eliminate 7}
by subtracting the mean of all 2D points, and henceforth can
assume that S is centered at the origin. Weak perspective projection is in practice a good approx
imation if the perspective effects between the closest and
furthest point on the object surface are small. Extending this
framework to fullperspective projection is straight—forward
using an iterative extension. All experiments reported here
assume weak perspective projection. We can rewrite the linear combination in (2) as a matrix
multiplication: 51
unl “if =[ l [R leR 52
v“ V1,}? r’ t t‘ I SK (4) We stack all point tracks from time frame 1 to F into one large measurement 2F x P matrix W. Using (4) we can
write: ZURr l1=KR1 S1 W ___ 12,1R2 12,KR2 . $2 (5)
[F.lRF [EKRF SK
Q 3 Since Q is 21 2F >< 3K matrix and B is a 3K x P matrix, in
the noise free case W has a rank r S 3K. In the following sections we describe how this rank
bound on W can be exploited for l) constrained lowlevel
tracking 2) recovery of occluded feature locations 3) 3D reconstruction of pose, nonrigid deformations, and key
shapes. 3.2 Basis Flow The previous analysis tells us why W is rank bounded
and how W can be factored. In this section we discuss how
to derive the optical ﬂow matrix W from an image sequence
and how the rank—bound can be used to disambiguate the
local ﬂow. Features can usually be tracked reliably with local meth—
ods, such as Lucas—Kanade [l7] and extensions [21, 2], if
they contain a distinctive high contrast pattern with 2D tex—
ture, such as corner features. For traditional rigid shape
reconstruction, only a few feature locations are necessary.
Nonrigid objects go through much more severe motion
variations, hence many more features need to be tracked. In
the extreme case it might be desirable to track every pixel
location. Unfortunately, many objects that we are interested
in, including the human body, do not have many of those
very reliable features. V Our solution to the tracking dilemma builds on a tech—
nique introduced in [14] that exploits rank constraints for
optical flow estimation in the case of rigid motion. Since W is assumed to have rank r, all P columns of W
can be modeled as a linear combination of r “basistracks”,
Q. The basis is not uniquely deﬁned, but if there are more
than r points whose trajectories over the F frames can be
reliably estimated, then we can compute with SVD the ﬁrst
r eigenvectors Q of the reduced tracking matrix Wrelmble.
Q25” is an initial estimate of the basis for all P tracks. Our
next task is to estimate all P tracks (the entire W) using this
eigenbase Q and additional local image constraints. As in the original LucasKanade tracking, we assume
that a small image—patch centered at a track—point location
will not change its appearance drastically between two con—
secutive frames. Therefore the local patch ﬂow [u,v] can be computed by solving the following well known equation
[17]: " 1—495 c d
[ulil’vtif’b I: d e ] :i Thi d 212 211 .
where c = x ” th d —
[d e] [21ny 21; ]IS esecon mo ment matrix of the local image patch in the ﬁrst frame,
g = 21x1“ and h = Zlylt. (for further details see [17, 21, 2]). If all F X P ﬂow—vectors across the entire image sequence
are coded relative to one single image template, the follow
ing equation system can be written [14]: WM [,3 :3 l = [01H] (7) where C, D, E are diagonal P X P matrices that contain
the corresponding c, d, and e values for each of the P local
image patches. Accordingly, G and H are F X P matrices,
that contain the g and h values for all P local patches across
all F time frames. This system of equations is a rewriting of
the LucasKanade linearization for every ﬂow vector, with
no additional constraints yet applied. The number of free
variables is equal to the number of constraints. If a local
patch has no 2D texture, the single equation describing its
motion in the system will only provide an accurate estimate
of its normal ﬂow (aperture problem). Now we split Q into Qu that contains all even rows of Q,
and Qv that contains all odd rows of Since Q is a basis
for W, there must exist some r X P matrix 3 for which the
following equations hold: QuB=U Qvé=v (8)
Using (7) we can write [14]: [erasa [ g g ] =[GIH1 (9) This is a system with r X P unknowns (the entries in 3‘)
and 2F x P equations. For long tracks (F > > P) the system is very overconstrained (in contrast to (7)). We can exploit
this redundancy to derive the optical ﬂow for points difﬁcult
to track and for features along lD edges. Since [GIH] is computed based on the LucasKanade lin—
earization, the resulting ﬂow [U IV] = [Qu  1?le  3] will
only be a ﬁrst approximation. We rewarp all images of the
sequence using the new ﬂow and then iterate equation (9). 3.3 Dealing with Occlusion By reordering the elements of B into a r  Pdimensional
vector b. equation (9) can be rewritten in the form: LZPerP_8rP><l :mZPNXl where now each row describes one point in one particular
frame. If we have occlusion, or the tracker used for initial—
ization has lost some points at certain time frames, then the
corresponding entries in the m vector will not be measur
able. We eliminate those rows from the L matrix and the m
vector. If the number of missing points is not overly large,
we are still left with an overconstrained system that can give
us an accurate solution for I}. As long as the disappearing
features are visible in enough frames, the product Q  3 pro vides also a good prediction of the displacements for the
missing points 3.4 3D Reconstruction As mentioned earlier, the factorization of W into Q and B
is not unique. Any invertible r x r matrixA applied to Q and
B in the following way leads to an alternative factorization: Qu=QA, Q“ and Ba multiplied together approximate W with the
same sumofsquared error as Q and B.’ Using SVD. we compute a Q (with orthonormal
columns) and E. In general Q will not comply to the struc
ture we described in (5); Q:
Q: .with Q,=[l¢,lRtl...l,Eth] (12) QF For the general case, transforming Q into a Q that com
plies to those constraints can not be done with a linear least—
squares technique. For the speciﬁc case of rigid scenes,
each subblock is equal to the ﬁrst 2 rows of a rotation matrix (Q, = R,). TomasiKanade [22] suggested a linear
approximation Schema to ﬁnd an A that enforces the sub blocks of Q to comply to rotation matrices. 3.4.1 Subblock factorization For the nonrrigid case, we previously proposed a second
factorization step on each subblock that transforms every Q, onto a Q,.that complies to the constraints (5) [7]. Q, can
be rewritten as: I r ‘ Qt = [ [IRr lKRr ]
_ [m llrz [1r3 [Kn [Km .lKrg
— [M4 [123 [[76 lKr4 lKrs lKr6 We reorder the elements of Q, into a new matrix Q,: liri lirz' lira llr4 lirs llr6
Q 1er hm hm 12m lzrs 12%
I /
[Kr] lxrg lKrg lKr4 lKrs lKr6l 1—496 Ba=A"1B (11) which shows that Q, is of rank 1 and can be factored into
the pose R, and conﬁguration weights 1, by SVD. After the second factorization step is applied to each of
the individual subblocks Q, a non—linear optimization over
the entire time sequence is performed to ﬁnd one invertible
matrix A that orthonormalizes all of the sub—blocks. The re
sult is that each sub—block is a scaled rotation matrix. In the
presence of noise and ambiguities, the second and higher
eigenvalues of many subblocks do not vanish. In those
cases, it results in bad rankl approximations, and bad esti
mates for R,. We therefore propose a second alternative in
the next section that overcomes this limitation. 3.4.2 Iterative Optimization Instead of local factorizations on the subblocks, we pro—
pose a new iterative technique that solves (5) directly. Many nonrigid objects have avdominant rigid compo
nent and we take advantage of this to get an initial estimate
for all pose matrices (R1, , RF). Given an initial guess of
the pose at each time frame, we can solve for the conﬁgura—
tion weights and the basis shapes. To initialize the pose, we factor W into a‘ZF x 3 rigid
pose matrix Qng and a 3 x P matrix ﬁrig (as originally done
by TomasiKanade). As usual, we transform Qng into a ma
trix Qng, whose subblocks have all weakperspective rota
tion matrices (as outlined in section_3.4.l). Using Qrig as an initial guess for the pose of the non
rigid shape, we solve for the nonrigid l” and B terms in
(5), We do this iteratively by ﬁrst initializing I,” randomly and then iterating between solving for B, then for 1,1,, and then reﬁning R, againl. 1. Given all R, and In; terms (the Q matrix), equation (5)
can be used to ﬁnd the linear leastsquare—ﬁt of B. 2., Given B and all R,, we can solve for all 1,), with linear
leastsquares. 3. Given B and L, we can rewrite (5) to: m = R, Elnksk (13)
k Solving for all R, such that they ﬁt this equation and
remain rotation matrices can be done by parameter—
izing R, with exponential coordinates. A full rotation ‘Alternatively we can use the sub—block factorization described in sec
tion 3.4.l for initialization matrix can be described by 3 variables [(ux, my, (02] as: 0 —(nZ (n).
R(w) = exp (oz 0 —mx (14)
~03}. to,t 0 Assume 6) is the estimate of R, at the previous itera tion, we can then linearize (13) around the previous
estimate to: ~ 1 *w’ (o’. _
w,=[mc lz ~&;]R(w)2ktl,yksk (15) and solve for a new 0). We then update R(u)) :
R(tu’)R(6)) and iterate2 We iterate all 3 steps until convergence. Similar to the technique described in section 3.3 we can
easily handle missing entries in W when points are occluded
or are lost by the tracker. B and L are overconstrained, so we leave out the missing data points and solve the linear ﬁt
as before. 3.4.3 MultiView Input Another extension of this factorization technique is the in—
corporation of multi—view inputs from M cameras.
This enlarges the input matrix W to size 2F M x P. W1 Wm W . .
W 2 2 7w, ~_ Wz,2 ’WW 2 [ um um ] 116,1...VL1p WF WM (16) As before, we assume that W”. can be described by a 2 X 3 pose matrix RM, by k deformation coefﬁcients 1,,1,l,,2,...l,,k, and a 3K X P keyshape matrix B. Assum ing the cameras are synchronized, an additional constraint for the multiView case is that all M views share the same
deformation coefﬁcients for a particular time frame 1?: w, = [1,,1 R,1,2 ‘R,.'..1,,K R,] B (17)
Rel R: = R1,: (18)
Km Similar to our previous 2step factorization, we can fac—
tor W into Q and B complying to this new structure. Fur
thermore we can enforce another constraint if we assume
that all M cameras remain ﬁxed relative to each other: The
relative rotation between all Rw’s in the R, subblock of Q
is constant over time. This is enforced with a nonlinear it—
erative optimization after the 2—step factorization. 2A future extension of this algorithm will deal with an iterative version
for true perspective models. However, we like to point out, that for the
orthographic case, there exist also several closed—form solutions including
Hom‘s technique [12, 13]. and a SVD based method proposed by Ruder—
man [20] that we will include in an extendet technical report. 1—497 3.4.4 Shape Regularization If there is not enough outof—plane rotation, the Z values of
B can be ill~conditioned. For instance, a small nonrigid de
formation in X and Y can also be explained by a small out—
of~imageplane rigid rotation of a shape with large Z values.
Another problem area is that if the lowlevel features have
almost no image texture, the corresponding 3D point in the
shape matrix B is not deﬁned. We can overcome these problems by regularizing the
shape matrix B during the iterative optimization. A sim—
ple term can be added to the leastsquares—ﬁt of B in section
3.4.2, but also to the least—squaresﬁt of B in section 3.2. If
point location i and j are neighbors (we can determine that
with a local nearest neighbor algorithm or Delaunay trian
gulation), we add the following term for each neighbor pair
$1.24!)” — bay)? This pulls ambiguous points closer to the
average of their neighbors. It is similar to the smoothness
terms in snake—based contour tracking optimizations [15].
otij can be inversely proportional to the distance between
the 2 points z' and j. The global least—square—ﬁt remains lin
ear, since we only need to add additional linear equations
of the form (1,717,), — 0L;ij :— 0 to the least square system in
section 3.2 and 3.4.2. 4 Experimental Results
4.1 RankConstrained Tracking We tested the rank—constraint technique for optical ﬂow
estimation on a 500 frame long video sequence of a deform
ing shoe (Figure 1). The recordings are challenging due
to changes in the object appearance that are caused by the
large rotations and deformations as well as by variations in
illumination. In our examples a set of 30 reliable features were initially
tracked using an implementation of the technique of Lucas—
Kanade employing afﬁne transformations for the patches
centered at these points. We updated the reference patch
of each point every 10 frames in order to accommodate the
changes in feature appearance of our long sequence. We as—
sumed our multiframe approach capable of recovering the
possible drifting introduced on some of the tracks by the
frequent update of the template for the points. 80 additional
features were then selected along lD edges in the reference
frame. We produced a ﬁrst approximate initialization of
their displacements by linear extrapolation from the motion
of the reliable points. We used the resulting W matrix as
initialization for our tracking technique based on rank con
straints. We experimented with different values for the rank
and achieved the best solution by setting rank r = 9. We
employed the classic pyramidal approach in smoothing the
images and ran several iterations of the multiframe method. Figure 1. Example tracks of the shoe se
quence. The blue circles are reliable points,
the red crosses are features with 1D texture that have been recovered uslng rank con
straints. We could track robustly and very accurately most of the l 10
points throughout the whole sequence. In order to correct
drifting of some of the edge features we incorporated the
additional regularization described in section 3.4.4. Figure
I shows the features tracked for several frames. In the last part of the image sequence many of the dis—
tinctive points that we have used to derive the optical ﬂow
ﬁeld are progressively occluded while some disappear and
then become visible again with the variations in the motion.
These difﬁcult frames were used to demonstrate the abile
ity of our solution to cope with lost features and occlusion.
For this experiment we manually labeled in each frame fea—
tures that were not visible, incorrectly tracked or lost by the
Lucas—Kanade initialization and then used the approach de
scribed in section 3.3 to recover their position. Figure 2
shows the estimated position of some of these features be
fore, during and after their temporary occlusion. The algo—
rithm is successful in reconstructing their complete trajec— tories. Future experiments will also use an automatic track
termination test. 4.2 3D Reconstruction Given the estimated Q and B of those 500 tracked
monocular image frames, we then applied our reconstruc—
tion technique described in section 3.4.2. Figure 3 shows
some example frames and the reconstructed nonrigid 3D
shapes overlayed. See http://movement.stanford.edu/nonrig
for mpeg or quicktime video showing the entire sequence
reconstructed. We also applied the new reconstruction technique to
a video recording of a deforming human torso (Fig—
ure 4). The included video 841_tyab.mpg (also at 1498 Figure 2. The black rectangular markers are
predicted locations of disappearing features.
The algorithm can recover the complete tra
jectories of the temporarily occluded points. Figure 3. 3D reconstruction of carresponding
20 tracks from monocular video sequence.
Please check video to see all details. http://movernent.stanford.edu/nonrig) shows the deforming
shapes in 3D. As you can tell, again the pose changes and
the shape deformations were recovered successfully. We applied this technique and the previously reported
solution [7] to several other video recordings, including a
giraffe recording and human face tracking. All those re
constructions recovered the pose and shape deformations Figure 4. 3D reconstruction of corresponding
ZD tracks from monocular video sequence.
Please check video to see all details. nicely. Since we do not have ground—truth data available, we
can only speculate how good the quantitative performance
is. Therefore we also tested this technique on several arti
ﬁcial datasets with ground truth and multiview recordings
with 2 calibrated cameras that allowed us to get a good es
timate of the ground truth by using triangulation. 4.3 Performance on Artiﬁcial Data The artiﬁcial data sets were based on random ba
sis shapes, artiﬁcial superquadrics, 3D data from 2—view
recordings of a deforming face, and 2view recordings of
the shoe sequence. All error reported here are computed in
percentage points: the average distance of the reconstructed
point to the correct point divided by the size of the shape. The random basis shapes were generated by sampling
points uniformly inside a unit cube. The ﬁrst basis shape
was given the largest weight so that the overall shape has a
strong rigid component. Artiﬁcial data were generated from
5, 10, and 20 random basis shapes rotating over 300 frames.
The overall maximum rotation in any axis was 90 degrees
and gaussian noise was added to the ﬁnal tracking matrix
W. The results of running the iterative optimization are
shown in ﬁgure 5(a,b). Both 3D error and 2 error decrease
as the number of basis shapes (K) used in the optimization
increase. For the data generated from 5 basis shapes, the 3D
error and 2 error level off after K = 5. This makes sense be
cause only 5 basis shapes are required to describe the data
and the iterative optimization ﬁnds the 5 basis shapes. The superquadric data were generated with three of the
octants deforming independently. The same rotation from
the random basis shapes was applied to it to generate 300 (a) x random basis shapes
7 or b avg percent error in SD
avg percent error in z .0
a .0
\l
r‘ — 3D error with coal
+ 2 error with 0ch a
as avg percent error
3
53 avg percent error
h P
w 1 P
a 0.8 Figure 5. Iterative optimization on monocular
artificial data. Plots show how SD and 2 er
rorsvary as the number of basis shapes (K)
used in the iterative optimization is increased.
(a,b) random basis shapes, (a) shows 3D er
ror and (b) shows 2 error. x is the number of
basis shapes used to generate the data. (c)
superquadric. (d) deforming face. frames for the tracking matrix W. The average 3D and 2
errors are plotted in ﬁgure 5(c). The 3D face data were taken from a stereo reconstruc—
tion of a deforming face. The same rotation was applied to
this to generate the 300 frames for the tracking matrix W.
The average 3D and 2 errors are plotted in ﬁgure 5(d). The
iterative optimization was also tested with occlusion. Those
points on the face which should be occluded in a certain
pose were labelled as such and not included in W. As a re—
sult, 15% of the points in W were removed. The average
3D and 2 errors for the reconstruction with occlusion are
also plotted in ﬁgure 5(d). 4.4 Performance on Real Data Table 1 shows the monocular view reconstruction errors
for the shoe sequence. A second camera was used for tri—
angulation to get the ground truth so that we could compare
our monocular reconstruction. This reconstruction is the
most challenging task, since it tests the entire system from
video input to 3D output. In the artiﬁcial experiments we
ran the algorithms for each K 30 times and reported the me
dian error, for the shoe recording we only ran it once for
each K. This explains the larger random ﬂuctuations in the I—499 (b) x random basis shapes
4 10 K1213 4ISI617l8 ml l
I III
I Table 1. Top: 30 reconstruction performance
on monocular shoe sequence. Bottom: 3D
reconstruction performance on 2view face
sequence. errors. Overall the small reconstruction errors tell us that  this technique is indeed able to accurately recover nonrigid
deformations from monocular image sequences. We also ran the multiview reconstruction on the face
data, and achieved again reasonable error rates as can be
seen in Table 1. 5 Discussion We have shown how to exploit lowrank constraints for
lowlevel tracking,» for prediction of missing lowlevel fea—
tures and for nonérigid reconstruction. We have demon
strated those techniques on several video sequences and
have shown that good 3D nonrigid reconstructions can be
achieved. We'further quantiﬁed the performance of single—
view reconstructions with the use of additional calibrated
views and simulations of artiﬁcial data. We have not yet addressed the issue of discovering how 7 many basis shapes are needed. One possible solution is that
the user deﬁnes an upper treshold on how much reprojec~
tion error is allowed. K is increased until the error is below
threshold. Another interesting aspect that is currently under inves
tigation is the bias of this technique. In many cases 3D rota
tion can be compensated with some degrees of freedom of
the basis shape set. Despite this ambiguity, our technique
has a strong bias towards representing as much as possible
with the rotation matrix, but we like to further study this
ambiguities. Reconstructin g nonrigid models from singleview video
recordings has many potential applications. For example,
we intend to apply this technique to our imagebased facial
and fullbody animation system and to a model based track
ing system. Acknowledgements We would like to thank James Davis, Erika Chuang, and
Ajit Chaudhari for providing advice and data. This research I—500 was funded in part by the National Science Foundation and
the Stanford BIOX program. References [I] B. Bascle and A. Blake. Separability of pose and expression
in facial tracking and animation. In ICC V, 1998.
[2] J. Bergen, P. Anandan, K. Hanna, and R. Hingorani. Hierar— chical modelbased motion estimation. In ECCV, 1992.
[3] M. Black and Y. Yacoob. Tracking and recognizing rigid and nonrigid facial motions using local parametric models of im
age motion. In ICCV, 1995. [4] M. Black, Y.Yacoob, A.D.Jepson, and D.J.F1eet. Learning
parameterized models of image motion. In CVPR, 1997. [5] A. Blake, M. Isard, and D. Reynard. Learning to track the
visual motion of contours. In J. Artiﬁcial Intelligence, 1995. [6] V. Blanz and T. Vetter. A morphable model for the synthesis
of 3d faces. In SIGGRAPH, 1999. [7] C. Bregler, A. Hertzmann. and H. Biermann. Recovering
NonRigid 3D Shape from Image Streams. In CVPR, 2000. [8] J. Costeira and T. Kanade. A multibody factorization method
for motion analysis. Int. J. of Computer Vision, pages 159
180, Sep 1998. [9] D. DeCarlo and D. Metaxas. Deformable modelbased shape
and motion analysis from images using motion residual error.
In ICCV, 1998. [10] S. Gokturk, J. Bouget, and R, Graeszocuk. A datadriven
model for monocular face tracking. In ICCV, 2001. [11] B. Guenter, C. Grimm, D. Wood, H. Malvar, and F. Pighin.
Making faces. [n SIGGRAPH, 1998. [12] B. K. P. Horn. Closed—form solution of absolute orienta—
tion using unit quatemions. Journal of the Optical Society
ofAmerica, 4(4), 1987. ‘ [13] B. K. P. Horn, H. M. Hildne, and S. Negahdaripour. Closed
form solution of absolute orientation using orthonormal ma—
trices. Journal of the Optical Society of America A, 5(7),
1938. [14] M. Irani. Multiframe optical ﬂow estimation using subspace
constraints. In ICCV, 1999. [15] M. Kass, A. Witkin, and D. Terzopoulus. Snakes: Active
contour models. Int. J. of Computer Vision, 1(4):321—33l,
1987. t [16] A. Lanitis, T. (2.1., C. TE, and A. '1‘. Automatic interpreta—
tion of human faces and hand gestures using ﬂexible models.
In International Workshop on Automatic Face and Gesture.
Recognition, 1995. [17] B. Lucas and T. Kanade. An iterative image registration tech
nique with an application to stereo vision. Proc. 7th Int. Joint
Conf. on Arttf Intell., 1981. [18] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H.
Salesin. Synthesizing realistic facial expressions from pho
tographs. 1n SIGGRAPH, 1998. [19] F. Pighin, D. H. Salesin, and R. Szeliski. Resynthesizing
facial animation through 3d modelbased tracking. In [CC V,
1999. [20] D. L. Ruderman. Private communication. [21] J . Shi and C. Tomasi. Good features to track. In C VPR , 1994. [22] C. Tomasi and T. Kanade. Shape and motion from image
streams under orthography: a factorization method. Int. J. of
Computer Vision, 9(2): 137—154, 1992. ...
View
Full
Document
This note was uploaded on 06/13/2011 for the course CAP 6412 taught by Professor Staff during the Spring '08 term at University of Central Florida.
 Spring '08
 Staff

Click to edit the document details