{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

bregler2001 - Tracking and Modeling Non-Rigid Objects with...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 8
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Tracking and Modeling Non-Rigid Objects with Rank Constraints Lorenzo Torresani’r Danny B. Yangl l {ltorresa, dbyang, bregler} @cs.stanford.edu Computer Science Department Stanford University, Stanford, CA 94305 Abstract This paper presents a novel solution for flaw-based tracking and 3D reconstruction of deforming objects in monocular image sequences. A non-rigid 3D object under- going rotation and deformation can be effectively approxi- mated using a linear combination of 3D basis shapes. This puts a bound on the rank of the tracking matrix. The rank constraint is used to achieve robust and precise low-level optical flow estimation without prior knowledge of the 3D shape of the object. The bound on the rank is also ex- ploited to handle occlusion at the tracking level leading to the possibility of recovering the complete trajectories of occluded/disoccluded points. Following the same low- rank principle, the resulting flow matrix can be factored to get the 3D pose, configuration coefiicients, and 3D basis shapes. The flow matrix is factored in an iterative manner; looping between solving for pose, configuration. and basis shapes. The flow-based tracking is applied to several video sequences and provides the input to the 3D non-rigid re- construction task. Additional results on synthetic data and comparisons to ground truth complete the experiments. 1. introduction This paper addresses the problem of 3D tracking and model acquisition of non—rigid motion in video sequences. We are specifically concerned with human motion, which is a challenging domain. Standard. low-level tracking schemes usually fail due to local ambiguities and noise. Most re- cent approaches overcome this problem with the use of a model. t In those techniques optical flow vectors or the mo- tion of feature locations can be constrained by a low degree- of—frecdom parametric model. For instance, to track joint— angles of human limb segments an approximate kinematic chain model can be used. The models lose many details that cannot be recovered by simple cylinder or sphere shape models and fixed axis rotations. Non-rigid torso motions, deforming shoe motions, or subtle facial skin motions are 0-7695-1272—0/01 $10.00 © 2001 IEEE 1—493 Eugene J. Alexander1E Christoph Breglerl [email protected] Mechanical Engineering Department Stanford University, Stanford, CA 94305 problem areas. Alternatively, such non—rigid motions can be captured with basis-shape models that are learned from example data. Most of the previous work is based on PCA techniques applied to 2D or 3D training data. For exam— ple, human face deformations have been tracked in 2D and 3D with such models. For 3D domains, prior models are aquired using stereo cameras or cyber—scan hardware. Care— fully labeled data have to be provided to derive the PCA based models. We are interested in cases where no such 3D models are available, or existing models are too restricted and would not be able to recover all subtleties. The input to our tech— nique is a single-view video reCording of an arbitrary de- forming object, and the output is the 3D motion AND a 3D shape model parameterized by its modes of non-rigid defor- mation. ' We are facing three very challenging problems: 1. Without a model, how can we reliably track ambigu- ous and noisy local features in this domain? 2. Without point feature tracks or robust optical flow, how can we derive a model? 3. Given reliable 2D tracks, how can we recover 3D nonrigid motion and shape structure? We have previously demonstrated that single—view 2D point tracks are enough to recover 3D non-rigid motion and structure by exploiting low-rank constraints [7]. Based on the same assumption, we show in this paper that it is also possible to constrain the low—level flow estimation and to handle occlusion without any model—assumption. Irani [14] has demonstrated that model-free low-rank constraints can be applied to overcome local ambiguities in flow—estimation for rigid scenes. We show that this can be extended to 3D non—rigid tracking and model-acquistion. Our new tech- niques do not need 2D point tracks, can deal with ambigous and noisy local features, and can handle occlusion. By ex- ploiting the low-rank constraints in low-level tracking and in 3D non—rigid model acquisition we are able to solve all three challenges mentioned above in one unified manner. We demostrate the technique on tracking several video se- quences and on deriving 3D deformable models from those measurements. 2 Previous Work Many non—rigid tracking solutions have been proposed previously. As mentioned earlier, most techniques use an a—priori model. Examples are [16, 5, 9, 19, 3, 4]. Most of these techniques model 2D non—rigid motion, but some of these approaches also recover 3D pose and deformations based on a 3D model. The 3D model is obtained from 3D scanning devices [6], stereo cameras [10], or multi—view reconstruction [18, 11]. The multi-view reconstruction is based on the assumption that for a specific deformed con- figuration all views are sampled at the same time. This is equivalent to the structure from motion problem, that as- sumes rigidity between the different views [22]. Extensions have been proposed, such as the multi-body factorization method of _Coseira and Kanade [8] that relaxes the rigid- ity constraint. In this method, K independently moving ob- jects are allowed, which results in a tracking matrix of rank 3K and a permutation algorithm that identifies the subma— trix corresponding to each object. More recently, Bascle and Blake [l] proposed a method for factoring facial ex— pressions and pose during tracking. Although it exploits the bilinearity of 3D pose and nonrigid object configuration, it requires again a set of basis images selected before factor— ization is performed. The discovery of these basis images is not part of their algorithm. In addition, most techniques treat low-level tracking and 3D structural constraints independently. In the following section we describe how we can track and reconstruct non- rigid motions from single views without prior models. 3 Technical Approach The central theme in this paper is the exploitation of rank—hounds for recovering 3D non-rigid motion. We first describe in general why and under what circumstances 3D non-rigid motion puts rank bounds on 2D image motion (section 3. l ). We then detail how these b0unds can be used to constrain low—level tracking in a model-free fashion (sec- tion 3.2). We then describe how this technique can also be used for prediction of occluded features (section 3.3), and we then introduce three techniques that are able to recon- struct’3D deformable shapes and their motion from those 2D measurements (section 3.4. l, 3.4.2, and 3.4.3). 3.1 Low-rank constraints for non-rigid motion Given a sequence of F video frames, the optical flow of P pixels can be coded into two F x P matrices, U and V. [—494 Each row of U holds all x—displacements of all P locations for a specific time frame, and each row of V holds all y- displacements for a specific time frame. It has been shown that if U and v describe a 3D rigid motion, the rank of [9,] has an upper bound, which depends on the assumed camera model (for example, for an orthographic camera model the rank is r S 4, while for a perspective camera model the rank is r 5 8) [22, 14]. This rank constraint derives from the fact that [g] can be factored into two matrices: Q x S. Q”” describes the relative pose between camera and object for each time frame, and 8'” describes the 3D structure of the scene which is invariant to camera and object motion. Previously we have shown that non—rigid object motion can also be factored into 2 matrices [7] but of rank r that is higher than the bounds for the rigid case. Assuming the 3D non-rigid motion can be approximated by a set of K modes of variatidn, the 3D shape of a specific object configuration can be expressed as a linear combination of K basis—shapes (51,52, ...Sk). Each basis-shape Si is a 3 x P matrix describ— ing P points. The shape of a specific configuration is a linear combination of this basis set: X 3:24-33 , (:1 Assuming weak-perspective projection, at a specific time frame I the P points of a configuration S are projected onto 2D image points (uni, v,,,-): um 141,? :R_ +T (2) vr,l W.” t i=1 m l ( Rt=[r1 r2 r3] (3) r4 r5 76 5,3. enema, an (1) where R, contains the first two rows of the full 3D cam- era rotation matrix, and 7} is the camera translation. The weak perspective scaling (f/ngg) of the projection is im— plicitly coded in 1,,1,...l,,K. As in [22], we can eliminate 7} by subtracting the mean of all 2D points, and henceforth can assume that S is centered at the origin. Weak perspective projection is in practice a good approx- imation if the perspective effects between the closest and furthest point on the object surface are small. Extending this framework to full-perspective projection is straight—forward using an iterative extension. All experiments reported here assume weak perspective projection. We can rewrite the linear combination in (2) as a matrix multiplication: 51 unl “if =[ l [R leR 52 v“ V1,}? r’ t t‘ I SK (4) We stack all point tracks from time frame 1 to F into one large measurement 2F x P matrix W. Using (4) we can write: ZURr l1=KR1 S1 W ___ 12,1R2 12,KR2 . $2 (5) [F.lRF [EKRF SK Q 3 Since Q is 21 2F >< 3K matrix and B is a 3K x P matrix, in the noise free case W has a rank r S 3K. In the following sections we describe how this rank bound on W can be exploited for l) constrained low-level tracking 2) recovery of occluded feature locations 3) 3D reconstruction of pose, non-rigid deformations, and key- shapes. 3.2 Basis Flow The previous analysis tells us why W is rank bounded and how W can be factored. In this section we discuss how to derive the optical flow matrix W from an image sequence and how the rank—bound can be used to disambiguate the local flow. Features can usually be tracked reliably with local meth— ods, such as Lucas—Kanade [l7] and extensions [21, 2], if they contain a distinctive high contrast pattern with 2D tex— ture, such as corner features. For traditional rigid shape reconstruction, only a few feature locations are necessary. Non-rigid objects go through much more severe motion variations, hence many more features need to be tracked. In the extreme case it might be desirable to track every pixel location. Unfortunately, many objects that we are interested in, including the human body, do not have many of those very reliable features. V Our solution to the tracking dilemma builds on a tech— nique introduced in [14] that exploits rank constraints for optical flow estimation in the case of rigid motion. Since W is assumed to have rank r, all P columns of W can be modeled as a linear combination of r “basis-tracks”, Q. The basis is not uniquely defined, but if there are more than r points whose trajectories over the F frames can be reliably estimated, then we can compute with SVD the first r eigenvectors Q of the reduced tracking matrix Wrelmble. Q25” is an initial estimate of the basis for all P tracks. Our next task is to estimate all P tracks (the entire W) using this eigenbase Q and additional local image constraints. As in the original Lucas-Kanade tracking, we assume that a small image—patch centered at a track—point location will not change its appearance drastically between two con— secutive frames. Therefore the local patch flow [u,v] can be computed by solving the following well known equation [17]: " 1—495 c d [ulil’vtif’b I: d e ] :i Thi d 212 211 . where c = x ” th d — [d e] [21ny 21; ]IS esecon mo ment matrix of the local image patch in the first frame, g = 21x1“ and h = Zlylt. (for further details see [17, 21, 2]). If all F X P flow—vectors across the entire image sequence are coded relative to one single image template, the follow- ing equation system can be written [14]: WM [,3 :3 l = [01H] (7) where C, D, E are diagonal P X P matrices that contain the corresponding c, d, and e values for each of the P local image patches. Accordingly, G and H are F X P matrices, that contain the g and h values for all P local patches across all F time frames. This system of equations is a rewriting of the Lucas-Kanade linearization for every flow vector, with no additional constraints yet applied. The number of free variables is equal to the number of constraints. If a local patch has no 2D texture, the single equation describing its motion in the system will only provide an accurate estimate of its normal flow (aperture problem). Now we split Q into Qu that contains all even rows of Q, and Qv that contains all odd rows of Since Q is a basis for W, there must exist some r X P matrix 3 for which the following equations hold: Qu-B=U Qv-é=v (8) Using (7) we can write [14]: [eras-a- [ g g ] =[GIH1 (9) This is a system with r X P unknowns (the entries in 3‘) and 2F x P equations. For long tracks (F > > P) the system is very over-constrained (in contrast to (7)). We can exploit this redundancy to derive the optical flow for points difficult to track and for features along lD edges. Since [GIH] is computed based on the Lucas-Kanade lin— earization, the resulting flow [U IV] = [Qu - 1?le - 3] will only be a first approximation. We rewarp all images of the sequence using the new flow and then iterate equation (9). 3.3 Dealing with Occlusion By reordering the elements of B into a r - P-dimensional vector b. equation (9) can be rewritten in the form: LZPerP_8rP><l :mZPNXl where now each row describes one point in one particular frame. If we have occlusion, or the tracker used for initial— ization has lost some points at certain time frames, then the corresponding entries in the m vector will not be measur- able. We eliminate those rows from the L matrix and the m vector. If the number of missing points is not overly large, we are still left with an overconstrained system that can give us an accurate solution for I}. As long as the disappearing features are visible in enough frames, the product Q - 3 pro- vides also a good prediction of the displacements for the missing points 3.4 3D Reconstruction As mentioned earlier, the factorization of W into Q and B is not unique. Any invertible r x r matrixA applied to Q and B in the following way leads to an alternative factorization: Qu=Q-A, Q“ and Ba multiplied together approximate W with the same sum-of-squared error as Q and B.’ Using SVD. we compute a Q (with orthonormal columns) and E. In general Q will not comply to the struc- ture we described in (5); Q: Q: .with Q,=[l¢,lRtl...l,Eth] (12) QF For the general case, transforming Q into a Q that com- plies to those constraints can not be done with a linear least— squares technique. For the specific case of rigid scenes, each sub-block is equal to the first 2 rows of a rotation matrix (Q, = R,). Tomasi-Kanade [22] suggested a linear approximation Schema to find an A that enforces the sub- blocks of Q to comply to rotation matrices. 3.4.1 Sub-block factorization For the nonrrigid case, we previously proposed a second factorization step on each sub-block that transforms every Q, onto a Q,.that complies to the constraints (5) [7]. Q, can be rewritten as: I r ‘ Qt = [ [IRr lKRr ] _ [m llrz [1r3 [Kn [Km .lKrg — [M4 [123 [[76 lKr4 lKrs lKr6 We reorder the elements of Q, into a new matrix Q,: liri lirz' lira llr4 lirs llr6 Q- 1er hm hm 12m lzrs 12% I / [Kr] lxrg lKrg lKr4 lKrs lKr6l 1—496 Ba=A"1-B (11) which shows that Q, is of rank 1 and can be factored into the pose R, and configuration weights 1,- by SVD. After the second factorization step is applied to each of the individual sub-blocks Q, a non—linear optimization over the entire time sequence is performed to find one invertible matrix A that orthonormalizes all of the sub—blocks. The re- sult is that each sub—block is a scaled rotation matrix. In the presence of noise and ambiguities, the second and higher eigen-values of many sub-blocks do not vanish. In those cases, it results in bad rank-l approximations, and bad esti- mates for R,. We therefore propose a second alternative in the next section that overcomes this limitation. 3.4.2 Iterative Optimization Instead of local factorizations on the sub-blocks, we pro— pose a new iterative technique that solves (5) directly. Many non-rigid objects have avdominant rigid compo- nent and we take advantage of this to get an initial estimate for all pose matrices (R1, , RF). Given an initial guess of the pose at each time frame, we can solve for the configura— tion weights and the basis shapes. To initialize the pose, we factor W into a‘ZF x 3 rigid pose matrix Qn-g and a 3 x P matrix firig (as originally done by Tomasi-Kanade). As usual, we transform Qn-g into a ma- trix Qn-g, whose sub-blocks have all weak-perspective rota- tion matrices (as outlined in section_3.4.l). Using Qrig as an initial guess for the pose of the non- rigid shape, we solve for the non-rigid l”- and B terms in (5), We do this iteratively by first initializing I,”- randomly and then iterating between solving for B, then for 1,1,, and then refining R, againl. 1. Given all R, and In; terms (the Q matrix), equation (5) can be used to find the linear least-square—fit of B. 2., Given B and all R,, we can solve for all 1,),- with linear least-squares. 3. Given B and L, we can rewrite (5) to: m = R, Elnksk (13) k Solving for all R, such that they fit this equation and remain rotation matrices can be done by parameter— izing R, with exponential coordinates. A full rotation ‘Alternatively we can use the sub—block factorization described in sec- tion 3.4.l for initialization matrix can be described by 3 variables [(ux, my, (02] as: 0 —(nZ (n). R(w) = exp (oz 0 —mx (14) ~03}. to,t 0 Assume 6) is the estimate of R, at the previous itera- tion, we can then linearize (13) around the previous estimate to: ~ 1 *w’ (o’. _ w,=[mc lz ~&;]R(w)2ktl,yksk (15) and solve for a new 0). We then update R(u)) : R(tu’)R(6)) and iterate2 We iterate all 3 steps until convergence. Similar to the technique described in section 3.3 we can easily handle missing entries in W when points are occluded or are lost by the tracker. B and L are over-constrained, so we leave out the missing data points and solve the linear fit as before. 3.4.3 Multi-View Input Another extension of this factorization technique is the in— corporation of multi—view inputs from M cameras. This enlarges the input matrix W to size 2F M x P. W1 Wm W . . W 2 2 7w, ~_ Wz,2 ’WW 2 [ um um ] 116,1...VL-1p WF WM (16) As before, we assume that W”. can be described by a 2 X 3 pose matrix RM, by k deformation coefficients 1,,1,l,,2,...l,,k, and a 3K X P key-shape matrix B. Assum- ing the cameras are synchronized, an additional constraint for the multi-View case is that all M views share the same deformation coefficients for a particular time frame 1?: w, = [1,,1 -R,|1,2 ‘R,|.'..|1,,K -R,] -B (17) Rel R: = R1,: (18) Km Similar to our previous 2-step factorization, we can fac— tor W into Q and B complying to this new structure. Fur- thermore we can enforce another constraint if we assume that all M cameras remain fixed relative to each other: The relative rotation between all Rw’s in the R, sub-block of Q is constant over time. This is enforced with a nonlinear it— erative optimization after the 2—step factorization. 2A future extension of this algorithm will deal with an iterative version for true perspective models. However, we like to point out, that for the orthographic case, there exist also several closed—form solutions including Hom‘s technique [12, 13]. and a SVD based method proposed by Ruder— man [20] that we will include in an extendet technical report. 1—497 3.4.4 Shape Regularization If there is not enough out-of—plane rotation, the Z values of B can be ill~conditioned. For instance, a small non-rigid de- formation in X and Y can also be explained by a small out— of~image-plane rigid rotation of a shape with...
View Full Document

{[ snackBarMessage ]}