Bregler2001 - Tracking and Modeling Non-Rigid Objects with Rank Constraints Lorenzo Torresani’r Danny B Yangl l{ltorresa dbyang

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Tracking and Modeling Non-Rigid Objects with Rank Constraints Lorenzo Torresani’r Danny B. Yangl l {ltorresa, dbyang, bregler} @cs.stanford.edu Computer Science Department Stanford University, Stanford, CA 94305 Abstract This paper presents a novel solution for flaw-based tracking and 3D reconstruction of deforming objects in monocular image sequences. A non-rigid 3D object under- going rotation and deformation can be effectively approxi- mated using a linear combination of 3D basis shapes. This puts a bound on the rank of the tracking matrix. The rank constraint is used to achieve robust and precise low-level optical flow estimation without prior knowledge of the 3D shape of the object. The bound on the rank is also ex- ploited to handle occlusion at the tracking level leading to the possibility of recovering the complete trajectories of occluded/disoccluded points. Following the same low- rank principle, the resulting flow matrix can be factored to get the 3D pose, configuration coefiicients, and 3D basis shapes. The flow matrix is factored in an iterative manner; looping between solving for pose, configuration. and basis shapes. The flow-based tracking is applied to several video sequences and provides the input to the 3D non-rigid re- construction task. Additional results on synthetic data and comparisons to ground truth complete the experiments. 1. introduction This paper addresses the problem of 3D tracking and model acquisition of non—rigid motion in video sequences. We are specifically concerned with human motion, which is a challenging domain. Standard. low-level tracking schemes usually fail due to local ambiguities and noise. Most re- cent approaches overcome this problem with the use of a model. t In those techniques optical flow vectors or the mo- tion of feature locations can be constrained by a low degree- of—frecdom parametric model. For instance, to track joint— angles of human limb segments an approximate kinematic chain model can be used. The models lose many details that cannot be recovered by simple cylinder or sphere shape models and fixed axis rotations. Non-rigid torso motions, deforming shoe motions, or subtle facial skin motions are 0-7695-1272—0/01 $10.00 © 2001 IEEE 1—493 Eugene J. Alexander1E Christoph Breglerl [email protected] Mechanical Engineering Department Stanford University, Stanford, CA 94305 problem areas. Alternatively, such non—rigid motions can be captured with basis-shape models that are learned from example data. Most of the previous work is based on PCA techniques applied to 2D or 3D training data. For exam— ple, human face deformations have been tracked in 2D and 3D with such models. For 3D domains, prior models are aquired using stereo cameras or cyber—scan hardware. Care— fully labeled data have to be provided to derive the PCA based models. We are interested in cases where no such 3D models are available, or existing models are too restricted and would not be able to recover all subtleties. The input to our tech— nique is a single-view video reCording of an arbitrary de- forming object, and the output is the 3D motion AND a 3D shape model parameterized by its modes of non-rigid defor- mation. ' We are facing three very challenging problems: 1. Without a model, how can we reliably track ambigu- ous and noisy local features in this domain? 2. Without point feature tracks or robust optical flow, how can we derive a model? 3. Given reliable 2D tracks, how can we recover 3D nonrigid motion and shape structure? We have previously demonstrated that single—view 2D point tracks are enough to recover 3D non-rigid motion and structure by exploiting low-rank constraints [7]. Based on the same assumption, we show in this paper that it is also possible to constrain the low—level flow estimation and to handle occlusion without any model—assumption. Irani [14] has demonstrated that model-free low-rank constraints can be applied to overcome local ambiguities in flow—estimation for rigid scenes. We show that this can be extended to 3D non—rigid tracking and model-acquistion. Our new tech- niques do not need 2D point tracks, can deal with ambigous and noisy local features, and can handle occlusion. By ex- ploiting the low-rank constraints in low-level tracking and in 3D non—rigid model acquisition we are able to solve all three challenges mentioned above in one unified manner. We demostrate the technique on tracking several video se- quences and on deriving 3D deformable models from those measurements. 2 Previous Work Many non—rigid tracking solutions have been proposed previously. As mentioned earlier, most techniques use an a—priori model. Examples are [16, 5, 9, 19, 3, 4]. Most of these techniques model 2D non—rigid motion, but some of these approaches also recover 3D pose and deformations based on a 3D model. The 3D model is obtained from 3D scanning devices [6], stereo cameras [10], or multi—view reconstruction [18, 11]. The multi-view reconstruction is based on the assumption that for a specific deformed con- figuration all views are sampled at the same time. This is equivalent to the structure from motion problem, that as- sumes rigidity between the different views [22]. Extensions have been proposed, such as the multi-body factorization method of _Coseira and Kanade [8] that relaxes the rigid- ity constraint. In this method, K independently moving ob- jects are allowed, which results in a tracking matrix of rank 3K and a permutation algorithm that identifies the subma— trix corresponding to each object. More recently, Bascle and Blake [l] proposed a method for factoring facial ex— pressions and pose during tracking. Although it exploits the bilinearity of 3D pose and nonrigid object configuration, it requires again a set of basis images selected before factor— ization is performed. The discovery of these basis images is not part of their algorithm. In addition, most techniques treat low-level tracking and 3D structural constraints independently. In the following section we describe how we can track and reconstruct non- rigid motions from single views without prior models. 3 Technical Approach The central theme in this paper is the exploitation of rank—hounds for recovering 3D non-rigid motion. We first describe in general why and under what circumstances 3D non-rigid motion puts rank bounds on 2D image motion (section 3. l ). We then detail how these b0unds can be used to constrain low—level tracking in a model-free fashion (sec- tion 3.2). We then describe how this technique can also be used for prediction of occluded features (section 3.3), and we then introduce three techniques that are able to recon- struct’3D deformable shapes and their motion from those 2D measurements (section 3.4. l, 3.4.2, and 3.4.3). 3.1 Low-rank constraints for non-rigid motion Given a sequence of F video frames, the optical flow of P pixels can be coded into two F x P matrices, U and V. [—494 Each row of U holds all x—displacements of all P locations for a specific time frame, and each row of V holds all y- displacements for a specific time frame. It has been shown that if U and v describe a 3D rigid motion, the rank of [9,] has an upper bound, which depends on the assumed camera model (for example, for an orthographic camera model the rank is r S 4, while for a perspective camera model the rank is r 5 8) [22, 14]. This rank constraint derives from the fact that [g] can be factored into two matrices: Q x S. Q”” describes the relative pose between camera and object for each time frame, and 8'” describes the 3D structure of the scene which is invariant to camera and object motion. Previously we have shown that non—rigid object motion can also be factored into 2 matrices [7] but of rank r that is higher than the bounds for the rigid case. Assuming the 3D non-rigid motion can be approximated by a set of K modes of variatidn, the 3D shape of a specific object configuration can be expressed as a linear combination of K basis—shapes (51,52, ...Sk). Each basis-shape Si is a 3 x P matrix describ— ing P points. The shape of a specific configuration is a linear combination of this basis set: X 3:24-33 , (:1 Assuming weak-perspective projection, at a specific time frame I the P points of a configuration S are projected onto 2D image points (uni, v,,,-): um 141,? :R_ +T (2) vr,l W.” t i=1 m l ( Rt=[r1 r2 r3] (3) r4 r5 76 5,3. enema, an (1) where R, contains the first two rows of the full 3D cam- era rotation matrix, and 7} is the camera translation. The weak perspective scaling (f/ngg) of the projection is im— plicitly coded in 1,,1,...l,,K. As in [22], we can eliminate 7} by subtracting the mean of all 2D points, and henceforth can assume that S is centered at the origin. Weak perspective projection is in practice a good approx- imation if the perspective effects between the closest and furthest point on the object surface are small. Extending this framework to full-perspective projection is straight—forward using an iterative extension. All experiments reported here assume weak perspective projection. We can rewrite the linear combination in (2) as a matrix multiplication: 51 unl “if =[ l [R leR 52 v“ V1,}? r’ t t‘ I SK (4) We stack all point tracks from time frame 1 to F into one large measurement 2F x P matrix W. Using (4) we can write: ZURr l1=KR1 S1 W ___ 12,1R2 12,KR2 . $2 (5) [F.lRF [EKRF SK Q 3 Since Q is 21 2F >< 3K matrix and B is a 3K x P matrix, in the noise free case W has a rank r S 3K. In the following sections we describe how this rank bound on W can be exploited for l) constrained low-level tracking 2) recovery of occluded feature locations 3) 3D reconstruction of pose, non-rigid deformations, and key- shapes. 3.2 Basis Flow The previous analysis tells us why W is rank bounded and how W can be factored. In this section we discuss how to derive the optical flow matrix W from an image sequence and how the rank—bound can be used to disambiguate the local flow. Features can usually be tracked reliably with local meth— ods, such as Lucas—Kanade [l7] and extensions [21, 2], if they contain a distinctive high contrast pattern with 2D tex— ture, such as corner features. For traditional rigid shape reconstruction, only a few feature locations are necessary. Non-rigid objects go through much more severe motion variations, hence many more features need to be tracked. In the extreme case it might be desirable to track every pixel location. Unfortunately, many objects that we are interested in, including the human body, do not have many of those very reliable features. V Our solution to the tracking dilemma builds on a tech— nique introduced in [14] that exploits rank constraints for optical flow estimation in the case of rigid motion. Since W is assumed to have rank r, all P columns of W can be modeled as a linear combination of r “basis-tracks”, Q. The basis is not uniquely defined, but if there are more than r points whose trajectories over the F frames can be reliably estimated, then we can compute with SVD the first r eigenvectors Q of the reduced tracking matrix Wrelmble. Q25” is an initial estimate of the basis for all P tracks. Our next task is to estimate all P tracks (the entire W) using this eigenbase Q and additional local image constraints. As in the original Lucas-Kanade tracking, we assume that a small image—patch centered at a track—point location will not change its appearance drastically between two con— secutive frames. Therefore the local patch flow [u,v] can be computed by solving the following well known equation [17]: " 1—495 c d [ulil’vtif’b I: d e ] :i Thi d 212 211 . where c = x ” th d — [d e] [21ny 21; ]IS esecon mo ment matrix of the local image patch in the first frame, g = 21x1“ and h = Zlylt. (for further details see [17, 21, 2]). If all F X P flow—vectors across the entire image sequence are coded relative to one single image template, the follow- ing equation system can be written [14]: WM [,3 :3 l = [01H] (7) where C, D, E are diagonal P X P matrices that contain the corresponding c, d, and e values for each of the P local image patches. Accordingly, G and H are F X P matrices, that contain the g and h values for all P local patches across all F time frames. This system of equations is a rewriting of the Lucas-Kanade linearization for every flow vector, with no additional constraints yet applied. The number of free variables is equal to the number of constraints. If a local patch has no 2D texture, the single equation describing its motion in the system will only provide an accurate estimate of its normal flow (aperture problem). Now we split Q into Qu that contains all even rows of Q, and Qv that contains all odd rows of Since Q is a basis for W, there must exist some r X P matrix 3 for which the following equations hold: Qu-B=U Qv-é=v (8) Using (7) we can write [14]: [eras-a- [ g g ] =[GIH1 (9) This is a system with r X P unknowns (the entries in 3‘) and 2F x P equations. For long tracks (F > > P) the system is very over-constrained (in contrast to (7)). We can exploit this redundancy to derive the optical flow for points difficult to track and for features along lD edges. Since [GIH] is computed based on the Lucas-Kanade lin— earization, the resulting flow [U IV] = [Qu - 1?le - 3] will only be a first approximation. We rewarp all images of the sequence using the new flow and then iterate equation (9). 3.3 Dealing with Occlusion By reordering the elements of B into a r - P-dimensional vector b. equation (9) can be rewritten in the form: LZPerP_8rP><l :mZPNXl where now each row describes one point in one particular frame. If we have occlusion, or the tracker used for initial— ization has lost some points at certain time frames, then the corresponding entries in the m vector will not be measur- able. We eliminate those rows from the L matrix and the m vector. If the number of missing points is not overly large, we are still left with an overconstrained system that can give us an accurate solution for I}. As long as the disappearing features are visible in enough frames, the product Q - 3 pro- vides also a good prediction of the displacements for the missing points 3.4 3D Reconstruction As mentioned earlier, the factorization of W into Q and B is not unique. Any invertible r x r matrixA applied to Q and B in the following way leads to an alternative factorization: Qu=Q-A, Q“ and Ba multiplied together approximate W with the same sum-of-squared error as Q and B.’ Using SVD. we compute a Q (with orthonormal columns) and E. In general Q will not comply to the struc- ture we described in (5); Q: Q: .with Q,=[l¢,lRtl...l,Eth] (12) QF For the general case, transforming Q into a Q that com- plies to those constraints can not be done with a linear least— squares technique. For the specific case of rigid scenes, each sub-block is equal to the first 2 rows of a rotation matrix (Q, = R,). Tomasi-Kanade [22] suggested a linear approximation Schema to find an A that enforces the sub- blocks of Q to comply to rotation matrices. 3.4.1 Sub-block factorization For the nonrrigid case, we previously proposed a second factorization step on each sub-block that transforms every Q, onto a Q,.that complies to the constraints (5) [7]. Q, can be rewritten as: I r ‘ Qt = [ [IRr lKRr ] _ [m llrz [1r3 [Kn [Km .lKrg — [M4 [123 [[76 lKr4 lKrs lKr6 We reorder the elements of Q, into a new matrix Q,: liri lirz' lira llr4 lirs llr6 Q- 1er hm hm 12m lzrs 12% I / [Kr] lxrg lKrg lKr4 lKrs lKr6l 1—496 Ba=A"1-B (11) which shows that Q, is of rank 1 and can be factored into the pose R, and configuration weights 1,- by SVD. After the second factorization step is applied to each of the individual sub-blocks Q, a non—linear optimization over the entire time sequence is performed to find one invertible matrix A that orthonormalizes all of the sub—blocks. The re- sult is that each sub—block is a scaled rotation matrix. In the presence of noise and ambiguities, the second and higher eigen-values of many sub-blocks do not vanish. In those cases, it results in bad rank-l approximations, and bad esti- mates for R,. We therefore propose a second alternative in the next section that overcomes this limitation. 3.4.2 Iterative Optimization Instead of local factorizations on the sub-blocks, we pro— pose a new iterative technique that solves (5) directly. Many non-rigid objects have avdominant rigid compo- nent and we take advantage of this to get an initial estimate for all pose matrices (R1, , RF). Given an initial guess of the pose at each time frame, we can solve for the configura— tion weights and the basis shapes. To initialize the pose, we factor W into a‘ZF x 3 rigid pose matrix Qn-g and a 3 x P matrix firig (as originally done by Tomasi-Kanade). As usual, we transform Qn-g into a ma- trix Qn-g, whose sub-blocks have all weak-perspective rota- tion matrices (as outlined in section_3.4.l). Using Qrig as an initial guess for the pose of the non- rigid shape, we solve for the non-rigid l”- and B terms in (5), We do this iteratively by first initializing I,”- randomly and then iterating between solving for B, then for 1,1,, and then refining R, againl. 1. Given all R, and In; terms (the Q matrix), equation (5) can be used to find the linear least-square—fit of B. 2., Given B and all R,, we can solve for all 1,),- with linear least-squares. 3. Given B and L, we can rewrite (5) to: m = R, Elnksk (13) k Solving for all R, such that they fit this equation and remain rotation matrices can be done by parameter— izing R, with exponential coordinates. A full rotation ‘Alternatively we can use the sub—block factorization described in sec- tion 3.4.l for initialization matrix can be described by 3 variables [(ux, my, (02] as: 0 —(nZ (n). R(w) = exp (oz 0 —mx (14) ~03}. to,t 0 Assume 6) is the estimate of R, at the previous itera- tion, we can then linearize (13) around the previous estimate to: ~ 1 *w’ (o’. _ w,=[mc lz ~&;]R(w)2ktl,yksk (15) and solve for a new 0). We then update R(u)) : R(tu’)R(6)) and iterate2 We iterate all 3 steps until convergence. Similar to the technique described in section 3.3 we can easily handle missing entries in W when points are occluded or are lost by the tracker. B and L are over-constrained, so we leave out the missing data points and solve the linear fit as before. 3.4.3 Multi-View Input Another extension of this factorization technique is the in— corporation of multi—view inputs from M cameras. This enlarges the input matrix W to size 2F M x P. W1 Wm W . . W 2 2 7w, ~_ Wz,2 ’WW 2 [ um um ] 116,1...VL-1p WF WM (16) As before, we assume that W”. can be described by a 2 X 3 pose matrix RM, by k deformation coefficients 1,,1,l,,2,...l,,k, and a 3K X P key-shape matrix B. Assum- ing the cameras are synchronized, an additional constraint for the multi-View case is that all M views share the same deformation coefficients for a particular time frame 1?: w, = [1,,1 -R,|1,2 ‘R,|.'..|1,,K -R,] -B (17) Rel R: = R1,: (18) Km Similar to our previous 2-step factorization, we can fac— tor W into Q and B complying to this new structure. Fur- thermore we can enforce another constraint if we assume that all M cameras remain fixed relative to each other: The relative rotation between all Rw’s in the R, sub-block of Q is constant over time. This is enforced with a nonlinear it— erative optimization after the 2—step factorization. 2A future extension of this algorithm will deal with an iterative version for true perspective models. However, we like to point out, that for the orthographic case, there exist also several closed—form solutions including Hom‘s technique [12, 13]. and a SVD based method proposed by Ruder— man [20] that we will include in an extendet technical report. 1—497 3.4.4 Shape Regularization If there is not enough out-of—plane rotation, the Z values of B can be ill~conditioned. For instance, a small non-rigid de- formation in X and Y can also be explained by a small out— of~image-plane rigid rotation of a shape with large Z values. Another problem area is that if the low-level features have almost no image texture, the corresponding 3D point in the shape matrix B is not defined. We can overcome these problems by regularizing the shape matrix B during the iterative optimization. A sim— ple term can be added to the least-squares—fit of B in section 3.4.2, but also to the least—squares-fit of B in section 3.2. If point location i and j are neighbors (we can determine that with a local nearest neighbor algorithm or Delaunay trian- gulation), we add the following term for each neighbor pair $1.24!)”- — bay)? This pulls ambiguous points closer to the average of their neighbors. It is similar to the smoothness terms in snake—based contour tracking optimizations [15]. otij can be inversely proportional to the distance between the 2 points z' and j. The global least—square—fit remains lin- ear, since we only need to add additional linear equations of the form (1,717,),- — 0L;ij :— 0 to the least square system in section 3.2 and 3.4.2. 4 Experimental Results 4.1 Rank-Constrained Tracking We tested the rank—constraint technique for optical flow estimation on a 500 frame long video sequence of a deform- ing shoe (Figure 1). The recordings are challenging due to changes in the object appearance that are caused by the large rotations and deformations as well as by variations in illumination. In our examples a set of 30 reliable features were initially tracked using an implementation of the technique of Lucas— Kanade employing affine transformations for the patches centered at these points. We updated the reference patch of each point every 10 frames in order to accommodate the changes in feature appearance of our long sequence. We as— sumed our multi-frame approach capable of recovering the possible drifting introduced on some of the tracks by the frequent update of the template for the points. 80 additional features were then selected along lD edges in the reference frame. We produced a first approximate initialization of their displacements by linear extrapolation from the motion of the reliable points. We used the resulting W matrix as initialization for our tracking technique based on rank con- straints. We experimented with different values for the rank and achieved the best solution by setting rank r = 9. We employed the classic pyramidal approach in smoothing the images and ran several iterations of the multi-frame method. Figure 1. Example tracks of the shoe se- quence. The blue circles are reliable points, the red crosses are features with 1D texture that have been recovered uslng rank con- straints. We could track robustly and very accurately most of the l 10 points throughout the whole sequence. In order to correct drifting of some of the edge features we incorporated the additional regularization described in section 3.4.4. Figure I shows the features tracked for several frames. In the last part of the image sequence many of the dis— tinctive points that we have used to derive the optical flow field are progressively occluded while some disappear and then become visible again with the variations in the motion. These difficult frames were used to demonstrate the abile ity of our solution to cope with lost features and occlusion. For this experiment we manually labeled in each frame fea— tures that were not visible, incorrectly tracked or lost by the Lucas—Kanade initialization and then used the approach de- scribed in section 3.3 to recover their position. Figure 2 shows the estimated position of some of these features be- fore, during and after their temporary occlusion. The algo— rithm is successful in reconstructing their complete trajec— tories. Future experiments will also use an automatic track termination test. 4.2 3D Reconstruction Given the estimated Q and B of those 500 tracked monocular image frames, we then applied our reconstruc— tion technique described in section 3.4.2. Figure 3 shows some example frames and the reconstructed non-rigid 3D shapes overlayed. See http://movement.stanford.edu/nonrig for mpeg or quicktime video showing the entire sequence reconstructed. We also applied the new reconstruction technique to a video recording of a deforming human torso (Fig— ure 4). The included video 841_tyab.mpg (also at 1-498 Figure 2. The black rectangular markers are predicted locations of disappearing features. The algorithm can recover the complete tra- jectories of the temporarily occluded points. Figure 3. 3D reconstruction of carresponding 20 tracks from monocular video sequence. Please check video to see all details. http://movernent.stanford.edu/nonrig) shows the deforming shapes in 3D. As you can tell, again the pose changes and the shape deformations were recovered successfully. We applied this technique and the previously reported solution [7] to several other video recordings, including a giraffe recording and human face tracking. All those re- constructions recovered the pose and shape deformations Figure 4. 3D reconstruction of corresponding ZD tracks from monocular video sequence. Please check video to see all details. nicely. Since we do not have ground—truth data available, we can only speculate how good the quantitative performance is. Therefore we also tested this technique on several arti- ficial datasets with ground truth and multi-view recordings with 2 calibrated cameras that allowed us to get a good es- timate of the ground truth by using triangulation. 4.3 Performance on Artificial Data The artificial data sets were based on random ba- sis shapes, artificial superquadrics, 3D data from 2—view recordings of a deforming face, and 2-view recordings of the shoe sequence. All error reported here are computed in percentage points: the average distance of the reconstructed point to the correct point divided by the size of the shape. The random basis shapes were generated by sampling points uniformly inside a unit cube. The first basis shape was given the largest weight so that the overall shape has a strong rigid component. Artificial data were generated from 5, 10, and 20 random basis shapes rotating over 300 frames. The overall maximum rotation in any axis was 90 degrees and gaussian noise was added to the final tracking matrix W. The results of running the iterative optimization are shown in figure 5(a,b). Both 3D error and 2 error decrease as the number of basis shapes (K) used in the optimization increase. For the data generated from 5 basis shapes, the 3D error and 2 error level off after K = 5. This makes sense be- cause only 5 basis shapes are required to describe the data and the iterative optimization finds the 5 basis shapes. The superquadric data were generated with three of the octants deforming independently. The same rotation from the random basis shapes was applied to it to generate 300 (a) x random basis shapes 7 or b avg percent error in SD avg percent error in z .0 a .0 \l r‘ — 3D error with coal + 2 error with 0ch a as avg percent error 3 53 avg percent error h P w 1 P a 0.8 Figure 5. Iterative optimization on monocular artificial data. Plots show how SD and 2 er- rorsvary as the number of basis shapes (K) used in the iterative optimization is increased. (a,b) random basis shapes, (a) shows 3D er- ror and (b) shows 2 error. x is the number of basis shapes used to generate the data. (c) superquadric. (d) deforming face. frames for the tracking matrix W. The average 3D and 2 errors are plotted in figure 5(c). The 3D face data were taken from a stereo reconstruc— tion of a deforming face. The same rotation was applied to this to generate the 300 frames for the tracking matrix W. The average 3D and 2 errors are plotted in figure 5(d). The iterative optimization was also tested with occlusion. Those points on the face which should be occluded in a certain pose were labelled as such and not included in W. As a re— sult, 15% of the points in W were removed. The average 3D and 2 errors for the reconstruction with occlusion are also plotted in figure 5(d). 4.4 Performance on Real Data Table 1 shows the monocular view reconstruction errors for the shoe sequence. A second camera was used for tri— angulation to get the ground truth so that we could compare our monocular reconstruction. This reconstruction is the most challenging task, since it tests the entire system from video input to 3D output. In the artificial experiments we ran the algorithms for each K 30 times and reported the me- dian error, for the shoe recording we only ran it once for each K. This explains the larger random fluctuations in the I—499 (b) x random basis shapes 4 10 K1213 4ISI617l8 ml l I III I Table 1. Top: 30 reconstruction performance on monocular shoe sequence. Bottom: 3D reconstruction performance on 2-view face sequence. errors. Overall the small reconstruction errors tell us that - this technique is indeed able to accurately recover nonrigid deformations from monocular image sequences. We also ran the multi-view reconstruction on the face data, and achieved again reasonable error rates as can be seen in Table 1. 5 Discussion We have shown how to exploit low-rank constraints for low-level tracking,» for prediction of missing low-level fea— tures and for nonérigid reconstruction. We have demon- strated those techniques on several video sequences and have shown that good 3D non-rigid reconstructions can be achieved. We'further quantified the performance of single— view reconstructions with the use of additional calibrated views and simulations of artificial data. We have not yet addressed the issue of discovering how 7 many basis shapes are needed. One possible solution is that the user defines an upper treshold on how much reprojec~ tion error is allowed. K is increased until the error is below threshold. Another interesting aspect that is currently under inves- tigation is the bias of this technique. In many cases 3D rota- tion can be compensated with some degrees of freedom of the basis shape set. Despite this ambiguity, our technique has a strong bias towards representing as much as possible with the rotation matrix, but we like to further study this ambiguities. Reconstructin g non-rigid models from single-view video recordings has many potential applications. For example, we intend to apply this technique to our image-based facial and full-body animation system and to a model based track- ing system. Acknowledgements We would like to thank James Davis, Erika Chuang, and Ajit Chaudhari for providing advice and data. This research I—500 was funded in part by the National Science Foundation and the Stanford BIO-X program. References [I] B. Bascle and A. Blake. Separability of pose and expression in facial tracking and animation. In ICC V, 1998. [2] J. Bergen, P. Anandan, K. Hanna, and R. Hingorani. Hierar— chical model-based motion estimation. In ECCV, 1992. [3] M. Black and Y. Yacoob. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of im- age motion. In ICCV, 1995. [4] M. Black, Y.Yacoob, A.D.Jepson, and D.J.F1eet. Learning parameterized models of image motion. In CVPR, 1997. [5] A. Blake, M. Isard, and D. Reynard. Learning to track the visual motion of contours. In J. Artificial Intelligence, 1995. [6] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In SIGGRAPH, 1999. [7] C. Bregler, A. Hertzmann. and H. Biermann. Recovering Non-Rigid 3D Shape from Image Streams. In CVPR, 2000. [8] J. Costeira and T. Kanade. A multi-body factorization method for motion analysis. Int. J. of Computer Vision, pages 159- 180, Sep 1998. [9] D. DeCarlo and D. Metaxas. Deformable model-based shape and motion analysis from images using motion residual error. In ICCV, 1998. [10] S. Gokturk, J. Bouget, and R, Graeszocuk. A data-driven model for monocular face tracking. In ICCV, 2001. [11] B. Guenter, C. Grimm, D. Wood, H. Malvar, and F. Pighin. Making faces. [n SIGGRAPH, 1998. [12] B. K. P. Horn. Closed—form solution of absolute orienta— tion using unit quatemions. Journal of the Optical Society ofAmerica, 4(4), 1987. ‘ [13] B. K. P. Horn, H. M. Hildne, and S. Negahdaripour. Closed- form solution of absolute orientation using orthonormal ma— trices. Journal of the Optical Society of America A, 5(7), 1938. [14] M. Irani. Multi-frame optical flow estimation using subspace constraints. In ICCV, 1999. [15] M. Kass, A. Witkin, and D. Terzopoulus. Snakes: Active contour models. Int. J. of Computer Vision, 1(4):321—-33l, 1987. t [16] A. Lanitis, T. (2.1., C. TE, and A. '1‘. Automatic interpreta— tion of human faces and hand gestures using flexible models. In International Workshop on Automatic Face- and Gesture. Recognition, 1995. [17] B. Lucas and T. Kanade. An iterative image registration tech- nique with an application to stereo vision. Proc. 7th Int. Joint Conf. on Arttf Intell., 1981. [18] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H. Salesin. Synthesizing realistic facial expressions from pho- tographs. 1n SIGGRAPH, 1998. [19] F. Pighin, D. H. Salesin, and R. Szeliski. Resynthesizing facial animation through 3d model-based tracking. In [CC V, 1999. [20] D. L. Ruderman. Private communication. [21] J . Shi and C. Tomasi. Good features to track. In C VPR , 1994. [22] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. Int. J. of Computer Vision, 9(2): 137—154, 1992. ...
View Full Document

This note was uploaded on 06/13/2011 for the course CAP 6412 taught by Professor Staff during the Spring '08 term at University of Central Florida.

Page1 / 8

Bregler2001 - Tracking and Modeling Non-Rigid Objects with Rank Constraints Lorenzo Torresani’r Danny B Yangl l{ltorresa dbyang

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online