yilmaz_iccv_2005 - Recognizing Human Actions in Videos...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras Alper Yilmaz School of Computer Science University of Central Florida yilmaz@cs.ucf.edu Mubarak Shah School of Computer Science University of Central Florida shah@cs.ucf.edu Abstract Most work in action recognition deals with sequences acquired by stationary cameras with ±xed viewpoints .Due to the camera motion, the trajectories of the body parts contain not only the motion of the performing actor but also the motion of the camera. In addition to the camera motion, different viewpoints of the same action in different environments result in different trajectories, which can not be matched using standard approaches. In order to handle these problems, we propose to use the multi-view geometry between two actions. However, well known epipolar geom- etry of the static scenes where the cameras are stationary is not suitable for our task. Thus, we propose to extend the standard epipolar geometry to the geometry of dynamic scenes where the cameras are moving. We demonstrate the versatility of the proposed geometric approach for recogni- tion of actions in a number of challenging sequences. 1. Introduction During the last two decades, a large number of research articles have been published on the recognition of human actions. This popularity is mainly due to the occurrence of actions in many real world applications such as surveil- lance, video classi±cation and content based retrieval. Re- gardless, both of these tasks still remain outstanding chal- lenges in the Vision Community. A common approach taken by the researchers is to per- form action recognition in 2-D, such as, use of motion tra- jectories [16], optical ²ow vectors [4] and silhouettes [3]. For instance, for recognizing facial expressions, Black and Yacoob [2] computed the af±ne motion of the bounding boxes around the eyes, the eyebrows and the mouth. The variations in the af±ne parameters are shown to capture fa- cial changes during an expression. Yang et al. [21] also used the af±ne transformation computed between the cor- responding segments in consecutive frames for sign lan- guage recognition. Before their work the same problem was addressed by Starner and Pentland [17], where the au- thors used the bounding boxes around the hands to training HMMs which model the states of the hand during the ac- tion. Efros et al. [4] used the optical ²ow computed in the bounding boxes of the objects to represent the actions. Similarly, Polana and Nelson [14] generated the statistics of the normal ²ow from the spatio-temporal cube to represent the motion content during an action. Laptev and Lindeberg [12] used temporal and spatial image gradients to ±nd de- scriptors of an action. Instead of using trajectory or bound- ing boxes, Bobick and Davis [3] used object silhouettes to model the action. A stack of such silhouettes, which pro- vides a motion history, was called the a temporal template.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/13/2011 for the course CAP 6412 taught by Professor Staff during the Spring '08 term at University of Central Florida.

Page1 / 8

yilmaz_iccv_2005 - Recognizing Human Actions in Videos...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online