This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Free Viewpoint Action Recognition using Motion History Volumes Daniel Weinland 1 , Remi Ronfard, Edmond Boyer Perception-GRAVIR, INRIA Rhone-Alpes, 38334 Montbonnot Saint Martin, France. Abstract Action recognition is an important and challenging topic in computer vision, with many important applications including video surveillance, automated cinematogra- phy and understanding of social interaction. Yet, most current work in gesture or action interpretation remains rooted in view-dependent representations. This paper introduces Motion History Volumes (MHV) as a free-viewpoint representation for human actions in the case of multiple calibrated, and background-subtracted, video cameras. We present algorithms for computing, aligning and comparing MHVs of different actions performed by different people in a variety of viewpoints. Alignment and comparisons are performed efficiently using Fourier transforms in cylindrical co- ordinates around the vertical axis. Results indicate that this representation can be used to learn and recognize basic human action classes, independently of gender, body size and viewpoint. Key words: action recognition, view invariance, volumetric reconstruction 1 Introduction Recognizing actions of human actors from video is an important topic in com- puter vision with many fundamental applications in video surveillance, video indexing and social sciences. According to Neumann et al.  and from a com- putational perspective, actions are best defined as four-dimensional patterns Email addresses: [email protected] (Daniel Weinland), [email protected] (Remi Ronfard), [email protected] (Edmond Boyer). 1 D. Weinland is supported by a grant from the European Community under the EST Marie-Curie Project Visitor. Preprint submitted to Elsevier Science 16 October 2006 in space and in time. Video recordings of actions can similarly be defined as three-dimensional patterns in image-space and in time, resulting from the perspective projection of the world action onto the image plane at each time instant. Recognizing actions from a single video is however plagued with the unavoidable fact that parts of the action are hidden from the camera because of self-occlusions. That the human brain is able to recognize actions from a sin- gle viewpoint should not hide the fact that actions are firmly four-dimensional, and, furthermore, that the mental models of actions supporting recognition may also be four-dimensional. In this paper, we investigate how to build spatio-temporal models of human ac- tions that can support categorization and recognition of simple action classes, independently of viewpoint, actor gender and body sizes. We use multiple cameras and shape from silhouette techniques. We separate action recognition in two separate tasks. The first task is the extraction of motion descriptors from visual input, and the second task is the classification of the descriptors into various levels of action classes, from simple gestures and postures to prim-...
View Full Document
- Spring '08