2007_iccv_laptev - Retrieving actions in movies Ivan Laptev...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Retrieving actions in movies Ivan Laptev and Patrick P´erez IRISA / INRIA Rennes, Campus universitaire de Beaulieu 35042 Rennes Cedex France { ilaptev | perez } @irisa.fr Abstract We address recognition and localization of human ac- tions in realistic scenarios. In contrast to the previous work studying human actions in controlled settings, here we train and test algorithms on real movies with substantial vari- ation of actions in terms of subject appearance, motion, surrounding scenes, viewing angles and spatio-temporal ex- tents. We introduce a new annotated human action dataset and use it to evaluate several existing methods. We in par- ticular focus on boosted space-time window classifiers and introduce “keyframe priming” that combines discrimina- tive models of human motion and shape within an action. Keyframe priming is shown to significantly improve the per- formance of action detection. We present detection results for the action class “drinking” evaluated on two episodes of the movie “Coffee and Cigarettes”. 1. Introduction Human actions are frequent and essential events within the content of feature films, documentaries, commercials, personal videos, and so forth. “Did Frodo throw the ring into the volcano?” “Did Trinity kiss Neo?” The answers to these and many other questions are hidden exclusively in the visual representation of human actions. Automatic recognition of human actions, hence, is crucial for video search applications and is particularly urged by the rapidly growing amounts of professional and personal video data (BBC Motion Gallery, YouTube, Video Google). Interpretation of human actions is a well recognized problem in computer vision [2, 3, 1, 6, 9, 10, 16, 17, 19, 20, 22, 25, 26, 27]. It is a difficult problem due to the in- dividual variations of people in expression, posture, motion and clothing; perspective effects and camera motions; illu- mination variations; occlusions and disocclusions; and the distracting effect of the scene surroundings. Figure 1 illus- trates some of these difficulties on examples of drinking and smoking actions from the movie “Coffee and Cigarettes”. To delimit the problem, previous work used a number of simplifying assumptions for example (a) restricted cam- to appear in Proc. ICCV 2007 Figure 1. Examples of two action classes (drinking and smoking) from the movie “Coffee and Cigarettes”. Note the high within- class variability of actions in terms of object appearance (top) and human motion (bottom). Note also the similarity of both action classes in the gross motion and the posture of people. era motion; (b) specific scene context, e.g. in field sports or surveillance scenes; (c) reliable spatial segmentation; and (d) restricted variation of view points. Notably, action recognition has not yet been addressed in unrestricted sce- narios such as in feature films....
View Full Document

This note was uploaded on 06/13/2011 for the course CAP 6412 taught by Professor Staff during the Spring '08 term at University of Central Florida.

Page1 / 8

2007_iccv_laptev - Retrieving actions in movies Ivan Laptev...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online