This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Recognizing Human Actions Using Multiple Features Jingen Liu Computer Vision Lab University of Central Florida email@example.com Saad Ali Computer Vision Lab University of Central Florida firstname.lastname@example.org Mubarak Shah Computer Vision Lab University of Central Florida email@example.com Abstract In this paper, we propose a framework that fuses multi- ple features for improved action recognition in videos. The fusion of multiple features is important for recognizing ac- tions as often a single feature based representation is not enough to capture the imaging variations (view-point, illu- mination etc.) and attributes of individuals (size, age, gen- der etc.). Hence, we use two types of features: i) a quan- tized vocabulary of local spatio-temporal (ST) volumes (or cuboids), and ii) a quantized vocabulary of spin-images, which aims to capture the shape deformation of the actor by considering actions as 3D objects ( x,y,t ) . To optimally combine these features, we treat different features as nodes in a graph, where weighted edges between the nodes repre- sent the strength of the relationship between entities. The graph is then embedded into a k-dimensional space subject to the criteria that similar nodes have Euclidian coordinates which are closer to each other. This is achieved by con- verting this constraint into a minimization problem whose solution is the eigenvectors of the graph Laplacian matrix. This procedure is known as Fiedler Embedding . The per- formance of the proposed framework is tested on publicly available data sets. The results demonstrate that fusion of multiple features helps in achieving improved performance, and allows retrieval of meaningful features and videos from the embedding space. 1. Introduction Action recognition in videos is an important area of re- search in the field of computer vision. The ever growing interest in characterizing human actions is fuelled, in part, by the increasing number of real-world applications such as action/event centric video retrieval, activity monitoring in surveillance scenarios, sports video analysis, smart rooms, human-computer interaction, etc. The classification of hu- man actions has remained a challenging problem due to the sheer amount of variations in the imaging conditions (view-point, illumination etc.) and attributes of the individ- ual (size, age, gender, etc.) performing the action. In general, approaches for human action recognition can be categorized on the basis of the representation. Some leading representations include learned geometrical models of the human body parts, space-time pattern templates, ap- pearance or region features, shape or form features, interest point based representations, and motion/optical flow pat- terns. Specifically, [ 16 , 17 ] utilized the geometrical mod- els of human body parts where the action is recognized by searching for the static postures in the image that match the target action. The popular shape based representations in- clude edges [...
View Full Document
- Spring '08