Abstract
Conference Title: 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA) Conference Start Date: 2015, Dec. 14 Conference End Date: 2015, Dec. 16 Conference Location: Marrakech, Morocco This paper presents the structured Fisher vector encoding method, a new video representation which yields an improved model to classical FV for human action recognition. Our proposed representation is based on local structural organization of features by building graphs of trajectories. It preserve more information in feature encoding process by local spatial pooling and refining the representation in the global pooling. Local spatio-temporal information are exploited by presenting the relationships among video trajectories as local graphs of trajectories using a multi-scale Delaunay triangulation. Experiments using the human action recognition datasets (Hollywood2 and HMDB51) show the effectiveness of the proposed approach.