Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

Abstract

Previous work on action recognition has focused on adapting hand-designed local features, such as SIFT or HOG, from static images to the video domain. In this paper, we propose using unsupervised feature learning as a way to learn features directly from video data. More specifically, we present an extension of the Independent Subspace Analysis algorithm to learn invariant spatio-temporal features from unlabeled video data. We discovered that, despite its simplicity, this method performs surprisingly well when combined with deep learning techniques such as stacking and convolution to learn hierarchical representations. By replacing hand-designed features with our learned features, we achieve classification results superior to all previous published results on the Hollywood2, UCF, KTH and YouTube action recognition datasets. On the challenging Hollywood2 and YouTube action datasets we obtain 53.3% and 75.8% respectively, which are approximately 5% better than the current best published results. Further benefits of this method, such as the ease of training and the efficiency of training and prediction, will also be discussed. You can download our code and learned spatio-temporal features here: http://ai.stanford.edu/~wzou/.

Keywords

Computer scienceArtificial intelligenceScale-invariant feature transformSubspace topologyPattern recognition (psychology)Action recognitionFeature learningInvariant (physics)Machine learningFeature extraction

Affiliated Institutions

Stanford University US

Related Publications

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks

Maxime Oquab , Léon Bottou , Ivan Laptev +1 more

Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The success o...

2014 3151 citations

Residual Dense Network for Image Super-Resolution

Yulun Zhang , Yapeng Tian , Yu Kong +2 more

A very deep convolutional neural network (CNN) has recently achieved great success for image super-resolution (SR) and offered hierarchical features as well. However, most deep ...

2018 3866 citations

Unsupervised Feature Learning via Non-parametric Instance Discrimination

Zhirong Wu , Yuanjun Xiong , Stella X. Yu +1 more

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether...

2018 3435 citations

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Sijie Yan , Yuanjun Xiong , Dahua Lin

Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts o...

2018 Proceedings of the AAAI Conference on... 4453 citations

Beta Process Joint Dictionary Learning for Coupled Feature Spaces with Application to Single Image Super-Resolution

Li He , Hairong Qi , Russell Zaretzki

This paper addresses the problem of learning over-complete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship between the ...

2013 179 citations

Publication Info

Year: 2011
Type: article
Citations: 982
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

982

OpenAlex

Cite This

APA Style

                            
                                    Quoc V. Le, 
                                
                                    Will Y. Zou, 
                                
                                    Serena Yeung
                                
                                et al.
                            
                            (2011). 
                            Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. 
                            
                            .
                            https://doi.org/10.1109/cvpr.2011.5995496

Identifiers

DOI: 10.1109/cvpr.2011.5995496