Articulated Human Detection with Flexible Mixtures of Parts

Abstract

We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, nonoriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: 1) They efficiently model articulation by sharing computation across similar warps, 2) they efficiently model an exponentially large set of global mixtures through composition of local mixtures, and 3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree structured, our models can be efficiently optimized with dynamic programming. We learn all parameters, including local appearances, spatial relations, and co-occurrence relations (which encode local rigidity) with a structured SVM solver. Because our model is efficient enough to be used as a detector that searches over scales and image locations, we introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently used evaluation criteria may conflate these two issues. Most previous approaches model limbs with rigid and articulated templates that are trained independently of each other, while we present an extensive diagnostic evaluation that suggests that flexible structure and joint training are crucial for strong performance. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets while being orders of magnitude faster.

Keywords

Computer scienceENCODEArtificial intelligenceSpatial relationSolverRepresentation (politics)ComputationPattern recognition (psychology)Computer visionRigidity (electromagnetism)Object detectionAlgorithm

Affiliated Institutions

University of California, Irvine US

Related Publications

Convolutional Pose Machines

Shih-En Wei , Varun Ramakrishna , Takeo Kanade +1 more

Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be...

2016 2728 citations

Multi-source Deep Learning for Human Pose Estimation

Wanli Ouyang , Xiao Chu , Xiaogang Wang

Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multi-source d...

2014 273 citations

DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation

Leonid Pishchulin , Eldar Insafutdinov , Siyu Tang +4 more

This paper considers the task of articulated human pose estimation of multiple people in real world images. We propose an approach that jointly solves the tasks of detection and...

2016 1069 citations

Poselet Conditioned Pictorial Structures

Leonid Pishchulin , Mykhaylo Andriluka , Peter Gehler +1 more

In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human...

2013 382 citations

Pictorial structures revisited: People detection and articulated pose estimation

Mykhaylo Andriluka , Stefan Roth , Bernt Schiele

Non-rigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and oft...

2009 2009 IEEE Conference on Computer Visi... 805 citations

Publication Info

Year: 2012
Type: article
Volume: 35
Issue: 12
Pages: 2878-2890
Citations: 857
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Articulated Human Detection with Flexible Mixtures of Parts

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

857

OpenAlex

Cite This

APA Style

                            
                                    Yi Yang, 
                                
                                    Deva Ramanan
                                
                            (2012). 
                            Articulated Human Detection with Flexible Mixtures of Parts. 
                            IEEE Transactions on Pattern Analysis and Machine Intelligence
                            , 35
                            (12)
                            , 2878-2890.
                            https://doi.org/10.1109/tpami.2012.261

Identifiers

DOI: 10.1109/tpami.2012.261