Abstract

Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.

Keywords

PoseComputer scienceInferenceArtificial intelligenceTask (project management)Machine learningRange (aeronautics)Image (mathematics)Convolutional neural networkConvolution (computer science)Function (biology)Graphical modelPattern recognition (psychology)Artificial neural network

Affiliated Institutions

Related Publications

Publication Info

Year
2016
Type
preprint
Citations
2728
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

2728
OpenAlex

Cite This

Shih-En Wei, Varun Ramakrishna, Takeo Kanade et al. (2016). Convolutional Pose Machines. . https://doi.org/10.1109/cvpr.2016.511

Identifiers

DOI
10.1109/cvpr.2016.511