PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Abstract

Estimating the 6D pose of known objects is important for robots to interact with the real world.The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects.In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation.PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera.The 3D rotation of the object is estimated by regressing to a quaternion representation.We also introduce a novel loss function that enables PoseCNN to handle symmetric objects.In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset.Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames.We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input.When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset.

Keywords

Artificial intelligenceComputer sciencePoseComputer visionConvolutional neural networkClutterObject (grammar)QuaternionRepresentation (politics)Translation (biology)Object detectionRotation (mathematics)Pattern recognition (psychology)3D pose estimationMathematics

Affiliated Institutions

Related Publications

Caltech-256 Object Category Dataset

G. S. Griffin , Alex Holub , Pietro Perona

We introduce a challenging set of 256 object categories containing a total of 30607 images. The original Caltech-101 [1] was collected by choosing a set of object categories, do...

2007 The Caltech Institute Archives (Calif... 2388 citations

Is object localization for free? - Weakly-supervised learning with convolutional neural networks

Maxime Oquab , Léon Bottou , Ivan Laptev +1 more

Successful methods for visual object recognition typically rely on training datasets containing lots of richly annotated images. Detailed image annotation, e.g. by object boundi...

2015 915 citations

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

Anh‐Tu Nguyen , Jason Yosinski , Jeff Clune

Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Giv...

2015 3232 citations

Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

Ning Zhang , Ryan Farrell , Forrest Iandola +1 more

Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized,...

2013 201 citations

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick , Jeff Donahue , Trevor Darrell +1 more

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that...

2014 30615 citations

Publication Info

Year: 2018
Type: preprint
Citations: 1986
Access: Closed

External Links

Download PDF (Free) View on DOI.org arXiv Semantic Scholar

Social Impact

Altmetric

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1986

OpenAlex

461

Influential

1353

CrossRef

Cite This

APA Style

                            
                                    Xiang Yu, 
                                
                                    Tanner Schmidt, 
                                
                                    Venkatraman Narayanan
                                
                                et al.
                            
                            (2018). 
                            PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. 
                            Robotics: Science and Systems XIV
                            
                            .
                            https://doi.org/10.15607/rss.2018.xiv.019

Identifiers

DOI: 10.15607/rss.2018.xiv.019
arXiv: 1711.00199

Data Quality

Data completeness: 84%