RGB-D Object Recognition via Incorporating Latent Data Structure and Prior Knowledge

Abstract

For the task of RGB-D object recognition, it is important to identify suitable representations of images, which can boost the performance of object recognition. In this work, we propose a novel representation learning method for RGB-D images by jointly incorporating the underlying data structure and the prior knowledge of the data. Specifically, the convolutional neural networks (CNN) are employed to learn image representation by exploiting the underlying data structure. To handle the problem of the limited RGB and depth images for object recognition, the multi-level hierarchies of features trained on ImageNet from the CNN are transferred to learn rich generic feature representation for RGB and depth images while the labeled images are leveraged. On the other hand, we propose a novel deep auto-encoders (DAE) to exploit the prior knowledge, which can overcome the expensive computational cost of optimization in feature encoding. The expected representations of images are obtained by integrating the two types of image representations. To verify the effectiveness of the proposed method, we thoroughly conduct extensive experiments on two publicly available RGB-D datasets. The encouraging experimental results compared with the state-of-the-art approaches demonstrate the advantages of the proposed method.

Keywords

Computer scienceArtificial intelligenceRGB color modelFeature learningConvolutional neural networkPattern recognition (psychology)Feature (linguistics)Representation (politics)Object (grammar)Encoding (memory)Cognitive neuroscience of visual object recognitionFeature extractionComputer visionDeep learningEncoderAutoencoder

Affiliated Institutions

Related Publications

When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs

Gong Cheng , Ceyuan Yang , Xiwen Yao +2 more

Remote sensing image scene classification is an active and challenging task driven by many applications. More recently, with the advances of deep learning models especially conv...

2018 IEEE Transactions on Geoscience and R... 1205 citations

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Kaiming He , Xiangyu Zhang , Shaoqing Ren +1 more

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 × 224) input image. This requirement is "artificial" and may reduce the recognition accuracy f...

2015 IEEE Transactions on Pattern Analysis... 10916 citations

Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks

Maxime Oquab , Léon Bottou , Ivan Laptev +1 more

Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The success o...

2014 3151 citations

Learning hierarchical representations for face verification with convolutional deep belief networks

Guoyang Huang , Honglak Lee , Erik Learned-Miller

Most modern face recognition systems rely on a feature representation given by a hand-crafted image descriptor, such as Local Binary Patterns (LBP), and achieve improved perform...

2012 412 citations

Deep Learning for Generic Object Detection: A Survey

Li Liu , Wanli Ouyang , Xiaogang Wang +4 more

Abstract Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories...

2019 International Journal of Computer Vision 2605 citations

Publication Info

Year: 2015
Type: article
Volume: 17
Issue: 11
Pages: 1899-1908
Citations: 60
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

RGB-D Object Recognition via Incorporating Latent Data Structure and Prior Knowledge

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                
                                    Jinhui Tang, 
                                
                                    Lu Jin, 
                                
                                    Zechao Li
                                
                                et al.
                            
                            (2015). 
                            RGB-D Object Recognition via Incorporating Latent Data Structure and Prior Knowledge. 
                            IEEE Transactions on Multimedia
                            , 17
                            (11)
                            , 1899-1908.
                            https://doi.org/10.1109/tmm.2015.2476660
                        

Identifiers

DOI: 10.1109/tmm.2015.2476660