Semantics-Preserving Bag-of-Words Models and Applications

Abstract

The Bag-of-Words (BoW) model is a promising image representation technique for image categorization and annotation tasks. One critical limitation of existing BoW models is that much semantic information is lost during the codebook generation process, an important step of BoW. This is because the codebook generated by BoW is often obtained via building the codebook simply by clustering visual features in Euclidian space. However, visual features related to the same semantics may not distribute in clusters in the Euclidian space, which is primarily due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a novel scheme to learn optimized BoW models, which aims to map semantically related features to the same visual words. In particular, we consider the distance between semantically identical features as a measurement of the semantic gap, and attempt to learn an optimized codebook by minimizing this gap, aiming to achieve the minimal loss of the semantics. We refer to such kind of novel codebook as semantics-preserving codebook (SPC) and the corresponding model as the Semantics-Preserving Bag-of-Words (SPBoW) model. Extensive experiments on image annotation and object detection tasks with public testbeds from MIT's Labelme and PASCAL VOC challenge databases show that the proposed SPC learning scheme is effective for optimizing the codebook generation process, and the SPBoW model is able to greatly enhance the performance of the existing BoW model.

Keywords

CodebookComputer scienceArtificial intelligenceBag-of-words modelSemantics (computer science)Semantic gapCategorizationNatural language processingCluster analysisBag-of-words model in computer visionPattern recognition (psychology)Visual WordImage retrievalInformation retrievalImage (mathematics)Programming language

Affiliated Institutions

Related Publications

Discovering objects and their location in images

Josef Šivic , Bryan Russell , Alexei A. Efros +2 more

We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic Laten...

2005 980 citations

Modeling scenes with local descriptors and latent aspects

Pedro Quelhas , Florent Monay , Jean‐Marc Odobez +3 more

We present a new approach to model visual scenes in image collections, based on local invariant features and probabilistic latent space models. Our formulation provides answers ...

2005 345 citations

Bags of Spacetime Energies for Dynamic Scene Recognition

Christoph Feichtenhofer , Axel Pinz , Richard P. Wildes

This paper presents a unified bag of visual word (BoW) framework for dynamic scene recognition. The approach builds on primitive features that uniformly capture spatial and temp...

2014 61 citations

Context-aware Image Tweet Modelling and Recommendation

Tao Chen , Xiangnan He , Min‐Yen Kan

While efforts have been made on bridging the semantic gap in image understanding, the in situ understanding of social media images is arguably more important but has had less pr...

2016 86 citations

FAB-MAP 3D: Topological mapping with spatial and visual appearance

Rohan Paul , Paul Newman

This paper describes a probabilistic framework for appearance based navigation and mapping using spatial and visual appearance data. Like much recent work on appearance based na...

2010 122 citations

Publication Info

Year: 2010
Type: article
Volume: 19
Issue: 7
Pages: 1908-1920
Citations: 191
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Semantics-Preserving Bag-of-Words Models and Applications

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

191

OpenAlex

Cite This

APA Style

                            
                                    Lei Wu, 
                                
                                    Steven C. H. Hoi, 
                                
                                    Nenghai Yu
                                
                            (2010). 
                            Semantics-Preserving Bag-of-Words Models and Applications. 
                            IEEE Transactions on Image Processing
                            , 19
                            (7)
                            , 1908-1920.
                            https://doi.org/10.1109/tip.2010.2045169

Identifiers

DOI: 10.1109/tip.2010.2045169