Near Duplicate Image Detection: min-Hash and tf-idf Weighting

Abstract

This paper proposes two novel image similarity measures for fast indexing via locality sensitive hashing. The similarity measures are applied and evaluated in the context of near duplicate image detection. The proposed method uses a visual vocabulary of vector quantized local feature descriptors (SIFT) and for retrieval exploits enhanced min-Hash techniques. Standard min-Hash uses an approximate set intersection between document descriptors was used as a similarity measure. We propose an efficient way of exploiting more sophisticated similarity measures that have proven to be essential in image / particular object retrieval. The proposed similarity measures do not require extra computational effort compared to the original measure. We focus primarily on scalability to very large image and video databases, where fast query processing is necessary. The method requires only a small amount of data need be stored for each image. We demonstrate our method on the TrecVid 2006 data set which contains approximately 146K key frames, and also on challenging the University of Kentucky image retrieval database.

Keywords

Computer scienceHash functionWeightingPattern recognition (psychology)Artificial intelligenceMedicine

Related Publications

Similarity Search in High Dimensions via Hashing

Aristides Gionis , Piotr Indyk , Rajeev Motwani

The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasin...

1999 3096 citations

Scalable Recognition with a Vocabulary Tree

D. Nistér , Henrik Stewénius

A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers...

2006 3595 citations

Video Google: a text retrieval approach to object matching in videos

Sivic , Zisserman

We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a s...

2003 6388 citations

ORB: An efficient alternative to SIFT or SURF

Ethan Rublee , Vincent Rabaud , Kurt Konolige +1 more

Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection a...

2011 9963 citations

Natural Feature Detection on Mobile Phones with 3D FAST

Achim Weimert , Xueting Tan , Xubo Yang

In this paper, we present a novel feature detection approach designed for mobile devices, showing optimized solutions for both detection and description. It is based on FAST (Fe...

2010 International Journal of Virtual Reality 4 citations

Publication Info

Year: 2008
Type: article
Pages: 50.1-50.10
Citations: 465
Access: Closed

External Links

Download PDF (Free) View on DOI.org Semantic Scholar

Social Impact

Altmetric

Near Duplicate Image Detection: min-Hash and tf-idf Weighting

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

465

OpenAlex

Influential

280

CrossRef

Cite This

APA Style

                            
                                    Ondřej Chum, 
                                
                                    James Philbin, 
                                
                                    Andrew Zisserman
                                
                            (2008). 
                            Near Duplicate Image Detection: min-Hash and tf-idf Weighting. 
                            Procedings of the British Machine Vision Conference 2008
                            
                            , 50.1-50.10.
                            https://doi.org/10.5244/c.22.50

Identifiers

DOI: 10.5244/c.22.50

Data Quality

Data completeness: 81%