Abstract

This paper proposes two novel image similarity measures for fast indexing via locality sensitive hashing. The similarity measures are applied and evaluated in the context of near duplicate image detection. The proposed method uses a visual vocabulary of vector quantized local feature descriptors (SIFT) and for retrieval exploits enhanced min-Hash techniques. Standard min-Hash uses an approximate set intersection between document descriptors was used as a similarity measure. We propose an efficient way of exploiting more sophisticated similarity measures that have proven to be essential in image / particular object retrieval. The proposed similarity measures do not require extra computational effort compared to the original measure. We focus primarily on scalability to very large image and video databases, where fast query processing is necessary. The method requires only a small amount of data need be stored for each image. We demonstrate our method on the TrecVid 2006 data set which contains approximately 146K key frames, and also on challenging the University of Kentucky image retrieval database.

Keywords

Computer scienceHash functionWeightingPattern recognition (psychology)Artificial intelligenceMedicine

Related Publications

Publication Info

Year
2008
Type
article
Pages
50.1-50.10
Citations
465
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

465
OpenAlex
32
Influential
280
CrossRef

Cite This

Ondřej Chum, James Philbin, Andrew Zisserman (2008). Near Duplicate Image Detection: min-Hash and tf-idf Weighting. Procedings of the British Machine Vision Conference 2008 , 50.1-50.10. https://doi.org/10.5244/c.22.50

Identifiers

DOI
10.5244/c.22.50

Data Quality

Data completeness: 81%