Approximate search as a source coding problem, with application to large scale image retrieval

Herve Jegou
INRIA Rennes, France

Image recognition, which is used in many applications such as copy detection or location recognition, requires to be able to handle and search into large databases of descriptors, typically in the order of billions of vectors. This raises an efficiency problem, but also the problem of memory resources.

In this talk, I will first show that the search problem can be cast into a source coding framework, where the database vectors are approximated by quantization. The Euclidean distance between a query vector and a database vector is estimated in an asymmetric manner based on the quantized database descriptors. The method is advantageously combined with an inverted file to avoid exhaustive search, and used either for local or global descriptors.

I will then discuss some recent improvement of this method in a context of precise image matching, and show the importance of the nearest neighbor search accuracy for image search.