Visual Search: From More is Better to Less is More.

Cees Snoek (University of Amsterdam, The Netherlands)

Progress in visual search has doubled, in just three years. Thanks to computer vision and pattern recognition breakthroughs in representing images, such as the bag of SIFT codewords, it is now possible to search a massive video or image collection for presence of visual categories like 'person', 'boat' and 'beach', and whatever concept with an accuracy that is good enough for real-world deployment.

These image representations initially reduced much of the visual information content by selecting only salient points in the image, but a dense analysis of the image proved to be better. In the past years, research in visual search started to live by this mantra of ‘more is better’: more pixels, more color features, larger codebooks, and more kernels for better retrieval. In this talk we take an opposite stance, as we will defend the view that for visual search less can be more as well. We highlight three algorithms for reducing image representations. Algorithm I considers supervised codebook construction via selection of the informative codewords, leading to 99% smaller codebooks than common today for query-by-example. In algorithm II, we propose convex reduced linear kernels that use only a portion of the dimensions to reconstruct an efficient linear kernel for visual concept search.

Finally, in algorithm III we consider videos represented by a few hundred concept detectors that outperform large-scale bag-of-word approaches for event retrieval. Apart from the reductions in image representation, the three algorithms also bring explanatory power to the visual search but without being instructed to do so.