Large Scale Image Clustering and Object Discovery

Ondrej Chum
(Center for Machine Perception, CTU Prague, Czech Republic)

Abstract:

We propose a randomized data mining method that finds clusters of spatially overlapping images. Our approach avoids burut-force computation of similarity between all image pairs. The core of the method relies on the a randomized algorithm for fast proposal of image clusters, the so-called cluster seeds. Efficient method for cluster seed discovery are discussed: min-Hash, weighted min-Hash and geometric min-Hash.

The seeds are then used as visual queries to obtain clusters which are formed as transitive closures of sets of partially overlapping images that include the seed. We show that the probability of finding a seed for an image cluster rapidly increases with the size of the cluster.

Related Papers: