CMP events

Ondřej Chum presents Fast Computation of min-Hash Signatures for Image Collections

On 2012-06-12 11:00 at G205, Karlovo náměstí 13, Praha 2
A new method for highly efficient min-Hash generation
for document collections is proposed. It exploits the inverted
file structure which is available in many applications
based on a bag or a set of words. Fast min-Hash generation
is important in applications such as image clustering
where good recall and precision requires a large number of
min-Hash signatures.
Using the set of words representation, the novel exact
min-Hash generation algorithm achieves approximately a
50-fold speed-up on two dataset with 10^5 and 10^6 images
respectively. We also propose an approximate min-Hash
assignment process which reaches a more than 200-fold
speed-up at the cost of missing about 2-3% of matches.
We also experimentally show that the method generalizes
to other modalities with significantly different statistics.