Ondřej Chum presents Fast Computation of min-Hash Signatures for Image Collections
On 2012-06-12 11:00
at G205, Karlovo náměstí 13, Praha 2
A new method for highly efficient min-Hash generation
for document collections is proposed. It exploits the inverted
file structure which is available in many applications
based on a bag or a set of words. Fast min-Hash generation
is important in applications such as image clustering
where good recall and precision requires a large number of
min-Hash signatures.
Using the set of words representation, the novel exact
min-Hash generation algorithm achieves approximately a
50-fold speed-up on two dataset with 10^5 and 10^6 images
respectively. We also propose an approximate min-Hash
assignment process which reaches a more than 200-fold
speed-up at the cost of missing about 2-3% of matches.
We also experimentally show that the method generalizes
to other modalities with significantly different statistics.
for document collections is proposed. It exploits the inverted
file structure which is available in many applications
based on a bag or a set of words. Fast min-Hash generation
is important in applications such as image clustering
where good recall and precision requires a large number of
min-Hash signatures.
Using the set of words representation, the novel exact
min-Hash generation algorithm achieves approximately a
50-fold speed-up on two dataset with 10^5 and 10^6 images
respectively. We also propose an approximate min-Hash
assignment process which reaches a more than 200-fold
speed-up at the cost of missing about 2-3% of matches.
We also experimentally show that the method generalizes
to other modalities with significantly different statistics.
External WWW: http://cmp.felk.cvut.cz/~chum/papers/chum-cvpr12.pdf