PhD Thesis

Abstract

This thesis addresses the problem of visual retrieval in large-scale image datasets, where the goal is to find all images of an object instance. The object is specified by a query image, which can be a photograph, painting, edge-map, human-drawn sketch etc. Solutions to this problem can be widely used in many applications such as place or location recognition, copyright violation detection, product search, 3D reconstruction, etc. The task of visual retrieval of an object instance is a challenging one, as the representation of the object appearance has to handle: significant viewpoint, scale, and illumination change; heavy occlusions; and, different image modalities (photograph, painting, cartoon, sketch). At the same time, the search has to be performed online, i.e., when a user submits the query, the response should be immediate, even when searching through millions of images. Towards this goal, we propose methods for compact image representation, that achieve high accuracy, while maintaining low memory and computational requirements.

A number of image retrieval related problems is stated, studied and resolved in the theses. Two conceptually different approaches to compact image representation are proposed. First, a method of joint dimensionality reduction of multiple vocabularies for bag-of-words-based compact representation is proposed. Second, a method to fine-tune convolutional neural networks (CNNs) for compact image retrieval from a large collection of unordered images in a fully automated manner is proposed. We additionally show that the CNN trained with edge maps of landmark images, instead of photographs, improves performance in the cases where shape is carrying the dominant information. The proposed compact representations are evaluated on a range of different tasks, providing improvements on challenging cases of instance image retrieval, generic sketch-based image retrieval or its fine-grained counterpart, and domain generalization.

We address the issue of image retrieval benchmarking. We extend standard and popular Oxford Buildings and Paris datasets by novel annotations, protocols, and queries. The novel protocols allow fair comparison between different methods, including those using a dataset pre-processing stage. An extensive comparison of the state-of-the-art methods is performed on the new benchmark. The results show that image retrieval is far from being solved.

Finally, we introduce the concept of target mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image. The adversarial image looks nothing like the user intended query, but leads to identical or very similar retrieval results. We evaluate the attacks on standard retrieval benchmarks and compare the results retrieved with the original and adversarial image.

Contributions

CNN-based image retrieval representation learning. Following improvements to the deep learning pipeline are proposed: whitening learned in a supervised way, generalized-mean (GeM) pooling layer, improved multi-scale representation based on the same pooling, α-weighted query expansion (α-QE). State-of-the-art performance is set on all standard image retrieval benchmarks.
[ ECCV'16 | TPAMI'18 ]

Training

Oxford

Paris
Collecting hard matching and non-matching examples for image retrieval training. State-of-the-art retrieval and structure-from-motion (SfM) pipeline, with many bells and whistles, is utilized. The pipeline is first exploited to reconstruced all 3D models in an unordered dataset. Next, reconstucted 3D models and their SfM information is used to enforce geometrically challenging matching but also hard non-matching examples for CNN training.
[ CVPR'15 | CVPR'16 | ECCV'16 | TPAMI'18 ]
Large-scale image retrieval benchmarks. The problem of large-scale image retrieval dataset construction is addressed. New annotation for Oxford and Paris datasets is generated, the evaluation protocol is updated, new more difficult queries are defined, and a new set of over a million challenging distractors is created.
[ CVPR'18 ]
Extensive large-scale image retrieval evaluation. We perform large-scale image retrieval evaluation on our newly proposed test datasets. Extensive evaluation of image retrieval methods is provided, ranging from local-feature based to CNN-descriptor based approaches, including various methods of re-ranking.
[ CVPR'18 ]
Targeted mismatch adversarial attack for CNN-based image retrieval. This novel concept is used to generate an adversarial image to conceal the query image and protect user privacy. The adversarial image looks nothing like the user intended query, but leads to identical or very similar retrieval results. We show successful attacks to partially unknown systems, by designing various loss functions for the adversarial image construction.
[ ICCV'19 ]
Sketch-based image retrieval and domain generalization via shape matching. A new CNN-based compact shape descriptor is proposed, which is shown to be highly beneficial for two problems: (i) domain generalization in the case of classification, and (ii) cross modality matching of sketches to images. Additionally, shape information is shown to be useful even in the specific cases of traditional image retrieval.
[ ECCV'18 ]
Bag-of-words based compact image representaion for image retrieval. A method of joint dimensionality reduction of multiple visual vocabularies is proposed. More precisely, a variety of vocabulary generation techniques are studied: different k-means initializations, different descriptor transformations, different measurement regions for descriptor extraction.
[ ICMR'15 ]

Publications

This thesis build on the results previously published in the following publications. The references are ordered chronologically. The citation counts were obtained using Google Scholar on Feb 17, 2019.

Multiple Measurements and Joint Dimensionality Reduction for Large Scale Image Search with Short Vectors
Radenović F., Jégou H., Chum O.
ICMR 2015 [ pdf | bib | poster | presentation ] [ 19 citations ]
From Single Image Query to Detailed 3D Reconstruction
Schönberger J. L., Radenović F., Chum O., Frahm J.
CVPR 2015 [ pdf | bib | video | poster ] [ 83 citations ]
From Dusk till Dawn: Modeling in the Dark
Radenović F., Schönberger J. L., Ji D., Frahm J., Chum O., Matas J.
CVPR 2016 [ pdf | bib | supp | poster | presentation ] [ 29 citations ]
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples
Radenović F., Tolias G., Chum O.
ECCV 2016 [ pdf | bib | code & data | poster | presentation | oral ] [ 340 citations ]
Fine-tuning CNN Image Retrieval with No Human Annotation
Radenović F., Tolias G., Chum O.
TPAMI 2018 [ pdf | bib | code & data ] [ 152 citations ]
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
Radenović F., Iscen A., Tolias G., Avrithis Y., Chum O.
CVPR 2018 [ pdf | bib | code & data ] [ 59 citations ]
Deep Shape Matching
Radenović F., Tolias G., Chum O.
ECCV 2018 [ pdf | bib | code & data ] [ 14 citations ]
Targeted Mismatch Adversarial Attack: Query with a Flower to Retrieve the Tower
Tolias G., Radenović F., Chum O.
ICCV 2019 [ pdf | bib | code & data ] [ 2 citations ]

The following publications were not included in the thesis, in order to keep the thesis more focused and easier to follow.

Efficient Image Detail Mining
Mikulík A., Radenović F., Chum O., Matas J.
ACCV 2014 [ pdf | bib | poster ] [ 15 citations ]
Camera Elevation Estimation from a Single Mountain Landscape Photograph
Čadík M., Vašíček J., Hradiš M., Radenović F., Chum O.
BMVC 2015 [ pdf | bib | dataset | poster | project ] [ 7 citations ]
Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition
Sattler T., Havlena M., Radenović F., Schindler K., Pollefeys M.
ICCV 2015 [ pdf | bib | poster | project ] [ 78 citations ]
Working hard to know your neighbor's margins: Local descriptor learning loss
Mishchuk A., Mishkin D., Radenović F., Matas J.
NIPS 2017 [ pdf | bib | code & data ] [ 110 citations ]
Repeatability Is Not Enough: Learning Affine Regions via Discriminability
Mishkin D., Radenović F., Matas J.
ECCV 2018 [ pdf | bib | code & data ] [ 39 citations ]