Visual Retrieval with Compact Image Representations
Ph.D. Thesis defended in 2019 at the Czech Technical University in Prague

Author: Filip Radenović
Supervisor: Doc. Mgr. Ondřej Chum, Ph.D.

Sources: [ pdf | bib | presentation ]
fig_intro

Abstract


This thesis addresses the problem of visual retrieval in large-scale image datasets, where the goal is to find all images of an object instance. The object is specified by a query image, which can be a photograph, painting, edge-map, human-drawn sketch etc. Solutions to this problem can be widely used in many applications such as place or location recognition, copyright violation detection, product search, 3D reconstruction, etc. The task of visual retrieval of an object instance is a challenging one, as the representation of the object appearance has to handle: significant viewpoint, scale, and illumination change; heavy occlusions; and, different image modalities (photograph, painting, cartoon, sketch). At the same time, the search has to be performed online, i.e., when a user submits the query, the response should be immediate, even when searching through millions of images. Towards this goal, we propose methods for compact image representation, that achieve high accuracy, while maintaining low memory and computational requirements.

A number of image retrieval related problems is stated, studied and resolved in the theses. Two conceptually different approaches to compact image representation are proposed. First, a method of joint dimensionality reduction of multiple vocabularies for bag-of-words-based compact representation is proposed. Second, a method to fine-tune convolutional neural networks (CNNs) for compact image retrieval from a large collection of unordered images in a fully automated manner is proposed. We additionally show that the CNN trained with edge maps of landmark images, instead of photographs, improves performance in the cases where shape is carrying the dominant information. The proposed compact representations are evaluated on a range of different tasks, providing improvements on challenging cases of instance image retrieval, generic sketch-based image retrieval or its fine-grained counterpart, and domain generalization.

We address the issue of image retrieval benchmarking. We extend standard and popular Oxford Buildings and Paris datasets by novel annotations, protocols, and queries. The novel protocols allow fair comparison between different methods, including those using a dataset pre-processing stage. An extensive comparison of the state-of-the-art methods is performed on the new benchmark. The results show that image retrieval is far from being solved.

Finally, we introduce the concept of target mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image. The adversarial image looks nothing like the user intended query, but leads to identical or very similar retrieval results. We evaluate the attacks on standard retrieval benchmarks and compare the results retrieved with the original and adversarial image.

Contributions


Publications


This thesis build on the results previously published in the following publications. The references are ordered chronologically. The citation counts were obtained using Google Scholar on Feb 17, 2019.

The following publications were not included in the thesis, in order to keep the thesis more focused and easier to follow.