Efficient Object Detection through Selective Search and FLAIR

(University of Amsterdam, The Netherlands)

This talk is on the problem of making object detection more efficient. First, it addresses the problem of generating possible object locations for use in object recognition through selective search. Selective search combines the strength of both an exhaustive search and segmentation. Like segmentation, we use the image structure to guide our sampling process. Like exhaustive search, we aim to capture all possible object locations.

Our selective search results in a small set of data-driven, class-independent, high quality locations, yielding 98% recall and an average intersection over union of over 0.8. The reduced number of locations compared to an exhaustive search enables the use of stronger machine learning techniques and stronger appearance models for object recognition, such as the Fisher vector encoding. However, encoding selective search boxes with the Fisher vector is still a major computational bottleneck.

To address this bottleneck, we introduce FLAIR, a Fast, Local Area Independent Representation which accelerates encoding by 18x without any approximations. Fisher with FLAIR was the key component to our winning entry in the ImageNet 2013 Detection task. Finally, we will compare Selective Search and Fisher with FLAIR to the current state-of-the-art in object detection, with surprising results