Has the (Large-Scale) Image-based Localization Problem been solved?

Torsten Sattler (ETH Zurich, Switzerland)

Abstract:

In recent years, tremendous progress has been made in the area of Structure-from-Motion, allowing us to reconstruct large environments from (hundreds of) thousands of images in little time. Thus, we can use these large-scale point clouds for structure-based image localization, where we aim to determine the camera pose from which novel images were taken relative to these point clouds. Efficient and reliable localization in turn enables many interesting Computer Vision applications ranging from navigation for autonomous vehicles to large-scale Augmented Reality for mobile devices.

Over the last couple of years, multiple image-based localization approaches have been proposed that enable us to handle large-scale scenes both efficiently and effectively by being able to localize query images over a large range of viewing conditions in very little time. Very recently, we have shown that we can leverage these methods to make large-scale, real-time localization feasible on mobile devices such as tablets and smartphones. This recent success naturally begs the questions of whether we have already solved the image-based localization problem. In this talk, we will argue that this is actually not the case. Starting with a detailed overview over our current localization pipeline, we will analyze the shortcomings of the current state-of-the-art in image-based localization in terms of computational costs, scalability, robustness, and their ability to handle certain scene elements.