Where is the Camera? - From Space Resection to Large Scale Geolocalization


Kevin Koeser (GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany)

In photogrammetry, space resection techniques and camera pose estimation algorithms were developed to compute the position and orientation from where a photo has been taken, given known correspondences between 2D image (e.g. feature) and 3D world (e.g. ground control point).

With unknown approximate pose and the need for automated approaches, the association of image data to 3D reference model data becomes difficult and ambiguous: robust methods are required to select the pose hypothesis with the best support, where hypothesis generation is usually driven by as few image features as possible, assuming that some of them are good and unique.

To this end, I will discuss a minimal solution for space resection using affine features. However, when increasing the database (the 3D model) size to a city, a country or the whole planet (say, for intelligence, archive or forensic applications), classical robust pose estimation quickly becomes computationally infeasible and it can also happen that there is no globally unique feature in the image upon which a pose hypothesis could be generated. Rather, it might become necessary to look at all features or the whole image at once in order to narrow down the search space. This is often done by feature descriptor quantization into "visual words" and their distribution in the image, with or without considering the spatial relations between the features.

In any case, removing the vast majority of unlikely cadidate locations not only improves efficiency and makes the approaches feasible but also breaks down gelocalization to a local alignment and verification problem. Different properties and characteristics can be exploited and have to be considered for urban scenes or countryside imagery. For cities often streetlevel footage is available showing planar facades and I will discuss our city-scale system exploiting those structures (San Francisco Landmark Dataset).

In contrast, on the countryside, vegetation, weather and seasons change the appearance of the scenery, such that plain SIFT matching usually does not work. Also, photos do not densely cover the countryside (usually only available for scenic views and landmarks) or they are taken from an aerial perspective. On the other hand, here digital elevation models and maps are available that can help and I will discuss our work in this direction (e.g. Switzerland data). I will conclude with some open issues for a real world-scale geolocalization system.