|Martin Bujňák: Algebraic Solutions to Abslute Pose Problems
Estimating internal and external camera calibration is a very basic element in many computer vision applications. Camera localization, structure from motion, scene reconstruction, object localization, tracking and recognition are just a few examples of such applications. This thesis focuses on minimal algorithms for estimating camera calibration, i.e. algorithms which use all possible constraints and minimal number of inputs, for example point correspondences between 2D and 3D space, to calculate the camera pose and other camera parameters such as unknown focal length or coefficients modeling lens distortion.
In this work, we first study the absolute pose problem for a calibrated camera, which was an intensively studied problem in the past and many solutions were already developed. The problem itself can be formulated as a simple system of polynomial equations. Researches in the past focused on how to solve this problem, searched for different solutions, compared numerical stability, speed, or studied how to calculate the camera pose from more than three 2D-to-3D point correspondences. We review the state-of-the-art and present our own formulations to this problem based on the well known invariants and properties of the problem. We provide solutions to our formulations using different methods for solving system of polynomial equations. Next we provide solutions to the absolute pose for a camera without complete internal calibration or for a camera where some additional information about the scene is known. In particular, absolute pose of a camera calibrated up to an unknown focal length or a camera with unknown focal length and unknown radial distortion. Furthermore, we describe special cases when some of the scene or camera priors are known, for example, scene is planar, scene is non-planar or when vertical direction of a camera is known from a gyroscope or a vanishing point. The main contribution of this thesis is in finding minimal or finding optimal solutions to these problems. We formulate all studied problems, from very basic relations between 3D space and 2D measurements, show different formulations and how can different invariants be helpful in reducing the number of unknowns or to simplify the problem. All our formulations lead to systems of polynomial equations. We show how to solve these systems using methods for solving systems of polynomial equations, which we developed during our research. Solution the problems are evaluated with high level of detail and with focus on important properties such as numerical stability and resistance to noise in data. We compare our new solvers with the state-of-the-art on synthetic and real data.
In this thesis we further present a general method which speeds up most of the presented solvers and can be also used to speed up other solvers based on eigenvalue computation. We have also found the connection between methods for converting basis of an ideal to a basis with respect to the lexicographic ordering and calculation of the characteristic polynomial of an action matrix.
Figure 1: Illustration to the camera pose estimation problem.
In this thesis we explore the anatomy of the camera pose estimation (Figure 1) from a set of n correspondences between 3D points and their 2D projections. It is a very old problem, known as Perspective-n-Point (PnP) problem and first papers considering this topic date back to 1841.
In our work we focus on minimal solutions, i.e. when n is the smallest possible value sufficient to solve the problem. This number, n, depends on addi- tional information about the camera we have. For modern digital cameras we can assume square pixels, no skew, the principal point close to the image center. Moreover, the focal length is sometimes stored with the image data. Also, data from inertial units attached to the camera became available. Adopting calibration constraints has several advantages. First, the minimal number of points needed to solve the absolute pose is reduced. Second, since fewer parameters are estimated, the results are more stable.
We show that n = 4 is enough to estimate the camera pose for a zooming camera and a zooming camera with unknown radial distortion in the image plane. We show how efficient solvers for these problems can be created by considering planar and non-planar scenes separately. The knowledge of camera vertical direction, e.g. extracted from a single vanishing point or read from an accelerometer mounted to the camera, can further simplify the problem. We show that n = 2 is enough for a calibrated camera and n = 3 for a camera with unknown focal length and radial distortion in this case.
In this work we focus on mathematical formulations of absolute pose problems, transfer them to systems of polynomial equations, and we show how to solve these systems by hand or by using automatized methods we developed. Finally, we show how to reduce computational complexity of solvers by focusing only on feasible solutions. This enables real-time use of solvers which were considered slow before.
The contribution of the thesis is related to the problem of absolute camera pose estimation from a minimal set of point correspondences between 2D and 3D space. This thesis also studies the absolute camera pose problem when some additional information is available, e.g. when scene is planar or when camera vertical direction is known. In particular, the main contribution of this dissertation work is in formulating and developing new solutions to camera absolute pose problems:
Absolute pose of a calibrated camera
Figure 2: The camera pose problem illustration. Reference points Xi in the
world space are projected to points xi in the camera image plane.
The basic mathematical relation describing projection of a set of 3D points Xi to their images xi measured in the image plane under the pinhole camera model can be described in terms of a camera calibration matrix K, a rotation matrix R and a translation vector t which satisfy following relation:
where alphas are the unknown scalar factors. The matrix K is supposed to be known in the case of calibrated P3P problem and the goal is to compute R and t given three point correspondences.
To simplify the problem, 3D points Xi are usually expressed in the camera coordinate system (Figure 2). Then the rigid transformation between these sets of points, i.e. rotation R and translation t, is searched for. The cosine law is often used to calculate the distances from the camera center to the 3D points in the camera coordinate systems, given the angles between the rays of sights and the distances between the 3D points. This leads to three polynomial equations in three unknowns. Alternatively, world space 3D points Xi can be expressed in the camera coordinate system using unknown scalars i from Equation 1. A system of polynomial equations can be build using the fact that Euclidean distance between 3D points is preserved under a rigid transform, i.e.
Even a simpler system of polynomial equations can be build by considering ratios of these distances. We solve above systems using our automatic generator. Since three points always lie in a plane, we can calculate a homography between reference 3D points Xi and their projections xi. Having a calibrated camera gives us constrains on the structure of the homography, which can be solved in a closed form. Unknown pose, R and t, can be derived directly from the computed homography.
Absolute pose of a camera with an unknown focal length
Figure 3: Relations between 2D and 3D space when focal length is unknown.
Unlike some internal camera parameters, which can be safely set for most of present digital cameras to some prior value, the effective focal length can significantly change when using a zooming camera and thus needs to be estimated from images together with the camera position and orientation, Figure 3.
The equations arising from distances and ratios of distances invariants, we used to solve P3P problem, now also contain unknown focal length f what results in more complicated systems of polynomial equations. Simpler systems can be obtained by considering planar and non-planar cases separately. Splitting does not bring any practical obstacle as 3D points are always known in advance. Inspired by Josephson et.al. we also show how to solve this problem directly using quaternions.
Four point correspondences already over-constrain this problem and hence we can create different solvers by choosing its formulation, a subset of constrains and a method of solution. By creating and testing several thousands of such such solvers we show that solvers with very different behavior can be obtained, Figure 4.
Figure 4: From the left, numerical stability, accuracy w.r.t 1px noise level and
speed of growing of inlier set in a real experiment for various solvers.
Unknown focal length and radial distortion
Figure 5: Image distorted by radial lens distortion (Left). A greater inlier
set is gained faster even on slightly distorted images (Right).
Including the radial distortion estimation into camera pose calculation in the early stage of the reconstruction process allows a broader variety of cameras to be handled compared to when only the focal length is calculated. Even using a simple distortion model, like the division model, can rapidly help finding more correct matches between 2D and 3D space, Figure 5 (Right).
The first minimal solver estimating both the radial distortion and the focal length was presented by Josephson et.al. A more efficient solution can be obtained by decomposing the problem for non-planar and for planar scenes. These solvers can be joined into a general solver which gives comparable or even better results than Josephson's general solver for planar as well as non-planar scenes.
Known vertical direction
Figure 6: Knowing the vertical direcion.
Knowing the vertical direction which is equivalent to knowing the camera rotation around two axes gracefully simplifies the absolute pose problem, Figure 6. This is important since more and more cameras become equipped with inertial measurement units (IMUs) like gyroscopes and accelerometers, compasses, or GPS devices. Information from these sensors is often stored along with the image data. This could be even more the case with modern cellular phones and other smart devices.
Given the rotation of the calibrated camera around two axes and the images of two known 3D reference points, absolute position of the camera and the rotation of the camera around the third axis (y-axis) can be estimated in a closed form. Similarly, given distorted images of three known 3D reference points in the camera with unknown focal length, the absolute position of the camera, the rotation of the camera around the third axis (y-axis), the unknown focal length and the parameter of radial distortion can be estimated in closed form too.
In both cases, knowing the vertical direction leads to very fast, efficient, and numerically stable closed-form solutions.
Figure 7: Speed profile of relative 5pt algorithm.
Minimal solvers are often used frequently inside RANSAC cycle. Therefore, it is very important to maximize their efficiency.
Recently, it has been demonstrated that the numerical stability of Groebner basis solvers can be improved, e.g., by reordering columns in elimination templates, using QR or LU decomposition or by “extending the basis”. Also, several methods for reducing the size of elimination templates, and in this way speeding up Grobner basis solvers and improving their stability, were proposed. Usually it is not G-J elimination or LU decomposition of the elimination templates but the eigenvalue computation that takes the largest fraction of time in many solvers, see Figure 7.
We propose several methods for speeding up minimal solvers based on the Grobner basis and the action matrix computation. We show that such solvers can be significantly sped up by replacing the eigenvalue computations with the computation of the characteristic polynomial of an action matrix followed by the calculation of its roots using Sturm-sequences. We demonstrate how effective the proposed methods are on three important computer vision problems.
The thesis contributions are solutions to important absolute pose problems, some of them previously unsolved. In particular, we proposed three completely new formulations for calibrated camera based on distances, ratios of distances invariants and the simple direct solution based on homography. Next, we ex- tended these formulations to a zooming camera, i.e. with unknown focal length. We presented and analyzed several general, planar and non-planar solvers. Fur- ther, we developed more efficient and numerically stable solution for zooming camera with unknown focal length and radial distortion. Given a camera verti- cal direction, we solved absolute pose problems from two points for a calibrated camera and from three points for a radially distorted camera with an unknown focal length. We have demonstrated the quality and effectiveness of all pro- posed solvers on both synthetic and real data. Finally, we have shown how to create fast and efficient solvers by removing eigenvalues and eigenvectors calculation.
|by Martin Bujnak © 2014. All rights reserved.