Algebraic solutions to absolute pose problems

Martin Bujňák: Algebraic Solutions to Abslute Pose Problems

Cz/En

Center for Machine Perception
Department of Cybernetics
Faculty of Electrical Engineering
Czech Technical University in Prague

Computational and Cognitive
Vision Systems: A Training
European Network

Estimating internal and external camera calibration is a very basic element in many computer vision applications. Camera localization, structure from motion, scene reconstruction, object localization, tracking and recognition are just a few examples of such applications. This thesis focuses on minimal algorithms for estimating camera calibration, i.e. algorithms which use all possible constraints and minimal number of inputs, for example point correspondences between 2D and 3D space, to calculate the camera pose and other camera parameters such as unknown focal length or coefficients modeling lens distortion.
In this work, we first study the absolute pose problem for a calibrated camera, which was an intensively studied problem in the past and many solutions were already developed. The problem itself can be formulated as a simple system of polynomial equations. Researches in the past focused on how to solve this problem, searched for different solutions, compared numerical stability, speed, or studied how to calculate the camera pose from more than three 2D-to-3D point correspondences. We review the state-of-the-art and present our own formulations to this problem based on the well known invariants and properties of the problem. We provide solutions to our formulations using different methods for solving system of polynomial equations. Next we provide solutions to the absolute pose for a camera without complete internal calibration or for a camera where some additional information about the scene is known. In particular, absolute pose of a camera calibrated up to an unknown focal length or a camera with unknown focal length and unknown radial distortion. Furthermore, we describe special cases when some of the scene or camera priors are known, for example, scene is planar, scene is non-planar or when vertical direction of a camera is known from a gyroscope or a vanishing point. The main contribution of this thesis is in finding minimal or finding optimal solutions to these problems. We formulate all studied problems, from very basic relations between 3D space and 2D measurements, show different formulations and how can different invariants be helpful in reducing the number of unknowns or to simplify the problem. All our formulations lead to systems of polynomial equations. We show how to solve these systems using methods for solving systems of polynomial equations, which we developed during our research. Solution the problems are evaluated with high level of detail and with focus on important properties such as numerical stability and resistance to noise in data. We compare our new solvers with the state-of-the-art on synthetic and real data.
In this thesis we further present a general method which speeds up most of the presented solvers and can be also used to speed up other solvers based on eigenvalue computation. We have also found the connection between methods for converting basis of an ideal to a basis with respect to the lexicographic ordering and calculation of the characteristic polynomial of an action matrix.

Figure 1: Illustration to the camera pose estimation problem.

In this thesis we explore the anatomy of the camera pose estimation (Figure 1) from a set of n correspondences between 3D points and their 2D projections. It is a very old problem, known as Perspective-n-Point (PnP) problem and first papers considering this topic date back to 1841.
     In our work we focus on minimal solutions, i.e. when n is the smallest possible value sufficient to solve the problem. This number, n, depends on addi- tional information about the camera we have. For modern digital cameras we can assume square pixels, no skew, the principal point close to the image center. Moreover, the focal length is sometimes stored with the image data. Also, data from inertial units attached to the camera became available. Adopting calibration constraints has several advantages. First, the minimal number of points needed to solve the absolute pose is reduced. Second, since fewer parameters are estimated, the results are more stable.
     We show that n = 4 is enough to estimate the camera pose for a zooming camera and a zooming camera with unknown radial distortion in the image plane. We show how efficient solvers for these problems can be created by considering planar and non-planar scenes separately. The knowledge of camera vertical direction, e.g. extracted from a single vanishing point or read from an accelerometer mounted to the camera, can further simplify the problem. We show that n = 2 is enough for a calibrated camera and n = 3 for a camera with unknown focal length and radial distortion in this case.
     In this work we focus on mathematical formulations of absolute pose problems, transfer them to systems of polynomial equations, and we show how to solve these systems by hand or by using automatized methods we developed. Finally, we show how to reduce computational complexity of solvers by focusing only on feasible solutions. This enables real-time use of solvers which were considered slow before.

The contribution of the thesis is related to the problem of absolute camera pose estimation from a minimal set of point correspondences between 2D and 3D space. This thesis also studies the absolute camera pose problem when some additional information is available, e.g. when scene is planar or when camera vertical direction is known. In particular, the main contribution of this dissertation work is in formulating and developing new solutions to camera absolute pose problems:

Absolute pose solvers for a calibrated camera. We show basic mathematical and algebraical relations, useful constraints and invariants and also how to build a system of polynomial equations which can be used to easily solve this P3P problem. We derive new solutions to P3P problem based on (1) distances between points in 3D space, (2) ratios of these distances, and we also derived (3) a direct solution to the problem (calculating camera matrix directly) based on the relation between projected points lying on the plane. Next, we build on these basic relations and derive more complicated solvers which gradually relax input calibration assumptions.
Absolute pose solvers for a camera with unknown focal length is our next contribution. Compared to work we published at CVPR 2008, we present several different approaches to this problem. We exhaustively searched and analyzed over hundred thousands of possible solvers, picked the best solvers which can benefit from the fact that four point correspondences over-constrain this problem. We found solvers which provide much better performance on a noisy data compared to solvers presented in our previous work published at CVPR 2008. We also show that the problem can be simplified by considering planar and non-planar scenes separately. Moreover, this formulation allows calculation of aspect ratio or camera skew from the given set of four 2D-to-3D point correspondences as well.
Fast absolute pose solver for a camera with unknown focal length and unknown radial distortion. We achieved speed and efficiency by splitting the problem into two sub-problems, i.e. for planar and non-planar scenes. We demonstrate in experiments that the general purpose solver can be created by combining solvers of these two sub-problems..
Absolute pose solver for a camera with known vertical direction e.g. aligned with the ground plane, read from a vanishing point or from an inertial sensor. We derive closed form solutions to two previously unsolved problems. The first one is for a completely calibrated camera, where we show that two point correspondences are enough to calculate the camera pose. The second one uses three point correspondences to calculate the camera pose, unknown focal length and unknown radial distortion. Both solutions are very fast, efficient, numerically stable and closed-form.
Solvers optimization is our next contribution. We show how the theory of solving system of polynomial equations using the action matrices is connected to the characteristic polynomial calculation and Cayley-Hamilton theorem. We developed a fast matrix version of popular FGML algorithm used for converting different Groebner bases to a basis w.r.t. the lexicographic ordering. Next, we show how the conversion process is related to the characteristic polynomial calculation of an action matrix using Krylov's method. In fact, one of the polynomials in the lexicographic ordering is the characteristic polynomial of the action matrix. Based on this result we show how to build faster and numerically stable Groebner basis solvers.

In the thesis we show how to formulate and solve the above camera absolute pose problems. We show that it is often sufficient to use simple methods to solve seemingly complicated problems. Therefore, we show step-by-step how to get from a problem formulation to a final solution using popular algebraic methods. For more complicated problems we show how to use our Groebner basis solver generator, which we recently developed. We also focus on technical details of the solver creation process, describe how to work in finite prime field arithmetic, how to create cameras and 3D scenes in the finite prime field, how to cope with over-constrained systems and also how to further tune speed of solvers.

Figure 2: The camera pose problem illustration. Reference points Xi in the
world space are projected to points xi in the camera image plane.

The basic mathematical relation describing projection of a set of 3D points Xi to their images xi measured in the image plane under the pinhole camera model can be described in terms of a camera calibration matrix K, a rotation matrix R and a translation vector t which satisfy following relation:

where alphas are the unknown scalar factors. The matrix K is supposed to be known in the case of calibrated P3P problem and the goal is to compute R and t given three point correspondences.
To simplify the problem, 3D points Xi are usually expressed in the camera coordinate system (Figure 2). Then the rigid transformation between these sets of points, i.e. rotation R and translation t, is searched for. The cosine law is often used to calculate the distances from the camera center to the 3D points in the camera coordinate systems, given the angles between the rays of sights and the distances between the 3D points. This leads to three polynomial equations in three unknowns. Alternatively, world space 3D points Xi can be expressed in the camera coordinate system using unknown scalars i from Equation 1. A system of polynomial equations can be build using the fact that Euclidean distance between 3D points is preserved under a rigid transform, i.e.

Even a simpler system of polynomial equations can be build by considering ratios of these distances. We solve above systems using our automatic generator. Since three points always lie in a plane, we can calculate a homography between reference 3D points Xi and their projections xi. Having a calibrated camera gives us constrains on the structure of the homography, which can be solved in a closed form. Unknown pose, R and t, can be derived directly from the computed homography.

Figure 3: Relations between 2D and 3D space when focal length is unknown.

Unlike some internal camera parameters, which can be safely set for most of present digital cameras to some prior value, the effective focal length can significantly change when using a zooming camera and thus needs to be estimated from images together with the camera position and orientation, Figure 3.
The equations arising from distances and ratios of distances invariants, we used to solve P3P problem, now also contain unknown focal length f what results in more complicated systems of polynomial equations. Simpler systems can be obtained by considering planar and non-planar cases separately. Splitting does not bring any practical obstacle as 3D points are always known in advance. Inspired by Josephson et.al. we also show how to solve this problem directly using quaternions.
Four point correspondences already over-constrain this problem and hence we can create different solvers by choosing its formulation, a subset of constrains and a method of solution. By creating and testing several thousands of such such solvers we show that solvers with very different behavior can be obtained, Figure 4.

Figure 4: From the left, numerical stability, accuracy w.r.t 1px noise level and
speed of growing of inlier set in a real experiment for various solvers.

Figure 5: Image distorted by radial lens distortion (Left). A greater inlier
set is gained faster even on slightly distorted images (Right).

Including the radial distortion estimation into camera pose calculation in the early stage of the reconstruction process allows a broader variety of cameras to be handled compared to when only the focal length is calculated. Even using a simple distortion model, like the division model, can rapidly help finding more correct matches between 2D and 3D space, Figure 5 (Right).
The first minimal solver estimating both the radial distortion and the focal length was presented by Josephson et.al. A more efficient solution can be obtained by decomposing the problem for non-planar and for planar scenes. These solvers can be joined into a general solver which gives comparable or even better results than Josephson's general solver for planar as well as non-planar scenes.

Figure 6: Knowing the vertical direcion.

Knowing the vertical direction which is equivalent to knowing the camera rotation around two axes gracefully simplifies the absolute pose problem, Figure 6. This is important since more and more cameras become equipped with inertial measurement units (IMUs) like gyroscopes and accelerometers, compasses, or GPS devices. Information from these sensors is often stored along with the image data. This could be even more the case with modern cellular phones and other smart devices.
Given the rotation of the calibrated camera around two axes and the images of two known 3D reference points, absolute position of the camera and the rotation of the camera around the third axis (y-axis) can be estimated in a closed form. Similarly, given distorted images of three known 3D reference points in the camera with unknown focal length, the absolute position of the camera, the rotation of the camera around the third axis (y-axis), the unknown focal length and the parameter of radial distortion can be estimated in closed form too.
In both cases, knowing the vertical direction leads to very fast, efficient, and numerically stable closed-form solutions.

Figure 7: Speed profile of relative 5pt algorithm.

Minimal solvers are often used frequently inside RANSAC cycle. Therefore, it is very important to maximize their efficiency.
Recently, it has been demonstrated that the numerical stability of Groebner basis solvers can be improved, e.g., by reordering columns in elimination templates, using QR or LU decomposition or by “extending the basis”. Also, several methods for reducing the size of elimination templates, and in this way speeding up Grobner basis solvers and improving their stability, were proposed. Usually it is not G-J elimination or LU decomposition of the elimination templates but the eigenvalue computation that takes the largest fraction of time in many solvers, see Figure 7.
We propose several methods for speeding up minimal solvers based on the Grobner basis and the action matrix computation. We show that such solvers can be significantly sped up by replacing the eigenvalue computations with the computation of the characteristic polynomial of an action matrix followed by the calculation of its roots using Sturm-sequences. We demonstrate how effective the proposed methods are on three important computer vision problems.

The thesis contributions are solutions to important absolute pose problems, some of them previously unsolved. In particular, we proposed three completely new formulations for calibrated camera based on distances, ratios of distances invariants and the simple direct solution based on homography. Next, we ex- tended these formulations to a zooming camera, i.e. with unknown focal length. We presented and analyzed several general, planar and non-planar solvers. Fur- ther, we developed more efficient and numerically stable solution for zooming camera with unknown focal length and radial distortion. Given a camera verti- cal direction, we solved absolute pose problems from two points for a calibrated camera and from three points for a radially distorted camera with an unknown focal length. We have demonstrated the quality and effectiveness of all pro- posed solvers on both synthetic and real data. Finally, we have shown how to create fast and efficient solvers by removing eigenvalues and eigenvectors calculation.

Journal articles

M. Bujnak, Z. Kukelova, and T. Pajdla. Efficient solutions to the absolute pose of cameras with unknown focal length and radial distortion by decomposition to planar and non-planar cases. IPSJ Transaction on Computer vision and Application (CVA), 4:78–86, May 2012.

Conference papers (included in thesis)

M. Bujnak, Z. Kukelova, and T. Pajdla. Making Minimal Solvers Fast. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12), 2012. pdf
M. Bujnak, Z. Kukelova, and T. Pajdla. New efficient solution to the absolute pose problem for camera with unknown focal length and radial distortion. In 10th Asian Conference on Computer Vision (ACCV'10), volume 6492 of Lecture Notes in Computer Science, pages 11–24, 2011. pdf
Z. Kukelova, M. Bujnak, and T. Pajdla. Closed-form solutions to minimal absolute pose problems with known vertical direction. In 10th Asian Conference on Computer Vision (ACCV'10), volume 6493 of Lecture Notes in Computer Science, pages 216–229, 2011. pdf
M. Bujnak, Z. Kukelova, and T. Pajdla. A general solution to the p4p problem for camera with unknown focal length. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08), Vols 1-12, pages 3506–3513, 2008. pdf
Z. Kukelova, M. Bujnak, and T. Pajdla. Automatic Generator of Minimal Problem Solvers. In 10th European Conference on Computer Vision (ECCV'08), volume 5304 of Lecture Notes in Computer Science, pages 302–315, 2008. pdf

Other related publications

Z. Kukelova, M. Bujnak, T. Pajdla, Real-time solution to the absolute pose problem with unknown radial distortion and focal length, In IEEE International Conference on Computer Vision (ICCV'13), Sydney, Australia, 2013. pdf
Z. Kukelova, M. Bujnak, and T. Pajdla. Polynomial eigenvalue solutions to the 5-pt and 6-pt relative pose problems. In British Machine Vision Conference (BMVC'08), 2008. pdf
Z. Kukelova, M. Bujnak, and T. Pajdla. Polynomial eigenvalue solutions to minimal problems in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1381–1393, 2012. (Spotlight paper, IF 4.795) html

Download thesis (6.4MB pdf)