Structure and motion

Next: Augmented Video Up: Augmented reality Previous: Overview Contents

Structure and motion

The structure and motion recovery approach described in this text can easily be adapted to fit the needs of AR applications. However, the methods of Chapter 4 and 5 implicitly assume that two consecutive views are not 'too close'. If this is the case, e.g. for two consecutive images in a video sequence, the computation of the matrix F and therefore the determination of the corner matches between the two images becomes an ill-conditioned problem. Even if the matches could be found exactly the updating of motion and structure is ill-conditioned as the triangulation of newly reconstructed 3D points is very inaccurate as depicted in Figure 8.15.

**Figure 8.15:** If the images are chosen too close to each other the position and orientation of the camera hasn't changed much. Uncertainties in the image corners lead to a large uncertainty ellipsoid around the reconstructed point (left). If images are taken further apart the camera position and orientation may differ more from one image to the next, leading to smaller uncertainty on the position of the reconstructed point (right).
$\begin{figure}\centerline{\psfig{figure=mod/AR/illconditioned1.ps,width=1.4cm}\hspace{10mm}\psfig{figure=mod/AR/illconditioned2.ps,width=4cm}} \end{figure}$

We solved this problem by running through the video sequence a first time to build up an accurate but crude 3D reconstruction of the real environment. Accuracy is obtained by using key-frames which are separated sufficiently from each other in the video sequence (see Figure 8.16). Structure and motion are extracted for these key-frames. In the next step each unprocessed image is calibrated using corner matches with the two key-frames between which it is positioned in the video sequence. For these new images no new 3D structure points are reconstructed as they will probably be ill-conditioned due to the closeness of the new image under scrutiny and its neighboring key-frames. In this way a crude but accurate 3D structure is built up in a first pass along with the calibration of the key-frames. In a second pass, every other image is calibrated using the 2D-3D corner matches it has with its neighboring key-frames. This leads to both a robust determination of the reconstructed 3D environment and the calibration of each image within the video sequence.

**Figure 8.16:** The small dots on the background represent the recovered crude 3D environment. The larger dark spots represent camera positions of key-frames in the video stream. The lighter spots represent the camera positions of the remaining frames.
$\begin{figure}\begin{center} \hspace{0pt} \psfig{figure=mod/AR/keyframes.ps,width=12cm} \end{center}\end{figure}$

Next: Augmented Video Up: Augmented reality Previous: Overview Contents

Marc Pollefeys 2000-07-12