Efficient, Causal Camera Tracking In Unprepared Environments

Manolis Lourakis and Antonis Argyros
Institute of Computer Science,
Foundation for Research and Technology - Hellas,
Heraklion, Crete, Greece

[ Reconstructions Gallery :: Augmented Video :: Contact Address ]

Brief description: This work addresses the problem of tracking the 3D pose of a camera in space, using the images it acquires while moving freely in unmodeled, arbitrary environments. A novel feature-based approach for camera tracking is proposed, intended to facilitate tracking in on-line, time-critical applications such as video see-through augmented reality. In contrast to several existing methods which are designed to operate in a batch, off-line mode, assuming that the whole video sequence to be tracked is available before tracking commences, the proposed method operates on images incrementally. At its core lies a feature-based 3D plane tracking technique, which permits the estimation of the homographies induced by a virtual 3D plane between successive image pairs. Knowledge of these homographies allows the corresponding projection matrices encoding camera motion to be expressed in a common projective frame and, therefore, to be recovered directly, without estimating 3D structure. Projective camera matrices are then upgraded to Euclidean and used for recovering structure, which is in turn employed for refining the projection matrices through local bundle adjustment. The proposed approach is causal, is tolerant to erroneous and missing feature matches, does not require modifications of the environment and has realistic computational requirements.

A detailed description of the approach can be found in ICS/FORTH Technical Report #324, Sep. 2003. A shorter version entitled ``Vision-based Camera Motion Recovery for Augmented Reality'', was presented in the 2004 Computer Graphics International Conference (CGI'04). Additionally, a journal version titled ``Efficient, Causal Camera Tracking In Unprepared Environments'' has been published in the Computer Vision and Image Understanding Journal and a demo video titled ``Camera Matchmoving in Unprepared, Unknown Environments'' has been included in the CVPR'05 video proceedings.

Reconstructions Gallery

Sample experimental results from the application of the proposed camera tracker on a variety of image sequences are shown below. For each sequence, a VRML file illustrating the recovered motion and structure is provided. Dots correspond to 3D points, red pyramids to camera locations and green polylines to camera trajectories. Running times were measured on an Intel P4@2.5 GHz laptop. Roughly 80% of the reported execution time is spent for detecting and matching image corners.

We recommend using VRMLview to inspect VRML models.

Clicking on the second column images brings up a larger view.

INRIA MOVI house INRIA MOVI reconstructed from 59 frames

INRIA MOVI house

Images of a model house, acquired by a fixed camera as a model house on a turntable made a full revolution around its vertical axis.
  • Frames used: 59
  • Average number of matched corners per pair: 197.7
  • Average number of matched corners per triplet: 127.96
  • Average running time per frame (ms): 485.05
  • VRML model.

Digital Air's cooks Digital Air's cooks reconstructed from 27 frames

Digital Air's cooks

"Frozen time" sequence captured with Digital Air's TimeTrack camera.
  • Frames used: 27
  • Average number of matched corners per pair: 486.49
  • Average number of matched corners per triplet: 354.05
  • Average running time per frame (ms): 946.89
  • VRML model.

Oxford's basement Oxford's basement reconstructed from 11 frames

Oxford's basement

Sequence acquired by a camera mounted on a mobile robot as it approached the scene while smoothly turning left.
  • Frames used: 11
  • Average number of matched corners per pair: 144.1
  • Average number of matched corners per triplet: 91.0
  • Average running time per frame (ms): 308.3
  • VRML model.

Leuven's Arenberg castle Leuven's Arenberg castle reconstructed from 22 frames

Leuven's Arenberg castle

Sequence shot with a handheld camera, exhibiting relatively large interframe translational motion and epipoles being located outside the images.
  • Frames used: 22
  • Average number of matched corners per pair: 483.43
  • Average number of matched corners per triplet: 373.65
  • Average running time per frame (ms): 1173.2
  • VRML model.

Leuven's Sagalassos site Leuven's Sagalassos site reconstructed from 26 frames

Leuven's Sagalassos site

Sequence shot with a camcorder, frames are characterized by very small interframe motion. Imaged scene contains two dominant planes, relative to which the camera moves laterally.
  • Frames used: 26
  • Average number of matched corners per pair: 481.94
  • Average number of matched corners per triplet: 331.31
  • Average running time per frame (ms): 987.062
  • VRML model.

Leuven's Beguinages Leuven's Beguinages reconstructed from 11 frames

Leuven's Beguinages

Small interframe motion sequence, shot with a camcorder as the operator approached the scene. Forward camera motion results in the angle between the triangulating 3D lines being small, making structure recovery challenging.
  • Frames used: 11
  • Average number of matched corners per pair: 382.89
  • Average number of matched corners per triplet: 240.37
  • Average running time per frame (ms): 1164.12
  • VRML model.

Office desk Office desk reconstructed from 46 frames

Office desk

Sequence shot with a firewire webcam undergoing complex motion, resulting in large changes in the field of view.
  • Frames used: 46
  • Average number of matched corners per pair: 330.22
  • Average number of matched corners per triplet: 211.23
  • Average running time per frame (ms): 681.20
  • VRML model.

Calibration object Calibration object reconstructed from 27 frames

Calibration object

Images of a two-face calibration object that were acquired with a consumer digital camera. Corners were determined as the intersections of line segments fitted to the calibration grids.
  • Frames used: 27
  • Average number of matched corners per pair: 722
  • Average number of matched corners per triplet: 722
  • Average running time per frame (ms): 382.7 (not including corner extraction and matching)
  • VRML model.

Augmented Video

In addition to VRML reconstructions, the tracking results were used to augment the original sequences with artificial 3D objects. To achieve this, the estimated camera trajectories were exported to 3DSMax using MaxScript and then the augmented sequences were generated with the aid of 3DSMax's rendering engine that used the original sequence as a background. The initial alignment of the coordinate systems employed by the camera tracker and 3DSMax was achieved interactively, by manually rotating and translating them until they lined up. The placement of the artificial graphical objects into the scene was guided by the structure information also provided by the camera tracker.

Click here for a ~16Mb video augmenting (among others) the above sequences. Notice that the frame dimensions have been decreased to reduce the video's file size.

Contact Address

For any questions, please contact M. Lourakis' e-mail address

[ Reconstructions Gallery :: Augmented Video :: Contact Address ]

Total hits since Thu Jul 22 11:59:13 EEST 2004:
lottery
Free Web Counter
lottery