Visual tracking is one of the most important applications of Computer Vision and several tracking systems have been developed, which, either focus mainly on the tracking of targets moving on a plane or attempt to reduce the 3-dimensional tracking problem to the tracking of a set of characteristic points of the target. These approaches are seriously handicapped in complex visual situations from segmentation and point correspondence problems.
A mathematical theory for visual tracking of a three-dimensional target moving rigidly in 3-D is presented here and it is shown how a monocular observer can track an initially foveated object and keep it stationary in the center of his visual field. Our attempt is to develop correspondence-free tracking schemes and take advantage of the dynamic segmentation capabilities inherent in the optical flow formalism. Moreover, a general tracking criterion, the Tracking Constraint is derived, which reduces tracking to an appropriate optimization problem. The connection of our tracking strategies with the Active Vision Paradigm is shown to provide a solution to the Egomotion problem.
In the first part of this work, tracking strategies based on the assumption that we know the optical flow field are examined and tracking is formulated as a constrained optimization and a penalized least-squares problem. In the second part, tracking strategies based on the recovery of the 3-D motion of the target are devised under the assumption that we know the shape of the target. A correspondence-free scheme is derived, which depends on global information about the scene (provided from linear features of the image) in order to bypass the ill-posed problem of computing the spatial derivatives of the image intensity function and amounts to the solution of a linear system of equations in order to estimate the 3-D motion of the target. An important feature of these tracking strategies is that they do not require continuous segmentation of the image in order to locate the target. Supposing that the target is sufficiently textured , dynamic segmentation using temporal derivatives of the linear features provides sufficient information for the tracking phase. Therefore, this approach is expected to perform best when previous ones perform worst, namely in a complex visual environment.
Experimental results for the algorithms presented here demonstrate their robustness in the presence of noise.
D.P. Tsakiris, "Visual Tracking Strategies", University of Maryland, College Park, 1988. (Also available as Institute for Systems Research Technical Report M.S. 88-8, 1988).