We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the 3D structure and appearance of hypothesized instances of a hand model and actual hand observations. This optimization problem is effectively solved using a variant of Particle Swarm Optimization (PSO). The proposed method does not require special markers and/or a complex image acquisition setup. Being model based, it provides continuous solutions to the problem of tracking hand articulations. Extensive experiments with a prototype GPU-based implementation of the proposed method demonstrate that accurate and robust 3D tracking of hand articulations can be achieved in near real-time (12Hz).
In this work we extend our earlier approach (PEHI) for markerless and efficient 26-DOF hand pose recovery (ACCV 2010). PEHI was a generative, multiview method for 3D hand pose recovery. In the current, new approach, instead of exploiting 2D visual cues obtained by a multicamera setup, we employ 2D and 3D visual cues resulting from a single RGB-D sensor. It turns out that this (a) improves the accuracy of hand tracking (b) reduces the complexity and the cost of the required camera setup (c) improves tolerance in variations of lighting conditions and (d) drastically improves computational performance. An executable demo version of our Kinect 3D hand tracking software can be downloaded here.
Graphical illustration of the proposed method. A Kinect RGB image (a) and the corresponding depth map (b). The hand is segmented (c) by jointly considering skin color and depth. The proposed method fits the employed hand model (d) to this observation recovering the hand articulation (e).
Other related problems treated in terms of the same framework:
Quantitative evaluation of the performance of the method with respect to (a) the PSO parameters (b) the distance from the sensor (c) noise and (d) viewpoint variation (see paper for details).
A video with sample results on full DOF tracking of articulated hands based on the Kinect.
The electronic versions of the above publications can be downloaded from my publications page.