An executable demo version of our Kinect 3D hand tracking software can be downloaded here

Brief description

We present a novel solution to the problem of recovering and tracking the 3D position, orientation and full articulation of a human hand from markerless visual observations obtained by a Kinect sensor. We treat this as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the 3D structure and appearance of hypothesized instances of a hand model and actual hand observations. This optimization problem is effectively solved using a variant of Particle Swarm Optimization (PSO). The proposed method does not require special markers and/or a complex image acquisition setup. Being model based, it provides continuous solutions to the problem of tracking hand articulations. Extensive experiments with a prototype GPU-based implementation of the proposed method demonstrate that accurate and robust 3D tracking of hand articulations can be achieved in near real-time (12Hz).

In this work we extend our earlier approach (PEHI) for markerless and efficient 26-DOF hand pose recovery (ACCV 2010). PEHI was a generative, multiview method for 3D hand pose recovery. In the current, new approach, instead of exploiting 2D visual cues obtained by a multicamera setup, we employ 2D and 3D visual cues resulting from a single RGB-D sensor. It turns out that this (a) improves the accuracy of hand tracking (b) reduces the complexity and the cost of the required camera setup (c) improves tolerance in variations of lighting conditions and (d) drastically improves computational performance.

Graphical illustration of the method

Graphical illustration of the proposed method. A Kinect RGB image (a) and the corresponding depth map (b). The hand is segmented (c) by jointly considering skin color and depth. The proposed method fits the employed hand model (d) to this observation recovering the hand articulation (e).

Sample results

Quantitative evaluation of the performance of the method with respect to (a) the PSO parameters (b) the distance from the sensor (c) noise and (d) viewpoint variation (see paper for details).

A video with sample results on full DOF tracking of articulated hands based on the Kinect.

  • Scalable 3D tracking of multiple interacting objects (CVPR 2014a).
  • Evolutionary quasi-random search for hand articulations tracking (CVPR 2014b).
  • Physically plausible 3D scene tracking: the single actor hypothesis (CVPR 2013).
  • Tracking the articulated motion of two strongly interacting hands (CVPR 2012).
  • Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints (ICCV 2011).


Relevant publications

  • I. Oikonomidis, N. Kyriazis and A.A. Argyros, “Efficient model-based 3D tracking of hand articulations using Kinect”, in Proceedings of the 22nd British Machine Vision Conference, BMVC’2011, University of Dundee, UK, Aug. 29-Sep. 1, 2011.
  • Oikonomidis, N. Kyriazis and A.A. Argyros, “Tracking the articulated motion of two strongly interacting hands”, in the Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, Rhode Island, USA, June 18-20, 2012.
  • I. Oikonomidis, N. Kyriazis and A.A. Argyros, “Markerless and Efficient 26-DOF Hand Pose Recovery”, in Proceedings of the 10th Asian Conference on Computer Vision, ACCV’2010, Part III LNCS 6494, pp. 744–757, Queenstown, New Zealand, Nov. 8-12, 2010.
  • I. Oikonomidis, N. Kyriazis and A.A. Argyros, “Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints”, in Proceedings of the 13th IEEE International Conference on Computer Vision, ICCV’2011, Barcelona, Spain, Nov. 6-13, 2011.
  • N. Kyriazis, I. Oikonomidis, A.A. Argyros, “A GPU-powered computational framework for efficient 3D model-based vision”, Technical Report TR420, Jul. 2011, ICS-FORTH, 2011.

The electronic versions of the above publications can be downloaded from my publications page.