MocapNET: Ensemble of SNN Encoders for 3D Human Pose Estimation in RGB Images

Brief description

We present MocapNET, an ensemble of SNN encoders that estimates the 3D human body pose based on 2D joint estimations extracted from monocular RGB images. MocapNET provides an efficient divide and conquer strategy for supervised learning. It outputs skeletal information directly into the BVH format which can be rendered in real-time or imported without any additional processing in most popular 3D animation software. The proposed architecture achieves 3D human pose estimations at state of the art rates of 400Hz using only CPU processing.

Sample results

Video with description and experimental results

Main web page

Check the github page of Ammar Qammaz.


  • Ammar Qammaz, Antonis A. Argyros
  • We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Quadro P6000 GPU used for the execution of this research. This work was partially supported by EU H2020 projects Mingei (Grant No 822336) and Co4Robots (Grant No 731869).

Relevant publications

  • A. Qammaz, A.A. Argyros, “MocapNET: Ensemble of SNN Encoders for 3D Human Pose Estimation in RGB Images”, British Machine VIsion Conference (BMVC 2019), Cardiff, UK, September, 2019.

The electronic versions of the above publications can be downloaded from my publications page.