Dissertations |
PhD Thesis
Author: M. Sigalas, Supervisor: Professor Panos Trahanias
"Full-body pose tracking under severe occlusions: the Top View Reprojection approach". University of Crete, Department of Computer Science, Heraklion, Greece, June 2015.
Abstract -- Marker-less articulated human body pose recovery and tracking is a challenging
problem of great importance, with strong theoretical and practical implications.
The recent introduction of low-cost depth cameras triggered a number of interesting
new works, pushing forward the state of the art. However, despite the remarkable
progress, estimating the body pose in realistic, complex scenarios is still an open
research task.
In this thesis we propose and develop a markerless model-based method to recover
and track the full body pose, from RGB-D sequences, in arbitrary scenarios where
users can freely enter or leave the scene, move, act and interact with other users or
the environment. Our research focuses mainly on the problem of handling occlusions,
either across body parts belonging to the same user, or across different users. At
the same time, we attempt to tackle additional important issues encountered in the
problem at hand, such as dealing with the large diversity of human bodies or the
unconstrained initialization of tracking.
Towards this goal, we introduced the novel concept of Top View Reprojection
(TVR) of cylindrical objects, which uniquely dfines the pose of a cylinder based
on certain quantitative appearance properties of its Top View, i.e. the view aligned
with the cylinder's main axis. Based on this, the problem of estimating the pose
of a cylindrical object becomes that of estimating the corresponding Top View.
Interestingly, the developed formulation of TVR remains unaffected from factors
such as noisy or missing data.
Capitalising on the TVR concept, we represent the human body by a cylinderbased
model, consisting of 11 body parts. The body is uniformly treated within the
TVR framework following a local optimization technique; body parts, represented
as cylinders, are examined in a top-to-bottom sequential order, starting from the
head. For each body part a set of hypotheses is generated and tracked over time
by a Particle Filter (PF). To evaluate each hypothesis, we employ a novel metric
that considers the virtual Top View of the corresponding body part. The latter,
in conjunction with regular depth information, effectively copes with difficult and
ambiguous cases, such as severe inter- and intra-person occlusions.
For evaluation purposes, we conducted several series of experiments addressing
realistic scenarios of gradually increased difficulty, involving varying number of users
interacting with each other. We further compared the performance of the proposed
method against that of state-of-the-art approaches using public or own-collected
datasets with ground truth annotation. The presented quantitative and qualitative
results attest for the effectiveness of our approach.
@article{sigalas2015full, AUTHOR = {Sigalas, M.}, MONTH = June, YEAR = 2015, TITLE = {Full-body pose tracking under severe occlusions: the Top View Reprojection approach}, ADDRESS = {Heraklion, Greece}, SCHOOL = {University of Crete, Department of Computer Science}, TYPE = {PhD Thesis}, URL = {http://www.didaktorika.gr/eadd/handle/10442/36130?locale=en}, HTYPE = {thesis} }
MSc Thesis
Author: M. Sigalas, Supervisor: Professor Panos Trahanias
"Probabilistic Gesture Recognition". University of Crete, Department of Computer Science, Heraklion, Greece, December 2008.
Abstract -- Communication with the use of gestures is a very crucial and common form of interaction in human societies. Gestures not only allow us to interact with other people and objects, but, in some cases, substitute every other form of communication –deaf people for example. On the other hand, computers have become an inseparable part of our society, influencing many aspects of our daily lives in terms of communication and interaction. Evolution in the field of informatics has seen tremendously high speeds, mostly in the last few decades, enabling new forms of /Human-Computer Interaction/ (HCI) which fully exploit the dynamics of hand gestures.
In the current thesis, a probabilistic approach towards Hand Gesture Recognition is proposed. Based on the assumption that various common gestures can be modeled without the need of high-level information, the proposed approach achieves to reduce the complexity of the problem by decreasing the space dimensionality of the parameters, which describe the configuration of the arm.
The methodology for tracking the mentioned parameters, manages to extract a robust representation of the arm's pose and to end up with an efficient spatio-temporal gesture model. Initially, skin-colored blobs are being detected on the images. Since, usually, the highest detected skin-colored blob is the head, the height of the actor is easily calculated, which leads to an estimation of the size of the limbs, with the aid of simple anthropometric proportions. Once this is done, inverse kinematics equations serve for the extraction of an initial estimation of the arm's parameters, which are then tracked over time with the use of particle filters. The usage of particle filters implies that multiple hypotheses are being tracked simultaneously, enabling the recovery from cases where erroneous estimations occur. In order to assure time invariance and to prevent discontinuities, the extracted parameters are being filtered according to their relevancy to previous outputs, resulting with smooth parameter sequences, which are, therefore, used in order to model each hand gesture.
The final, gesture recognition, step consists of a set of neural networks, each of them responsible for the recognition of a single gesture. The usage of multiple neural networks –instead of using a global one- ensures the elimination of possible ambiguities due to overlapping gesture paths. Since there is no prior knowledge regarding the possible gesture being performed, the parameter sequences are being fed to all neural networks simultaneously. Appropriate supervised training of the networks, ensures that only one network at each time will produce high output, resulting in the successful recognition of the performed gesture.
@MASTERSTHESIS{msigalas_msc_thesis, AUTHOR = {Sigalas, M.}, MONTH = DEC, YEAR = 2008, TITLE = {Probabilistic Gesture Recognition}, ADDRESS = {Heraklion, Greece}, SCHOOL = {University of Crete, Department of Computer Science}, TYPE = {MSc Thesis}, URL = {http://users.ics.forth.gr/~msigalas/test/dissert/msc_thesis.pdf}, HTYPE = {thesis} }