Victoria Manousaki

Computer Vision & Robotics Lab · Institute of Computer Science, FORTH
Computer Science Department, University of Crete
vmanous@ics.forth.gr

Hi! I'm currently a Ph.D. student at the Computer Science Department at the University of Crete and a research assistant at the Computer Vision & Robotics Lab (CVRL) at ICS-FORTH advised by Prof. Antonis Argyros. My interests lie in the Computer Vision area and more specifically in the prediction of actions and objects in human-object interaction scenarios.

Research

VLMAH: Visual-Linguistic Modeling of Action History\\ for Effective Action Anticipation
Victoria Manousaki, Konstantinos Bacharidis, Konstantinos Papoutsakis, Antonis Argyros
Workshop on Assistive Computer Vision and Robotics (ACVR), ICCV 2023.

Although existing methods for action anticipation have shown considerably improved performance on the pre- dictability of future events in videos, the way they exploit information related to past actions is constrained by time du- ration and encoding complexity. This paper addresses the task of action anticipation by taking into consideration the history of all executed actions throughout long, procedu- ral activities. A novel approach noted as Visual-Linguistic Modeling of Action History (VLMAH) is proposed that fuses the immediate past in the form of visual features as well as the distant past based on a cost-effective form of linguistic constructs (semantic labels of the nouns, verbs, or actions). Our approach generates accurate near-future action pre- dictions during procedural activities by leveraging informa- tion on the long- and short-term past. Extensive experimen- tal evaluation was conducted on three challenging video datasets containing procedural activities, namely the Mec- cano, the Assembly-101, and the 50Salads. The obtained results validate the importance of incorporating long-term action history for action anticipation and document the sig- nificant improvement of the state-of-the-art Top-1 accuracy performance.

Partial Alignment of Time Series for Action and Activity Prediction
Victoria Manousaki, Antonis Argyros
(to be released in Springer Book) 2023.

The temporal alignment of two complete action/activity sequences has been the focus of interest in many research works. However, the problem of partially aligning an incomplete sequence to a complete one has not been sufficiently explored. Very effective alignment algo- rithms such as Dynamic Time Warping (DTW) and Soft Dynamic Time Warping (S-DTW) are not capable of handling incomplete sequences. To overcome this limitation the Open-End DTW (OE-DTW) and the Open- Begin-End DTW (OBE-DTW) algorithms were introduced. The OE- DTW has the capability to align sequences with common begin points but unknown ending points, while the OBE-DTW has the ability to align unsegmented sequences. We focus on two new alignment algorithms, namely the Open-End Soft DTW (OE-S-DTW) and the Open-Begin-End Soft DTW (OBE-S-DTW) which combine the partial alignment capabil- ities of OE-DTW and OBE-DTW with those of Soft DTW (S-DTW). Specifically, these algorithms have the segregational capabilities of DTW combined with the soft-minimum operator of the S-DTW algorithm that results in improved, differentiable alignment in the case of continuous, unsegmented actions/activities. The developed algorithms are well-suited tools for addressing the problem of action prediction. By properly match- ing and aligning an on-going, incomplete action/activity sequence to pro- totype, complete ones, we may gain insight in what comes next in the on-going action/activity. The proposed algorithms are evaluated on the MHAD, MHAD101-v/-s, MSR Daily Activities and CAD-120 datasets and are shown to outperform relevant state of the art approaches. Keywords: Segregational Soft Dynamic Time Warping · Temporal Alignment · Action Prediction · Activity Prediction · Duration Prognosis · Graphs

Graphing the Future: Activity and Next Active Object Prediction using Graph-based Activity Representations
Victoria Manousaki, Konstantinos Papoutsakis, Antonis Argyros
ISVC - International Symposium on Visual Computing 2022.

We present a novel approach for the visual prediction of human-object interactions in videos. Rather than forecasting the human and object motion or the future hand-object contact points, we aim at predicting (a) the class of the on-going human-object interaction and (b) the class(es) of the next active object(s) (NAOs), i.e., the object(s) that will be involved in the interaction in the near future as well as the time the interaction will occur. Graph matching relies on the efficient Graph Edit distance (GED) method. The experimental evaluation of the proposed approach was conducted using two well-established video datasets that contain human-object interactions, namely the MSR Daily Activities and the CAD120. High prediction accuracy was obtained for both action prediction and NAO forecasting.

Segregational Soft Dynamic Time Warping and its Application to Action Prediction
Victoria Manousaki, Antonis Argyros
VISAPP - International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 2022.

Aligning the execution of complete actions captured in segmented videos has been a problem explored by Dynamic Time Warping (DTW) and Soft Dynamic Time Warping (S-DTW) algorithms. The limitation of these algorithms is that they cannot align unsegmented actions, ie, actions that appear between other actions. This limitation is mitigated by the use of two existing DTW variants, namely the Open-End DTW (OE-DTW) and the Open-Begin-End DTW (OBE-DTW). OE-DTW is designed for aligning actions of known begin point but unknown end point, while OBE-DTW handles continuous, completely unsegmented actions with unknown begin and end points. In this paper, we combine the merits of S-DTW with those of OE-DTW and OBE-DTW. In that direction, we propose two new DTW variants, the Open-End Soft DTW (OE-S-DTW) and the Open-Begin-End Soft DTW (OBE-S-DTW). The superiority of the proposed algorithms lies in the combination of the soft-minimum operator and the relaxation of the boundary constraints of S-DTW, with the segregational capabilities of OE-DTW and OBE-DTW, resulting in better and differentiable action alignment in the case of continuous, unsegmented videos. We evaluate the proposed algorithms on the task of action prediction on standard datasets such as MHAD, MHAD101-v/-s, MSR Daily Activities and CAD-120. Our experimental results show the superiority of the proposed algorithms to existing video alignment methods.

Action Prediction During Human-Object Interaction Based on DTW and Early Fusion of Human and Object Representations
Victoria Manousaki, Konstantinos Papoutsakis, Antonis Argyros
ICVS - International Conference on Computer Vision Systems 2021.

Action prediction is defined as the inference of an action label while the action is still ongoing. Such a capability is extremely useful for early response and further action planning. In this paper, we consider the problem of action prediction in scenarios involving humans interacting with objects. We formulate an approach that builds time series representations of the performance of the humans and the objects. Such a representation of an ongoing action is then compared to prototype actions. This is achieved by a Dynamic Time Warping (DTW)-based time series alignment framework which identifies the best match between the ongoing action and the prototype ones. Our approach is evaluated quantitatively on three standard benchmark datasets. Our experimental results reveal the importance of the fusion of human- and object-centered action representations in the accuracy of action prediction. Moreover, we demonstrate that the proposed approach achieves significantly higher action prediction accuracy compared to competitive methods.

Towards a robust and accurate screening tool for dyslexia with data augmentation using GANs
Thomais Asvestopoulou, Victoria Manousaki, Antonis Psistakis, Erjona Nikolli, Vassilios Andreadakis, Ioannis M Aslanides, Yannis Pantazis, Ioannis Smyrnakis, Maria Papadopouli
BIBE - IEEE 19th International Conference on Bioinformatics and Bioengineering 2019.

Eye movements during text reading can provide insights about reading disorders. We developed the DysLexML, a screening tool for developmental dyslexia, based on various ML algorithms that analyze gaze points recorded via eyetracking during silent reading of children. We comparatively evaluated its performance using measurements collected from two systematic field studies with 221 participants in total. This work presents DysLexML and its performance. It identifies the features with prominent predictive power and performs dimensionality reduction. Specifically, it achieves its best performance using linear SVM, with an accuracy of 97% and 84% respectively, using a small feature set. We show that DysLexML is also robust in the presence of noise. These encouraging results set the basis for developing screening tools in less controlled, larger-scale environments, with inexpensive eye-trackers, potentially reaching a larger population for early intervention. Unlike other related studies, DysLexML achieves the aforementioned performance by employing only a small number of selected features, that have been identified with prominent predictive power. Finally, we developed a new data augmentation/substitution technique based on GANs for generating synthetic data similar to the original distributions.

Dyslexml: Screening tool for dyslexia using machine learning
Thomais Asvestopoulou, Victoria Manousaki, Antonis Psistakis, Ioannis Smyrnakis, Vassilios Andreadakis, Ioannis M Aslanides, Maria Papadopouli
ArXiv 2019.

Eye movements during text reading can provide insights about reading disorders. Via eye-trackers, we can measure when, where and how eyes move with relation to the words they read. Machine Learning (ML) algorithms can decode this information and provide differential analysis. This work developed DysLexML, a screening tool for developmental dyslexia that applies various ML algorithms to analyze fixation points recorded via eye-tracking during silent reading of children. It comparatively evaluated its performance using measurements collected in a systematic field study with 69 native Greek speakers, children, 32 of which were diagnosed as dyslexic by the official governmental agency for diagnosing learning and reading difficulties in Greece. We examined a large set of features based on statistical properties of fixations and saccadic movements and identified the ones with prominent predictive power, performing dimensionality reduction. Specifically, DysLexML achieves its best performance using linear SVM, with an a accuracy of 97 %, with a small feature set, namely saccade length, number of short forward movements, and number of multiply fixated words. Furthermore, we analyzed the impact of noise on the fixation positions and showed that DysLexML is accurate and robust in the presence of noise. These encouraging results set the basis for developing screening tools in less controlled, larger-scale environments, with inexpensive eye-trackers, potentially reaching a larger population for early intervention.

Evaluating Method Design Options for Action Classification based on Bags of Visual Words
Victoria Manousaki, Konstantinos Papoutsakis, Antonis Argyros
VISAPP - International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 2018.

The Bags of Visual Words (BoVWs) framework has been applied successfully to several computer vision tasks. In this work we are particularly interested on its application to the problem of action recognition/classification. The key design decisions for a method that follows the BoVWs framework are (a) the visual features to be employed,(b) the size of the codebook to be used for representing a certain action and (c) the classifier applied to the developed representation to solve the classification task. We perform several experiments to investigate a variety of options regarding all the aforementioned design parameters. We also propose a new feature type and we suggest a method that determines automatically the size of the codebook. The experimental results show that our proposals produce results that are competitive to the outcomes of state of the art methods.

Education

Ph.D. - University of Crete

Computer Science Department

Topic: Action prediction & forecasting in human-object interaction scenarios

Supervisor: Prof. Antonis Argyros

2017 - Current

M.Sc. - University of Crete

Computer Science Department

M.Sc. thesis: Evaluating design options of Bag-of Visual-Words based methods
for action classification

Supervisor: Prof. Antonis Argyros

2017

Bachelor of Science - University of Crete

Computer Science Department

Diploma Thesis: Gesture Recognition

Supervisor: Prof. Antonis Argyros

2014

Teaching Assistant - Computer Science Department

During my studies I have been a teaching assistant to the following courses:

CS118 - Discrete Mathematics

CS119 - Linear Algebra

CS280 - Theory of Computation

CS380 - Algorithms and Complexity

2014 - Current

Scholarships

State Scholarships Foundation (IKY)

06/2022 - 09/2023

Hellenic Foundation for Research and Innovation (H.F.R.I.)

10/2019 - 02/2022

Institute of Computer Science - FORTH

2016 - 2019

Victoria Manousaki

Research

Education

Ph.D. - University of Crete

M.Sc. - University of Crete

Bachelor of Science - University of Crete

Teaching Assistant - Computer Science Department

Scholarships

Contact Info