Diploma Thesis Projects 2020-2021

  1. Java Garbage Collection Extend one of the OpenJDK JVM Garbage Collectors with support for placing objects on memory pages in a programmable way, in order to achieve better temporal and spatial locality. Then benchmark your GC with different kinds of applications.
  2. Artificial Intelligence on Social Network Graphs Extend an existing algorithm for graph embeddings with structural features for shapes other than triangles. Explore applications where such features make an improvement in existing Machine Learning or Deep Learning models, and measure the improvement.
  3. Graph Partitioner Selector Implement a machine learning model to dynamically select among graph partitioning algorithms in the Spark GraphX graph-analytics framework.
  4. LLVM and OpenMP tasks
    Description: Extend LLVM with OpenMP tasks, and link it with the PARTEE task-parallel runtime system. LLVM is a production compiler written in C++. PARTEE is an API and runtime system written in low-level C, targeting distributed DMA-based and shared-memory architectures.
  5. Social Network Graph Analytics: Find blocking and ghosting
    Description: Design and implement an analysis to infer blocking between social network users, using a large dataset from Twitter. Investigate how blocks parition the graph, discover how information diffusion is limited by users blocking or ghosting other users.
    Further information: Implement an information diffusion analysis on large multilayer graphs, using a distributed analytics runtime system (Spark-GraphX or Flink-Gelly). Use special features of the twitter API to infer when users have blocked other users. Based on the inferred block relations, study two existing alternative algorithms for information diffusion in social networks, and discover how the existence of a single block edge limits information diffusion in the full graph on average.
    Related topics: Graph theory, information diffusion (algorithms used in epidemiology), graph analytics, multilayer graphs, statistics.
  6. Social Network Graph Analytics: Community Detection
    Description: Analyze twitter traffic on a given topic such as Type 2 Diabetes, and map the involved community. Use community detection analysis to identify users and groups that repeatedly interact with the topic. Calculate usage patterns, correlation with demographic, and geolocation data. Implement your analysis in a scalable analytics framework (Spark or Flink) and calculate performance and scalability.
  7. Social Network Graph Analytics: Information Diffusion
    Description: Design and implement a plugin for a twitter crawler that will discover and report in real time the diffusion graphs produced for specific keywords, memes, or phrases.
    Further information: Information diffusion is the analysis of the propagation of information in networks. Retweets are an exmaple of information diffusion, where an original piece of information may be propagated to users that were not immediately exosed to the original content in the graph, but were eventually able to see the content because of a path of retweets from the original poster to the user.
    The aim of this project is to develop a streaming analysis on streams of data produced by a twitter crawler and subsequently produce a stream of evolving diffusion graphs for selected memes, hashtags, phrases, urls, topics, etc.
    Related topics: Graph theory, information diffusion (algorithms used in epidemiology), graph analytics, multilayer graphs, statistics.
  8. Transprecise Analytics in Flink
    Description: Apache Flink is a framework for streaming big-data analytics. Implement a transprecise operator for Flink streams. Examples may be logistic/linear regressions, graph triangle counts, shortest path, pagerank, k-means, etc. Study the sensitivity to precision of arithmetic representations.
  9. Adaptive Flink Scheduling
    Description: Extend the Flink scheduler to dynamically adapt the code execution of transprecise operators depending on application-specific targets.
  10. Static Analysis in Infer
    Description: Extend the Infer static analysis system with contextual effects.
    Related information: Infer is an open source tool mainly developed by Facebook that can analyze programs written in C, C++, Objective C, and Java. It models heaps using propositions in separation logic.
    Related reading:
    • Find the Infer repository in github, read its documentation, install and compile it, and use it to analyze example programs.
    • Read on separation logic, starting from the Wikipedia page and the seminal paper by Reynolds.
  11. Java static analysis
    Description: Develop a static analysis for Java that computes NUMA locality and object lifetime.
  12. Parallelization of static analyses
    Modularize and parallelize an existing static analysis engine. Factorize the Locksmith pointer analysis engine.
  13. Learn Rust, Parse Rust, Analyze Rust Familiarize yourself with existing open-source Rust front-ends, compare and benchmark them and develop a simple static checker for Rust programs in one of the existing front-ends.

Diploma Thesis Projects 2019-2020

  1. OpenMP distributed memory allocation
    Description: Extend OpenMP with directives for memory allocation patterns tailored to NUMA or distributed memories.
    Related reading: Region-based memory management, the Legion parallel programming language.
  2. Fault-tolerant PARTEE tasks
    Description: Extend the PARTEE runtime system with support for local and global checkpointing and recovery from both transient and permanent errors. Measure the overhead of fault tolerance on task-parallel programs. This project will use experimental hardware and requires physical presense in FORTH.
  3. Social Network Graph Analytics: Community Detection
    Description: Design and implement an analysis that discovers users speaking about a specific topic in social media, and mines the corresponding graph. Analyze the detected community with respect to placement in the general audience.
  4. Streaming Graph Analytics in Flink
    Description: Augment the Graph-Streaming library of Flink with additional streaming algorithm implementations.
  5. Graph Analytics in Flink
    Description: Reimplement an existing graph analytics computation in the Flink distributed analytics runtime system.
    Further information: Read the TwitterMancer paper and implement feature extraction by modifying the Triangle Count algorithm accordingly.

Diploma Thesis Projects 2018-2019

  1. MPI benchmarking
    Description: Write tests and test automation, port existing benchmarks, and evaluate MPI applications on experimental hardware platforms with DMA-accelerated MPI.
  2. MPI collectives
    Description: Understand the existing implementation of MPI collectives and investigate alternative implementations better tailored for DMA-accelerated communication and very low custom networks in experimental HPC hardware.
  3. Linux Kernel
    Description: Add support for kernel self-extraction during boot for the RISC-V architecture.
  4. Linux Kernel
    Description: Extend the perf tool (implemneting all required in-kernel support) with additional counters for experimental HPC architectures.
  5. OpenMP tasks in Eclipse
    Description: Learn the Eclipse C internal representation, learn the workings of an existing static analysis engine and interface the two.
  6. Twitter Flu Trends
    Calculate the phases of flu epidemics using twitter data.

Diploma Thesis Projects 2017-2018

  1. Benchmarking Kernel Memory Allocators
    Compare kernel allocation accross an off-the-shelf platform based on x86 machines and an experimental ARM-based blade. Develop benchmarks and draw conclusions.
  2. Twitter Graph Analytics for Targetted Marketing
    Parallelize and optimize existing algorithms for distributed analytics. Develop a marketing application for the Twitter graph for the Spark and Spark/GraphX analytics engines.
  3. Graph Analytics for Flink Gelly
    Benchmark Flink Gelly with large social network graphs. Replicate existing anaytics pipelines for twitter in Flink.

Diploma Thesis Projects 2016-2017

  1. Alternative RDD implementations for Spark
    Description: Learn to code in Scala and Spark and write one or more extensions of Spark RDDs optimized for specific algorithms.

Diploma Thesis Projects 2013-2014

  1. LLVM and tasks
    Description: Extend LLVM with a "spawn" keyword that calls a function in parallel, as in Cilk, and link it with the PARTEE task-parallel runtime system.

Diploma Thesis Projects 2012-2013

  1. Task-parallel Fault Tolerance
    Extend the BDDT runtime system with fault-tolerance. Define a realistic fault model for permanent and transient faults on existing multicore computers. Extend the BDDT runtime system with support for local and global checkpointing and recovery from both transient and permanent errors. Measure the overhead of fault tolerance on task-parallel programs.
  2. Runtime Dependencies in Recursively Parallel Programs
    Implement a runtime analysis for dependencies among recursively-parallel tasks, and extend an existing runtime system (e.g., Cilk) with a dependency-aware scheduler.
  3. Static analysis in Eclipse
    Learn the architecture of the Eclipse IDE (for either Java or C programming), including the AST and analysis frameworks, and write an Eclipse interface for an existing static analysis engine.

Internship Topics 2012

  1. A fault-tolerant task parallel runtime
    Positions: 1
    Lab: CARV
    Description: Understand the BDDT runtime and add support for checkpointing of computations, and restoring to an earlier point on fault.