Adaptation policies in large-scale distributed data stores

Operators of successful Internet-scale applications invariably see the growing popularity of their services translate into the need for more data capacity, higher I/O performance, and continuous operation. Data demands have been growing exponentially in recent years and distributed data stores have been gaining popularity as a means to addressing these needs.

Rather than relying on centralized storage arrays, distributed data stores consolidate large numbers of commodity servers into a single storage pool, providing large capacity and high performance at low cost over an unreliable and dynamically-changing (often cloud-based) infrastructure.

A special class of systems goes further to offer guarantees to achieve application performance goals described as Service Level Objectives (SLOs). Such systems focus on identifying the performance capabilities of available resources and then predict and provision the optimal set resources necessary to achieve the requested performance goals. Since application requirements (and thus SLOs) change over time, quality of service (QoS)-aware data stores should be able to adapt to the new requirements to comply with application expectations.

A key question is to determine the appropriate adaptation actions that maintain the same application semantics (regarding durability, availability, integrity and consistency of data) model while minimizing or ideally even hiding the window of performance degradation. In addition, there is usually a trade-off between the duration of the adaptation actions and their performance impact on the system during adaptation. Ideally, the system should be able to adapt to data growth or the workload changes as quickly as possible to satisfy application requirements. Nevertheless, the more aggressive the adaptation action the deeper its impact on application performance, exposing a key tradeoff. There are important research questions to be addressed on agile provisioning and adaptation policies for large scale distributed data stores that target to achieve SLOs or offer the best possible performance while ensuring the durability, availability, and consistency of data.

The goal of the project is to address the important research questions on agile provisioning and adaptation policies for large scale distributed data stores that target to achieve SLOs or offer the best possible performance. As a matter of fact it is required a platform that can provide resource and support the deployment of distributed data store. Multiple nodes may effectively be used during the runtime of experiments by joining (or leaving) the system providing elastic and adaptive features to the data store. During the experiment we should be able to monitor the application performance as well as the available resource utilization while the application evolves during its lifetime.