Master 2013 2014
Stages de la spécialité SAR
Large-scale geo-replicated hybrid consistency transactional storage


Site :Masters’ internship : Large-scale geo-replicated hybrid consistency transactional storage
Lieu :LIP6, Jussieu, Paris
Encadrant : Marek Zawirski, Marc Shapiro, Masoud Saeida Ardekani
Dates :du 1/4/2014 au 31/8/2013
Rémunération :oui
Mots-clés : Parcours SAR autre qu’ATIAM, recherche

Description

Modern data storage engines such as key-value stores (KVSes) are deployed across machines in a data centre (sharding and replication), and across several data centres located in various geographical locations (geo-replication). In order to improve their latency and availability, these systems exploit topology-aware designs, where fast and relatively reliable communicatio over LAN is preferred over slow and more failure-prone WAN. Moreover, recent scalable designs achieve parallelism by minimizing cross-shard communication and using parallel streams of communication across data centres.

Fundamentally, every such system suffers from a trade-off between low latency of updates (and availability) and consistency of reads. Low latency system can offer highly available transactions [1], but those are restricted to the variants of causal+ consistency [4, 5]. Strongly consistent general-purpose transactions are achievable only at the expense of higher latency for some transactions [2, 6]. Recently, an interesting class of hybrid consistency storage systems is emerging, where both weakly and strongly consistent objects can be accessed by transactions [7], making them fast when possible and consistent when necessary [3]. However, existing hybrid consistency systems do not support processing and communication parallelism, which makes them inapplicable to large-scale deployments.

The goal of this project is to design and implement a large-scale geo-replicated hybrid consistency transactional system. Our existing experience with large-scale strong and weak transactional system designs make us believe that it may involve complete redesign of both algorithms and metadata compared to existing small-scale designs. Challenges involve maintaining consistency with transactions that access both weakly consistent and strongly consistent objects, which typically involves complex and not necessarily compatible causal dependencies and multi-versioning techniques, such as dependencies tracking and version vectors.

The research will consist of three phases : - State-of-the-art survey (approx. one month). - Designing a large-scale geo-replicated KVS with weak and strong transactions (approx. one month, plus interleaved with others). - Implementing and evaluating the solution, revisiting the design (4 months).

Requirements : - Enrolled in a Master’s in Computer Science / Informatics or a related field. - Excellent academic record. - Motivated by experimental research. - Skills in algorithms, data structures, concurrency, distributed computing, high-performance computing, operating systems, distributed systems, databases, or a related area.

Please provide a CV, a list of courses and your marks, an essay relevant to the topic (1 to 4 pages), and at least two references (whom we will contact ourselves for a recommendation).

Contact : Marek Zawirski or Marc Shapiro .

Bibliographie

[1] Peter Bailis, Aaron Davidson et al. Highly Available Transactions : virtues and limitations. Int. Conf. on Very Large Data Bases (VLDB), 2013.

[2] James C. Corbett, Jeffrey Dean, et al. Spanner : Google’s globally-distributed database. In Symp. on Op. Sys. Design and Implementation (OSDI), October 2012.

[3] Cheng Li, Daniel Porto, et al. Making geo-replicated systems fast as possible, consistent when necessary. In Symp. on Op. Sys. Design and Implementation (OSDI), October 2012.

[4] Wyatt Lloyd, Michael J. Freedman, et al. Don’t settle for eventual : scalable causal consistency for wide-area storage with COPS. Symp. on Op. Sys. Principles (SOSP), Cascais, Portugal, October 2011

[5] Wyatt Lloyd, Michael J. Freedman, et al. Stronger semantics for low-latency geo-replicated storage. In Networked Sys. Design and Implem. (NSDI), Lombard, IL, USA, April 2013.

[6] Masoud Saeida Ardekani, Pierre Sutra, and Marc Shapiro. Non-Monotonic Snapshot Isolation : scalable and strong consistency for geo-replicated transactional systems. In Symp. on Reliable Dist. Sys. (SRDS), Braga, Portugal, October 2013.

[7] Yair Sovran, Russell Power, et al. Transactional storage for geo-replicated systems. In Symp. on Op. Sys. Principles (SOSP), pages 385—400, Cascais, Portugal, October 2011.