Master 2015 2016
Stages de la spécialité SAR
Efficient implementation of high-level objects in a geo-replicated database


Site : Efficient implementation of high-level objects in a geo-replicated database
Lieu : LIP6
Encadrant : Marc Shapiro, Tyler Crain
Dates :1/2/2016 au 31/7/2016
Rémunération :Oui
Mots-clés : Master SAR, autre qu’ATIAM

Description

Traditional databases ensure that updates appear one at a time in some serial order. In contrast, modern geo-replicated databases accept updates at different replicas in parallel, in order to ensure that the data is always quickly available. The principles of Conflict-Free Replicated Data Types (CRDTs) [Shapiro] ensure that data will converge despite the concurrent updates.

When concurrent updates are allowed, there is no more a linear order of updates and of their corresponding object versions, since two concurrent updates do not depend on one another. Tracking this requires more complex structures, such as multiple operation logs, snapshots, and vector clocks. In order to serve a particular read or update, the system must compute an appropriate version. This may require searching through multiple previously-generated snapshots, as well as through buffers of recent updates. Once the version is generated, it may be stored as a snapshot (or discarded, depending on the system’s policy) and later updates buffered. On top of this inherent complexity, the application often requires data types with a rich structure, such as sets, graphs, or maps, for which generating a version requires significant computation and memory management. Breaking down a large object into smaller pieces that can be updated independently again adds more complexity. Finally, implementing these objects is further complicated when data is spread across both main memory and disk.

The subject of this internship is to study the above issues and to propose efficient solutions. After studying the state of the art of database-oriented memory/disk data structures (e.g., B-Trees or LSM-Trees) and concurrent data structures, the intern will focus on two main tasks. The first is to design a memory/disk organisation that provide efficient reading and updating of objects in these systems. The second how to parallelise reads and updates to large objects where an operation concerns only small section of the object.

Requirements : - Enrolled in a Masters’ in Computer Science / Informatics or a related field. - An excellent academic record. - Be strongly interested in, and have good knowledge of, distributed systems, distributed algorithms, or distributed databases, concurrent algorithms, or memory/disk data structures. - Motivated by experimental research.

Applying

The internship is fully funded. It will be co-advised by Dr. Marc Shapiro and Dr Tyler Crain, Inria and Laboratoire d’Informatique de Paris-6 (LIP6), Université Pierre et Marie Curie. A successful intern will be invited to apply for a PhD.

To apply, contact Tyler Crain with the following information : - A resume or Curriculum Vitæ - A list of courses and grades of the last two years of study (an informal transcript is OK). - Names and contact details of two references (people who can recommend you). We will contact these people directly

Bibliographie

[Shapiro] Conflict-free Replicated Data Types. M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski. Int. Symp. on Stabilization, Safety, and Security of Distributed Systems (SSS), Grenoble, France, October 2011.