Master 2014 2015
Stages de la spécialité SAR
A distributed log for scalable big-data storage


Site :Équipe Regal Lip6-Inria
Lieu :LIP6 / 4, place Jussieu / 75005 Paris
Encadrant : Alejandro Tomsic, Marc Shapiro
Dates :1/3/2015 au 31/8/2014 (négociable)
Rémunération :oui
Mots-clés : Master SAR, autre qu’ATIAM

Description

A well-known technique from databases and file systems to support fast writes and to be able to recover from failures, is to track all update activity into a log. A log is an append-only, write-mostly file, organised as an array of update records ; this ensures that writes are fast. If all activity is logged to disk, this supports recovery and rebuilding state.

Another well-known database technique is to partition the storage, i.e., to logically divide it into disjoint parts (also known as shards). This improves scalability, manageability, performance, availability and distribution. Shards are kept independent. For fault tolerance, a shard is replicated over multiple nodes.

The subject of this internship is to explore the design of a distributed, partitioned and replicated log. To support both performance and fault-tolerance, log records are replicated among different servers (instead of writing to disk). Each shard’s replica must remain totally ordered, in order to provide consistent reads.

Big-data applications manipulate data structures such as maps, trees or general graphs. The aim of the log is to combine fast and persistent writes to such structures, delaying the actual application of the operation to later asynchronous processing. This enables maintaining different views and versions of the database efficiently, which is important for transaction processing. Upper layers play the log and build their state on demand, off of the write path.

The work of the intern will be to design and implement this log, studying : - A replication scheme supporting high performance. - Mapping read and write operations to replicas. - Comparing to existing solutions, such as disk- or SSD-based logs.

The intern’s findings shall be written up for publication in the scientific literature.

Requirements. - Enrolled in a Masters’ in Computer Science / Informatics or a related field. - An excellent academic record. - A strong interest and good knowledge of big-data applications, distributed systems, distributed algorithms, or distributed databases. - Motivated by experimental research. Knowledge of the Erlang programming language is a plus.

Applying

The internship is fully funded and will take place in the Regal group, at Laboratoire d’Informatique de Paris-6 (LIP6), in Paris. A successful intern will be invited to apply for a PhD.

To apply, contact Alejandro Tomsic , with the following information : - A resume or Curriculum Vitæ - A list of courses and grades of the last two years of study (an informal transcript is OK). - Names and contact details of two references, people who can recommend you. We will contact these people directly.