Master 2017 2018
Stages de la spécialité SAR
Shared, persistent and mutable big-data structures

Site : Équipe Regal
Lieu : Équipe Regal, UPMC-LIP6, Jussieu
Encadrant : Gaël Thomas, Marc Shapiro
Dates :ouvert
Rémunération :oui
Mots-clés : Master SAR, autre qu’ATIAM

Cliquer ici pour vous authentifier


In current computer systems, sharing or persisting data is the job of the file system. However, the file system interface is narrow and slow. This penalises applications with a large memory footprint, such as big-data analytics. Consider for instance a map-reduce computation. The map processes produces some large data structure, e.g., a social-network graph, which the reduce processes will consume. Currently, the only practical approach is for the mapper to serialise (marshall) the graph and write it into a file, and the reducers to read the file and deserialise (unmarshall) the graph. This repeated serialise-output-input-deserialise sequence is extremely costly. However, a recent and exciting development in computer architecture is the advent of very large (gigabytes) main memories, including persistent memories. This has the potential to make traditional file systems obsolete, since sharing data and making it durable can now be done directly in main memory. Returning to the example, the graph can be shared and made persistent directly in main memory instead. A related problem is that of lazily mutating or copying a large pointer-based data structure in memory. The basic techniques are well known (e.g., mmap or copy-on-write) but they are either too difficult or impractical for application programmers. They are possible but difficult in low-level languages such as C or C++, and practically impossible in managed languages such as Java or Scala. Therefore, the aim of the internship is to explore how to enable direct sharing between processes, and/or lazy copying/mutation, of a rich pointer-based data structure. This consists of two related sub-problems : how to implement these techniques efficiently inside the execution environment, and how to expose them to the application programmer in a safe and simple fashion. The intern shall study the state of the art and perform experiments. The intern will build a proof-of-concept prototype, initially making simplifying assumptions, e.g., no concurrent writes, no garbage collection, no JIT, which shall be relaxed little by little. The findings are to be published in the scientific literature. Requirements :

Enrolled in a Masters’ in Computer Science / Informatics or a related field. An excellent academic record. A strong interest and good knowledge of memory management techniques, for operating systems or managed runtime systems, and concurrent algorithms. Motivated by experimental research. Applying

The internship is fully funded. It will be co-advised by Dr. Marc Shapiro, Inria and Laboratoire d’Informatique de Paris-6 (LIP6), Université Pierre et Marie Curie, and by Prof. Gaël Thomas of Télécom Sud-Paris. A successful intern will be invited to apply for a PhD. To apply, contact Marc Shapiro and Gaël Thomas with the following information : A resume or Curriculum Vitæ A list of courses and grades of the last two years of study (an informal transcript is OK). Names and contact details of two references (people who can recommend you). We will contact these people directly.