Master 2015 2016
Stages de la spécialité SAR
Shared, persistent and mutable big-data structures

Site : Shared, persistent and mutable big-data structures
Lieu : LIP6
Encadrant : Marc Shapiro, Gaël Thomas
Dates :1/2/2016 – 31/7/2016
Rémunération :oui
Mots-clés : Master SAR, autre qu’ATIAM


In current computer systems, sharing or persisting data is the job of the file system. However, the file system interface is narrow and slow. This penalises applications with a large memory footprint, such as big-data analytics. Consider for instance a map-reduce computation. The map processes produces some large data structure, e.g., a social-network graph, which the reduce processes will consume. Currently, the only practical approach is for the mapper to serialise (marshall) the graph and write it into a file, and the reducers to read the file and deserialise (unmarshall) the graph. This repeated serialise-output-input-deserialise sequence is extremely costly.

However, a recent and exciting development in computer architecture is the advent of very large (gigabytes) main memories, including persistent memories. This has the potential to make traditional file systems obsolete, since sharing data and making it durable can now be done directly in main memory. Returning to the example, the graph can be shared and made persistent directly in main memory instead. A related problem is that of lazily mutating or copying a large pointer-based data structure in memory.

The basic techniques are well known (e.g., mmap or copy-on-write) but they are either too difficult or impractical for application programmers. They are possible but difficult in low-level languages such as C or C++, and practically impossible in managed languages such as Java or Scala.

Therefore, the aim of the internship is to explore how to enable direct sharing between processes, and/or lazy copying/mutation, of a rich pointer-based data structure. This consists of two related sub-problems : how to implement these techniques efficiently inside the execution environment, and how to expose them to the application programmer in a safe and simple fashion.

The intern shall study the state of the art and perform experiments. The intern will build a proof-of-concept prototype, initially making simplifying assumptions, e.g., no concurrent writes, no garbage collection, no JIT, which shall be relaxed little by little. The findings are to be published in the scientific literature.

To apply, see the full advertisement :