Master 2017 2018
Stages de la spécialité SAR
A conflict free JSON data type, replicated to the EDGE


Lieu : LIP6, 4 place Jussieu Paris 5eme
Encadrant : Ilyas Toumlilt, Dimitris Vasilas, Marc Shapiro
Dates :6 mois à partir du 19/02/2018
Rémunération :standard (environ 550 euros par mois)
Mots-clés : Master SAR, autre qu’ATIAM

Cliquer ici pour vous authentifier


Description

Background : The document-based data model is a popular way of storing data, particularly for applications that handle semi-structured data or unstructured data accompanied by metadata, such as social media posts and multimedia. Document stores, including MongoDB and Apache CouchDB, store data object as JSON-like objects that have varying sets of fields, with different types for each field. This model is attractive as it makes application code easier to write and enables applications to change their database schema as demands evolve. A basic functionality of document stores is that, beyond the simple key-to-document lookup interface, they offer a query language that allows users to retrieve documents by their semi-structured content. To achieve efficient and scalable semi-structured search, document stores maintain secondary indexes on document attributes. AntidoteDB is a cloud database developed by the Delys team at LIP6. It provides geo-replication and guarantees both high availability and a high level of consistency. AntidoteDB supports replicated data types such as counters, sets and maps that are designed to work correctly in the presence of concurrent updates and network failures. Implementing a document store API backed by AntidoteDB can allow a geo-distributed deployment where users update their documents concurrently in multiple data centres with low latency and high availability. More importantly, Antidote’s replicated data types can provide a flexible mechanism for users to explicitly control how conflicts caused by concurrent updates will be resolved, by choosing the appropriate data types based on their application semantics. Research objectives and methods : The objective of this internship is to implement a document store interface and a query language that supports document retrieval by semi-structured content, based on AntidoteDB. The query language should support point queries on text attributes and interval queries on numerical attributes, and allow complex queries including logical operators (AND, OR, NOT). In order to achieve efficient and scalable search the system should maintain secondary indexes on document attributes. The intern shall study different indexing techniques (eg inverted indexes, B-trees) [1], different strategies for organising distributed indexes (global or local indexes) [2], and different strategies for updating the index structures [3, 4], select the most appropriate ones for the system, implement the described interface and perform benchmarks.

Bibliographie

[1] Qader, Mohiuddin Abdul, Shiwen Cheng, Abhinand Menon and Vagelis Hristidis. “Efficient Secondary Attribute Lookup in Key-Value Stores.” (2015). [2] Kejriwal, Ankita, Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang, and John K. Ousterhout. “SLIK : Scalable Low-Latency Indexes for a Key-Value Store.” In USENIX Annual Technical Conference, pp. 57-70. 2016. [3] Tan, Wei, Sandeep Tata, Yuzhe Tang, and Liana L. Fong. “Diff-Index : Differentiated Index in Distributed Log-Structured Data Stores.” In EDBT, pp. 700-711. 2014. [4] Tang, Yuzhe, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, and Balaji Palanisamy. “Deferred lightweight indexing for log-structured key-value stores.” In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, pp. 11-20. IEEE, 2015.