Master 2014 2015
Stages de la spécialité SAR
Failure detectors in WAN environments

Lieu :LIP6 - Equipe REGAL
Encadrant : Alejandro Tomsic, Luciana Arantes, Pierre Sens, Julien Sopena (
Dates :01/03/2014 au 31/08/2014
Rémunération :2616 euros
Mots-clés : Master SAR, autre qu’ATIAM


Distributed systems should provide reliable and continuous services despite the failures of some of their components. As a consequence, failure detection plays a central role in the engineering of such systems.

Failure detectors (FD) provide suspicion information on which processes have crashed. They are used by a wide variety of applications, such as network communication and group membership protocols, computer cluster management and distributed storage systems.

Many applications have timing constraints and require a FD that provides quality of service (QoS) with quantitative timeliness guarantees as the QoS of the FD greatly influences the QoS that upper layers of an application provide.

The subject of this internship is to study existing failure detectors in WAN environments and work in the development of new failure detection algorithms. Furthermore, to study failure detection message aggregation for distribution in large-scale deployments.

Finally, the student will be encouraged to perform a deployment and evaluation on PlanetLab and compare the developed algorithm(s) to the most relevant existing ones.


[1] Chandra, T.D., Toueg, S. : Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (Mar 1996),

[2] Chen, W., Toueg, S., Aguilera, M.K. : On the quality of service of failure detectors. IEEE Trans. Comput. 51(5), 561–580 (May 2002),

[3] Defago, X., Urb ´ an, P., Hayashibara, N., Katayama, T. : Defi- ´ nition and specification of accrual failure detectors. In : DSN. pp. 206–215. IEEE Computer Society (2005), http://dblp.

[4] Bertier, M., Marin, O., Sens, P. : Performance analysis of a hierarchical failure detector. In : In Proceedings of the International Conference on Dependable Systems and Networks. pp. 635–644 (2003)