Melhorando a Robustez de Detectores Distribuídos de Falhas em Condições Adversas (Improving the Robustness of Distributed Failure Detectors in Adverse Conditions)

Fernando Tarlá Cardoso Lemos (fernandotcl@usp.br), Liria Matsumoto Sato (liria.sato@poli.usp.br)



This paper appears in: Revista IEEE América Latina

Publication Date: Jan. 2012
Volume: 10,   Issue: 1 
ISSN: 1548-0992


Abstract:
Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such as loosely connected and administered computing grids). It packs redundancy into heartbeat messages, thereby improving on the robustness of the traditional protocols. Results from experimental tests conducted in a simulated environment with adverse network conditions show significant improvement over existing solutions.

Index Terms:
Fault Tolerance, Failure Detection, Distributed Failure Detectors   


Documents that cite this document
This function is not implemented yet.


[PDF Full-Text (342)]