Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems and its use in Quorum-Based Replication .

L. Rodrigues and M. Raynal

This report will be published in IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No 4, July/August 2003 (to appear).

Abstract

Atomic Broadcast is a fundamental problem of distributed systems: it states that messages must be delivered in the same order to their destination processes. This paper describes a solution to this problem in asynchronous distributed systems in which processes can crash and recover.

A Consensus-based solution to Atomic Broadcast problem has been designed by Chandra and Toueg for asynchronous distributed systems where crashed processes do not recover. Our extends this approach: it transforms any Consensus protocol suited to the crash-recovery model into an Atomic Broadcast protocol suited to the same model. We show that Atomic Broadcast can be implemented requiring few additional log operations in excess of those required by the Consensus. The paper also discusses how additional log operations can improve the protocol in terms of faster recovery and better throughput.

To illustrate the use of the protocol, the paper also describes a solution to the replica management problem in asynchronous distributed systems in which processes can crash and recover. The proposed technique makes a bridge between established results on Weighted Voting and recent results on the Consensus problem.

Not available online


Luís Rodrigues