This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: A Survey of Rollback-Recovery Protocols in Message-Passing Systems E. N. (MOOTAZ) ELNOZAHY IBM Research LORENZO ALVISI The University of Texas at Austin YI-MIN WANG Microsoft Research AND DAVID B. JOHNSON Rice University This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based. Checkpoint-based protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated, uncoordinated, or communication-induced. Log-based protocols combine checkpointing with logging of nondeterministic events, encoded in tuples called determinants . Depending on how determinants are logged, log-based protocols can be pessimistic, optimistic, or causal. Throughout the survey, we highlight the research issues that are at the core of rollback-recovery and present the solutions that currently address them. We also compare the performance of different rollback-recovery protocols with respect to a series of desirable properties and discuss the issues that arise in the practical implementations of these protocols. Categories and Subject Descriptors: D.4.5 [Operating Systems] : Reliability Checkpoint/restart ; fault-tolerance ; D.4.7 [Operating Systems] : Organization and Design Distributed systems ; D.2.8 [Software] : Metrics Performance measures ; General Terms: Design, Reliability, Performance Additional Key Words and Phrases: message logging, rollback-recovery Mootaz Elnozahy started this work while at Carnegie Mellon University, where he was supported in part by the National Science Foundation through a Research Initiation Award under contract CCR 9410116 and a CAREER Award under contract CCR 9502933. Lorenzo Alvisi was supported in part by an NSF CAREER award (CCR-9734185), an Alfred P. Sloan Fellowship, an IBM Faculty Partnership award, DARPA/SPAWAR grant N66001-98-8911, and a grant of the Texas Advanced Research Program. Authors addresses: E. N. (Mootaz) Elnozahy, IBM Austin Research Lab., M/S 904-6C-020, 11501 Burnet Rd., Austin, TX 78578; email: email@example.com; Lorenzo Alvisi, Department of Computer Sciences, Taylor Hall 2.124 The University of Texas at Austin, Austin, TX 78712-1188; email: firstname.lastname@example.org; Yi-Min Wang, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052; email: email@example.com; David B. Johnson, Rice University, Department of Computer Science, 6100 Main St., MS 132, Houston, TX 77005- 1892; email: firstname.lastname@example.org. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.requires prior specific permission and/or a fee....
View Full Document