A survey of rollback-recovery protocols in message-passing systems

E Elnozahy; L Alvisi; Y Wang; D Johnson

Back

Journal article

A survey of rollback-recovery protocols in message-passing systems

E Elnozahy, L Alvisi, Y Wang and D Johnson

Computing reviews, Vol.45(2), pp.103-103

01/02/2004

Abstract

Computer applications now span the globe, and incorporate devices ranging in size and power from watches to clustered supercomputers. The further a system reaches and the more its heterogeneity decreases, the more fragile (susceptible to exceptions and errors) it becomes. Every system we design and build is more likely than ever to encounter, and to have to recover from, unreliable communication. It is time for rollback-recovery techniques to become mainstream software design topics. This paper surveys the daunting volume of research literature that explores such techniques, concentrating on those approaches that can be implemented in any application environment (for example, those with no language dependencies). It splits these techniques into checkpoint-based and log-based techniques, and then subdivides each of those families. While this taxonomy alone is helpful, the authors go even deeper and analyze the key ideas underlying each technique, along with the problems that accompany their implementation.

Metrics

1 Record Views

Details

Title: A survey of rollback-recovery protocols in message-passing systems
Creators - without role: E Elnozahy
L Alvisi
Y Wang
D Johnson
Publication Details: Computing reviews, Vol.45(2), pp.103-103
Identifiers: 9942546308331
Academic Unit: King Abdullah University of Science & Technology
Language: English
Resource Type: Journal article