Abstract
Crash recovery techniques allow real-time distributed editing systems to make progress in case of failures. In this study, we propose a recovery scheme to manage a local document state (a.k.a., checkpoint) in each node, which periodically generates the checkpoint state. If a transient failure occurs in a distributed editing system, a node can rejoin the editing system by loading the local document state rather than retrieving the state from remote nodes. Our recovery scheme maintains the consistency between a local state and a remote state during the crash recovery procedure. The correctness of the recovery algorithm is theoretically proved. We evaluate the performance of our recovery scheme by varying the elapsed time between a failed node joining and leaving a system. The experimental results show that our solution is superior to the traditional recovery approach that regains document states from other peer nodes.