Abstract
In a multiprocessor under normal loading conditions, idle processors naturally offer spare capacity. Previous work attempted to utilize this redundancy to overcome the limitations of classic diagnosability and modular redundancy techniques while providing significant fault tolerance. A popular approach is task duplexing. The usefulness of this approach for critical applications, unfortunately, is seriously undermined by its susceptibility to agreement on faulty outcomes (malicious agreement). To assess the dependability of duplexing under malicious agreement, we propose a stochastic model which dynamically profiles behavior in the presence of malicious faults. The model uses a more or less typical policy we call NMR on demand (NMROD). Each task in a multiprocessor is duplicated, with additional processors allocated for recovery as needed. NMROD relies on a fault model favoring response correctness over actual fault status, and integrates online repair to provide nonstop operation over an extended period.