# Lec17 - Today Fault Tolerance Agreement in presence of...

Computer Science Lecture 17, page CS677: Distributed OS Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging 1 Computer Science Lecture 17, page CS677: Distributed OS Failure Masking by Redundancy Triple modular redundancy. 2

Computer Science Lecture 17, page CS677: Distributed OS Agreement in Faulty Systems How should processes agree on results of a computation? K-fault tolerant : system can survive k faults and yet function Assume processes fail silently Need (k+1) redundancy to tolerant k faults Byzantine failures : processes run even if sick Produce erroneous, random or malicious replies Byzantine failures are most difficult to deal with Need ? Redundancy to handle Byzantine faults 3 Computer Science Lecture 17, page CS677: Distributed OS Byzantine Faults Simplified scenario: two perfect processes with unreliable channel Need to reach agreement on a 1 bit message Two army problem: Two armies waiting to attack Each army coordinates with a messenger Messenger can be captured by the hostile army Can generals reach agreement? Property: Two perfect process can never reach agreement in presence of unreliable channel Byzantine generals problem: Can N generals reach agreement with a perfect channel? M generals out of N may be traitors 4
Computer Science Lecture 17, page CS677: Distributed OS Byzantine Generals Problem Recursive algorithm by Lamport The Byzantine generals problem for 3 loyal generals and 1 traitor. a) The generals announce their troop strengths (in units of 1 kilosoldiers). b) The vectors that each general assembles based on (a) c) The vectors that each general receives in step 3. 5 Computer Science Lecture 17, page CS677: Distributed OS Byzantine Generals Problem Example The same as in previous slide, except now with 2 loyal generals and one traitor.

