Lec17 - Computer Science Lecture 17 page CS677 Distributed...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Computer Science Lecture 17, page CS677: Distributed OS Today: Fault Tolerance • Agreement in presence of faults – Two army problem – Byzantine generals problem • Reliable communication • Distributed commit – Two phase commit – Three phase commit • Failure recovery – Checkpointing – Message logging 1 Computer Science Lecture 17, page CS677: Distributed OS Failure Masking by Redundancy • Triple modular redundancy. 2 Computer Science Lecture 17, page CS677: Distributed OS Agreement in Faulty Systems • How should processes agree on results of a computation? • K-fault tolerant : system can survive k faults and yet function • Assume processes fail silently – Need (k+1) redundancy to tolerant k faults • Byzantine failures : processes run even if sick – Produce erroneous, random or malicious replies • Byzantine failures are most difficult to deal with – Need ? Redundancy to handle Byzantine faults 3 Computer Science Lecture 17, page CS677: Distributed OS Byzantine Faults • Simplified scenario: two perfect processes with unreliable channel – Need to reach agreement on a 1 bit message • Two army problem: Two armies waiting to attack – Each army coordinates with a messenger – Messenger can be captured by the hostile army – Can generals reach agreement? – Property: Two perfect process can never reach agreement in presence of unreliable channel • Byzantine generals problem: Can N generals reach agreement with a perfect channel? – M generals out of N may be traitors 4 Computer Science Lecture 17, page CS677: Distributed OS Agreement in Faulty Systems • How should processes agree on results of a computation? • K-fault tolerant : system can survive k faults and yet function • Assume processes fail silently – Need (k+1) redundancy to tolerant k faults • Byzantine failures : processes run even if sick – Produce erroneous, random or malicious replies • Byzantine failures are most difficult to deal with – Need ? Redundancy to handle Byzantine faults 3 Computer Science Lecture 17, page CS677: Distributed OS Byzantine Faults • Simplified scenario: two perfect processes with unreliable channel – Need to reach agreement on a 1 bit message • Two army problem: Two armies waiting to attack – Each army coordinates with a messenger – Messenger can be captured by the hostile army – Can generals reach agreement?...
View Full Document

This note was uploaded on 11/22/2011 for the course COMPSCI 677 taught by Professor Shenoy during the Spring '08 term at UMass (Amherst).

Page1 / 12

Lec17 - Computer Science Lecture 17 page CS677 Distributed...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online