fault_tolerance - Today CSCI 5105 Foundations of Modern...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CSCI 5105 Foundations of Modern Operating Systems Instructor: Abhishek Chandra 2 Today Fault Tolerance in Distributed Systems Types of Faults Fault Tolerance Techniques 3 Faults What is a fault? Cause of an error or a failure Faults are common Machines crash, disks fail, bugs occur, packets lost How is the effect of faults different in single- machines vs. distributed systems? 4 Fault Tolerance Fault Tolerance Ability of a system to continue functioning normally in the presence of faults Questions: How can we detect faults? How can we hide the effects of faults? How can we recover from failures?
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 5 Fault Tolerance Properties Availability: What percentage of time is a system available for use? Reliability: How long can a system stay up continuously? Safety: Small failures should not have catastrophic effects Maintainability: How easy is it to repair faults? 6 Types of Faults Transient faults: Happen once and disappear E.g.: Temporary network outage Intermittent faults: Happen occasionally but unpredictably E.g.: System deadlocks, race conditions Permanent faults: Faulty component must be repaired/replaced
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/21/2011 for the course CSCI 5105 taught by Professor Staff during the Spring '08 term at Minnesota.

Page1 / 6

fault_tolerance - Today CSCI 5105 Foundations of Modern...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online