L2.sp11 - Lecture 4-1 Lecture 4-1 Distributed Systems CS...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture 4-1 Lecture 4-1 Distributed Systems CS 425 / CSE 424 / ECE 428 Spring 2011 Distributed Systems CS 425 / CSE 424 / ECE 428 Spring 2011 Reading: Sections 12.1 and part of 2.3.2 2010, I . Gupta (slides slightly revised by N. Vaidya) Lecture 4-2 Lecture 4-2 Your new datacenter Your new datacenter Youve been put in charge of a datacenter, and your manager has told you, Oh no! We dont have any failures in our datacenter! Do you believe him/her? What would be your first responsibility? Build a failure detector What are some things that could go wrong if you didnt do this? Lecture 4-3 Lecture 4-3 To build a failure detector To build a failure detector You have a few options 1. Hire 1000 people, each to monitor one machine in the datacenter and report to you when it fails. 2. Write a failure detector program (distributed) that automatically detects failures and reports to your workstation. Which is more preferable, and why? Lecture 4-4 Lecture 4-4 Two Different System Models Two Different System Models Whenever someone gives you a distributed computing problem, the first question you want to ask is, What is the model under which I need to solve the problem? S ynchronous Distributed System Each message is received within bounded time Each step in a process takes lb < time < ub (Each local clocks drift has a known bound) Examples: Multiprocessor systems Asynchronous Distributed System No bounds on message transmission delays No bounds on process execution (The drift of a clock is arbitrary) Examples: Internet, wireless networks, datacenters, most real systems Lecture 4-5 Lecture 4-5 Failure Model Failure Model Process omission failure Crash-stop (fail-stop) a process halts and does not execute any further operations Crash-recovery a process halts, but then recovers (reboots) after a while We will focus on Crash-stop failures They are easy to detect in synchronous systems Not so easy in asynchronous systems Lecture 4-6 Lecture 4-6 Whats a failure detector? Whats a failure detector? pi pj Lecture 4-7 Lecture 4-7 Whats a failure detector? Whats a failure detector? pi pj X Crash-stop failure (pj is a failed process) Lecture 4-8 Lecture 4-8 Whats a failure detector? Whats a failure detector? pi pj X needs to know about pjs failure (pi is a non-faulty process or alive process) There are two main flavors of Failure Detectors Crash-stop failure (pj is a failed process) Lecture 4-9 Lecture 4-9 I. Ping-Ack Protocol I. Ping-Ack Protocol pi pj needs to know about pjs failure...
View Full Document

Page1 / 27

L2.sp11 - Lecture 4-1 Lecture 4-1 Distributed Systems CS...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online