4- failure_detectors

4- failure_detectors - CSE 486/586 Distributed Systems...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Failure Detectors Steve Ko Computer Sciences and Engineering University at Buffalo
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CSE 486/586, Spring 2012 Last Time Socket programming socket(), bind(), listen(), accept(), connect(), read(), write()… Android Activities, Services, Broadcast receivers, Content providers, Intents, AndroidManifest.xml Overview of the projects Project 0: simple messenger Project 1 ~ project 3: distributed key-value store 2
Background image of page 2
CSE 486/586, Spring 2012 Today’s Question You’ll learn new terminologies, definitions, etc. 3 zzz… I have a feeling that something went wrong…
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CSE 486/586, Spring 2012 Two Different System Models Synchronous Distributed System Each message is received within bounded time Each step in a process takes lb < time < ub (Each local clock’s drift has a known bound) Examples: Multiprocessor systems Asynchronous Distributed System No bounds on message transmission delays No bounds on process execution (The drift of a clock is arbitrary) Examples: Internet, wireless networks, datacenters, most real systems These are used to reason about how protocols would behave , e.g., in formal proofs. 4
Background image of page 4
CSE 486/586, Spring 2012 Failure Model What is a failure? We’ll consider: process omission failure A process disappears. Permanently: crash-stop (fail-stop) – a process halts and does not execute any further operations Temporarily: crash-recovery – a process halts, but then recovers (reboots) after a while We will focus on crash-stop failures They are easy to detect in synchronous systems Not so easy in asynchronous systems The first step to handle failures? 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CSE 486/586, Spring 2012 What is a Failure Detector? pi pj 6
Background image of page 6
CSE 486/586, Spring 2012 What is a Failure Detector? pi pj Crash-stop failure (pj is a failed process) 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CSE 486/586, Spring 2012 What is a Failure Detector? pi needs to know about pj’s failure (pi is a non-faulty process or alive process) Crash-stop failure (pj is a failed process) pj There are two styles of failure detectors 8
Background image of page 8
pi queries pj once every T time units If pj does not respond within another T time units of being sent the ping, pi detects/declares pj as failed I. Ping-Ack Protocol pi pj pj replies ping ack If pj fails, then within T time units, pi will send it a ping message. pi will time out within another T time units. Worst case Detection time = 2T
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 02/27/2012.

Page1 / 29

4- failure_detectors - CSE 486/586 Distributed Systems...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online