synch-slides - D ISTRIBUTED S YSTEMS [COMP9243] S...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: D ISTRIBUTED S YSTEMS [COMP9243] S YNCHRONOUS Lecture 4: Synchronisation and Coordination A SYNCHRONOUS D ISTRIBUTED S YSTEMS Timing model of a distributed system (Part 1) Slide 1 VS Slide 3 Affected by: § Execution speed/time of processes Œ Distributed Algorithms § Communication delay  Time and Clocks § Clocks & clock drift Ž Global State § (Partial) failure  Concurrency Control D ISTRIBUTED A LGORITHMS Synchronous Distributed System: Time variance is bounded Algorithms that are intended to work in a distributed environment Execution : bounded execution speed and time Used to accomplish tasks such as: Communication : bounded transmission delay § Communication Clocks : bounded clock drift (and differences in clocks) § Accessing resources Slide 2 Slide 4 § Allocating resources § Consensus Effect: § Can rely on timeouts to detect failure § etc. V Easier to design distributed algorithms X Very restrictive requirements Synchronisation and coordination inextricably linked to distributed algorithms • Limit concurrent processes per processor § Achieved using distributed algorithms • Limit concurrent use of network § Required by distributed algorithms • Require precise clocks and synchronisation S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS 1 S YNCHRONOUS VS A SYNCHRONOUS D ISTRIBUTED S YSTEMS 2 Asynchronous Distributed System: Time variance is not bounded S YNCHRONISATION Execution : different steps can have varying duration Communication : transmission delays vary widely Slide 5 AND C OORDINATION Important: Clocks : arbitrary clock drift Slide 7 Effect: Doing the right thing at the right time. Two fundamental issues: § Allows no assumption about time intervals § Coordination (the right thing) X Cannot rely on timeouts to detect failure § Synchronisation (the right time) X Most asynch DS problems hard to solve V Solution for asynch DS is also a solution for synch DS § Most real distributed systems are hybrid synch and asynch E VALUATING D ISTRIBUTED A LGORITHMS General Properties: § Performance S YNCHRONISATION • number of messages exchanged Ordering of all actions • response/wait time • delay Slide 6 § Total ordering of events Slide 8 • throughput: 1/(delay + executiontime) • complexity: O () § Total ordering of instructions § Total ordering of communication § Efficiency § Ordering of access to resources • resource usage: memory, CPU, etc. § Requires some concept of time § Scalability § Reliability • number of points of failure (low is good) S YNCHRONISATION AND C OORDINATION 3 C OORDINATION 4 C OORDINATION Coordinate actions and agree on values. Coordinate Actions: § What actions will occur Slide 9 T IME Slide 11 § Who will perform actions AND C LOCKS Agree on Values: § Agree on global value § Agree on environment § Agree on state T IME M AIN I SSUES Global Time: § ’Absolute’ time Time and Clocks: synchronising clocks and using time in distributed algorithms Slide 10 • Einstein says no absolute time • Absolute enough for our purposes Global State: how to acquire knowledge of the system’s global state § Astronomical time Slide 12 • Based on earth’s rotation • Not stable Concurrency Control: coordinating concurrent access to resources § International Atomic Time (IAT) Coordination: when do processes need to coordinate and how do they do it § Coordinated Universal Time (UTC) • Based on oscillations of Cesium-133 • leap seconds • Signals broadcast over the world T IME AND C LOCKS 5 T IME 6 P HYSICAL C LOCKS Synchronisation Using Physical Clocks: Examples: Local Time: Slide 13 § Not synchronised to Global source Slide 15 § Relative not ’absolute’ § Performing events at an exact time (turn lights on/off, lock/unlock gates) § Logging of events (for security, for profiling, for debugging) § Tracking (tracking a moving object with separate cameras) § Make (edit on one computer build on another) C LOCKS Computer Clocks: § Crystal oscillates at known frequency § Oscillations cause timer interrupts Based on actual time § Timer interrupts update clock § Cp (t): current time (at UTC time t) on machine p Slide 14 Clock Skew: Slide 16 § Crystals in different computers run at slightly different rates § Ideally Cp (t) = t X Clock skew causes clocks to drift § Clocks get out of sync § Must regularly synchronise with UTC § Maximum drift rate Timestamps: § Used to denote at which time an event occurred § Logical vs Physical P HYSICAL C LOCKS 7 S YNCHRONISING P HYSICAL C LOCKS 8 S YNCHRONISING P HYSICAL C LOCKS C RISTIAN ’ S A LGORITHM Internal Synchronisation: Time Server: § Clocks synchronise locally Slide 17 § Has UTC receiver § Only synchronised with each other § Passive External Synchronisation: Slide 19 Algorithm: § Clocks synchronise to an external time source § Clients periodically request the time § Synchronise with UTC every δ seconds § Don’t set time backward § Take propagation and interrupt handling delay into account Time Server: § Server that has the correct time • (T 1 − T 0)/2 § Server that calculates the correct time • Or take a series of measurements and average the delay N ETWORK T IME P ROTOCOL (NTP) B ERKELEY A LGORITHM Hierarchy of Servers: § Primary Server: has UTC clock Time daemon 3:00 3:00 3:00 3:00 3:05 0 -10 Slide 18 § Secondary Server: connected to primary +5 § etc. +15 3:00 +25 Slide 20 -20 Network Synchronisation Modes: Multicast: for LAN, low accuracy 2:50 3:25 (a) 2:50 3:25 (b) 3:05 3:05 Procedure Call: clients poll, reasonable accuracy (c) Symmetric: Between peer servers. highest accuracy C RISTIAN ’ S A LGORITHM 9 N ETWORK T IME P ROTOCOL (NTP) 10 The relation → is a partial order: § If a → b, then a causally affects b § We consider unordered events to be concurrent: Example:a → b and b → a implies a b Synchronisation: E11 § Estimate clock offsets and transmission delays between two nodes Slide 21 E12 E13 E14 P1 § Keep estimates for past communication Slide 23 § Choose offset estimate for lowest transmission delay P2 § Also determine unreliable servers E21 E22 E23 E24 Real Time § Accuracy 1 - 50 msec § Causally related: E11 → E12 , E13 , E14 , E23 , E24 , . . . E21 → E22 , E23 , E24 , E13 , E14 , . . . § Concurrent: E11 E21 , E12 E22 , E13 E23 , E11 E22 , E13 E24 , E14 E23 , . . . Lamport’s logical clocks: L OGICAL C LOCKS § Software counter to locally compute the happened-before relation → § Each process pi maintains a logical clock Li § Lamport timestamp: Event ordering is more important than physical time: § Events (e.g., state changes) in a single process are ordered § Processes need to agree on ordering of causally related events (e.g., message send and receive) • Li (e): timestamp of event e at pi • L(e): timestamp of event e at process it occurred at Local ordering: § System consists of N processes pi , i ∈ {1, . . . , N } Slide 22 Slide 24 § Local event ordering →i : If pi observes e before e , we have e →i e Global ordering: Œ Before timestamping a local event pi executes Li := Li + 1  Whenever a message m is sent from pi to pj : • pi executes Li := Li + 1 and sends Li with m • pj receives Li with m and executes Lj := max(Lj , Li ) + 1 (receive (m) is annotated with the new Lj ) § Leslie Lamport’s happened before relation → § Smallest relation, such that 1. e →i e implies e → e 2. For every message m, send (m) → receive (m) 3. Transitivity: e → e and e → e implies e → e L OGICAL C LOCKS Implementation: Properties: § a → b implies L(a) < L(b) § L(a) < L(b) does not necessarily imply a → b 11 L OGICAL C LOCKS 12 Example: Vector clocks: E12 E13 E14 E15 E11 E16 E17 6 7 § At each process, maintain a clock for every other process P1 1 2 3 4 5 § I.e., each clock Vi is a vector of size N § Vi [j ] contains i’s knowledge about j ’s clock 1 2 3 4 7 E23 E24 E25 Implementation: P2 E21 E22 Œ Initially, Vi [j ] := 0 for i, j ∈ {1, . . . , N } Real Time Slide 25 Slide 27 Total event ordering: • pj receives Vi with m and merges the vector clocks Vi and Vj : max(Vj [k], Vi [k]) + 1 , if j = k Vj [k] := max(Vj [k], Vi [k]) , otherwise § Given local time stamps Li (e) and Lj (e ), we define global time stamps Li (e), i and Lj (e ), j § Lexicographical ordering: Li (e), i < Lj (e ), j iff • Li (e) < Lj (e ) or • Li (e) = Lj (e ) and i < j Properties: V ECTOR C LOCKS § For all i, j , Vi [i] ≥ Vj [i] § a → b iff V (a) < V (b) where Main shortcoming of Lamport’s clocks: § L(a) < L(b) does not imply a → b • • • • § We cannot deduce causal dependencies from time stamps: E11 E12 1 2 E21 1 Ž Whenever a message m is sent from pi to pj : • pi executes Vi [i] := Vi [i] + 1 and sends Vi with m § Complete partial to total order by including process identifiers Slide 26  Before pi timestamps an event, Vi [i] := Vi [i] + 1 P1 E22 E32 E33 1 2 = V iff V [i] = V [i] for i ∈ {1, . . . , N } ≥ V iff V [i] ≥ V [i] for i ∈ {1, . . . , N } > V iff V ≥ V ∧ V = V V iff V > V ∧ V > V Example: P2 Slide 28 3 E31 V V V V 3 P3 E11 1 Real Time (1,0,0) (2,0,0) E21 1 § We have L1 (E11 ) < L3 (E33 ), but E11 → E33 (0,1,0) § Why? E13 6 (3,4,1) 3 E22 (0,0,1) • There is no history as to where advances come from E23 4 (2,2,0) (2,3,1) 1 E31 • Clocks advances independently or via messages V ECTOR C LOCKS 2 E12 E32 2 (0,0,2) E24 5 (2,4,1) P1 P2 P3 Real Time § For L1 (E12 ) and L3 (E32 ), 2 = 2 versus (2, 0, 0) = (0, 0, 2) 13 G LOBAL S TATE 14 C ONSISTENT C UTS Determining global properties: § We need to combine information from multiple nodes G LOBAL S TATE Slide 29 Slide 31 § Without global time, how do we know whether collected local information is consistent? § Local state sampled at arbitrary points in time surely is not consistent § We need a criterion for what constitutes a globally consistent collection of local information Local history: § N processes pi , i ∈ {1, . . . , N } G LOBAL S TATE § For each pi , • event series e0 , e1 , e2 , . . . i i i Determining global properties: • is called pi ’s history denoted by hi . § Distributed garbage collection: Do any references exist to a given object? Slide 30 • May be finite or infinite Slide 32 § Distributed deadlock detection: Do processes wait in a cycle for each other? § We denote by hk a k-prefix of hi . i § Each event ej is either a local event or a communication event i § Distributed termination detection: Did a set of processes cease all activity? (Consider messages in transit!) Process state: § State of process pi immediately before event ek denoted sk i i § State sk records all events included in the history hk−1 i i § Hence, s0 refers to pi ’s initial state i C ONSISTENT C UTS 15 C ONSISTENT C UTS 16 Global history and state: cut 1 § Using a total event ordering, we can merge all local histories into a global history: P3 s cut 2 0 3 r 0 3 s 1 3 r 1 3 N hi H= i=1 Slide 33 Slide 35 r P2 0 2 r 1 2 s 0 2 s 1 2 s 2 2 r 1 1 § Similarly, we can combine a set of local states s1 , . . . , sN into a global state: P1 S = (s 1 , . . . , s N ) s 0 1 r 0 1 § Which combination of local state is consistent? Consistent cut: Cuts: § We call a cut consistent iff, § Similar to the global history, we can define cuts based on k-prefixes: for all events e ∈ C, e → e implies e ∈ C § A global state is consistent if it corresponds to a consistent cut N hci i C= § Note: we can characterise the execution of a system as a sequence of consistent global states i=1 § Slide 34 hci i is history of pi up to and including event eci i Slide 36 § The cut C corresponds to the state S = (sc1 +1 , . . . , scn +1 ) 1 N S0 → S1 → S2 → · · · Linearisation: § A global history that is consistent with the happened-before relation → is also called a linearisation or consistent run § The final events in a cut are its frontier: {eci | i ∈ {1, . . . , N }} i § A linearisation only passes through consistent global states § A state S is reachable from state S if there is a linearisation that passes thorough S and then S C ONSISTENT C UTS 17 C HANDY & L AMPORT ’ S S NAPSHOTS 18 C HANDY & L AMPOR T ’ S S NAPSHOTS § Determines a consistent global state § Takes care of messages that are in transit P1 * § Useful for evaluating stable global properties Slide 37 m3 m1 Properties: Slide 39 § Reliable communication and failure-free processes * P2 § Point-to-point message delivery is ordered m2 § Process/channel graph must be strongly connected P3 * § On termination, • processes hold only their local state components and • a set of messages that were in transit during the snapshot. Outline of the algorithm: Œ One process initiates the algorithm by • recording its local state and • sending a marker message over each outgoing channel  On receipt of a marker message over incoming channel c, Slide 38 • if local state not yet saved, save local state and send marker messages, or • if local state already saved, channel snapshot for c is complete Slide 40 C ONCURRENCY Ž Local contribution complete after markers received on all incoming channels Result for each process: § One local state snapshot § For each incoming channel, a set of messages received after performing the local snapshot and before the marker came down that channel C HANDY & L AMPORT ’ S S NAPSHOTS 19 C ONCURRENCY 20 C ONCURRENCY Concurrency in a Non-Distributed System: D ISTRIBUTED M UTUAL E XCLUSION Typical OS and multithreaded programming problems § Concurrent access to distributed resources § Prevent race conditions § Must prevent race conditions during critical regions § Critical sections Slide 41 Requirements: § Mutual exclusion Slide 43 • Locks • Semaphores Œ Safety: At most one process may execute the critical section at a time  Liveness: Requests to enter and exit the critical section eventually succeed • Monitors § Must apply mechanisms correctly Ž Ordering: Requests are processed in happened-before ordering • Deadlock • Starvation M ETHOD 1: C ENTRAL S ERVER Simplest approach: § Requests to enter and exit a critical section are sent to a lock server Concurrency in a Distributed System: § Permission to enter is granted by receiving a token Distributed System introduces more challenges § When critical section left, token is returned to the server § No directly shared resources (e.g., memory) Slide 42 Slide 44 § No global state § No global clock 0 § No centralised algorithms 1 2 0 2 0 1 3 Coordinator Queue is empty (a) 21 M ETHOD 1: C ENTRAL S ERVER 3 (b) 2 Release OK No reply § More concurrency D ISTRIBUTED M UTUAL E XCLUSION 1 Request OK Request 2 3 (c) 22 Properties: Properties: Slide 45 § Ring imposes an average delay of N/2 hops (limits scalability) § Easy to implement Slide 47 § Does not scale well § Token messages consume bandwidth § Failing nodes or channels can break the ring (token might be lost) § Central server may fail M ETHOD 2: T OKEN R ING Implementation: § All processes are organised in a logical ring structure § A token message is forwarded along the ring § Before entering the critical section, a process has to wait until the token comes by § Must retain the token until the critical section is left M ETHOD 3: U SING M ULTICASTS Slide 48 2 4 9 7 1 6 5 8 3 § Processes pi maintain a Lamport clock and can communicate pairwise § Processes are in one of three states: 2 0 L OGICAL C LOCKS Algorithm by Ricart & Agrawala: Slide 46 3 1 AND 0 1. Released: Outside of critical section 2. Wanted: Waiting to enter critical section 4 3. Held: Inside critical section 5 7 6 (a) M ETHOD 2: TOKEN R ING (b) 23 M ETHOD 3: U SING M ULTICASTS AND L OGICAL C LOCKS 24 E VALUATING D ISTRIBUTED A LGORITHMS General Properties: Process behaviour: § Performance Œ If a process wants to enter, it • multicasts a message Li , pi and Slide 49 • number of messages exchanged • waits until it has received a reply from every process • response/wait time • delay  If a process is in Released, it immediately replies to any request to enter the critical section Slide 51 • throughput: 1/(delay + executiontime) • complexity: O () Ž If a process is in Held, it delays replying until it is finished with the critical section § Efficiency  If a process is in Wanted, it replies to a request immediately only if the requesting timestamp is smaller than the one in its own request • resource usage: memory, CPU, etc. § Scalability § Reliability • number of points of failure (low is good) M UTUAL E XCLUSION : A C OMPARISON Messages Exchanged: Enters critical region 8 § Messages per entry/exit of critical section 12 8 OK 2 1 OK OK 8 1 12 (a) Slide 50 12 • Centralised: 3 • Ring: 1 → ∞ • Multicast: 2(n − 1) 0 0 0 OK (b) 2 2 1 Enters critical region Delay: (c) Slide 52 • Centralised: 2 • Ring: 0 → n − 1 • Multicast: 2(n − 1) Properties: § Multicast leads to increasing overhead (more sophisticated algorithm using only subsets of peer processes exists) Reliability: § Problems that may occur • Centralised: coordinator crashes • Ring: lost token, process crashes • Multicast: any process crashes § Susceptible to faults E VALUATING D ISTRIBUTED A LGORITHMS § Delay before entering critical section 25 M UTUAL E XCLUSION : A C OMPARISON 26 ...
View Full Document

Ask a homework question - tutors are online