COS 226_07 - Acknowledgement COS 226 Chapter 7 Spin Locks...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Acknowledgement COS 226 Chapter 7 Spin Locks and Contention Some of the slides are taken from the companion slides for “The Art of Multiprocessor Programming” by Maurice Herlihy & Nir Shavit Focus so far: Correctness and Progress Models Accurate But idealized New Focus: Performance Models More complicated Still focus on principles Protocols Elegant Important But naïve Protocols Elegant Important And realistic Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single data stream Kinds of Architectures SISD (Uniprocessor) Single instruction stream Single data stream SIMD (Vector) Single instruction Multiple data SIMD (Vector) Single instruction Multiple data Our space MIMD (Multiprocessors) Multiple instruction Multiple data. MIMD (Multiprocessors) Multiple instruction Multiple data. 1 MIMD Architectures Concurrency issues Memory contention: Not all processors can access the same memory at the same time and if they try they will have to queue Contention for communication medium: memory Shared Bus Distributed If everyone wants to communicate at the same time, some of them will have to wait Communication latency: If takes more time for a processor to communicate with memory or with another processor. New goals Think of performance, not just correctness and progress Understand the underlying architecture Understand how the architecture affects performance Start with Mutual Exclusion What should you do if you can’t get a lock? Keep trying “spin” or “busy-wait” Good if delays are short Give up the processor Suspend yourself and ask the schedule to create another thread on your processor Good if delays are long our focus Always good on uniprocessor Basic Spin-Lock Basic Spin-Lock …lock suffers from contention . . . CS spin lock critical section Resets lock upon exit . . . CS spin lock critical section Resets lock upon exit 2 Contention Contention: When multiple threads try to acquire a lock at the same time Welcome to the real world Java Lock interface java.util.concurrent.locks package Lock mutex = new LockImpl (…); … mutex.lock(); try { … } finally { mutex.unlock(); } High contention: There are many such threads Low contention: The opposite Why don’t we just use the Filter or Bakery Locks? The principal drawback is the need to read and write n distinct locations where n is the number of concurrent threads This means that the locks require space linear in n What about the Peterson lock? class Peterson implements Lock { boolean private boolean flag = new boolean[2]; private int victim; public void lock() { ThreadID.get(); int i = ThreadID.get(); int j = 1 – i; flag[i] flag[i] = true; victim = i; (flag[j flag[j] while (flag[j] && victim == i) {}; } Peterson lock? It is not our logic that fails, but our assumptions about the real world We assumed that read and write operations are atomic Our proof relied on the assumption that any two memory accesses by the same thread, even to different variables, take place in program order Why does it not take place in program order? Modern multiprocessors do not guarantee program order Due to: Compilers reorder instructions to enhance performance Multiprocessor hardware itself writes to multiprocessor memory do not necessarily take effect when they are issued writes to shared memory are buffered and written to memory only when needed 3 How can one fix this? Memory barriers (or memory fences) can be used to force outstanding operations to take effect It is the programmer’s responsibility to know when to insert a memory barrier However, memory barriers are expensive Review: Test-and-Set Boolean value Test-and-set (TAS) Swap true with current value Return value tells if prior value was true or false Can reset just by writing false TAS aka “getAndSet” Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean newValue) getAndSet(boolean newValue) { boolean prior = value; newValue; value = newValue; return prior; } } Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean newValue) getAndSet(boolean newValue) { boolean prior = value; newValue; value = newValue; return prior; } Package } java.util.concurrent.atomic Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean newValue) getAndSet(boolean newValue) { boolean prior = value; newValue; value = newValue; return prior; } } Review: Test-and-Set AtomicBoolean lock AtomicBoolean(false) = new AtomicBoolean(false) … lock.getAndSet(true) boolean prior = lock.getAndSet(true) Swap old and new values 4 Review: Test-and-Set AtomicBoolean lock AtomicBoolean(false) = new AtomicBoolean(false) … lock.getAndSet(true) boolean prior = lock.getAndSet(true) Test-and-Set Locks Locking Lock is free: value is false Lock is taken: value is true Acquire lock by calling TAS Swapping in true is called “test-and-set” or TAS If result is false, you win If result is true, you lose Release lock by writing false Test-and-set Lock class TASlock { AtomicBoolean state = AtomicBoolean(false); new AtomicBoolean(false); void lock() { (state.getAndSet(true state.getAndSet(true)) while (state.getAndSet(true)) {} } void unlock() { state.set(false); state.set(false); }} Test-and-set Lock class TASlock { AtomicBoolean state = AtomicBoolean(false); new AtomicBoolean(false); void lock() { (state.getAndSet(true state.getAndSet(true)) while (state.getAndSet(true)) {} } void unlock() { state.set(false); state.set(false); is Lock state }} AtomicBoolean Test-and-set Lock class TASlock { AtomicBoolean state = AtomicBoolean(false); new AtomicBoolean(false); void lock() { (state.getAndSet(true state.getAndSet(true)) while (state.getAndSet(true)) {} } void unlock() { state.set(false); state.set(false); Keep trying until }} Test-and-set Lock class TASlock { Release lock AtomicBoolean state = by resetting AtomicBoolean(false); new AtomicBoolean(false); state to false void lock() { (state.getAndSet(true state.getAndSet(true)) while (state.getAndSet(true)) {} } void unlock() { state.set(false); state.set(false); }} lock acquired 5 Performance Experiment n threads Increment shared counter 1 million times Graph no speedup because of sequential bottleneck How long should it take? How long does it take? time ideal threads Mystery #1 TAS lock Test-and-Test-and-Set Locks Lurking stage Wait until lock “looks” free Spin while read returns true (lock taken) time Ideal Pouncing state As soon as lock “looks” available Read returns false (lock free) Call TAS to acquire lock If TAS loses, back to lurking threads What is going on? Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = AtomicBoolean(false); new AtomicBoolean(false); void lock() { while (true) { (state.get state.get()) while (state.get()) {} (!state.getAndSet(true state.getAndSet(true)) if (!state.getAndSet(true)) return; } } Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = AtomicBoolean(false); new AtomicBoolean(false); void lock() { while (true) { (state.get state.get()) while (state.get()) {} (!state.getAndSet(true state.getAndSet(true)) if (!state.getAndSet(true)) return; } Wait until lock looks } free 6 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = AtomicBoolean(false); new AtomicBoolean(false); Mystery #2 TAS lock TTAS lock Ideal void lock() { acquire while (true) { (state.get state.get()) while (state.get()) {} (!state.getAndSet(true state.getAndSet(true)) if (!state.getAndSet(true)) return; } } Then try to it time threads Mystery Both TAS and TTAS Do the same thing (in our model) Questions Why is the TTAS lock so good (that is, so much better than TAS)? Why is it so bad (so much worse than ideal)? Except that TTAS performs much better than TAS Neither approaches ideal Opinion Our memory abstraction is broken TAS & TTAS methods Are provably the same (in our model) Except they aren’t (in field tests) Bus-Based Architectures cache cache Bus cache Need a more detailed model … memory 7 Bus-Based Architectures Random access memory (10s of cycles) cache cache Bus Bus-Based Architectures Shared Bus •Broadcast medium •One broadcaster at a time •Processors and memory all “snoop” cache cache Bus cache cache memory memory Per-Processor Caches Bus-Based •Small Architectures •Fast: 1 or 2 cycles •Address & state information Jargon Watch Cache hit “I found what I wanted in my cache” Good Thing™ cache cache Bus cache memory Jargon Watch Cache hit “I found what I wanted in my cache” Good Thing™ Cave Canem This model is still a simplification But not in any essential way Illustrates basic principles Cache miss “I had to shlep all the way to memory for that data” Bad Thing™ W ill discuss complexities later 8 Processor Issues Load Request Processor Issues Load Request Gimme data cache cache Bus cache cache cache Bus cache memory data memory data Memory Responds Processor Issues Load Request Gimme data cache Got your data right here cache Bus cache Bus data cache Bus cache memory data data memory data Processor Issues Load Request Gimme data Processor Issues Load Request I got data data cache Bus cache data cache Bus cache memory data memory data 9 Other Processor Responds I got data Other Processor Responds data cache Bus cache Bus data cache Bus cache Bus memory data memory data Modify Cached Data Modify Cached Data data data Bus cache data data data Bus cache memory data memory data Modify Cached Data Modify Cached Data data data Bus cache data data Bus cache memory data What’s up with the other copies?memory data 10 Cache Coherence We have lots of copies of data Original copy in memory Cached copies at processors Write-Back Caches Accumulate changes in cache W rite back when needed Need the cache for something else Another processor wants it Some processor modifies its own copy What do we do with the others? How to avoid confusion? W rite-back coherence protocol: Invalidate other entries Requires non-trivial protocol … Write-Back Caches Cache entry has three states Invalid: contains raw seething bits (meaningless) Valid: I can read but I can’t write because it may be cached elsewhere Dirty: Data has been modified Intercept other load requests Write back to memory before using cache Invalidate data data Bus cache memory data Invalidate Mine, all mine! Invalidate Uh,oh data data Bus cache data cache data Bus cache memory data memory data 11 Invalidate Other caches lose read permission Invalidate Other caches lose read permission cache data Bus cache cache data Bus cache memory data This cache acquires write permission memory data Invalidate Another Processor Asks for Data Memory provides data only if not present in any cache, so no need to change it now (expensive) data cache cache Bus cache data Bus cache memory data memory data Owner Responds Here it is! End of the Day … cache data Bus cache data data Bus cache memory data memory data no writing Reading OK, 12 Back to TASLocks How does a TASLock perform on a write-back shared-bus architecture? Because it uses the bus, each getAndSet() call delays all the other threads Even those not waiting for the lock TASLock When the thread wants to release the lock it may be delayed because the bus is being monopolized by the spinners The getAndSet() call forces the other processors to discard their own cached copies – resulting in a cache miss every time They must then use the bus to fetch the updated copy What about the TTASLock? Suppose thread A acquires the lock. The first time thread B reads the lock it takes a cache miss and has to use the bus to fetch the new value As long as A holds the lock however, B repeatedly rereads the value – resulting in a cache hit every time B thus produces no extra traffic What about the TTASLock? However when A releases the lock: A writes false to the lock variable The spinner’s cached copies are invalidated Each one takes a cache miss They all use the bus to read a new value They all call getAndSet() to acquire the lock The first one to acquire the lock invalidates the others who must then reread the value Storm of traffic Local spinning Threads repeatedly reread cached values instead of repeatedly using the bus Exponential Backoff Recall that in the TTASLock, the thread first reads the lock and if it appears to be free it attempts to acquire the lock If I see that the lock is free, but then another thread acquires it before I can, then there must be high contention for that lock Better to back off and try again later 13 For how long should a thread back off? Rule of thumb: The larger number of unsuccessful tries, the higher the contention, the longer the thread should back off. What about lock-step? What happens if all the threads backs off the same amount of time? Instead the threads should back off for a random amount of time Each time the thread tries and fails to get the lock, it doubles the back-off time, up to a fixed maximum. Approach: Whenever the thread sees the lock has become free, but fails to acquire it, it backs of before retrying Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { (state.get state.get()) while (state.get()) {} (!lock.getAndSet(true lock.getAndSet(true)) if (!lock.getAndSet(true)) return; sleep(random() sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { (state.get state.get()) while (state.get()) {} (!lock.getAndSet(true lock.getAndSet(true)) if (!lock.getAndSet(true)) return; sleep(random() sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} (state.get()) state.get (!lock.getAndSet(true lock.getAndSet(true)) if (!lock.getAndSet(true)) return; sleep(random() sleep(random() % delay); if (delay < MAX_DELAY) delay = Wait until lock looks free 2 * delay; }}} Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} (state.get()) state.get (!lock.getAndSet(true lock.getAndSet(true)) if (!lock.getAndSet(true)) return; sleep(random() sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; we win, return If }}} 14 Exponential Backoff Lock public class Backoff implements lock { Otherwise back off for public void lock() { int delay = MIN_DELAY; random duration while (true) { (state.get state.get()) while (state.get()) {} (!lock.getAndSet(true lock.getAndSet(true)) if (!lock.getAndSet(true)) return; sleep(random() % delay); sleep(random() if (delay < MAX_DELAY) delay = 2 * delay; }}} Exponential Backoff Lock public class Backoff implements lock { public voidmax delay, within reason Double lock() { int delay = MIN_DELAY; while (true) { (state.get state.get()) while (state.get()) {} (!lock.getAndSet(true lock.getAndSet(true)) if (!lock.getAndSet(true)) return; sleep(random() % delay); sleep(random() if (delay < MAX_DELAY) delay = 2 * delay; }}} Spin-Waiting Overhead TTAS Lock Backoff: Other Issues Good Easy to implement Beats TTAS lock time Bad Backoff lock Must choose parameters carefully Sensitive to choice of minimum and maximum delays Sensitive to number of processors and their speed threads Cannot have a general solution for all platforms and machines BackoffLock drawbacks Cache-coherence Traffic: All threads spin on the same location Idea Avoid useless invalidations By keeping a queue of threads Critical Section Underutilization: Threads delay longer than necessary Each thread Notifies next in line Without bothering the others 15 Queue Locks Cache-coherence traffic is reduced since each thread spins on a different location No need to guess when to attempt to access lock – increase critical section utilization First-come-first-served fairness Anderson Queue Lock tail 0 idle flags T F F F F F F F Anderson Queue Lock tail 0 Anderson Queue Lock tail 1 acquiring getAndIncrement acquiring getAndIncrement flags flags T F F F F F F F T F F F F F F F Anderson Queue Lock tail Anderson Queue Lock tail acquired Mine! acquired acquiring flags flags T F F F F F F F T F F F F F F F 16 Anderson Queue Lock tail Anderson Queue Lock tail acquired acquiring acquired acquiring flags getAndIncrement F F F F F F F flags getAndIncrement F F F F F F F T T Anderson Queue Lock tail Anderson Queue Lock tail acquired acquiring released acquired Waiting flags flags T F F F F F F F F T F F F F F F Anderson Queue Lock tail Anderson Queue Lock class ALock implements Lock { boolean flags={true,false true,false, boolean flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; ThreadLocal<Integer> mySlot; released acquired flags Yow! F T F F F F F F 17 Anderson Queue Lock class ALock implements Lock { boolean flags={true,false true,false, boolean flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; ThreadLocal<Integer> mySlot; Anderson Queue Lock class ALock implements Lock { boolean flags={true,false true,false, boolean flags={true,false,…,false}; AtomicInteger tail = new AtomicInteger(0); ThreadLocal<Integer> mySlot; ThreadLocal<Integer> mySlot; One flag per thread Next flag to use Anderson Queue Lock class ALock implements Lock { boolean flags={true,false true,false, boolean flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; ThreadLocal<Integer> mySlot; Anderson Queue Lock public lock() { next.getAndIncrement(); mySlot = next.getAndIncrement(); (!flags[mySlot while (!flags[mySlot % n]) {}; } public unlock() { flags[mySlot % n] = false; flags[(mySlot+1) % n] = true; } Thread-local variable Anderson Queue Lock public lock() { tail.getAndIncrement(); mySlot = tail.getAndIncrement(); (!flags[mySlot while (!flags[mySlot % n]) {}; } public unlock() { flags[mySlot % n] = false T = true; flags[(mySlot+1) % n] ake next } Anderson Queue Lock public lock() { next.getAndIncrement(); mySlot = next.getAndIncrement(); (!flags[mySlot while (!flags[mySlot % n]) {}; } public unlock() { flags[mySlot % n] = false; S % until told flags[(mySlot+1)pinn] = true; } slot to go 18 Anderson Queue Lock public lock() { next.getAndIncrement() myslot = next.getAndIncrement() % n; (!flags[myslot flags[myslot]) while (!flags[myslot]) {}; } public unlock() { flags[mySlot % n] = false; flags[(myslot+1) % n] = true; } Anderson Queue Lock public lock() { next.getAndIncrement(); mySlot = next.getAndIncrement(); Tell next thread to (!flags[mySlot while (!flags[mySlot % n]) {}; } public unlock() { flags[mySlot % n] = false; flags[(mySlot+1) % n] = true; } go Prepare slot for re-use Anderson Queue Lock Thread-local variables Are not stored in shared memory Do not require synchronization Do not generate coherence traffic Anderson Queue Lock Although the flags array is shared, contention on the array locations are minimised since each thread spins on its own locally cached copy of a single array location Performance TTAS Shorter handover than backoff Curve is practically flat Scalable performance FIFO fairness Anderson Queue Lock Good First truly scalable lock Simple, easy to implement Bad Not space efficient One bit per thread Unknown number of threads? Small number of actual contenders? queue 19 CLH Queue Lock Virtual Linked List keeps track of the queue Each thread’s status is saved in its node: True – has acquired the lock or wants to acquire the lock False – is finished with the lock and has released it Initially idle tail false Each node keeps track of its predecessors status Initially idle Initially idle Locked field: Lock is free tail false Queue tail tail false Initially idle Purple Wants the Lock acquiring tail false tail false 20 Purple Wants the Lock acquiring Purple Wants the Lock acquiring Add to the queue tail false true tail false true Purple Has the Lock acquired Red Wants the Lock acquired acquiring myPred tail false true myPred tail false true true Red Wants the Lock acquired acquiring Red Wants the Lock acquired acquiring myPred Add tail false true true myPred tail false true true 21 Red Wants the Lock acquired acquiring Red Wants the Lock acquired acquiring Implicit Linked list tail false true true tail false true true Red Wants the Lock acquired acquiring Red Wants the Lock acquired acquiring true Actually, it spins on cached copy true tail false true true tail false true Purple Releases release acquiring Purple Releases released acquired false Bingo! tail tail false false true true 22 Space Usage Let L = number of locks N = number of threads CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> ThreadLocal<Qnode> myNode Qnode(); = new Qnode(); public void lock() { Qnode pred tail.getAndSet(myNode); = tail.getAndSet(myNode); (pred.locked pred.locked) while (pred.locked) {} }} ALock O(LN) CLH lock O(L+N) CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> ThreadLocal<Qnode> myNode Qnode(); = new Qnode(); public void lock() { Qnode pred tail.getAndSet(myNode); = tail.getAndSet(myNode); (pred.locked pred.locked) while (pred.locked) {} }} CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> ThreadLocal<Qnode> myNode Qnode(); = new Qnode(); public void lock() { Qnode pred tail.getAndSet(myNode); = tail.getAndSet(myNode); (pred.locked pred.locked) while (pred.locked) {} Thread-local Qnode }} Queue tail CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> ThreadLocal<Qnode> myNode Add my node Qnode(); = new Qnode(); public void lock() { Qnode pred tail.getAndSet(myNode); = tail.getAndSet(myNode); (pred.locked pred.locked) while (pred.locked) {} }} CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> ThreadLocal<Qnode> myNode Spin until predecessor Qnode(); = new Qnode(); releases lock public void lock() { Qnode pred tail.getAndSet(myNode); = tail.getAndSet(myNode); (pred.locked pred.locked) while (pred.locked) {} }} 23 CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode.locked.set(false); pred; myNode = pred; } } CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode.locked.set(false); pred; myNode = pred; } } Notify successor CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode.locked.set(false); pred; myNode = pred; } } CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode.locked.set(false); pred; myNode = pred; } } (we don’t actually reuse myNode. Code in book shows how it’s done.) Recycle predecessor’s node CLH Lock Good Lock release affects predecessor only Small, constant-sized space MCS Lock FIFO order Spin on local memory only Small, Constant-size overhead 24 MCS Queue Lock Similar to CLHLock, but the linked list is explicit instead of implicit Each node in the Queue has a next field Initially idle tail false Acquiring acquiring (add Qnode) true Acquiring acquired add tail false false true tail false Acquiring acquired myPred true Acquiring acquiring true tail false false tail false false 25 Acquiring acquired acquiring Acquiring acquired acquiring false false tail tail add true true Acquiring acquired acquiring Acquiring acquired acquiring false false tail true tail true Acquiring acquired acquiring Acquiring acquired acquiring true true Yes! tail true false tail false true 26 MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; } MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode(); Qnode qnode = new Qnode(); tail.getAndSet(qnode); Qnode pred = tail.getAndSet(qnode); (pred if (pred != null) { qnode.locked = true; qnode; pred.next = qnode; (qnode.locked qnode.locked) while (qnode.locked) {} }}} MCS Queue Lock class MCSLock implements Lock { Make a AtomicReference tail; QNode public void lock() { Qnode(); Qnode qnode = new Qnode(); tail.getAndSet(qnode); Qnode pred = tail.getAndSet(qnode); (pred if (pred != null) { qnode.locked = true; qnode; pred.next = qnode; (qnode.locked qnode.locked) while (qnode.locked) {} }}} MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode(); Qnode qnode = new Qnode(); tail.getAndSet(qnode); Qnode pred = tail.getAndSet(qnode); (pred if (pred != null) { qnode.locked = true;add my Node to qnode; pred.next = qnode; the tail of (qnode.locked qnode.locked) while (qnode.locked) {} queue }}} MCS Queue Lock class MCSLock implements Lock { Fix if queue AtomicReference tail; was non-empty public void lock() { Qnode(); Qnode qnode = new Qnode(); tail.getAndSet(qnode); Qnode pred = tail.getAndSet(qnode); (pred if (pred != null) { qnode.locked = true; qnode; pred.next = qnode; while (qnode.locked) {} (qnode.locked) qnode.locked }}} MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; Wait until public void lock() { unlocked Qnode(); Qnode qnode = new Qnode(); tail.getAndSet(qnode); Qnode pred = tail.getAndSet(qnode); (pred if (pred != null) { qnode.locked = true; qnode; pred.next = qnode; while (qnode.locked) {} (qnode.locked) qnode.locked }}} 27 MCS Queue Unlock class MCSLock implements Lock { AtomicReference tail; public void unlock() { (qnode.next if (qnode.next == null) { (tail.CAS(qnode tail.CAS(qnode, if (tail.CAS(qnode, null) return; (qnode.next while (qnode.next == null) {} } qnode.next.locked = false; }} MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { (qnode.next if (qnode.next == null) { (tail.CAS(qnode tail.CAS(qnode, if (tail.CAS(qnode, null) return; (qnode.next while (qnode.next == null) {} } Missing qnode.next.locked = false; successor? }} MCS Queue Lock class MCSLock implements Lock { If really no successor, AtomicReference tail; return public void unlock() { (qnode.next if (qnode.next == null) { (tail.CompareAndSet(qnode tail.CompareAndSet(qnode, if (tail.CompareAndSet(qnode, null) return; (qnode.next while (qnode.next == null) {} } qnode.next.locked = false; }} MCS Queue Lock class MCSLock implements Lock { Otherwise wait for AtomicReference tail; successor to catch up public void unlock() { (qnode.next if (qnode.next == null) { (tail.CAS(qnode tail.CAS(qnode, if (tail.CAS(qnode, null) return; (qnode.next while (qnode.next == null) {} } qnode.next.locked = false; }} MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; Pass lock to successor public void unlock() { (qnode.next if (qnode.next == null) { (tail.CAS(qnode tail.CAS(qnode, if (tail.CAS(qnode, null) return; (qnode.next while (qnode.next == null) {} } qnode.next.locked = false; }} Purple Release releasing swap false false 28 Purple Release By looking at the queue, I see another thread is releasing activeswap Purple Release releasing By looking at the queue, I see another thread is activeswap false false false false I have to wait for that thread to finish Purple Release releasing prepare to spin Purple Release releasing spinning true false false true Purple Release releasing spinning Purple Release releasing Acquired lock false true false false false true 29 Abortable Locks What if you want to give up waiting for a lock? For example Timeout Database transaction aborted by user Back-off Lock Aborting is trivial Just return from lock() call Extra benefit: No cleaning up Wait-free Immediate return CLH Queue Lock acquired acquiring acquiring acquired I’m out acquiring acquiring tail false true true true tail false true true true Finished Queue Locks acquiring Can’t just quit Thread in line behind will starve release Need a graceful way out tail false false true true 30 Abortable CLH Lock When a thread gives up Removing node from queue in a wait-free way is hard Abortable CLH Queue Lock When a thread times out, it marks it node as abandoned The successor in the queue notices that the node has been abandoned Successor starts spinning on the abandoned node’s predecessors Idea for lazy approach: let successor deal with it. Initially idle Pointer to predecessor (or null) Acquiring idle Distinguished available node means lock is free tail A tail A Acquiring idle Null predecessor means lock not released or aborted Acquiring idle myPred tail A A tail 31 Acquired idle Pointer to AVAILABLE means lock is free. Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE Qnode(); = new Qnode(); AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode; ThreadLocal<Qnode> myNode; tail A Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE Qnode(); = new Qnode(); AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode; ThreadLocal<Qnode> myNode; Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE Qnode(); = new Qnode(); AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode; ThreadLocal<Qnode> myNode; Distinguished node to signify free lock Tail of the queue Time-out Lock public class TOLock implements Lock { static Qnode AVAILABLE Qnode(); = new Qnode(); AtomicReference<Qnode> AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode; ThreadLocal<Qnode> myNode; Time-out Lock public boolean lock(long timeout) { Qnode(); Qnode qnode = new Qnode(); myNode.set(qnode); myNode.set(qnode); qnode.prev = null; tail.getAndSet(qnode); Qnode myPred = tail.getAndSet(qnode); (myPred myPred== if (myPred== null || myPred.prev == AVAILABLE) { return true; } … Remember my node … 32 Time-out Lock public boolean lock(long timeout) { Qnode(); Qnode qnode = new Qnode(); myNode.set(qnode); myNode.set(qnode); qnode.prev = null; tail.getAndSet(qnode); Qnode myPred = tail.getAndSet(qnode); (myPred if (myPred == null || myPred.prev == AVAILABLE) { return true; } Time-out Lock public boolean lock(long timeout) { Qnode(); Qnode qnode = new Qnode(); myNode.set(qnode); myNode.set(qnode); qnode.prev = null; tail.getAndSet(qnode); Qnode myPred = tail.getAndSet(qnode); (myPred if (myPred == null || myPred.prev == AVAILABLE) { return true; } Create & initialize node Swap with tail Time-out Lock public boolean lock(long timeout) { Qnode(); Qnode qnode = new Qnode(); myNode.set(qnode); myNode.set(qnode); qnode.prev = null; tail.getAndSet(qnode); Qnode myPred = tail.getAndSet(qnode); (myPred if (myPred == null || myPred.prev == AVAILABLE) { return true; } ... Time-out Lock … locked spinning spinning If predecessor absent or released, we are done long start = now(); (now()while (now()- start < timeout) { myPred.prev; Qnode predPred = myPred.prev; (predPred if (predPred == AVAILABLE) { return true; (predPred } else if (predPred != null) { predPred; myPred = predPred; } } … Time-out Lock … long start = now(); (now()while (now()- start < timeout) { myPred.prev; Qnode predPred = myPred.prev; (predPred if (predPred == AVAILABLE) { return true; (predPred } else if (predPred != null) { predPred; myPred = predPred; } } … Time-out Lock … long start = now(); (now()while (now()- start < timeout) { myPred.prev; Qnode predPred = myPred.prev; (predPred if (predPred == AVAILABLE) { return true; (predPred } else if (predPred != null) { predPred; myPred = predPred; } } … Keep trying for a while … Spin on predecessor’s prev field 33 Time-out Lock … long start = now(); (now()while (now()- start < timeout) { myPred.prev; Qnode predPred = myPred.prev; (predPred if (predPred == AVAILABLE) { return true; (predPred } else if (predPred != null) { predPred; myPred = predPred; } } … Time-out Lock … long start = now(); (now()while (now()- start < timeout) { myPred.prev; Qnode predPred = myPred.prev; (predPred if (predPred == AVAILABLE) { return true; (predPred } else if (predPred != null) { predPred; myPred = predPred; } } … Predecessor released lock Predecessor aborted, advance one Time-out Lock … (!tail.compareAndSet(qnode, myPred)) if (!tail.compareAndSet(qnode, myPred)) tail.compareAndSet(qnode myPred; qnode.prev = myPred; return false; } } Time-out Lock … (!tail.compareAndSet(qnode, myPred)) if (!tail.compareAndSet(qnode, myPred)) tail.compareAndSet(qnode myPred; qnode.prev = myPred; return false; } } What do I do when I time out? Do I have a successor? If CAS fails: I do have a successor, tell it about myPred Time-out Lock … (!tail.compareAndSet(qnode, myPred)) if (!tail.compareAndSet(qnode, myPred)) tail.compareAndSet(qnode myPred; qnode.prev = myPred; return false; } } Time-Out Unlock public void unlock() { myNode.get(); Qnode qnode = myNode.get(); (!tail.compareAndSet(qnode tail.compareAndSet(qnode, if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; } If CAS succeeds: no successor, simply return false 34 Time-out Unlock public void unlock() { myNode.get(); Qnode qnode = myNode.get(); (!tail.compareAndSet(qnode tail.compareAndSet(qnode, if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; } Timing-out Lock public void unlock() { myNode.get(); Qnode qnode = myNode.get(); (!tail.compareAndSet(qnode tail.compareAndSet(qnode, if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE; } If CAS failed: exists successor, notify successor it can enter CAS successful: set tail to null, no clean up since no successor waiting MCS Queue Locks spinning spinning spinning MCS Queue Locks locked spinning spinning true true true false true true MCS Queue Locks locked spinning MCS Queue Locks locked false true false 35 MCS Queue Locks spinning spinning spinning MCS Queue Locks spinning spinning true true true true true true MCS Queue Locks locked spinning MCS Queue Locks spinning false true true false true MCS Queue Locks pwned false true 36 ...
View Full Document

This note was uploaded on 10/11/2010 for the course COS COS226 taught by Professor Klazar during the Spring '10 term at University of Pretoria.

Ask a homework question - tutors are online