11_directory_slides.pdf - Lecture 11 Directory-Based Coherence Implementing Synchronization Parallel Computing Stanford CS149 Fall 2019 What’s Due ▪

11_directory_slides.pdf - Lecture 11 Directory-Based...

This preview shows page 1 - 10 out of 91 pages.

Parallel Computing Stanford CS149, Fall 2019 Lecture 11: Directory-Based Coherence + Implementing Synchronization
Image of page 1

Subscribe to view the full document.

Stanford CS149, Fall 2019 What’s Due Nov 1 - Assignment 3: A Simple Renderer in CUDA Nov 4 - Written Assignment 3 Nov 5 - Midterm - Open book, open notes - Review session on Nov 3
Image of page 2
Stanford CS149, Fall 2019 Today’s topics A discussion of directory-based cache coherence Efficiently implementing synchronization primitives - Primitives for ensuring mutual exclusion - Locks - Atomic primitives (e.g., atomic_add) - Transactions (later in the course) - Primitives for event signaling - Barriers OpenMP - Parallelizing loops
Image of page 3

Subscribe to view the full document.

Stanford CS149, Fall 2019 Review: MSI state transition diagram * S (Shared) M (Modified) PrRd / -- PrWr / -- PrRd / BusRd BusRd / BusWB Remote processor (coherence) initiated transaction Local processor initiated transaction A / B: if action A is observed by cache controller, action B is taken I (Invalid) PrWr / BusRdX PrWr / BusRdX PrRd / -- BusRdX / -- BusRdX / BusWB BusRd / -- BusWB = Write-back dirty line to memory * Remember, all caches are carrying out this logic independently to maintain coherence
Image of page 4
Stanford CS149, Fall 2019 Example Consider this sequence of loads and stores to addresses X and Y by processors P0 and P1 Assume that X and Y contain value 0 at start of execution Hit/Miss Bus P0 state P1 state P0: LD X P0: LD X P0: ST X 1 P0: ST X 2 P1: ST X 3 P1: LD X P0: LD X P0: ST X 4 P1: LD X P0: LD Y P0: ST Y 1 P1: ST Y 2
Image of page 5

Subscribe to view the full document.

Stanford CS149, Fall 2019 Directory-based cache coherence
Image of page 6
Stanford CS149, Fall 2019 What you should know What limits the scalability of snooping-based approaches to cache coherence? How does a directory-based scheme avoid these problems? How can the storage overhead of the directory structure be reduced? (and at what cost?)
Image of page 7

Subscribe to view the full document.

Stanford CS149, Fall 2019 Implementing cache coherence Processor Local Cache Processor Local Cache Processor Local Cache Processor Local Cache Interconnect Memory I/O The snooping cache coherence protocols discussed last week relied on broadcasting coherence information to all processors over the chip interconnect. Every time a cache miss occurred, the triggering cache communicated with all other caches! We discussed what information was communicated and what actions were taken to implement the coherence protocol. We discussed breifly how to implement broadcasts on an interconnect. One example is to use a shared bus for the interconnect Efficient broadcast Scalability of buses is limited by bus bandwidth
Image of page 8
Stanford CS149, Fall 2019 Problem: scaling cache coherence to large machines Processor Local Cache Memory Processor Local Cache Memory Processor Local Cache Memory Processor Local Cache Memory Interconnect Recall idea of non-uniform access shared memory systems (NUMA): locating regions of memory near the processors increases scalability: it yields higher aggregate bandwidth and reduced latency (especially when there is locality in the application) But... efficiency of NUMA system does little good if the coherence protocol can’t also be scaled!
Image of page 9

Subscribe to view the full document.

Image of page 10
  • Fall '19

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes