Do nothing shared s do nothing issue bus upgrade s m

This preview shows page 18 - 24 out of 59 pages.

Do nothing Shared (S) Do nothing Issue bus upgrade s ' = M s ' = I Respond shared s ' = I s ' = I Exclusive (E) Do nothing s ' = M s ' = I Respond shared s ' = S s ' = I Error Modified (M) Do nothing Do nothing Write data back; s ' = I Respond dirty; Write data back; s ' = S Respond dirty; Write data back; s ' = I Error
Image of page 18

Subscribe to view the full document.

Implementing Cache Coherence Snooping implementation Origins in shared-memory-bus systems All CPUs could observe all other CPUs requests on the bus; hence “snooping” Bus Read, Bus Write, Bus Upgrade React appropriately to snooped commands Invalidate shared copies Provide up-to-date copies of dirty lines Flush (writeback) to memory, or Direct intervention ( modified intervention or dirty miss ) Snooping suffers from: Scalability: shared busses not practical Ordering of requests without a shared bus Lots of recent and on-going work on scaling snoop-based systems
Image of page 19
Snoop Latency Snoop latency: Must reach all nodes, return and combine responses Parallelism: fundamental advantage of snooping Broadcast exposes parallelism, enables speculative latency reduction LDir RDir XSnp XRsp CRsp RDat XRd XDat UDat RDat XDat UDat RDat XDat UDat RDat XDat UDat
Image of page 20

Subscribe to view the full document.

Alternative to Snooping Directory implementation Extra bits stored in memory (directory) record MSI state of line Memory controller maintains coherence based on the current state Other CPUs’ commands are not snooped, instead: Directory forwards relevant commands Ideal filtering: only observe commands that you need to observe Meanwhile, bandwidth at directory scales by adding memory controllers as you increase size of the system Leads to very scalable designs (100s to 1000s of CPUs) Directory shortcomings Indirection through directory has latency penalty Directory overhead for all memory, not just what is cached If shared line is dirty in other CPU’s cache, directory must forward request, adding latency This can severely impact performance of applications with heavy sharing (e.g. relational databases)
Image of page 21
Directory Protocol Latency Access to non-shared data Overlap directory read with data read Best possible latency given NUMA arrangement Access to shared data Dirty miss, modified intervention No inherent parallelism Indirection adds latency Minimum 3 hops, often 4 hops LDir RDir XSnp XRd RDat UDat XDat
Image of page 22

Subscribe to view the full document.

Memory Consistency How are memory references from different processors interleaved? If this is not well-specified, synchronization becomes difficult or even impossible ISA must specify consistency model Common example using Dekker’s algorithm for synchronization If load reordered ahead of store (as we assume for a baseline OOO CPU) Both Proc0 and Proc1 enter critical section, since both observe that other’s lock variable (A/B) is not set If consistency model allows loads to execute ahead of stores, Dekker’s algorithm no longer works Common ISAs allow this: IA-32, PowerPC, SPARC, Alpha
Image of page 23
Image of page 24
You've reached the end of this preview.
  • Fall '09
  • PROFGURISOHI
  • Central processing unit, CPU cache, thread-level parallelism

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern