Ld a st b ld b program order program order raw cycle

This preview shows page 5 - 7 out of 10 pages.

LD A ST B LD B Program order Program order RAW Cycle indicates that execution is incorrect 1. 3. 4. Anatomy of a cycle Proc 1 ST A Proc 2 WAR Incoming invalidate LD A ST B LD B Program order Program order RAW Cache miss High Performance Sequential Consistency Out-of-order processor core Bus writes Bus upgrades Load queue System address bus Other processors P P Load queue records all speculative loads Bus writes/upgrades are checked against LQ Any matching load gets marked for replay At commit, loads are checked and replayed if necessary Results in machine flush, since load dependent ops must also replay Practically, conflicts are rare, so expensive flush is OK In-order commit P Relaxed Consistency Models Key insight: only synchronizing references need ordering Hence, relax memory for all references Enable high performance OOO implementation Require programmer to label synchronization references Hardware must carefully order these labeled references All other references can be performed out of order Labeling schemes: Explicit synchronization ops (acquire/release) Memory fence or memory barrier ops: All preceding ops must finish before following ones begin Often: fence ops cause pipeline drain in modern OOO machine More: ECE/CS 757 Coherent Memory Interface Out-of-order processor core Load Q Store Q Storethrough Q Critical word bypass Level 1 tag array Level 1 data array WB buffer MSHR Snoop queue WB buffer Fill buffer Level 2 tag array Level 2 data array System address and response bus System data bus
Image of page 5

Subscribe to view the full document.

Executing Multiple Threads, ECE 752 Mikko H. Lipasti 6 Split Transaction Bus Req A Rsp ~B Xmit A Xmit B Rsp ~A Read A from DRAM Read B from DRAM Req A Req B (a) Simple bus with atomic transactions Rsp ~A Read A from DRAM Xmit A Req B Rsp ~B Read B from DRAM Xmit B Rsp ~C Xmit C Read C from DRAM Rsp ~D Req C Req D Xmit D Read D from DRAM (b) Split-transaction bus with separate requests and responses “Packet switched” vs. “circuit switched” Release bus after request issued Allow multiple concurrent requests to overlap memory latency Complicates control, arbitration, and coherence protocol Transient states for pending blocks (e.g. “req. issued but not completed”) Example: MSI (SGI Origin like, directory, invalidate) High Level Example: MSI (SGI Origin like, directory, invalidate) High Level Busy States Example: MSI (SGI Origin like, directory, invalidate) High Level Busy States Races Multithreading Basic idea: CPU resources are expensive and should not be left idle 1960’s: Virtual memory and multiprogramming VM/MP invented to tolerate latency to secondary storage (disk/tape/etc.) Processor:secondary storage cycle time ratio: microseconds to tens of milliseconds (1:10000 or more) OS context switch used to bring in other useful work while waiting for page fault or explicit read/write Cost of context switch must be much less than I/O latency (easy) 1990’s: Memory wall and multithreading Processor: non cache storage cycle time ratio: nanosecond to fractions of a microsecond (1:500 or worse)
Image of page 6
Image of page 7
You've reached the end of this preview.
  • Fall '09
  • PROFGURISOHI
  • CPU cache, Mikko H. Lipasti, Prof. Mikko H. Lipasti

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern