{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Final Review Quiz

If a block is marked exclusive that means that it is

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: xecuting the following sequence of instructions and adhering to the cache protocol in the diagram. Assume the following: • The data from addresses 110 and 120 are initially stored in both processors’ caches and marked as clean/shared. • The cache block size is 1 word. “Before” and “After” refer to the block’s state in cache before the instruction is executed and after it is executed, respectively. Processor: Hit or miss? Block’s Block’s Block’s Block’s Instruction state in P0 state in P0 state in P1 state in cache cache cache P1 cache (before) (after) (before) (after) P0: read 120 P0: write 120 P1: write 120 P1: read 110 P0: write 110 P1: read 120 (b) The MESI protocol is similar to the protocol in the diagram, except that it has an additional state called “Exclusive.” If a block is marked “Exclusive,” that means that it is present in only one processor’s cache and not shared in any other processor’s cache, and that it is clean (has not been written). What is the advantage of this additional state? 9 Question 5: Parallel decomposition [10 points]. (a) What does it mean for a parallel computation to have a reduction? (b) To find the maximum element of an array of size N using 4 processors in parallel, where processor P0 initially contains the array elements, you can follow this algorithm: (1) Send N/4 elements to each of the other processors (2) Have all processors find the maximum of their N/4 elements (3) Have processors P0 and P1 communicate to find the larger of their individual maximum elements; in parallel, have P2 and P3 do the same thing. (4) Compare the results from step (3) to find the overall maximum element. Alternatively, you could just do the entire computation on P0. We call this the sequential algorithm. If it takes X cycles to send/receive one element between processors, and Y cycles to compare two elements, how much time does the sequential algorithm take? How much time does each step of the parallel algorithm take? Using the above results, what relationship between X and Y must be true for the parallel algorithm to be preferable to the sequential algorithm? (e.g. X < Y) 10 9.3 9-17 Multiprocessors Connected by a Single Bus Invalid (not valid cache block) Processor write miss Processor read miss Processor miss Processor miss (write dirty block to memory) d en Shared (clean) ) te da Processor ali write hit inv Processor read hit (S Modified (dirty) Processor read hit or write hit a. Cache state transitions using signals from the processor Invalid (not valid cache block) Another processor has a read miss or a write miss for this block (seen on bus); write back old block Invalidate or another processor has a write miss for this block (seen on bus) Shared (clean) Modified (dirty) b. Cache state transitions using signals from the bus FIGURE 9.3.4 A write-invalidate cache coherence protocol. Part a of the diagram shows state transitions based on actions of the processor associated with this cache; part b shows transitions based on actions of other processors seen as operations on the bus. There is really only one state machine in a cache block, although there are two represented here to clarify when a transition occurs. The black arrows and actions specified in black text would be found in caches without coherency; the colored arrows and actions are added to achieve cache coherency. 11...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online