We can see that proling is a useful tool to have in

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: hese are unequal, the load proceeds to read the data from the cache. Even though the store operation has not been completed, the processor can detect that it will affect a different memory location than the load is trying to read. This process is repeated on the second iteration as well. Here we can see that the storedata operation must wait until the result from the previous iteration has been loaded and incremented. Long before this, the storeaddr operation and the load operations can match up their adddresses, determine they are different, and allow the load to proceed. In our computation graph, we show the load for the second iteration beginning just one cycle after the load from the first. If continued for more iterations, we would find the 5.13. UNDERSTANDING MEMORY PERFORMANCE %eax.0 %edx.0 259 1 2 3 4 5 6 7 8 9 = store data store addr load cc.1 decl jnc %eax.1 = store addr decl cc.2 %eax.2 jnc %edx.1a %edx.1b incl load store data Iteration 1 10 11 12 Cycle %edx.2a incl %edx.2b Iteration 2 Figure 5.36: Timing of write read for Example B. The store and load operations have the same address, and hence the load must wait until it can get the...
View Full Document

Ask a homework question - tutors are online