Unformatted text preview: hese are unequal, the load proceeds to read the data from the cache. Even though the store operation has not been completed, the processor can detect that it will affect a different memory location than the load is trying to read. This process is repeated on the second iteration as well. Here we can see that the storedata operation must wait until the result from the previous iteration has been loaded and incremented. Long before this, the storeaddr operation and the load operations can match up their adddresses, determine they are different, and allow the load to proceed. In our computation graph, we show the load for the second iteration beginning just one cycle after the load from the ﬁrst. If continued for more iterations, we would ﬁnd the 5.13. UNDERSTANDING MEMORY PERFORMANCE
%eax.0 %edx.0 259 1 2 3 4 5 6 7 8 9 = store data store addr load cc.1 decl jnc %eax.1 = store addr decl
cc.2 %eax.2 jnc %edx.1a %edx.1b incl load store data Iteration 1 10 11 12 Cycle
%edx.2a incl %edx.2b Iteration 2 Figure 5.36: Timing of write read for Example B. The store and load operations have the same address, and hence the load must wait until it can get the...
View Full Document