Based on this prefetch predictors are produced of

This preview shows page 15 - 16 out of 35 pages.

address may be predicted from previous instructions (called predictor instructions). Based on this, prefetch predictors are produced of which timely, accurate, and non- redundant predictors are selected. To see timeliness of a predictor, their technique checks whether a load miss is too close (in terms of processor cycles) to a predictor instruction, since, in such a case, prefetching will not be useful. When this distance is smaller than a threshold, the predictor is not used. Also, if a large fraction of prefetched data of a predictor is redundant, it is eliminated. Based on the selected predictors, SW prefetching instructions are placed in an application’s assembly code directly. Working at instruction level enables their technique to have a broad view of memory access patterns spanning over boundaries of functions, modules, and libraries. Their technique integrates and generalizes multiple prefetching approaches such as self-stride, next- line and intra-iteration stride, same-object, and other approaches for pointer-intensive and function call-intensive programs. Wang et al. [2003] present a HW-SW co-operative prefetching technique. Their tech- nique uses compiler analysis to generate load hints, such as the spatial region and its size (number of lines), to prefetch the pointer in the load’s cache line to follow for prefetching and the pointer data structure to recursively prefetch. Specifying size al- lows prefetching a variable-size region based on loop bounds. Based on these hints and triggered by L2 cache misses, prefetches are generated in HW at runtime. Unlike other SW techniques, in their technique, individual prefetch addresses are generated by HW and not by SW, which allows timely prefetching of data. Also, use of compiler hints allows reduced storage overhead and accurate prefetching even for complex access pat- terns, which is challenging for a HW-only approach. They show that while generating significantly less prefetch traffic, their technique still provides similar performance improvement as a HW-only prefetching technique. Rabbah et al. [2004] present a compiler directed prefetching technique that uses speculative execution. The portion of program dependence graph relevant to the ad- dress computation of a long latency memory operation is termed as load dependence chain (LDC). The LDCs are identified by the compiler and precomputations are stat- ically embedded in the program instruction stream with prefetch instructions. Using speculative execution of LDCs, future memory addresses are precomputed, which are utilized for performing prefetching. For pointer-based applications, LDCs contain in- structions that may miss in D-caches, and, hence, generating prefetch addresses for such applications causes large instruction overhead. To address this, their technique provisions that if a load in the LDC sees a miss, successive precomputation instructions are bypassed.

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture