address may be predicted from previous instructions (called predictor instructions).Based on this, prefetch predictors are produced of which timely, accurate, and non-redundant predictors are selected. To see timeliness of a predictor, their techniquechecks whether a load miss is too close (in terms of processor cycles) to a predictorinstruction, since, in such a case, prefetching will not be useful. When this distance issmaller than a threshold, the predictor is not used. Also, if a large fraction of prefetcheddata of a predictor is redundant, it is eliminated. Based on the selected predictors, SWprefetching instructions are placed in an application’s assembly code directly. Workingat instruction level enables their technique to have a broad view of memory accesspatterns spanning over boundaries of functions, modules, and libraries. Their techniqueintegrates and generalizes multiple prefetching approaches such as self-stride, next-line and intra-iteration stride, same-object, and other approaches for pointer-intensiveand function call-intensive programs.Wang et al.  present a HW-SW co-operative prefetching technique. Their tech-nique uses compiler analysis to generate load hints, such as the spatial region andits size (number of lines), to prefetch the pointer in the load’s cache line to follow forprefetching and the pointer data structure to recursively prefetch. Specifying size al-lows prefetching a variable-size region based on loop bounds. Based on these hints andtriggered by L2 cache misses, prefetches are generated in HW at runtime. Unlike otherSW techniques, in their technique, individual prefetch addresses are generated by HWand not by SW, which allows timely prefetching of data. Also, use of compiler hintsallows reduced storage overhead and accurate prefetching even for complex access pat-terns, which is challenging for a HW-only approach. They show that while generatingsignificantly less prefetch traffic, their technique still provides similar performanceimprovement as a HW-only prefetching technique.Rabbah et al.  present a compiler directed prefetching technique that usesspeculative execution. The portion of program dependence graph relevant to the ad-dress computation of a long latency memory operation is termed as load dependencechain (LDC). The LDCs are identified by the compiler and precomputations are stat-ically embedded in the program instruction stream with prefetch instructions. Usingspeculative execution of LDCs, future memory addresses are precomputed, which areutilized for performing prefetching. For pointer-based applications, LDCs contain in-structions that may miss in D-caches, and, hence, generating prefetch addresses forsuch applications causes large instruction overhead. To address this, their techniqueprovisions that if a load in the LDC sees a miss, successive precomputation instructionsare bypassed.