Their technique reduces storage and energy overhead

This preview shows page 19 - 21 out of 35 pages.

tures, which provides higher prefetch lookahead. Their technique reduces storage and energy overhead of accurate HW instruction prefetching (incurred in Ferdman et al. [2011]), while still maintaining high prefetcher coverage and program performance. 4.5. Use of Branch Prediction or History Information We now discuss some techniques that use information from branch prediction to spec- ulate on program control flow for determining the data to be prefetched. Srinivasan et al. [2001] present an instruction prefetching scheme that correlates branch instruction execution with misses in I-cache, based on the fact that control-flow alterations due to branches cause I-cache misses. Based on this, branch instructions are used to trigger prefetching of instructions that appear in the execution after a fixed number (say, K , for example, K = 4, etc.) of branches. For instance, a candidate basic block ( BB 1 ) will be associated with a branch instruction ( R 1 ) if an I-cache miss to BB 1 happens exactly K branches after the execution of R 1 occurs. On future execution of R 1 , BB 1 will be prefetched. Thus, their technique avoids the need of a branch predictor to estimate the result of K + 1 branch operations. Also, since another basic block starts at the branch instruction target address and may last several cache lines, their technique also stores the length of prefetch candidate blocks along with their addresses to prefetch the entire blocks in a timely manner. Zilles and Sohi [2001] note that a few frequently executed static instructions (called problem instructions) cause a majority of branch mispredictions and cache misses. Their technique creates a code portion, called a speculative slice , that mimics the computation including the problem instruction and includes only those operations that are necessary to compute the outcome of problem instruction. By forking such slices well before the problem instruction, data prefetching can be done to avoid the penalty of misses. 4.6. Memory Side Prefetching Solihin et al. [2003] present a technique where CoR prefetching is performed by a user thread running in main memory, and the prefetched data are sent to the L2 cache. L2 cache misses are tracked and recorded in a CoR table. Afterwards, for each miss, the CoR table is looked up and a prefetch of several lines is triggered for the L2 cache. ACM Computing Surveys, Vol. 49, No. 2, Article 35, Publication date: August 2016.
35:20 S. Mittal The CoR table is stored in main memory and, thus, changes to L2 cache are minimal. They show that by combining their technique with a core-side sequential prefetcher, the performance improvement can be increased further. Also, the prefetch algorithm used by the thread can be adapted on a per-application basis. Yedlapalli et al. [2013] present a memory-side prefetcher (MSP) that fetches data on-chip from memory but, unlike in Solihin et al. [2003], does not push the data to the caches and, thus, avoids resource contention. They use a next-line prefetching scheme and prefetch when a row buffer hit occurs such that, first, the demand request is served and then the prefetch request is served. Successive requests to lines in that row then turn into prefetch hits. Data are maintained in a separate buffer at each memory

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture