1 on page 248 to which an 8 kbyte mixed instruction

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: correct way to design a cache for this sort of application, the designer can be guided by the following observations on the examples presented in these ARM chips. 320 ARM CPU Cores Cache speed High-associativity caches give the best hit rate, but require sequential CAM then RAM accesses which limits how fast the cycle time can become. Caches with a lower associativity can perform parallel tag and data accesses to give faster cycle times, and although a direct mapped cache has a significantly lower hit rate than a fully associative one, most of the associativity benefits accrue going from direct-mapped to 2- or 4-way associative; beyond 4-way the benefits of increased associativity are small. However, a fully associative CAM-RAM cache is much simpler than a 4-way associative RAM-RAM cache. CAM is somewhat power-hungry, requiring a parallel comparison with every entry on each cycle. Segmenting the cache by reducing the associativity a little and activating only a subsection of the CAM reduces the power cost significantly for a small increase in complexity. In a static RAM the main users of power are the analogue sense-amplifiers. A 4-way cache must activate four times as many sense-amplifiers in the tag store as a direct-mapped cache; if it exploits the speed advantage offered by parallel tag and data accesses, it will also uses four times as many sense-amplifiers in the data store. (RAM-RAM caches can, alternatively, perform serial tag and data accesses to save power, only activating a particular data RAM when a hit is detected in the corresponding tag store.) Waste power can be minimized by using self-timed power-down circuits to turn off the sense-amplifiers as soon as the data is valid, but the power used in the sense-amplifiers is still significant. Where the processor is accessing memory locations which fall within the same cache line it should be possible to bypass the tag look-up for all but the first access. The ARM generates a signal which indicates when the next memory access will be sequential to the current one, and this can be used, with the current address, to deduce that the ac...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online