ARM.SoC.Architecture

The results are shown in table 102 on page 280

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: extreme, a fully associative cache performs significantly better at the smaller size, delivering the 'perfect' performance on the benchmark program used for the tests. Here the replacement algorithm is random; LRU (least recently used) gives very similar results. The cache model was then modified to use a quad-word line which is necessary to reduce the area cost of the tag store. This change had minimal effect on the performance. The fully associative cache requires a large CAM (Content Addressable Memory) tag store which is likely to consume significant power, even with a quad-word line. The power can be reduced a lot by segmenting the CAM into smaller components, but this reduces the associativity. An analysis of the sensitivity of the system performance on the degree of associativity, using a 4 Kbyte cache, is shown in Figure 10.7. This shows the performance of the system for all associativities from fully (256-way) associative down to direct-mapped (1-way). Although the biggest performance increase is in going from direct-mapped to dual-set associative, there are noticeable improvements all the way up to 64-way associativity. It would therefore appear that a 64-way associative CAM-RAM cache provides the same performance as the fully associative cache while allowing the 256 CAM entries to be split into four sections to save power. The external memory bandwidth requirement of each level of associativity is also shown in Figure 10.7 (relative to an Figure 10.7 The effect of associativity on performance and bandwidth requirement. 282 Memory Hierarchy Figure 10.8 ARMS cache organization. uncached processor), and note how the highest performance corresponds to the lowest external bandwidth requirement. Since each external access costs a lot of energy compared with internal activity, the cache is simultaneously increasing performance and reducing system power requirements. The organization of the cache is therefore that shown in Figure 10.8. The bottom two bits of the virtual address select...
View Full Document

Ask a homework question - tutors are online