ARM.SoC.Architecture

Cache design an example 279 table 101 summary of cache

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: sumptions about the cache and external memory speeds (which were 20 MHz and 8 MHz respectively): caches which hold either just instructions, mixed instructions and data, or just data. The results are shown in Table 10.2 on page 280, normalized to the performance of a system with no cache. They show that instructions are the most important values to hold in the cache, but including data values as well can give a further 25% performance increase. 280 Memory Hierarchy Table 10.2 Cache form 'Perfect' cache performance. Performance 1 1.95 2.5 1.13 No cache Instruction-only cache Instruction and data cache Data-only cache Although a decision was taken early on that the cache write strategy would be write-through (principally on the grounds of simplicity), it is still possible for the cache to detect a write miss and load a line of data from the write address. This 'allocate on write miss' strategy was investigated briefly, but proved to offer a negligible benefit in exchange for a significant increase in complexity, so it was rapidly abandoned. The problem was reduced to rinding the best organization, consistent with chip area and power constraints, for a unified instruction and data cache with allocation on a read miss. Various different cache organizations and sizes were investigated, with the results show in Figure 10.6. The simplest cache organization is the direct-mapped cache, but even with a size of 16 Kbytes, the cache is significantly worse than the 'perfect' case. The next step up in complexity is the dual set associative cache; now the performance Figure 10.6 Unified cache performance as a function of size and organization. Cache design - an example 281 of the 16 Kbyte cache is within a per cent or so of the perfect cache. But at the time of the design of the ARMS (1989) a 16 Kbyte cache required a large chip area, and the 4 Kbyte cache does not perform so well. (The results depend strongly on the program used to generate the address traces used, but these are typical.) Going to the other...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online