architecture of a system. I discuss a number of frequently used
patterns in Section 6.3. Garlan and Shaws notion of an architectural
style (style and pattern have come to mean the same thing) covers
questions 4 to 6 in the previous list. You have to choos
Summary of Miss Penalty Reduction Techniques
The processor-memory performance gap of Figure 5.2 on page 375 determines
the miss penalty, and as the gap grows so do techniques that try to close it. We
present ve in this section. The rst technique follows t
shows absolute miss rates; the bottom graph plots percentage of all the misses by
type of miss as a function of cache size.
To show the benet of associativity, conict misses are divided into misses
caused by each decrease in associativity. Here are the fo
a direct-mapped cache of size N has about the same miss rate as a 2-way setassociative cache of size N/2. This held for cache sizes less than 128 KB.
Like many of these examples, improving one aspect of the average memory access time comes at the expense
use a simple buffer when a block is replaced. If the write buffer is empty, the data
and the full address are written in the buffer, and the write is nished from the
CPU's perspective; the CPU continues working while the write buffer prepares to
than or equal to 8 KB for up to 4-way associativity. Starting with 16 KB, the
greater hit time of larger associativity outweighs the time saved due to the
reduction in misses.
Cache size (KB)
ducing conict misses. McFarling  looked at using proling information to
determine likely conicts between groups of instructions. Reordering the instructions reduced misses by 50% for a 2-KB direct-mapped instruction cache with 4byte blocks, and by 7
With a write-through cache the most important improvement is a write buffer
(page 386) of the proper size. Write buffers, however, do complicate memory accesses in that they might hold the updated value of a location needed on a read
Now we can reduce the miss penalty by reducing the miss rate of the second-level caches.
Another consideration concerns whether data in the rst-level cache is in the
second-level cache. Multilevel inclusion is the natural policy for memory hierarchies: L1
proposed change is assessed using traceability information and
general knowledge of the system requirements. The cost of making
the change is estimated both in terms of modifications to the
requirements document and, if appropriate, to the system design a
should be checked to ensure that they can actually be implemented.
These checks should also take account of the budget and schedule for
the system development. 5. Verifiability To reduce the potential for
dispute between customer and contractor, system re
similar symbols, such as stick and block arrowheads, that have
different meanings. 5.1 Context models At an early stage in the
specification of a system, you should decide on the system boundaries.
This involves working with system stakeholders to decide
pages. 4.7.2 Requirements change management Requirements
change management (Figure 4.18) should be applied to all proposed
changes to a systems requirements after the requirements document
has been approved. Change management is essential because you
prototyping for requirements analysis 110 Chapter 4 Requirements
engineering Requirements reviews A requirements review is a process
where a group of people from the system customer and the system
developer read the requirements document in detail and che
Requirements management needs automated support and the
software tools for this should be chosen during the planning phase.
You need tool support for: 1. Requirements storage The requirements
should be maintained in a secure, managed data store that is
Critical word firstRequest the missed word first from memory and send it to
the CPU as soon as it arrives; let the CPU continue execution while filling the
rest of the words in the block. Critical-word-first fetch is also called wrapped
fetch and requ
Note that input/output device registers are often mapped into the physical address space, as is the case of the 21264. These I/O addresses cannot allow write
merging, as separate I/O registers may not act like an array of words in memory.
For example, the
snapshot of the accesses to the three arrays. A dark shade indicates a recent access, a light shade indicates an older access, and white means not yet accessed.
The number of capacity misses clearly depends on N and the size of the cache.
If it can hold a
FIGURE 5.22 The age of accesses to the arrays x, y, and z. Note in contrast to
Figure 5.21 the smaller number of elements accessed.
The increasing processor-memory g
64 KB 128 KB 256 KB
FIGURE 5.24 Access times for as size and associativity vary in a CMOS cache. These data are based on Spice runs
used to validat
tion is to copy the protection information from the TLB on a miss, add a eld to
hold it, and check it on every access to the virtually addressed cache.
Another reason is that every time a process is switched, the virtual addresses
refer to different physi
the cache are for this program. Figure 5.25 shows the improvement in miss rates
by using PIDs to avoid cache ushes.
A third reason why virtual caches are not more popular is that operating systems and user programs may use two different virtual addresses
Associativity can keep the index in the physical part of the address and yet still
support a large cache. Recall that size of index is controlled by this formula:
= -Block size Set associativity
For example, doubling associativity and d
Clearly, trace caches have much more complicated address mapping mechanisms, as the addresses are no longer aligned to power of 2 multiple of the word
size. However, they have other benets for utilization of the data portion of the
instruction cache. Very