7-Inter_Memory-2

7-Inter_Memory-2 - Associative Mapping A main memory block...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line's tag is examined for a match Cache searching gets expensive -must simultaneously examine every line's tag for a match Fully Associative Cache Organization Associative Mapping Address Structure Example Tag 22 bit 24 Word 2 bit bit address 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which byte is required from 32 bit data block Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w w /2 = 2s Number of lines in cache = undetermined Size of tag = s bits Associative Mapping Example Consider how an access to memory location (A035F014)16 is mapped to the cache for a 232 word memory. The memory is divided into 227 blocks of 25 = 32 words per block, and the cache consists of 214 slots: If the addressed word is in the cache, it will be found in word (14)16 of a slot that has tag (501AF80)16, which is made up of the 27 most significant bits of the address. If the addressed word is not in the cache, then the block corresponding to tag field (501AF80)16 is brought into an available slot in the cache from the main memory, and the memory reference is then satisfied from the cache. Associate Mapping pros & cons Advantage Flexible Disadvantages Cost Complex circuit for simultaneous comparison Set Associative Mapping Compromise m between the previous two Cache is divided into v sets of k lines each = v x k, where m: #lines i = j mod v, where i : cache set number j : memory block number A given block maps to any line in a given set K-way set associate cache 2-way and 4-way are common Set Associative Mapping Example m = 16 lines, v = 8 sets k = 2 lines/set, 2 way set associative mapping Assume 32 blocks in memory, i = j mod v set blocks 0 0, 8, 16, 24 1 1, 9, 17, 25 : : 7 7, 15, 23, 31 A given block can be in one of 2 lines in only one set e.g., block 17 can be assigned to either line 0 or line 1 in set 1 Set Associative Mapping Address Structure Tag (s-d) bit d Set d bit Word w bit bits: v = 2d, specify one of v sets s bits: specify one of 2s blocks Use set field to determine cache set to look in Compare tag field simultaneously to see if we have a hit K Way Set Associative Cache Organization Set Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s d) bits Remarks Why is the simultaneous comparison cheaper here, compared to associate mapping? Tag is much smaller Only k tags within a set are compared Relationship k between set associate and the first two: extreme cases of set associate = 1 v = m direct (1 line/set) k = m v = 1 associate (one big set) An Associative Mapping Scheme for a Cache Memory Consider how an access to memory location (A035F014)16 is mapped to the cache for a 232 word memory. The memory is divided into 227 blocks of 25 = 32 words per block, there are two blocks per set, and the cache consists of 214 slots: Set-Associative Mapping Example The leftmost 14 bits form the tag field, followed by 13 bits for the set field, followed by five bits for the word field: Replacement Algorithms (1) Direct mapping Replacement When algorithm a new block is brought into cache, one of existing blocks must be replaced Direct No Mapping choice Each block only maps to one line Replace that line Replacement Policies When there are no available slots in which to place a block, a replacement policy is implemented. The replacement policy governs the choice of which slot is freed up for the new block. recently used (LRU) (FIFO) frequently used (LFU) Least First-in/first-out Least Random Optimal (used for analysis only look backward in time and reverse-engineer the best possible strategy for a particular sequence of memory references.) Replacement Algorithms (2) Associative & Set Associative Hardware e.g. implemented algorithm (speed) Least Recently used (LRU) in 2 way set associative Which of the 2 block is LRU? First in first out (FIFO) block that has been in cache longest block which has had fewest hits replace Least frequently used replace Random Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly Write through All writes go to main memory as well as cache Both copies always agree Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Disadvantage Lots of traffic bottleneck Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set,i.e., only if at least one word in the cache line is updated Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes Write back Block size = line size As block size increases from very small hit ratio increases because of "the principle of locality" As block size becomes very large hit ratio decreases as Number Block Size of blocks decreases Probability of referencing all words in a block decreases 4 - 8 addressable units is reasonable Cache Read and Write Policies Hit Ratios and Effective Access Times Hit ratio and effective access time for single level cache: Hit ratios and effective access time for multi-level cache: Performance example (1) Assume Level Level Hit 2-level memory system 1: access time T1 2: access time T2 ratio, H: fraction of time a reference can be found in level 1 Average access time, Tave = prob(found in level1) x T(found in level1) + prob(not found in level1) x T(not found in level1) = H xT1 + (1- H ) x (T1+ T2 ) =T1 + (1 - H )T2 Performance example (2) Assume 2-level memory system Level 1: access time T1 = 1 s Level Hit 2: access time T2 = 10 s ratio, H = 95% Average access time, Tave =? Performance example (2) Assume 2-level memory system Level 1: access time T1 = 1 s Level Hit 2: access time T2 = 10 s ratio, H = 95% Average access time, Tave = H xT1 + (1- H )x(T1+ T2 )= .95 x 1 + (1 - .95) X (1 + 10) = .95 + .05 X 11 = 1.5 s Performance example (3) Higher hit ratio better performance Exercise A CPU has level 1 cache and level 2 cache, with access times of 5 nsec and 10 nsec respectively. The main memory access time is 50 nsec. If 20% of the accesses are level 1 cache hits and 60% are level 2 cache hits, what is the average access time? Exercise Assuming a cache of 4K blocks, a fourword block size, and 32-bit address, find the total number of sets and the total number of tag bits for caches that are direct mapped, two way and four way set associative, and fully associative. Solution Since there are 16 (=24) bytes per block, a 32-bit address gives 32-4=28 bits to be used for index and tag. 1) direct-mapped: same number of sets as blocks, so 12 bits of index (since 4K), thus total number of tag bits is (2812)x4K=64Kbits. Continued 2)2-way associative: there are 2K sets, and total number of of tag bits is (28-11)x2x2K=34x2K=68Kbits 4-way associative: there are 1K sets, total number of tag bits: (28-10)x4x1K=72Kbits Fully associative: Only one set with 4K blocks, and the tag is 28 bits, so total: 28x4Kx1=112K tag bits Exercise Consider a 3-way set-associative write through cache with an 8 byte blocksize, 128 sets, and random replacement. Assume a 32-bit address. How big is the cache (in bytes)? How many bits total are there in the cache (i.e. for data, tags, etc.). Solution The cache size is 8 * 3 * 128 = 3072 bytes. Each line has 8 bytes * 8 bits/byte = 64 bits of data. A tag is 22 bits (2 address bits ignored, 1 for position of word, 7 for index). Each line has a valid bit (no dirty bit 'cause it is write-through). Each line therefore has 64 + 22 + 1 = 87 bits. Each set has 3 * 87 = 261 bits. There are no extra bits for replacement. The total number of bits is 128 * 261 = 33408 bits. Exercise Assume there are three small caches, each consisting of four one-word blocks. One cache is direct mapped, a second is two-way set associative, and the third is a fully associative. Find the number of misses for each cache organization given the following sequence of block addresses: 0, 8, 0, 6, 8. Solution: direct-mapped Remember: i=j modulo m Address of mem block addressed 0 8 0 6 Block Address 0 6 8 Cache block (0 modulo 4)=0 (6 modulo 4)=2 (8 modulo 4)=0 block after reference 3 Hit Contents of or miss 0 miss miss miss miss Memory[0] Memory[8] Memory[0] Memory[0] Cache 1 2 Memory[6] 8 miss Memory[8] Memory[6] The direct-mapped cache generates five misses for the five accesses. Solution: 2-way set associative Remember: i=j modulo v Address of mem block addressed 0 8 0 6 Block Address 0 6 8 Hit Contents of or miss Set 0 miss miss hit miss Memory[0] Memory[0] Memory[0] Memory[0] Memory[8] Memory[8] Memory[6] Cache Cache set (0 modulo 2)=0 (6 modulo 2)=0 (8 modulo 2)=0 block after reference Set 1 Set 0 Set 1 8 miss Memory[8] Memory[6] The 2-way set associative cache has 4 misses, one less than the direct Solution: Fully associative Remember: One single set of 4 blocks Address of mem block addressed 0 8 0 6 8 Hit or miss Contents of Cache block after reference Block 3 Block 0 miss miss hit miss hit Memory[0] Memory[0] Memory[0] Memory[0] Memory[0] Block 1 Block 2 Memory[8] Memory[8] Memory[8] Memory[8] Memory[6] Memory[6] The fully associative cache has the best performance with only 3 misses. ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online