Ch7-MemorySystemDesign

Ch7-MemorySystemDesign - Computer Architecture Memory...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Computer Architecture Memory System Design Chapter Outline 7.1 Introduction: The Components of the Memory System 7.2 RAM Structure: The Logic Designer's Perspective 7.3 Memory Boards and Modules 7.4 TwoLevel Memory Hierarchy 7.5 The Cache 7.6 Virtual Memory 7.7 The Memory Subsystem in the Computer February 11, 2012 Veton Kpuska 2 Introduction So far, we've treated memory as an array of words limited in size only by the number of address bits. Real world issues arise: cost speed size power consumption volatility ... February 11, 2012 Veton Kpuska 3 Topics Covered Memory components: RAM memory cells and cell arrays Static RAM--more expensive, but less complex Tree and matrix decoders--needed for large RAM chips Dynamic RAM--less expensive, but needs "refreshing" Memory boards ROM--Readonly memory Arrays of chips give more addresses and/or wider words 2D and 3D chip arrays Large systems can benefit by partitioning memory for Chip organization Timing Memory modules separate access by system components fast access to multiple words February 11, 2012 Veton Kpuska 4 Topics Covered The memory hierarchy: from fast and expensive to slow and cheap Example: Registers Cache Main Memory Disk At first, consider just two adjacent levels in the hierarchy The cache: High speed and expensive Virtual memory--makes the hierarchy transparent Kinds: Direct mapped, associative, set associative Overall consideration of the memory as a subsystem Translate the address from CPU's logical address to the physical address where the information is actually stored Memory management--how to move information back and forth Multiprogramming--what to do while we wait The Translation Lookaside Buffer "TLB" helps in speeding the address translation process February 11, 2012 Veton Kpuska 5 The CPUMemory Interface Data bus CPU m MAR w MDR w R/W Register file REQUEST COMPLETE 2m 1 b D0 Db 1 m Address bus Main memory s A0 Am 1 Address 0 1 2 3 Sequence of events: Control signals Read: 1. CPU loads MAR, issues Read, and REQUEST 2. Main memory transmits words to MDR 3. Main memory asserts COMPLETE Write: 1. CPU loads MAR and MDR, asserts Write, and REQUEST 2. Value in MDR is written into address in MAR 3. Main memory asserts COMPLETE February 11, 2012 Veton Kpuska 6 The CPUMemory Interface (cont'd.) Data bus CPU m MAR w MDR w R/W Register file REQUEST COMPLETE 2m 1 b D0 Db1 m Address bus Main memory s A0 Am1 Address 0 1 2 3 Additional points: If b < w, main memory must make (w/b) bbit transfers Some CPUs allow reading and writing of word sizes < w Example: Intel 8088: m = 20, w = 16, s = b = 8 8 and 16bit values can be read and written If memory is sufficiently fast, or if its response is predictable, then COMPLETE may be omitted Some systems use separate R and W lines, and omit REQUEST Control signals February 11, 2012 Veton Kpuska 7 Some Memory Properties Symbol w m s b 2m 2mxs Definition 8088 Intel 16 bits 20 bits 8 bits 8 bits 220 wds 220x8 bits Intel 8086 PowerPC 601 CPU word size Bits in a logical memory address Bits in smallest addressable unit Data bus size Memory word capacity, ssized wds Memory bit capacity 16 bits 64 bits 20 bits 32 bits 8 bits 8 bits 16 bits 64 bits 220 wds 232 wds 220 8 bits 232x8 bits February 11, 2012 Veton Kpuska 8 BigEndian and LittleEndian Storage When data types having a word size larger than the smallest addressable unit are stored in memory the question arises, "Is the least significant part of the word stored at the lowest address (littleEndian, little end first) or-- is the most significant part of the word stored at the lowest address (bigEndian, big end first)"? Example: The hexadecimal 16bit number ABCDH, stored at address 0: msb ... lsb AB Little-Endian CD Big-Endian 1 0 AB CD Veton Kpuska 9 1 0 CD AB February 11, 2012 BinEndian and LittleEndian Storage of a PowerPC 32Bit Word CPU Word CPU Word bit locations BigEndian byte addresses LittleEndian byte addresses b31...b24 b23...b16 b15...b8 b7...b0 0 3 1 2 2 1 BigEndian Storage Memory Address 0 1 2 3 ... 3 0 LittleEndian Storage Memory Address 0 1 2 3 ... Contents b7 ... b0 b15 ... b8 b23 ... b16 b31 ... b24 Contents b31 ... b24 b23 ... b16 b15 ... b8 b7 ... b0 February 11, 2012 Veton Kpuska 10 Memory Performance Parameters Symbol ta tc k tl tbl=t1+k/ Definition Access time Units Time Meaning Time to access a memory word until the assertion of the memory completion signal Time from start of read to start of next memory operation Number of words per block Cycle time Block size Bandwidth Latency Time Words Words/s Word Transmission Rate ec Time Time to access first of a sequence of words Block access time Time Time to access entire block form start of read February 11, 2012 Veton Kpuska 11 The Memory Hierarchy, Cost, and Performance Some Typical Values: Component Access type Capacity, bytes Latency Block size Bandwidth CPU Cache Main Memory Disk Memory Tape Memory Random Random access access Random access Direct access 110 GB 10 ms 4 KB 1 MB/s Sequential access 1 TB 10 ms10 s 4 KB 1 MB/s 641024 8512 KB 864 MB 110 ns 20 ns 50 ns 1 word 16 words 16 words System 8 MB/s clock rate High $500 1 MB/s Cost/MB February 11, 2012 $30 $0.25 $0.02 Veton Kpuska 12 Memory Hierarchy Memory Hierarchy CPU word Cache 2ndary level Block Cache lines Virtual Memory Main Memory 3rdary level Block HD Fast & Small Block Size Latency Bandwidth Hit Rate February 11, 2012 Primary Level Secondary Level Ternary Level Slow & Large Increases Frequency of transaction Decreases Principle of Locality Veton Kpuska 13 Memory Hierarchy Considering Any Two Adjacent Levels of the Memory Hierarchy Some definitions: Temporal locality: the property of most programs that if a given memory location is referenced, it is likely to be referenced again, "soon." Spatial locality: if a given memory location is referenced, those locations near it are likely to be referenced "soon." Working set: The set of memory locations referenced over a fixed period of time, or within a time window. Notice that temporal and spatial locality both work to assure that the contents of the working set change only slowly over execution time. Defining the primary and secondary levels: Faster, smaller CPU Primary level Slower, larger Secondary level two adjacent levels in the hierarchy February 11, 2012 Veton Kpuska 14 Primary and Secondary Levels of the Memory Hierarchy Speed between levels defined by latency: time to access first word, and bandwidth: the number of words per second transmitted between levels. Primary level Secondary level Typical latencies: Cache latency: a few clocks Disk latency: 100,000 clocks The item of commerce between any two levels is the block. Blocks may/will differ in size at different levels in the hierarchy. Example: Cache block size ~ 1664 bytes. Disk block size: ~ 14 Kbytes. As working set changes, blocks are moved back/forth through the hierarchy to satisfy memory access requests. The primary level block size is often determined more by the properties of information transfer than by spatial locality principle. February 11, 2012 Veton Kpuska 15 Primary and Secondary Addresses A complication: Addresses will differ depending on the level. Primary address: the address of a value in the primary level. Secondary address: the address of a value in the secondary level. Primary and Secondary Address Examples Main memory address: unsigned integer Disk address: track number, sector number, offset of word in sector. February 11, 2012 Veton Kpuska 16 Addressing and Accessing a TwoLevel Hierarchy The computer system, HW or SW, must perform any address translation System that is address required: Memory management unit (MMU) Miss Address in secondary memory Secondary level Translation function (mapping tables, permissions, etc.) Block Hit Address in primary memory Primary level Word Two ways of forming the address: Segmentation and Paging. Paging is more common. Sometimes the two are used together, one "on top of" the other. More about address translation and paging later... February 11, 2012 Veton Kpuska 17 Primary Address Formation System address Block Word System address Block Word Lookup table Lookup table Base address Block Word Primary address + Word Primary address (a) Paging (b) Segmentation February 11, 2012 Veton Kpuska 18 Hits and Misses; Paging; Block Placement Hit: the word was found at the level from which it was requested. Miss: the word was not found at the level from which it was requested. A miss will result in a request for the block containing the word from the next higher level in the hierarchy. Hit ratio (or hit rate) = h = number of hits Miss ratio: 1 hit ratio total number of tp = primary memory access time. references ts = secondary memory access time Access time, ta = h tp + (1h) ts Page: commonly, a disk block. Demand paging: pages are moved from disk to main memory only when a word in the page is requested by the processor. Block placement and replacement decisions must be made each time a block is moved. Page fault: synonymous with a miss. February 11, 2012 Veton Kpuska 19 Hits and Misses; Paging; Block Placement The Cache Regime: Secondary latency only about 10 time as long as primary level access time. CPU wait afforded as long as miss ratio is small. A small fast memory (sometimes part of CPU), called cache, connected to a slower memory typically of different semiconductor technology. Cache's access time matches the processor speed. Two level cashes: Since CPU waits on cache miss, software cannot be used in miss response thus everything has to be done in hardware. Primarily cache (faster level) Secondary cache (slower level) This means that placement and replacement decisions and secondary address formation must be kept simple so they can be done quickly with limited hardware. Less flexibility of placement in cache then in primary level of a virtual memory system. February 11, 2012 Veton Kpuska 20 Hits and Misses; Paging; Block Placement The Cache Miss versus the Main Memory Miss The cache hit time is typically 12 clock cycles. Miss penalty only a few 10's of clock cycles to bring the desired block into the cache from main memory. Main memory miss penalty typically runs to the 10,000 100,000 of clock cycles to bring the block from the disk. Cache miss rates of no greater than 2% are tolerable Main memory miss rates no greater than 0.001% to avoid significant degradation in system performance. February 11, 2012 Veton Kpuska 21 Virtual Memory A virtual memory is a memory hierarchy, usually consisting of at least main memory and disk, in which the processor issues all memory references as effective addresses in a flat address space. All translations to primary and secondary addresses are handled transparently to the process making the address reference, thus providing the illusion of a flat address space. Recall that disk accesses may require 100,000 clock cycles to complete, due to the slow access time of the disk subsystem. Once the processor has, through mediation of the operating system, made the proper request to the disk subsystem, it is available for other tasks. Multiprogramming shares the processor among independent programs that are resident in main memory and thus available for execution. February 11, 2012 Veton Kpuska 22 Decisions in Designing a 2Level Hierarchy Translation procedure to translate from system address to primary address. Block size--block transfer efficiency and miss ratio will be affected. Processor dispatch on miss--processor wait or processor multiprogrammed. Primarylevel placement--direct, associative, or a combination. Discussed later. Replacement policy--which block is to be replaced upon a miss. Direct access to secondary level--in the cache regime, can the processor directly access main memory upon a cache miss? Write through--can the processor write directly to main memory upon a cache miss? Read through--can the processor read directly from main memory upon a cache miss as the cache is being updated? Read or write bypass--can certain infrequent read or write misses be satisfied by a direct access of main memory without any block movement? February 11, 2012 Veton Kpuska 23 The Cache Mapping Function CPU Word Example: 256 KB 16 words 32 MB Cache Block Main memory Address Mapping function The cache mapping function is responsible for all cache operations: Placement strategy: where to place an incoming block in the cache Replacement strategy: which block to replace upon a miss Read and write policy: how to handle reads and writes upon cache misses Mapping function must be implemented in hardware. Three different types of mapping functions: Associative Direct mapped Block-set associative February 11, 2012 Veton Kpuska 24 Memory Fields and Address Translation Example of processor-issued 32-bit virtual address: 31 32 bits 0 That same 32-bit address partitioned into two fields, a block field, and a word field. The word field represents the offset into the block specified in the block field: Block number 26 226 64 word blocks Example of a specific memory reference: Block 9, word 11. Word 6 00 February 11, 2012 001001 001011 Veton Kpuska 25 Associative Cache Associative mapped cache model: any block from main memory can be put anywhere in the cache. Assume a 16-bit main memory.* Tag memory 421 ? 119 Valid bits 1 0 1 0 1 2 Cache memory Cache block 0 ? Cache block 2 Main memory MM block 0 MM block 1 MM block 2 2 Tag field, 13 bits 1 255 Cache block 255 MM block 119 One cache line, 8 bytes MM block 421 MM block 8191 13 Tag 3 Byte One cache line, 8 bytes Associative Memory Search Valid, 1 bit Main memory address: *16 bits, while unrealistically small, simplifies the examples February 11, 2012 Veton Kpuska 26 Associative Cache In General: B Number of Bytes per Block TAddress size in bits of Main Memory, also Number of Tag bits in the Tag Memory LNumber of Blocks in Cache. SMM=Bx2T Size of Main Memory in Bytes SCM=LxB Size of Cache Memory in Bytes Main Memory address length in bits: log2SMM = T+log2B February 11, 2012 Veton Kpuska 27 Associative Cache Mechanism Because any block can reside anywhere in the cache, an associative (content addressable) memory is used. All locations are searched simultaneously. Associative tag memory 1 Argument register Match bit Valid bit Cache block 0 2 3 4 Match ? Cache block 2 Cache block 255 64 Main memory address Tag Byte 13 3 6 One cache line, 8 bytes 3 5 Selector To CPU 8 February 11, 2012 Veton Kpuska 28 Advantages and Disadvantages of the Associative Mapped Cache Advantage Disadvantages Most flexible of all--any MM block can go anywhere in the cache. Large tag memory. Need to search entire tag memory simultaneously means lots of hardware. Replacement Policy is an issue when the cache is full. Q.: How is an associative search conducted at the logic gate level? Direct-mapped caches simplify the hardware by allowing each MM block to go into only one place in the cache: February 11, 2012 Veton Kpuska 29 DirectMapped Cache Tag memory 30 9 1 Valid bits 1 1 1 0 1 2 Cache memory 0 1 2 Main memory block numbers 256 512 257 513 258 514 2305 Group #: 7680 7936 0 7681 7937 1 7682 7938 2 1 Tag field, 5 bits 1 255 One cache line, 8 bytes 255 511 767 Tag #: 0 1 2 9 8191 255 30 31 Key Idea: all the MM blocks from a given group can go into only one location in the cache, corresponding to the group number. Cache address: 8 3 One cache line, 8 bytes Main memory address: 5 8 3 Tag Group Byte Now the cache needs only examine the single group that its reference specifies. February 11, 2012 Veton Kpuska 30 DirectMapped Cache In General: B Number of Bytes per Block T Number of Tag bits in the Tag Memory also Number of Columns in Main Memory GNumber of Groups L LNumber of Blocks in Cache. SMM=GxBx2T Size of Main Memory in Bytes SCM=LxB Size of Cache Memory in Bytes Main Memory address length in bits: log2SMM = T+log2G+log2B February 11, 2012 Veton Kpuska 31 DirectMapped Cache Operation 1. Decode the group number of the incoming MM address to select the group 2. If Match AND Valid 3. Then gate out the tag field 4. Compare cache tag with incoming tag 5. If a hit, then gate out the cache line Main memory address Tag Group Byte 5 1 8 8 256 decoder 256 3 Tag memory 30 9 1 Valid bits 1 1 1 2 Hit 0 5 Cache memory 5 3 1 2 1 255 5 5 3 Cache hit 6 1 Tag field, 5 bits 1 64 Selector 8 5-bit comparator Cache miss = 4 6. and use the word field to select the desired word. February 11, 2012 Veton Kpuska 32 DirectMapped Caches The direct mapped cache uses less hardware, but is much more restrictive in block placement. If two blocks from the same group are frequently referenced, then the cache will "thrash." That is, repeatedly bring the two competing blocks into and out of the cache. This will cause a performance degradation. Block replacement strategy is trivial. Compromise--allow several cache blocks in each group --the BlockSetAssociative Cache February 11, 2012 Veton Kpuska 33 2Way SetAssociative Cache Example shows 256 groups, a set of two per group. Sometimes referred to as a 2-way set-associative cache. Tag memory 2 30 2 9 1 0 1 2 Cache memory 512 513 7680 2304 258 0 1 2 Main memory block numbers 256 512 257 513 258 514 2304 Group #: 7680 7936 0 7681 7937 1 7682 7938 2 0 1 255 255 511 Tag #: One cache line, 8 bytes 255 0 511 767 1 2 9 30 8191 255 31 Tag field, 5 bits One cache line, 8 bytes Cache group address: 8 3 Main memory address: 5 Tag 8 Set 3 Byte February 11, 2012 Veton Kpuska 34 KWay SetAssociative Cache Tag Memory Valid Bits K K Cache Memory 2T L L L S T Main Memory Address 1 T log2S log2B B Primary Address February 11, 2012 Veton Kpuska 35 KWay SetAssociative Cache K=1way BlockSet Associative Cache Direct Mapped Cache S=1 BlockSet Associative Cache Associative Cache. February 11, 2012 Veton Kpuska 36 Read/Write Policies Cache Write Policies upon a Cache Hit: Writethrough: Writeback: Updates both cache & Main Memory upon each write to a block that is present in the cache. Postpones main memory update until block is replaced uses "dirty bit" (this bit is set upon first write in a block of the cache). Read and Write Miss Policies Miss upon a read: Miss upon a write: Readthrough: Immediate forwarding (from MM) to CPU then cache is updated Load the block into the cache first. Write allocate: block is first brought into the cache and then (MM) updated. Write noallocate: write directly to main memory (without brining block to cache). Veton Kpuska 37 February 11, 2012 Block Replacement Policy Not needed with directmapped cache Least Recently Used (LRU) Track usage with a counter. Each time a block is accessed: Random replacement--replace block at random Even random replacement is a fairly effective strategy FIFO Longest residing block replaced first. Smaller cache memories: Random worst, LRU best Large cache memories: Random, LRU comparable Veton Kpuska 38 When set is full, remove line with highest count Clear counter of accessed block Increment counters with values less than the one just cleared. All others remain unchanged February 11, 2012 Performance and Cache Design DirectMapped and BlockSet Associative Caches may suffer from minor time penalties due to the need to decode the group (set) field. For equivalent amount of hardware (e.g., # number of gates), directmapped and setassociative caches will have more capacity. Quantitative analysis of cache performance is quite complex and is the subject of ongoing research. Simplified mathematical analysis, however, will provide sufficient insight into the issues involved: February 11, 2012 Veton Kpuska 39 Performance and Cache Design Definitions: h hit ratio and (1h) miss ratio. TC cache access time upon a cache hit. TM access time upon a cache miss. Ta average memory access during a particular machine execution interval: Recall Access time: Ta = h Tp + (1 h) Ts For Tp =Tc for cache and Ts=TM for MM, Ta = hTC + (1h)TM for primary (p) and secondary (s) levels. This equation is valid for readthrough cache where the value is forwarded from main memory to the CPU directly as it is received without waiting for the entire cache line to be loaded before initiating another cache read to forward the value. If this is not the case TM is replaced by the cache line fill time. February 11, 2012 Veton Kpuska 40 Performance and Cache Design Speedup S: Twithout S= Twith where Having a model for cache and MM access times and cache line fill time, the speedup can be calculated once the hit ratio is known. Twithout is the time taken without the improvement, e.g., without the cache, and Twith is the time the process takes with the improvement, e.g., with the cache. February 11, 2012 Veton Kpuska 41 Performance and Cache Design Example: Changing a certain cache from direct mapped to twoway setassociative caused the hit ratio measured for a certain set of benchmarks to improve form 0.91 to 0.96 with T C=20 ns, and TM=70ns. What is the speedup, assuming no change in Tc from the setassociative multiplexer. Answer: Twithout = 0.91x20 + (10.91)x70 = 24.5 ns; Thus the speedup is: Twith = 0.96x20 + (10.96)x70 = 22.5 ns; S=24.5/22.0=1.11 11% improvement in execution time. February 11, 2012 Veton Kpuska 42 Virtual Memory The memory management unit, MMU, is responsible for mapping logical addresses issued by the CPU to physical addresses that are presented to the cache and main memory. CPU Chip CPU MMU Logical Address Mapping Tables Virtual Address Physical Address Cache Main Memory Disk Effective address -- an address computed by the processor while executing a program. Synonymous with logical address. The term effective address is often used when referring to activity inside the CPU. Logical address is most often used when referring to addresses when viewed from outside the CPU. Virtual address -- the address generated from the logical address by the memory management unit, MMU, prior to making address translation. Physical address -- the address presented to the memory unit. (Note: Every address reference must be translated.) February 11, 2012 Veton Kpuska 43 A word about addresses: Virtual Addresses--Why? The logical address provided by the CPU is translated to a virtual address by the MMU. Often the virtual address space is larger than the logical address, allowing program units to be mapped to a much larger virtual address space. February 11, 2012 Veton Kpuska 44 Virtual Addressing--Advantages Simplified addressing. Each program unit can be compiled into its own memory space, beginning at address 0 and potentially extending far beyond the amount of physical memory present in the system. No address relocation required at load time. No need to fragment the program to accommodate memory limitations. Cost effective use of physical memory. Less expensive secondary (disk) storage can replace primary storage. (The MMU will bring portions of the program into physical memory as required, e.g., not all portions of the program does need to occupy physical memory at one time) Access control. As each memory reference is translated, it can be simultaneously checked for read, write, and execute privileges. This allows access/security control at the most fundamental levels. Can be used to prevent buggy programs and intruders from causing damage to other users or the system( ). This is the origin of those "bus error" and "segmentation fault" messages. February 11, 2012 Veton Kpuska 45 Memory Management System address Block Word System address Block Word Lookup table Lookup table Base address Block Word Primary address + Word Primary address (a) Paging (b) Segmentation February 11, 2012 Veton Kpuska 46 Memory Management by Segmentation Notice that each segment's virtual address starts at 0, different from its physical address. Repeated movement of segments into and out of physical memory will result in gaps between Virtual segments. This is called external fragmentation. memory addresses Compaction (defragmenter) routines must be occasionally run to remove these fragments. Main memory FFF 0 0 Segment 5 Gap Segment 1 Segment 6 Physical memory addresses 0 Gap 0 0 Segment 9 Segment 3 0000 February 11, 2012 Veton Kpuska 47 Segmentation Mechanism The computation of physical address from virtual address requires an integer addition for each memory reference, Virtual memory and a comparison if address segment limits are from CPU checked. Bounds Q: How does the MMU error switch references from one segment to another? Main memory Segment 5 Gap Offset in segment + Segment base register Segment 1 Segment 6 Gap No Segment 9 Segment 3 Segment limit register February 11, 2012 Veton Kpuska 48 Memory Management by Paging Secondary memory Demand Paging: Pages are brought into physical memory only as they are needed. Virtual memory Physical memory Page n 1 Program unit 0 This figure shows the mapping between virtual memory pages, physical memory pages, and pages in secondary memory. Page n 1 is not present in physical memory, but only in secondary memory. The MMU manages this mapping (logical to physical address). Veton Kpuska 49 February 11, 2012 .. Page 2 Page 1 Page 0 Virtual Address Translation in a Paged MMU 1 table per user per program unit One translation per memory access Potentially large page table Bounds error Main memory Desired word Virtual address from CPU Page number Offset in page Physical address Physical page Word Offset in page table + Page table base register Page table Hit. Page in primary memory. Miss (page fault). Page in secondary memory. No Page table limit register Accesscontrol bits: presence bit, dirty bit, usage bits Physical page number or pointer to secondary storage Translate to Disk address. A page fault will result in 100,000 or more cycles passing before the page has been brought from secondary storage to MM. February 11, 2012 Veton Kpuska 50 Page Placement and Replacement Page tables are direct mapped, since the physical page is computed directly from the virtual page number. But physical pages can reside anywhere in physical memory. Page tables such as those on the previous slide result in large page tables, since there must be a page table entry for every page in the program unit. Some implementations resort to hash tables instead, which need have entries only for those pages actually present in physical memory. Replacement strategies are generally LRU, or at least employ a "use bit" to guide replacement. February 11, 2012 Veton Kpuska 51 Fast Address Translation: Regaining Lost Ground The concept of virtual memory is very attractive, but leads to considerable overhead: There must be a translation for every memory reference. There must be two memory references for every program reference: One to retrieve the page table entry, One to retrieve the value. Most caches are addressed by physical address, so there must be a virtual to physical translation before the cache can be accessed. February 11, 2012 Veton Kpuska 52 Fast Address Translation: Regaining Lost Ground The answer: a small cache in the processor that retains the last few virtual to physical translations: a Translation Lookaside Buffer, TLB. The TLB contains not only the virtual to physical translations, but also the valid, dirty, and protection bits, so a TLB hit allows the processor to access physical memory directly. The TLB is usually implemented as a fully associative cache: February 11, 2012 Veton Kpuska 53 Translation Lookaside Buffer Structure and Operation Virtual address from CPU Page number Associative lookup of virtual page number in TLB Hit N TLB miss. Look for physical page in page table. To page table Y Word Main memory or cache Desired word Physical address Physical page TLB TLB hit. Page is in primary memory. Word Virtual page number Accesscontrol bits: presence bit, dirty bit, valid bit, usage bits Physical page number February 11, 2012 Veton Kpuska 54 Operation of the Memory Hierarchy CPU Cache Main memory Secondary memory Virtual address Page fault. Get page from secondary memory Update MM, cache, and page table Search TLB Search cache Search page table Y TLB hit Miss Cache hit Miss Update cache from MM Y Y Page table hit Miss Generate physical address Return value from cache Generate physical address Update TLB February 11, 2012 Veton Kpuska 55 I/O Connection to a Memory with a Cache The memory system is quite complex, and affords many possible tradeoffs. The only realistic way to chose among these alternatives is to study a typical workload, using either simulations or prototype systems. Instruction and data accesses usually have different patterns. It is possible to employ a cache at the disk level, using the disk hardware. Traffic between MM and disk is I/O, and direct memory access, DMA, can be used to speed the transfers: Main memory CPU Cache Paging DMA Disk I/O DMA I/O February 11, 2012 Veton Kpuska 56 Summary Most memory systems are multileveled--cache, main memory, and disk. Static and dynamic RAM are fastest components, and their speed has the strongest effect on system performance. Chips are organized into boards and modules. Larger, slower memory is attached to faster memory in a hierarchical structure. The cache to main memory interface requires hardware address translation. Virtual memory--the main memorydisk interface--can employ software for address translation because of the slower speeds involved. The hierarchy must be carefully designed to ensure optimum priceperformance. February 11, 2012 Veton Kpuska 57 ...
View Full Document

This note was uploaded on 02/10/2012 for the course ECE 4551 taught by Professor Johnhadjilogiou during the Fall '09 term at FIT.

Ask a homework question - tutors are online