Documents Found!
As seen in
Less Work, Better Grades
Join
Course Hero
Access
best resources
Ace
your classes
Ace your courses with Course Hero!

Submit your homework question or assignment here:
352 Tutors are online
 
We are so confident that you will love our service, we will answer your first homework question for FREE!
*  Attach Assignment (optional):
 
Study Smarter, Score Higher
 
Document Content (unformatted)
Course Hero has millions of student submitted documents similar to the one below including study guides, homework solutions, papers, exam answer keys and textbook solutions.
Kalyan Memory Basu basu@cse.uta,edu CSE 5350 Kalyan Basu Quote on Memory "Ideally one would desire an indefinitely large memory capacity such that any particular...word would be immediately available... We are...forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible." A.W. Burke, H.H. Goldstine, and J. von Neumann, 1946 CSE 5350 Kalyan Basu Memory Memory is an active area of computer architecture. Due to the expanding gap between advances in processor speed and memory speed. The idea is to build a memory subsystem that consists of: Very small, very fast, very expensive memory "close" to the processor. Larger, slower, but more affordable memory "further away" from the processor. Hence, provide the appearance (from the CPU's perspective) of virtually unlimited memory while minimizing delays to the processor. CSE 5350 Kalyan Basu Memory Subsystem Micro-program Address mapping Control Unit CAM TLB O(1k)XO(10) Address mapping Cache Control Memory O(1-10K)XO(10-100) Processor SRAM O(1k-100K)XO(10) Main Memory TCAM O(100)XO(10) Register File O(100)XO(10) ROM DRAM 1 . . . DRAM n O(1M-10M)XO(10) Secondary Storage Table look ups for some arithmetic functions, code translations, etc. CSE 5350 Kalyan Basu Magnetic disk . . . O(10-1000M)XO(10) Memory Hierarchy CPU Registers Cache Main Memory Back-up Memory Level Name Typical size Technology Access Time(ns) BW(MB/sec) Managed by Backed by 1 Registers < 1KB custom memory CMOS 0.25-0.5 20K- 100K compiler cache 2 cache < 16MB on-chip,off-chip CMOS,SRAM 0.5 25 5K-10K Hardware Main memory CSE 5350 Kalyan Basu 3 Main Memory < 16GB CMOS,DRAM 1000-5000 1K-5K operating system Back-up memory 4 Back-up Memory > 100GB magnetic disk 5,000,000 0.02K-0.15K OS/operator Tape or CD Typical Memory Unit to a n d fro m P ro c e s s o r M e m o ry D a ta R e g is te r S EN SO R S S T O R A G E M E D IU M D R IV E R S A D D R E S S M E C H A N IS M M e m o ry A d d re s s R e g is te r fro m P ro c e s s o r CSE 5350 Kalyan Basu WRITE READ Address Decoding DECODER Linear, tree, balanced decoders covered in CSE 3322 00 S A R 10 A 'B ' 01 A 'B D E C O D E R O U T P U T S AB' S B R 11 AB CSE 5350 Kalyan Basu Address Decoding For example: n = 16 216 = 65,536 Linear number of transistors = 16 x 216 = 1,048,576 delay = 16 t Tree number of transistors = 4(216 - 2) = 262,136 delay = 2(16 - 1) t = 30 t Balanced number of transistors = 2(4(28-2)) + (65,536) x 2 = 133,104 delay = 2(8-1) + 2 = 16 t CSE 5350 Kalyan Basu Cache Mechanism CPU BW for data transfer Memory address Cache Block BW for data transfer Block Frame Address Identifies the block in The level of hierarchy Block offset address Identifies a item within a block Memory CSE 5350 Kalyan Basu Cache Mechanism Hit Rate: The probability to get the data item in cache : p block memory cache Hit Time: Access time to get the first ward of the data item + transfer time of all words of the data item CSE 5350 Kalyan Basu Cache Mechanism CPU Execution Time = (CPI + Memory Stall Cycle)*Clock Cycle Time Memory Stall Cycle = Number of Misses*Miss Penalty = IC*(Number of misses/IC)*Miss Penalty = IC*(Memory Access/Instruction)*Miss Rate*Miss PenaltyHit Time: Example: CPI = 1 Memory Access Probability for Load and store = 50% Miss Penalty = 25 cycles Miss Rate = 2% Memory stall cycle = IC*(1 + 0.5)*0.02*25 = 0.75*IC CPU Execution Time = (IC*1 + IC*0.75)*Clock Cycle time = 1.75*IC*Clock Cycle Time CSE 5350 Kalyan Basu Q1:Where can a block be placed in a cache 1. If a block can appear in only one place in a cache, the cache is said direct mapped. The mapping is: Mod[block-frame address,Mblock] where Mblock: Number of blocks in cache. 1. If the block can be placed anywhere in the cache, the cache is said fully associative. 1. If a block can be placed in restricted set of places in a cache, the cache is said set associative. A set is a group of two or more blocks in a cache. A block is first mapped into a set, a block can be any place within a set (fully associative). The set is selected by bit selection like Mod[block-frameaddress,Kset] where Kset is the number of sets in the cache. CSE 5350 Kalyan Basu Cache Mapping Fully Associative 12 can be placed at Any place Direct Mapped 12 can be placed Mod[12,8] = 4 Set Associative 12 can be placed Mod[12,4] = Set-0 Cache Cache Cache 0 1 2 3 4 7 0 1 2 3 4 7 Set:0 Set:1 Set:2 Set:3 Main Memory 0 1 2 3 4 12 Block-frame-number CSE 5350 Kalyan Basu Q2:How a block found in a cache 1.Cache has a address Tag on each block frame that gives the block address. 2. The Tag of every block that might contain the desired information is compared to see if there is a match. 3. All possible Tags are checked in parallel. 4. Valid information in the cache is indicated by bit called Valid bit. 5. Offset is not used in comparison, since entire block is either present or not. 6. Index is not used for comparison. It is used to select the set. CPU Bus: n Tag bits Index Offset nTag: No of bits in Tag field nIndex: No of bits in the index to identify a block within a set noffset nTag nIndex noffset: No of bits in offset to identify a ward in a block n CSE 5350 Kalyan Basu Example Cache size : 64K Bytes (65,538 Bytes) Block Size : 64 Bytes Cache is 2 Way set associative Byte addressing n = 44 bits Number of Blocks/set = 65,538/(2x64) = 512 nindex = 29 = 9 bits noffset = 64 = 26 = 6 bits ntag = 44 9 6 = 29 bits CSE 5350 Kalyan Basu Associative Memory A memory is said to provide associative access or content address ability if words can be fetched on the basis of their contents, rather than on the basis of their addresses or locations. Associative memories are very useful in applications in which "search of values" is used often, for example: Find the addresses of all "Smiths" in Dallas. Address translation, where you search for an address. CSE 5350 Kalyan Basu Associative Memory C O M P A R A N D R E G IS T E R M A S K R E G IS T E R M EM O RY CELL ARRAY RESPO NSE R E G IS T E R M U L T IP L E M ATCH RESO LVER IN P U T /O U T P U T B U F F E R CSE 5350 Kalyan Basu Associative Memory Value to search for C O M P A R A N D R E G IS T E R M A S K R E G IS T E R M EM O RY CELL ARRAY RESPO NSE R E G IS T E R M U L T IP L E M ATCH RESO LVER IN P U T /O U T P U T B U F F E R CSE 5350 Kalyan Basu Associative Memory Value to search for C O M P A R A N D R E G IS T E R M A S K R E G IS T E R Only interested in this field Mask off portions of the comparand RESPO NSE R E G IS T E R M U L T IP L E M ATCH RESO LVER M EM O RY CELL ARRAY IN P U T /O U T P U T B U F F E R CSE 5350 Kalyan Basu Associative Memory C O M P A R A N D R E G IS T E R M A S K R E G IS T E R Results of the search (0, 1, or more matches) M EM O RY CELL ARRAY RESPO NSE R E G IS T E R M U L T IP L E M ATCH RESO LVER IN P U T /O U T P U T B U F F E R CSE 5350 Kalyan Basu Associative Memory C O M P A R A N D R E G IS T E R M A S K R E G IS T E R Results of the search (0, 1, or more matches) Allows reading of the matched words one at a time M EM O RY CELL ARRAY RESPO NSE R E G IS T E R M U L T IP L E M ATCH RESO LVER IN P U T /O U T P U T B U F F E R CSE 5350 Kalyan Basu Associative Memory The memory array provides the storage for data. The comparand register contains the data (value) to be compared against the contents of the memory array. The mask register is used to mask off portions of the comparand register (data value). The response register indicates the success or failure of a search or comparison operation. If bit i of response register is set to 1 after a search, that means the word at location i matched the comparand value. The response register can also be used as a word select register, identifying the words that would participate in the next search. CSE 5350 Kalyan Basu Associative Memory The multiple match resolver is used to narrow down a memory read (access) to a specific location in the array in case more than one word satisfies the search criteria. Typical Associative Memory Operations: Search on equality (inequality) with respect to mask & word select. Multi-write with respect to mask and word select. Read out with respect to the first match (MMR). Setting and resetting word select and response register. Setting the mask & comparand registers. CSE 5350 Kalyan Basu Alpha 21264 Data Cache 1 Tag Index offset Data Address Data in CPU Data out Data 2 Valid Tag 3 2 =? Data Valid Tag 3 =? mux 4 CSE 5350 Kalyan Basu From Main Memory To Main Memory Read Time Sequence:Hit case Address Buffer Tag Fields Comparators Cache blocks MUX CPU 1 1 2 2 3 3 4 5 4 Read time = 1 + Max[( 2 + 3 ) , 4 ] + 5 CSE 5350 Kalyan Basu Write Time Sequence: Hit case Address Buffer Tag Fields Comparators Cache blocks CPU 1 1 2 2 3 3 4 4 ite time = 1 + 2 + 3 + 4 CSE 5350 Kalyan Basu Q3: Which Block should be replaced in a cache When miss occurs, cache control block must select a memory block to be replaced with the desired data from the main memory. 1. Direct mapped: No choice, only the effected block is replaced. 1. Fully associative and Set associative have many choices to replace the block. 1. Random 2. Least Recently used (LRU) 3. First-in-first-out (FIFO) CSE 5350 Kalyan Basu Data Cache miss per 1000 instructions Associatively Cache Size 16KB 64KB 256KB LRU 114.1 103.4 92.2 2-way Random 117.3 104.3 92.1 FIFO 115.5 103.9 92.5 LRU 111.7 102.4 92.1 4-way Random 115.1 102.3 92.1 FIFO 113.3 103.1 92.5 LRU 109.0 99.7 92.1 8-way Random 111.8 100.5 92.1 FIFO 110.4 100.3 92.5 Alpha Architecture 64 Byte block size 10 SPEC2000 bench marks 5 SPECint2000 {gap,gcc,gzip,mcf,perl} 5 SPECfp2000 {applu,art,equake,lucas,swim} CSE 5350 Kalyan Basu An Example Assume a direct mapped cache with 4-word blocks and a total size of 16 words. Consider the following string of address references given as word addresses: 1, 4, 8, 5, 20, 17, 19, 56, 9, 11, 4, 43, 5, 6, 9, 17 Show the hits and misses and final cache contents. CSE 5350 Kalyan Basu Block # 0 Cache 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Block # 0 1 1 2 2 3 3 4 CSE 5350 Kalyan Basu Address 1: Miss, bring block 0 to cache Main memory block no in cache 0 Block # 0 Cache 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Block # 0 1 1 2 2 3 3 4 CSE 5350 Kalyan Basu Address 4: Miss, bring block 1 to cache Main memory block no in Block # cache 0 Cache 0 1 2 3 4 5 6 7 0 1 2 3 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 1 1 1 2 2 3 3 4 Address 8: Miss, bring block 2 to cache Main memory block no in cache 0 Block # 0 Cache 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 1 1 1 2 2 2 3 3 4 Address 5: Hit, access block 2 of cache Main memory block no in Block # cache 0 Cache 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 1 1 1 2 2 2 3 3 4 Address 20: Miss, bring block 5 to cache block 1 Main memory block no in Block # cache 0 Cache 0 1 2 3 20 21 22 23 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 5 1 1 2 2 2 3 3 4 Address 17: Miss, bring block 4 to cache block 0 Main memory block no in Block # cache 4 Cache 16 17 18 19 20 21 22 23 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 5 1 1 2 2 2 3 3 4 Address 19: Hit, access block 0 of cache Main memory block no in Block # cache 4 Cache 16 17 18 19 20 21 22 23 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 5 1 1 2 2 2 3 3 4 Address 56: Miss, bring block 14 to cache block 2 Main memory block no in Block # cache 4 Cache 16 17 18 19 20 21 22 23 56 57 58 59 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 5 1 1 14 2 2 3 3 4 Address 9: Miss, bring block 2 to cache block 2 Main memory block no in Block # cache 4 Cache 16 17 18 19 20 21 22 23 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 5 1 1 2 2 2 3 3 4 Address 11: Hit, access block 2 of cache Main memory block no in Block # cache 4 Cache 16 17 18 19 20 21 22 23 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 5 1 1 2 2 2 3 3 4 Address 4: Miss, bring block 1 to cache block 1 Main memory block no in Block # cache 4 Cache 16 17 18 19 4 5 6 7 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 1 1 1 2 2 2 3 3 4 Address 43: Miss, bring block 10 to cache block 2 Main memory block no in Block # cache 4 Cache 16 17 18 19 4 5 6 7 40 41 42 43 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 1 1 1 10 2 2 3 3 4 Address 5: Hit, access block 1 of cache Main memory block no in Block # cache 4 Cache 16 17 18 19 4 5 6 7 40 41 42 43 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 1 1 1 10 2 2 3 3 4 Address 9: Miss, bring block 2 to cache block 2 Main memory block no in Block # cache 4 Cache 16 17 18 19 4 5 6 7 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 1 1 1 2 2 2 3 3 4 Address 17: Hit, access block 0 of cache Main memory block no in Block # cache 4 Cache 16 17 18 19 4 5 6 7 8 9 10 11 0 1 2 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 CSE 5350 Kalyan Basu Block # 0 0 1 1 1 2 2 2 3 3 4 Summary Number of Hits = 6 Number of Miss = 10 Hit Ratio: 6/16 = 37.5% Unacceptable Typical Hit ratio: > 90% Address 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 CSE 5350 Kalyan Basu Miss/Hit Miss Miss Miss Hit Miss Miss Hit Miss Miss Hit Miss Miss Hit Hit Miss Hit Q4: What happens on a write Reads dominate processor cache access All instructions fetch are read Most instruction do not write to memory. Typical Mix: 10% stores 37% Load Overall write Cache access = 10/(100+10+37) = 7% Write data cache access = 10/(10+37) = 21% CSE 5350 Kalyan Basu Write Strategies Write through: The information is written to both the block in the cache and to the block in the lower level memory. Write back: The information is written only on the block of the cache. The information is written to the lower level memory only during replacement. Dirty bit: To reduce the frequency of writing back on the main memory, a dirty bit is used as status of the block. This indicates whether the block has been changed during its stay at the cache. If the dirty bit is not set, the block is not written back to main memory, in case of miss because the information in the main memory has not changed. CSE 5350 Kalyan Basu Comparison Item Speed Write through Write stall as CPU waits for the write on the lower layer memory. Write Buffer can reduce this stall's effect. Easier to implement Maintained More memory BW No power savings, all write is performed on the main memory. Possible to operate on multi processor architecture. CSE 5350 Kalyan Basu write back write occurs at the speed of the Cache memory. Complexity Data Coherency Memory BW Power Multiprocessors Implementation more complex Not maintained Less memory BW Potential of power savings as Many writes on main memory are not performed. Data Coherence need to be Established to use it for multi processor architecture. Write Miss Policy Write Allocate : The block is allocated on a write miss, followed by the write hit actions above.In this natural option, write misses act like a read misses. No write allocate: The write miss do not affect the cache, instead the block is modified only at the lower level of the memory. So block remains out of cache. Only when program tries to read the block, read miss will occur, and the block will be placed on the cache. Write Policy/Write miss policy Write Allocate No-Write Allocate Normally used Write Through Write Back Normally used CSE 5350 Kalyan Basu Cache Performance Average Memory access time = Hit time + Miss Rate x Miss Penalty Out-of-order execution Processors Memory stall cycles/instruction = Misses/instruction] X [Total miss latency overlapped miss latency] Length of memory latency :The start and end of memory operation in an out-of-order processor. Length of Latency overlap: When memory operation is stalling the processor. CSE 5350 Kalyan Basu Cache Performance Cache Index size: 2index = Cache size/(Block size x Set associativity) Two Level Cache(L1 and L2) Average Memory Access time = Hit TimeL1 + Miss RateL1(Hit TimeL2 + Miss RateL2*Miss PenaltyL2) Memory Stall cycles per instruction = (MissesL1/Instruction )X Hit TimeL2 + (MissesL2/Instruction)XMissPenaltyL2 CPU Execution Time = IC x (CPIexecution + Memory miss stall clock cycle/ instruction)X Clock Cycle Time CSE 5350 Kalyan Basu Cache Performance Improvement 1. Reduce Miss Penalty 1. Reduce Miss Rate 1. Use Parallelism 1. Reduce time to hit the cache CSE 5350 Kalyan Basu Reducing Cache Miss Penalty 1. Multi Level Cache The speed difference between main memory and CPU is very high, in the order of 1,000 +. One answer to add another level of cache between CPU and Main memory. The first level cache can be small enough and meet the speed of CPU. The second level cache can be large. Local Miss Rate: Number of misses in a cache divided by the total number of CSE 5350 Kalyan Basu Reducing Cache Miss Penalty 1. 1. Multi Level Cache Critical word first and Early restart CPU need one word of the block at a time and send it to CPU as soon as it arrives. Let CPU stat execution, when rest of the block is getting filled by the words. Critical word First: Request the missed ward first from the memory and send it to CPU as soon as it arrives. It is also called wrapped fetch and requested ward first. Early Restart: Fetch the words in normal order, but as soon as the requested ward is received, send it to CPU. This works well with large block size. Due to spatial locality, there is more than random chance that the next miss is to the remainder of the block. In that case the effect of the miss is the time from the miss until the second ward arrives. CSE 5350 Kalyan Basu Reducing Cache Miss Penalty 1. 2. 1. Multi Level Cache Critical word first and Early restart Giving priority to read misses over write SW R3,512(R0) M[512] R3 LW R1,1024(R0) R1 M[1024] LW R2, 512(R0) R2 M[512] Assume Write Through Policy. 4 Ward block SW Hit Ist LW Miss 2nd LW Miss Cache Write Buffer Cache replacement CSE 5350 Kalyan Basu Cache Index 0 Cache Index 0 Cache Index 0 Read after Write Hazard Read Miss to wait until write buffer is empty. Or check the contents of the write buffer on read miss. If there is no conflict, allow the read miss to continue. All desktop processors use this technique. Reducing Cache Miss Penalty 1. 2. 1. Multi Level Cache Critical word first and Early restart Giving priority to read misses over write SW R3,512(R0) M[512] R3 LW R1,1024(R0) R1 M[1024] Assume Write back cache Policy. 4 Ward block SW Hit LW Miss Dirty Memory writing Cache Index 0 Cache Index 0 Saving of CPU Time Cache Memory Buffer CSE 5350 Kalyan Basu Read after Write Hazard If read miss will replace a dirty cache, instead of writing the dirty cache block to memory and then read the new block from memory, store the dirty cache on a buffer, and then read the memory and then write the dirty cache from buffer to memory. In case of read miss, the processor stalls until the buffer is empty. Reducing Cache Miss Penalty 1. 2. 3. 1. Multi Level Cache Critical word first and Early restart Giving priority to read misses over write Merging write buffers Write through cache use write buffers. Write back cache also use write buffers.If write buffers is empty, data and full address is written on the buffers. The write is finished for CPU perspective. If write buffers are full, CPU stalls till the write buffers are empty. In write merging, the CPU checks whether merging of the buffer is possible. If possible, it merges the buffer and the new entry is called write merge. If buffer is full and no write merge possible,the cache must wait until the buffers are empty. This optimizes the memory write, as multi ward writes are more efficient than multiple single ward writes. The input/output devices registers are often mapped into the physical address space. These I/O addresses can not allow the write merging because separate I/ O registers does not allow the address to be merged. Each register require its own address space defined separately. CSE 5350 Kalyan Basu Reducing Cache Miss Penalty 1. 2. 3. 4. 1. Multi Level Cache Critical word first and Early restart Giving priority to read misses over write Merging write buffers Victim caches To reuse the discarded data by cache. Discarded data has already been fetched from main memory and thus can provide the data without significant penalty. This recycling is done by fully associative cache between cache and its memory refill path. The victim cache contains data that are discarded on miss. On cache miss, it is checked whether the data is on the victim cache. If data is found in the victim cache, the data is swapped between victim block and cache block. 4 entry victim cache might remove 25% of misses in 4 KB cache. CSE 5350 Kalyan Basu Victim Cache ? Hit Tag Data Cache Victim Cache CPU ? Miss Write Buffer Memory CSE 5350 Kalyan Basu Miss Rate Reducing Miss Rate Compulsory : The very first access to a block can not be in the cache.These are also called cold start misses or first reference misses.This will occur in infinite cache also. Capacity: If the cache can not hold all the blocks of a program, capacity misses will occur. This will occur in fully associative cache. Conflict: If the block placement strategy is set associative or direct mapped, conflict misses will occur, if too many blocks are mapped to a set. This misses are also called collision misses or interference misses. This will occur from moving from fully associative to set associative to direct mapped. CSE 5350 Kalyan Basu Miss Rate 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 Direct mapped 2-Way 4-Way 8-Way comp cap Conflict Total 4 KB Cache Size CSE 5350 Kalyan Basu Miss Rate 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 Direct mapped 2-way 4-way 8-way 16KB 64 KB 128KB 256 KB Total Miss Rate CSE 5350 Kalyan Basu Reducing Miss Rate 1. Large Block size 1. Larger caches 1. Higher Associativity 1. Way prediction and Pseudoassociative caches 1. Compiler optimization CSE 5350 Kalyan Basu Reducing Miss Rate 1. Large Block size: This reduces miss rate but increases miss penalty.The large block size reduces compulsory misses because it takes advantage of the spatial locality. Larger block size will increase conflict misses and even capacity misses if the cache size is small. 10 9 8 7 6 5 4 3 2 1 0 Miss Rate(%) 16 32 64 128 256 Block Size 4KB 16KB CSE 5350 Kalyan Basu 64KB 256KB Cache Size Reducing Miss Rate 1. Large Block size: This reduces miss rate but increases miss penalty.The large block size reduces compulsory misses because it takes advantage of the spatial locality. Larger block size will increase conflict misses and even capacity misses if the cache size is small. 120 100 Miss Penalty 80 60 40 20 0 16 32 64 128 256 Miss Penalty 3-D Column 2 3-D Column 3 3-D Column 4 3-D Column 5 Block Size CSE 5350 Kalyan Basu Reducing Miss Rate 1. Large Block size 1. Larger caches 12 10 8 Average Access Time 6 4 2 0 4K 16K 64K 256K 16 32 64 128 256 Block Size Cache Size CSE 5350 Kalyan Basu Reducing Miss Rate 1. Large Block size 2. Larger caches 1. Higher Associativity 2:1 Rule of Thumb: A direct map cache size on N has about the same miss rate as a two-way set associative of cache size N/2 for cache sizes less than 128 KB. Greater Associativity comes at the higher Hit Time, as the Tag comparator has to compare more number of registers to determine Hit. CSE 5350 Kalyan Basu 1. 2. 3. 1. Large Block size Larger caches Higher Associativity Way prediction and Pseudoassociative caches Reducing Miss Rate Way Prediction: Extra bits are kept in the cache to predict the way within a set of next cache access. This removes the number of comparison of Tags to select the desired block. Pseudoassociative or column associative: The access is proceed just as direct mapped cache, in case of miss, before going to next level, it checks the other part of the set. The simple way to invert the most significant bit of the index. Fast Hit: 1 cycle Slow Hit: 3 cycle Miss: 20 cycle Time CSE 5350 Kalyan Basu Reducing Rate Miss 1. 2. 3. 4. 1. Large Block size Larger caches Higher Associativity Way prediction and Pseudoassociative caches Compiler optimization Loop Inter Change: Exchanging the nesting of the loop can make code access data in order they are stored. Assuming that the array does not fit in the cache, this technique reduces misses. This improves spatial locality. Blocking: Reduces misses by improving temporal locality. If multiple arrays are used in the loop, and some arrays are accessed by row-row and other access by column-column, loop interchange will not solve the problem, as access schemes are orthogonal.Instead of operating entire rows and columns, this scheme operates on sub matrices or blocks. The goal is to maximize the use of data on the cache, before they are replaced. CSE 5350 Kalyan Basu Help from Parallelism In pipe-line computer that allows out-of-order execution, like dynamic multi instruction execution using Tomasulo's algorithm, CPU can continue fetching instructions from instruction cache, while waiting for data on a data cache miss. The CPU is not stalled due to data cache miss. A nonblocking cache or lookup-free cache escalates the potential benefits of such a scheme by allowing the data cache to continue to supply data cache hits during a miss. The potential benefits are hit-under miss miss-under-multiple miss miss-under-miss. This requires memory system to be able to serve multiple miss. This will significantly increase the complexity of the cache controller. CSE 5350 Kalyan Basu Hit under Miss Instruction FIFO Address fn Unit Load Buffers Operation Bus Reorder Reg# Buffer Data Registers FP Operand Bus Miss Store Data Store Address Address Memory fn unit Time Hit Hit Adder fn unit Reservation Stations Multiplier fn unit Common Data Bus (CDB) Hit time Miss Penalty CSE 5350 Kalyan Basu Hit Under Miss Performance Ratio of Avg,.Memory Stall time 120 100 % Average Memory Stall 80 Hit under 1 miss Hit under 2 miss nch Mark SPEC92 Programs Eqntott Compress Integer Espresso Xlisp Ora Spice2gs Alvinn Mdljpd2 Ear . Wave5 . Doduc FP . Nasa7 . Mdljsp2 . Hydro2d . Su2cor . Fpppp . Tomcatv . swm256 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Benchmark CSE 5350 Kalyan Basu Hardware Perfecting of Instruction Instruction Fetch CPU Instruction Cache Miss block To cache Miss block in Prefetcher Miss Instruction stream buffer Prefetch 2 blocks 2nd block In buffer 2 consecutive Blocks prefetched Instruction Store CSE 5350 Kalyan Basu Hardware Perfecting of Instruction and data Instruction Fetch CPU Data Access Miss block in buffer Instruction Cache Miss block To cache Miss block in Prefetcher Miss Data Cache Miss block to cache Instruction stream buffer Prefetch 2 blocks 2nd block In buffer Data Access Miss 2 consecutive Blocks prefetched Data Prefetch Data stream buffer 2 consecutive Data blocks Instruction Store Data store CSE 5350 Kalyan Basu Compiler Controlled Prefetching Compiler to insert prefetch instructions to request data before they are needed. Register Prefetch will load the value into a register Cache Prefetch will load the data into cache Exception Handling in case of virtual address faults Faulting: Address fault cause an exception Nonfaulting: Address fault will simply cause a no-ops. Register Load instruction : Faulting register prefetch instruction Prefetch register instruction cause instruction overhead CSE 5350 Kalyan Basu Reduce Hit time 1. Small and simple cache For fast speed, cache need to be on the same chip as the CPU. This forces design of small and simple cache design like direct mapped. Direct mapped can overlap tag check with data transmission to reduce hit time. 14 12 10 Access Time(ns) 8 6 4 2 0 4KB 16KB 64KB 256KB 1-way 2-way 4-way Full The L1 cache in Pentium 4 (8KB) has reduced from Pentium 3 (16 KB). Fast clock speed is achieved, with dynamic execution and large L2 cache. Results by Reinman and Jouppi(1999) CSE 5350 Kalyan Basu Reduce Hit time 1. Small and simple cache 2. Avoiding address translation during indexing of the cache PAGE NO. Page-offset Address translation 0 1 i Virtual Address in CPU CPU n bits Tag . . . nTag: No of bits in Tag Index Offset field nTag nIndex noffset nIndex: No of bits in the index to identify a block within a set Physical Address in Cache n n-1 CSE 5350 Kalyan Basu noffset: No of bits in offset to identify a ward in a block Reduce Hit time 1. Small and simple cache 1. Avoiding address translation during indexing of the cache Task 1, Indexing and Tag Comparator 2. Page Protection Need Page Protection. Copy TLB on cache miss, add a field and check on access to the cache Cache to be flashed, as virtual address refer to different physical address. Miss rate will impact the cache. Add PID and operating system will set its value. At context switching, based on PID, cache elements will be flashed. Virtual Cache Physical Cache Overlap Indexing and Tag comparison. Page protection checked during translation Not required 3. Process Switching 4, Synonyms & aliases Same physical address has two different virtual address. Two copies of the same data on the virtual cache. Hardware antialiasing avoid this problem. 5. I/O I/O always require physical address. CSE 5350 Kalyan Basu Not happen. One physical location of the data. Reduce Hit time 1. Small and simple cache 2. Avoiding address translation during indexing of the cache 1. Pipelined cache access Use pipeline technique to access the cache. CPU Phase-1 Phase-2 Phase-3 1. Virtual Address Translation 2. Tag Comparison 3, Cache Read/ Write Data 4. Completion 4-phase Cache Pipeline Phase-4 Cache Access Req. Pipeline increases latency, but it increases access BW Cache Access Time Pentium 1 cycle Pentium Pro through III 2 cycles Pentium 4 4 cycles CSE 5350 Kalyan Basu Reduce Hit time 1. 2. 3. 1. Small and simple cache Avoiding address translation during indexing of the cache Pipelined cache access Trace cache This is used to obtain instruction level parallelism beyond 4 instructions. Instead of limiting the instructions in a static cache block to spatial locality, the trace cache finds the dynamic sequence of instructions including branch taken to load into a cache block. The branch prediction is folded into the cache and must be validated along with the address to have a valid fetch. Trace cache store instruction only from branch entry to exit points. Complexity of address translation. Same Instruction can appear multiple time in cache CSE 5350 Kalyan Basu Trace Cache Instruction Cache 3 instructions 2 instructions 3 instructions 1 instruction Block-1 Block-2 Block-3 Block-4 Cache space not used for the program Main Memory:Program Store Block-1 Block-2 Block-3 Block-4 CSE 5350 Kalyan Basu Main Memory Organization 1. Wider Main Memory 2. Simple Interleaved Memory 3. Virtual Memory CSE 5350 Kalyan Basu Wide Main Memory Assume Cache block : 4 wards Ward size = 8 bytes Main Memory speed 4 cycles to send the address 56 cycles to access per ward 4 cycles to send the data Total 64 cycles per ward Memory BW= 4*8/4*64 = 1/8 bytes/cycle Memory Bus speed = 4*8/4 = 8 bytes/cycle If cycle = 106/sec cycle time = 10-6 sec Memory BW = (1/8)*106 bytes/sec = 1 Mbps Memory Bus speed = 8*106 bytes/sec = 64 Mbps CSE 5350 Kalyan Basu Cache Address Register Memory Wide Main Memory CPU CPU Wide Memory cache Bus Bus One Ward Memory BW = 1/8 byte/cs cache MUX Bus Width of main Memory is twice We need half the main memory Access to get the block. Memory BW = byte/cs Drawbacks: 1. 2. Expansion in module of 2 Error correction checking is difficult. Multiplexer is on critical timing path, differences in speed of cache and main memory very high. Main Memory Main Memory Main Memory 3. Ward 2 Wards CSE 5350 Kalyan Basu Interleaved Memory Disadvantages of a large single level memory: The larger the memory, the longer the delay (due to decoder and propagation delays), Expansion of the memory is difficult, and Creates a bandwidth bottleneck. Solution: Divide the memory into several smaller banks (modules): Faster access to each module, Ease of expansion, and Multiple memory accesses. M 0 M e m o ry S y s te m M . . . M 1 n -1 CSE 5350 Kalyan Basu Interleaved Memory Cache Address Register Memory CPU 1 2 cache Bus 3 3 3 Main Memory Main Memory Multiple Ward Main Memory Memory access time in miss = 1 + 2 +n* 3 Interleaved n: No of Interleaved Banks CSE 5350 Kalyan Basu Interleaved Memory A memory is n-way interleaved if it is composed of n modules numbered 0, 1, ..., n-1, and if word at address i is in module (i mod n). Consecutive instructions (words) are placed in consecutive modules. Some data structures (arrays) can also be stored similarly. I0 In . . . I 1 I n+1 . . . . . . I n-1 I 2n-1 . . . Module 0 Module 1 CSE 5350 Kalyan Basu Module n-1 Interleaved Memory This organization is called low-order interleaving since each word address is as: Memory Address Displacement Relative address within a module Module number n bits, addressing one of 2n modules Nothing but (K mod m) operation, where m=2n allows fast and parallel access (with one or more processors). CSE 5350 Kalyan Basu Interleaved Memory - Example Assume we have 2 modules and the address is 4 bits long: Module 0 Module 1 Relative address 0 item 0 1 item 2 2 item 4 Address 0000 0001 0010 0011 0100 0101 0110 Item No. 0 1 2 3 4 5 6 CSE 5350 Kalyan Basu Relative address 0 item 1 1 item 3 2 item 5 Module # 0 1 0 1 0 1 0 Rel. address 000 000 001 001 010 010 011 Low Order Interleaving An example of low order interleaving: Memory size = 217 = 128K (32 banks of 4K words each). Address format: 17 bits 12 Displacement CSE 5350 Kalyan Basu 5 Module # Virtual Memory Advantages of virtual memory: Address space (program/data size) is not limited to the portion of memory allocated to a user. Better memory utilization by keeping only the active program parts in faster memories. Allows multi-programming without the requirement of contiguous partitions for each user. CSE 5350 Kalyan Basu Virtual Memory Why does Virtual Memory work? Program Locality! Temporal locality: tendency to reference in the future those instructions/data referenced in the recent past. Loops, constants, variables, etc. Spatial locality: tendency to reference neighboring instructions/data. Sequential portions of code. Manipulation of arrays and data structures. CSE 5350 Kalyan Basu Virtual Memory Advantages of virtual memory: Address space (program/data size) is not limited to the portion of memory allocated to a user. Better memory utilization by keeping only the active program parts in faster memories. Allows multi-programming without the requirement of contiguous partitions for each user. Disadvantages of virtual memory: Requires address mapping and overhead resulting from sending information back and forth between memory levels. CSE 5350 Kalyan Basu Virtual Memory Multiple Process running at the same time on computer require partition of memory address space between the processes. Automatically manages overlays, the two level of program hierarchy represented by main memory and secondary storage. Relocation, allows to run the same program to any location in physical memory. Keep the program/data in the largest available memory unit. Bring in portions of the program/data which is currently being used into faster, but smaller memory unit. Create an environment which has the storage capacity of the largest memory unit with the speed of the fastest memory unit. CSE 5350 Kalyan Basu Virtual Memory A program has 4 pages A,B,C and D. The Logical program in the contiguous virtual memory space is mapped to Main and Back-up physical memory. Physical Address 0 0 4K Cache Block Miss Frame address Virtual Memory Page or Segment Page Fault or Address fault Virtual Address 8K 12K Virtual Address A B C D Virtual Memory 4K 0 8K 12K 16K 20K 24K 28K 32K C A B Physical Memory D Disk CSE 5350 Kalyan Basu Address Mapping Address Space CPU Cache Main Memory Seconday Storage We first focus on the interface (address mapping) between the Address Space and the Main Memory. Let V be the address space or set of addresses a process (or program) can address. CSE 5350 Kalyan Basu Address Mapping For example, if we have a 16-bit machine (i.e., MAR is 16 bits long), then the maximum address space of this machine (V) is 65,536 bytes. Let M be the physical locations or memory space allotted to a process in the main memory For example, in the previous machine, even though you can have up to 64KB of memory, only 8KB of physical memory may be actually installed, thus, M = 8KB. CSE 5350 Kalyan Basu Address Mapping Then the virtual memory is a function from the address space to the memory space: f: V M such that if xV y if x is in M at location y f ( x) = Undefined, otherwise CSE 5350 Kalyan Basu Address Mapping 0 1 . . . n -1 A d d re s s S p a c e ( L o g ic a l S p a c e ) CSE 5350 Kalyan Basu . . . 0 1 m -1 M a in M e m o r y ( P h y s ic a l S p a c e ) m <<n Address Mapping If f(x) = undefined, then we have a missingitem fault and the item must be brought from the secondary storage to main memory. In the case of main memory-address space interface, the items are blocks of info which must be sent back and forth between the main memory and secondary storage. CSE 5350 Kalyan Basu Address Mapping Different techniques based on block sizes: Paging: blocks are of equal length, a page in the main memory is called a frame. Segmentation: blocks are of unequal length. Segmentation with paging: blocks (segments) are multiples of pages. CSE 5350 Kalyan Basu Paging A memory reference consists of two parts: For example: Let memory size = 16 bytes, page size = 4 bytes Page num ber D is p la c e m e n t V ir tu a l ( lo g ic a l) a d d r e s s 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 Page num ber D is p la c e m e n t CSE 5350 Kalyan Basu Paging There are 3 general schemes for address mapping: Direct Mapping Associative Mapping Combination of Associative and Direct Mapping CSE 5350 Kalyan Basu Direct Mapping Let n = no. of virtual pages in the address space. m of those n pages are in the main memory. Keep a table of n items (Page Table) such that item i in the page table contains the frame (page) number of the ith virtual page if that page is in the main memory, Otherwise, the ith item in the table contains a flag value indicating that the page is not in the main memory (page fault). CSE 5350 Kalyan Basu Direct Mapping Disadvantages of direct mapping? n is usually large page table size is large, and hard to implement as a set of registers for fast access If table is in main memory, then it becomes very slow: one additional access. Table fragmentation: most table entries are empty most of the time. CSE 5350 Kalyan Basu Direct Mapping Displacement Virtual Address (From Processor) PAG E NO . L IN E N O . 0 Page Table 1 i . . . FRAM E NO . L IN E N O . To Memory n -1 CSE 5350 Kalyan Basu Physical Address Associative Mapping Keep a table of m (number of frames in main memory) entries in an associative memory. Each entry has two fields X and Y, where X is the virtual page number, and Y is the actual page (frame) number. done. If X is not found, then the page is not in the main memory (Page fault). Simple and memory efficient implementation, but large associative memories are slow and expensive. Search for X. If found, then get Y and you are CSE 5350 Kalyan Basu Associative Mapping Displacement Virtual Address (From Processor) PAG E N O . PAG E N O . L IN E N O . FR AM E N O . i k j l . . . . . . FR AM E N O . L IN E N O . Physical Address CSE 5350 Kalyan Basu To Memory Combination: Associative and Direct Mapping Keep the page table (as in direct mapping) in the Main Memory with PP register pointing to its beginning address. Also use a small associative memory (Translation Lookaside Buffer [TLB]) to keep the entries for the most recently used blocks and their neighbors. For a virtual page number, check the associative map and the table in parallel. If the frame number (in main memory) is found in the associative memory, then we are done. Otherwise, wait for the result of page table search; if found, we are done, else we have a page fault: CSE 5350 Kalyan Basu V ir tu a l A d d r e s s PAG E N O . L IN E N O . PP i 0 + 1 2 . . . m a tc h . . . A s s o c ia tiv e (p a r tia l) m a p or n -1 D ir e c t M a p p in g T a b le FR AM E N O . L IN E N O . R e a l A d d re s s CSE 5350 Kalyan Basu Pre-translation If we do not change the virtual page number, there is no need to go through TLB/page table. Use the same old frame number Very fast. For example, let's say we have just brought in a page for a main memory access: Program Counter: Virtual Page # Word # When PC is incremented, look for the carry from "word #" to "page #". Statistics show: 70% of accesses do not need TLB or page table access. 90% of accesses can be handled by the TLB. CSE 5350 Kalyan Basu Demand and Pre-Paging Demand paging: Pages are loaded into main memory only when they are accessed at page fault time. Pre-paging: Uses locality of effect to bring in several neighboring pages into main memory at each page fault.5350 CSE Kalyan Basu Paging Example Assume we have a 40-bit virtual address, 16KB pages, 36-bit physical address. What is the total size of the page table for each program on this machine, assuming there are 4 flag bits per page table entry? Show the address mapping process. Addr. space Page size = 240 16K = 240 24x210 = 226 No. of pages Page tables are often in the secondary storage Word # = 40 - 26 = 14 bits Frame # = 36 - 14 = 22 bits Each table entry = 4 + 22 = 26 bits = 4 bytes Total = 226 x 22 = 228 bytes = 256MB! VERY LARGE! CSE 5350 Kalyan Basu Paging Example P a g e n o : 2 6 b its L in e n o : 1 4 b its 4 b it s 2 2 b it s 0 1 i . . . F r a m e n o : 2 2 b its L in e n o : 1 4 b its 2 26 -1 CSE 5350 Kalyan Basu Segmentation Blocks of variable size. Break up the memory or user program into logical segments: user vs system memory areas, procedures, loops, local data, arrays, etc. FIRST l1 l2 l3 Used Free CSE 5350 Kalyan Basu Segmentation STP Addressing translation: + i Segment # s i Displacement d SEGM ENT TABLE Si Dj + REAL ADDRESS Problems of Segmentation: Overhead due to manipulation of linked lists and compaction. External fragmentation: free spaces which are too small to be used by any segment. What if segment size is too large for the allocated space? CSE 5350 Kalyan Basu FLAG PROT. B IT S LENGTH BASE Added not concatenated A D D R E S S T R A N S L A T IO N IN A S E G M E N T E D S Y S T E M Paging Example P a g e n o : 2 6 b its L in e n o : 1 4 b its 4 b it s 2 2 b it s 0 1 i . . . F r a m e n o : 2 2 b its L in e n o : 1 4 b its 2 26 -1 CSE 5350 Kalyan Basu Segmentation with Paging We have segments of variable size, but each segment is a multiple of a page. Not all the pages of a segment need to be in the main memory at the same time. Each segment has its own page table. CSE 5350 Kalyan Basu SEG M ENT TABLE P O IN T E R STP Si Pj Dk V IR T U A L ADDRESS + + lim it PT P o in t. fla g prot real b its a d d REAL ADDRESS SEG M ENT TABLE PAG E TABLE FO R SEG M ENT Si S E G M E N T E D (A N D P A G E D ) A D D R E S S T R A N S L A T IO N CSE 5350 Kalyan Basu Example: DEC Alpha AXP 21064 Uses combination of segmentation with paging. 64-bit address space. Segments are: seg0 (bit 63=0), and seg1 (bits 63 and 62 = 11) user processes segments kseg (bits 63 and 62 = 10) OS kernels segment seg0 A d d re s s s p a c e seg1 A d d re s s s p a c e seg0 addresses grow up from 0 000...0xxxxx seg1 addresses grow down to 0 111...1xxxxx CSE 5350 Kalyan Basu Example: DEC Alpha AXP 21064 Since address space is 64 bits and we only have 3 segments, the size of the page table for each segment is still very large. Alpha AXP uses a 3-level hierarchical page table Page Table Entries (PTE) at level 1 give the physical address of the level 2 page table, A PTE at level 2 gives the physical address of the level 3 page table, and Finally, a PTE at level 3 gives the physical frame number for the item to be accessed. In Alpha AXP, each page is 8KB long. Each page table is also 8KB long fits in a page. CSE 5350 Kalyan Basu Example: DEC Alpha AXP 21264 V ir tu a l a d d r e s s s e g 0 /s e g 1 S e le c to r 0 0 0 ... 0 o r 1 1 1 ... 1 L e v e l1 L e v e l2 L e v e l3 P a g e o ffs e t Each page table is 8KB or one page long. Each levelspecifier is 10 bits long (1kB). Page offset is 13 bits long (8kB). Frame number is 28 bits long. Physical address is 41 bits long. P a g e ta b le b a s e r e g is te r + L 1 p a g e ta b le L 2 p a g e ta b le P a g e ta b le e n tr y + P a g e ta b le e n tr y + P a g e ta b le e n try L 3 p a g e ta b le P h y s ic a l a d d r e s s P h y s ic a l p a g e - fr a m e n u m b e r P a g e o ffs e t CSE 5350 Kalyan Basu M a in m e m o r y Example: DEC Alpha AXP 21264 Effective Physical Address Size 240 bytes of memory and 240 I/O address space. Total 41 bits long. Page size: Currently 8 KB, can grow to 16KB, 32 KB and 64 KB in future. Each page is exactly one page long, so level field is 2n = page size/8. Current value of n = 10, and in future it will grow to 11 and12,13 bits as page size grows. Virtual address of each page size = 3x10 +13 = 43 now and grows to 55 bits for 64 KB page size. Seg0 {64 (3x10 + 13)} = 21 bits are set to zero, and for seg1 they are all set to one. CSE 5350 Kalyan Basu Example: DEC Alpha AXP 21264 Alpha AXP 21264 also uses a TLB for instruction accesses and a TLB for data accesses. Alpha allows the OS to tell the TLB that contiguous sequences of pages can act as one (pre-paging), with the options being: 8, 64, and 512 times the page size. TLB characteristics: Parameter Block size Hit time Miss penalty TLB size Block selection Block placement Description 1 PTE (8 bytes) 1 clock cycle 20 clock cycles (average) Same for instruction and data TLBs. 128 TLBs per PTE, each of which can map 1, 8, 64 or 512 pages. Round-robin. Fully associative CSE 5350 Kalyan Basu Replacement Policies Which block in Main Memory to replace next? Depends on whether we have a Fixed or a Variable memory partition for a program. Fixed: if page fault we must replace a page. Variable: if page fault we have a choice of replacing a page or adding it to the memory. CSE 5350 Kalyan Basu Fixed Replacement Policies Random Replacement. First-in First-out (FIFO): keep a queue in RAM. First-in-NOT-used-First-out (CLOCK): Keep a queue of pages in Main Memory. Each page has a "use" bit. At replacement time set all use bits = "unused." Mark used pages as "used" until next replacement. At replacement time check queue for the 1st unused page to be replaced. CSE 5350 Kalyan Basu Fixed Replacement Policies Least-Recently-Used (LRU): Use a pseudo queue: If page Pi is referenced and Pi is in main memory, then remove it from the queue (wherever it is: thus pseudo queue), place it at the rear of the queue. If Pi is not in main memory, replace the element at the front of the queue. Other variations are possible; for example, random, but not last used. CSE 5350 Kalyan Basu Variable Replacement Policies Working-set Replacement: Window size (T): an interval of T units of time. Working set at time t for a window (T) is the set of pages which have been referenced in the interval [t-(T+1), t]. A page is deleted from main memory at time t if it does not belong to working set (t, T) Number of frames may vary. CSE 5350 Kalyan Basu Variable Replacement Policies Page-Fault Frequency Replacement: Let PFF be the no. of page faults during the interval (window) T. At page fault time, if PFF is greater than some predefined value P, no replacement occurs and a page is loaded. If PFF < P, then all pages not referenced since last page fault are discarded. Simulation shows performance the same as workingset. CSE 5350 Kalyan Basu Protection Process 1. Base and bound: Valid address is `base' < address < `bound'. In some system where address is added to base then (base + address) < `bound'. User in multi process environment, can change the values of their `base' and `bound' registers and that will result conflict. The protection is provided by the operating system. Provide two modes of process: user process and operating system process. Operating system process is called Kernel process or Executive process. Provide a portion of the CPU state that user process can use but can not write. The states include `'base'/'bound' registers, mode bits of user/operating system process, exception enable/disable bits. System call to move from user process to executive process. 2. Protection Flags: Page level protection 3. Protection Rings: Multi level concentric Rings 4. Key Capabilities: CSE 5350 Kalyan Basu CSE 5350 Kalyan Basu
Find millions of documents here - Study Guides, Homework Solutions, Papers, Exam Answer Keys and more. Course Hero has millions of course related materials that will enable you to learn better, faster and get an A in all your courses.
Below is a small sample set of documents:

Texas Tech >> CS >> 5375 (Fall, 2008)
Access-Mode Predictions for Low-Power Cache Design Zhichun Zhu, and Xiadong Zhang presented by: Gregory Gelfond, Brian Jurries, & Yongjun Rong Overview Background Direct Mapping Set Associative Mapping Low Power Set Associative Mapping Phased ...
Texas Tech >> CS >> 5375 (Fall, 2008)
Architectural Approaches to Low Power Cache System - CET520 Survey Paper By Lihua Chen April 30, 2003 Agenda Background Overview of the conventional cache architecture Introduce architectural approaches to low power cache system Evaluation of t...
Texas Tech >> CS >> 5375 (Fall, 2008)
ACCESS-MODE PREDICTIONS FOR LOW-POWER CACHE DESIGN AN ACCESS-MODE PREDICTION TECHNIQUE BASED ON CACHE HIT AND MISS SPECULATION FOR CACHE DESIGN ACHIEVES MINIMAL ENERGY CONSUMPTION. USING THIS METHOD, CACHE ACCESSES CAN BE ADAPTIVELY SWITCHED BETWEEN...
Texas Tech >> CS >> 5375 (Fall, 2008)
7.0 STANDARDS IN BROADCASTING The Commission undertook significant work in 2003 on the development of the codes and rules envisaged under Section 19 of the Broadcasting Act 2001, specifically the Children\'s Advertising Code and the Access Rules. Th...
Texas Tech >> CS >> 5375 (Fall, 2008)
Integrated I-cache Way Predictor and Branch Target Bu er to Reduce Energy Consumption Weiyu Tang Alexander V. Veidenbaum Alexandru Nicolau Rajesh Gupta Department of Information and Computer Science University of California, Irvine Irvine, CA 92697-3...
Texas Tech >> CS >> 5375 (Fall, 2008)
Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption Koji Inoue, Tohru Ishihara, and Kazuaki Murakami Department of Computer Science and Communication Engineering Kyushu University 6-1 Kasuga-Koen, Kasuga, Fukuoka 816-...
Texas Tech >> CS >> 5375 (Fall, 2008)
Appears in the Proceedings of the 34th International Symposium on Microarchitecture (MICRO), 2001. Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-Mapping Michael D. Powell, Amit Agarwal, T. N. Vijaykumar, Babak Falsafi...
Texas Tech >> CS >> 5375 (Fall, 2008)
...
Texas Tech >> CS >> 5375 (Fall, 2008)
CS5331 P2P Computing Final Projects Demo Access SORCER via EJB in JBoss Rong,Yongjun(rong@cs.ttu.edu) Outline - Introduction Overview of J2EE J2EE vs. JINI What\'s my demo? Problems and Solutions Introduction Ancestry of the JavaTM 2 Platform,Enter...
Texas Tech >> CS >> 5375 (Fall, 2008)
Location Cache: A Low-Power L2 Cache System Rui Min, Wen-Ben Jone and Yiming Hu Department of Electrical & Computer Engineering and Computer Science University of Cincinnati Cincinnati, OH 45221-0030 e-mail: {minri,wjone,yhu}@ececs.uc.edu ABSTRACT Wh...
Texas Tech >> CS >> 5375 (Fall, 2008)
Partial Tag Comparison: A New Technology for Power-Efficient Set-Associative Cache Designs Rui Min, Zhiyong Xu, Yiming Hu, Wen-ben Jone Department of Electrical & Computer Engineering and Computer Science University of Cincinnati Cincinnati, OH 45221...
Texas Tech >> CS >> 5375 (Fall, 2008)
To appear in the Proceedings of the 34th International Symposium on Microarchitecture (MICRO 34), 2001. Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-Mapping Michael D. Powell, Amit Agarwal, T. N. Vijaykumar, Babak Fa...
Texas Tech >> CS >> 5331 (Fall, 2008)
CS5331 P2P Computing Final Projects Demo Access SORCER via EJB in JBoss Rong,Yongjun(rong@cs.ttu.edu) Outline - Introduction Overview of J2EE J2EE vs. JINI What\'s my demo? Problems and Solutions Motivation for B-Trees Data is too large to fi...
Texas Tech >> CS >> 5331 (Fall, 2008)
CS5331 P2P Computing Final Projects Demo Access SORCER via EJB in JBoss Rong,Yongjun(rong@cs.ttu.edu) Outline - Introduction Overview of J2EE J2EE vs. JINI What\'s my demo? Problems and Solutions Introduction Ancestry of the JavaTM 2 Platform,Enter...
Texas Tech >> CS >> 5331 (Fall, 2008)
3 JXTA Protocols By Daniel Brookshier n this chapter, you are going to read about the Java implementation of the JXTA protocols. We will highlight the important classes, interfaces, and functionality. The JXTA API is fairly large, not simple, and not...
Texas Tech >> CS >> 5331 (Fall, 2008)
8 JXTA and Security By Navaneeth Krishnan IN THIS CHAPTER Importance of Security Security in Multifaceted Security in P2P Networks JXTA Platform Security JXTA Security Requirements The Cryptographic Toolkit Security Issues and Solutions Trus...
Texas Tech >> CS >> 5353 (Fall, 2008)
CS 5353 COMPILER CONSTRUCTION Spring, 2003 COOKE SYLLABUS TEXT: Compilers: Principles, techniques, and tools, BY Aho, Sehti, and Ullman TOPICS COVERED: Topics related to the construction of a compiler will be introduced. The definition of tokens fo...
Texas Tech >> CS >> 3375 (Fall, 2008)
Register Transfer Language (RTL) RTL (Register Transfer Language) Provides a language for describing the behavior of computers in terms of step-wise register contents Provides a formal means of describing machine structure and function Is at the ...
Texas Tech >> CS >> 3375 (Fall, 2008)
Final Exam Review Combinational Logic Building Blocks: Decoders, Encoders, Multiplexers, Demultiplexers Implementing functions using decoders, multiplexers. Combinational Arithmetic Circuits: Adders, Subtractors, Multipliers, Comparators, shift...
Texas Tech >> CS >> 3375 (Fall, 2008)
FUNCTIONS OF CONBINATIONAL LOGIC 1. BASIC ADDERS 1.1 THE HALF - ADDER 6.1 Logic Symbol for a half add 6.1 Half adder truth table 6.2 Half - adder logic diagram 1.2 THE FULL ADDER 6.3 Logic symbol for a full adder Sarawut S. Janpong 2-2...
Texas Tech >> CS >> 3375 (Fall, 2008)
Chapter 5 Debugging VHDL Programs 1999, TU Wien, Dr. Franz Wotawa 1 VHDLDIAG+ Debugging Processes VHDLDIAG Diagnosis Components = Concurrent Statements Only abstract values considered (ok, undef) Only Dependency Model VHDLDIAG+ Diagnosi...
Texas Tech >> CS >> 3375 (Fall, 2008)
EE232.3: Digital Electronics Dr. Anh Dinh Electrical Engineering University of Saskatchewan MSI Logic Circuits Overview * Decoders * Encoders * Multiplexers (MUX) - Logic Implementation using Decoder and MUX * Demultiplexers (DEMUX) * Magnitude Comp...
Texas Tech >> CS >> 3375 (Fall, 2008)
Decoder Definition A decoder is a logic device that selects which of a set of devices attached to its outputs is to be activated The devices to be driven by the outputs of the decoder may be lamps or LEDs, memory locations, other logic devices, et...
Texas Tech >> CS >> 3375 (Fall, 2008)
Decoder Definition A decoder is a logic device that selects which of a set of devices attached to its outputs is to be activated The devices to be driven by the outputs of the decoder may be lamps or LEDs, memory locations, other logic devices, et...
Texas Tech >> CS >> 3375 (Fall, 2008)
CPRE 210 Partial Solutions to Practice questions for exam #1 9/25/01 1. 2. 3. 4. (a) 9 (b) 9 (c) 3 (a) 6 (b) 6 (c) 2 (a) 0 to 1023 (b) -511 to 511 (c) -511 to 511 (d) 512 to 511 (a) (i) 10 (ii) 13 (b) (i) 01010 (ii) 11011 (c) (i) 10110 (ii) 01101 (d...
Texas Tech >> CS >> 3375 (Fall, 2008)
CPRE 210 - Practice questions for exam #2 10/24/01 1. Construct a 3-to-8 decoder using AND gates, OR gates, and NOT gates only. 2. In this question, we will compare two different ways to construct a 4-to-1 multiplexer. (a) Design a 4-to-1 multiplex...
Texas Tech >> CS >> 3375 (Fall, 2008)
CPRE 210 - Practice questions for exam #3 12/4/01 1. In the HW question #1 for Lecture 28 (Assignment #9), you are asked to design a 4bit shift register. The block diagram is given below. (a) A rotate operation is different from a shift operation ...
Texas Tech >> CS >> 5381 (Fall, 2008)
= MICROSOFT FOUNDATION CLASS LIBRARY : cs5381 = AppWizard has created this cs5381 application for you. This application not only demonstrates the basics of using the Microsoft Foundation classes but is also a starting point for writing your...
Texas Tech >> CS >> 5381 (Fall, 2008)
3242 10.5 0 18 10.3944 0 18.2658 10.353 0 18.4725 10.3667 0 18.6202 10.4265 0 18.7087 10.5234 0 18.7383 10.6485 0 18.7087 10.7927 0 18.6202 10.947 0 18.4725 11.1024 0 18.2658 11.25 0 18 10.3648 -1.72284 18 10.2606 -1.70552 18.2658 10.2197 -1.69872 18...
Bates >> PHYS >> 106 (Fall, 2008)
Ristinen and Kraushaar - Chapter 10 Solutions Questions and Problems 1. Total carbon dioxide added to the atmosphere: (7.64 1012 tonne) (106 gm/tonne) 0.5 The approximate \"molecular weight\" of air: 0.78 (28 gm/mole) + 0.21 (32 gm/mole) + 0.01 ...
Texas Tech >> CS >> 5331 (Fall, 2008)
FIPER: The Federated S2S Environment Michael Sobolewski Senior Computer Scientist GE CR&D sobol@crd.ge.com Session 2420 Overall Presentation Goal Learn how to architect and build Service-to-Service (S2S) environments using RMI, JiniTM, Rio, JavaTM ...
Texas Tech >> CS >> 5331 (Fall, 2008)
A Cost Estimation Tool Integrated into FIPER David Koonce, Robert Judd, Thomas Keyser and Mike Bailey Ohio University, Athens OH, 45701 USA, General Electric Corp. Key words: Abstract: FIPER, Cost Estimation, The estimation of a part\'s manufacturin...
Texas Tech >> CS >> 5331 (Fall, 2008)
AIAA-2000-4902 A FEDERATED INTELLIGENT PRODUCT ENVIRONMENT Peter J. Rhl , Raymond M. Kolonay, Rohinton K. Irani, Michael Sobolewski, Kevin Kao General Electric Corporate Research and Development Schenectady, NY 12301 Michael W. Bailey GE Aircraft Eng...
Texas Tech >> CS >> 5331 (Fall, 2008)
AIAA-2000-4868 A WORKFLOW PARADIGM FOR FLEXIBLE DESIGN PROCESS CONFIGURATION IN FIPER Brett Wujek*, Patrick N. Koch*, and Wei-Shan Chiang Engineous Software, Inc. 1800 Perimeter Park West, Suite 275 Morrisville, North Carolina 27560 ABSTRACT The sol...
Texas Tech >> CS >> 5331 (Fall, 2008)
COMBUSTION SUB-SYSTEM MULTIDISCIPLINARY ANALYSIS AND OPTIMIZATION USING A NETWORK-CENTRIC APPROACH 2001-1217 XV ISOABE Conference Bangalore, India September 2-7, 2001 Charles E. Seeley, Venkat Tangirala, Ray Kolonay General Electric Corporate Researc...
Texas Tech >> CS >> 5331 (Fall, 2008)
A DISTRIBUTED, COMPONENT-BASED INTEGRATION ENVIRONMENT FOR MULTIDISCIPLINARY OPTIMAL AND QUALITY DESIGN Brett A. Wujek, Patrick N. Koch, Mark McMillan, Wei-Shan Chiang Engineous Software, Inc. 2000 CentreGreen Way, Suite 100 Cary, NC 27513 ABSTRACT T...
Texas Tech >> CS >> 5331 (Fall, 2008)
DEVELOPING INNOVATIVE DESIGN TECHNOLOGIES FOR BROAD ECONOMIC BENEFIT The Federated Intelligent Product EnviRonment (FIPER) is a four year joint venture, supported by the National Institute of Standards and Technology (NIST), Advanced Technology Prog...
Texas Tech >> CS >> 5331 (Fall, 2008)
Contact Author: Dr. Peter J. Rhl, K1-3C18A, GE Corporate R&D, P.O. Box 8, Schenectady, NY 12301 ph: (518) 387-4004, email: rohl@crd.ge.com INTELLIGENT COMPRESSOR DESIGN IN A NETWORK-CENTRIC ENVIRONMENT Peter J. Rhl, Raymond M. Kolonay General Elect...
Texas Tech >> CS >> 5331 (Fall, 2008)
2001-1613 TURBINE BLADE PROBABILISTIC ANALYSIS USING SEMI-ANALYTICAL SENSITIVITIES Scott A. Burton* General Electric Aircraft Engines Evendale, Ohio Raymond M. Kolonay and Mustafa Dindar General Electric Corporate Research and Development Niskayuna, ...
Texas Tech >> CS >> 5331 (Fall, 2008)
Federated P2P Services in CE Environments Michael Sobolewski GE Global Research Center, Niskayuna, New York, USA ABSTRACT: The goal of the Federated Intelligent Product Environment (FIPER) environment is to form a federation of distributed services ...
Texas Tech >> CS >> 5331 (Fall, 2008)
Service-ORiented Computing EnviRonment (SORCER) Michael Sobolewski sobol@cs.ttu.com Distributed Megaprogramming I do not believe traditional tools, technologies, and methodologies support Distributed Megaprogramming, Service-Oriented Programming, ...
Bates >> PHYS >> 106 (Fall, 2008)
Physics 106 Energy and Environment In-Class Problem #1 Names: In 2002, the United States consumed a total of 97.4 Quadrillion Btu of energy, from all sources. a. How many Snickers bars does this correspond to, given that each (not the Big One) conta...
Texas Tech >> CS >> 5381 (Fall, 2008)
1 Data Structures and Algorithm Analysis in C+ 1 Linear Abstract Data Types Remember that an Abstract Data Type is a way of storing data sometimes called a \"container for the data\", together with the algorithms that one can perform on the data. Mos...
Bates >> PHYS >> 106 (Fall, 2008)
Physics 106 Energy and Environment In-Class Problem #3 Names: A heat engine extracts energy at a rate of 100 W from a hot reservoir, and delivers 20 W to a cold reservoir. a) Calculate the rate at which the heat pump does work. b) If the hot reserv...
Bates >> PHYS >> 106 (Fall, 2008)
Physics 106 Energy and Environment In-Class Problem #4 Names: A windmill has five-meter long blades. Calculate the power it produces in a 7 m.p.h. wind if it operates at 70% of the theoretical maximum extraction efficiency and generates electricity ...
Bates >> PHYS >> 106 (Fall, 2008)
Physics 106 Energy and Environment In-Class Problem #6 Names: 1. In a proposed fusion energy reactor, tritium fuel is created by the neutron capture on 6Li. Another product of the reaction is an alpha particle. Lithium-6 has atomic number 3. Write a...
Bates >> PHYS >> 106 (Fall, 2008)
Physics 106 - Exam 2 Equations 1 E = mc2 KE = mv 2 P E = w h = mgh 2 E = P t Qhot - Qcold Tcold = (1 - ) Qhot Thot Ef f iciency = Qcold Tcold = Qhot Thot C.O.P. = Qh Th Qh = = W Qh - Q c Th - T c g = 9.81 P = T4 A m s2 2898 T (K) Area = r 2 max ...
Texas Tech >> CS >> 5331 (Fall, 2008)
JiniTM Technology logo JiniTM Technology Core Platform Specification A system of JiniTM technology-enabled services and/or devices is a JavaTM technology-centered, distributed system designed for simplicity, flexibility, and federation. The Jini arc...
Texas Tech >> CS >> 5331 (Fall, 2008)
JiniTM Technology logo JiniTM Architecture Specification The JiniTM Architecture Specification defines the top-level view of the Jini architecture, its components, and the systems on which the Jini architecture is layered. This will give you a high-...
Texas Tech >> CS >> 5331 (Fall, 2008)
JiniTM Technology logo JavaSpacesTM Service Specification The JavaSpacesTM service specification provides a distributed persistence and object exchange mechanism for code written in the JavaTM programming language. Objects are written in entries tha...
Texas Tech >> CS >> 5331 (Fall, 2008)
Jini(TM) Technology Starter Kit New Specifications v2.0 version created May 28, 2003 Jini(TM) Technology Team Sun Microsystems, Inc. Copyright 2003 Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 USA. All rights reserved. Sun Mic...
Texas Tech >> CS >> 5331 (Fall, 2008)
vti_encoding:SR|utf8-nl vti_timelastmodified:TR|20 Feb 2003 05:14:13 -0000 vti_extenderversion:SR|5.0.2.2623 vti_author:SR|WOODENEAR\\Administrator vti_modifiedby:SR|WOODENEAR\\Administrator vti_timecreated:TR|20 Feb 2003 05:14:13 -0000 vti_cacheddtm:T...
Texas Tech >> CS >> 5331 (Fall, 2008)
vti_encoding:SR|utf8-nl vti_timelastmodified:TR|05 Mar 2003 03:41:24 -0000 vti_extenderversion:SR|5.0.2.2623 vti_lineageid:SR|{70BB8CFD-42A1-440A-AE05-02977CAC3ED5} vti_cacheddtm:TX|05 Mar 2003 03:41:24 -0000 vti_filesize:IR|14002 vti_backlinkinfo:VX...
Bates >> PHYS >> 106 (Fall, 2008)
Physics 106 - Final Exam Equations 1 E = mc2 KE = mv 2 P E = w h = mgh 2 E = P t Qhot - Qcold Tcold = (1 - ) Qhot Thot Ef f iciency = Qcold Tcold = Qhot Thot C.O.P. = Qh Th Qh = = W Qh - Q c Th - T c g = 9.81 P = T4 A m s2 2898 T (K) Area = r 2 ...
Stanford >> PUBS >> 1500 (Fall, 2009)
SINGLE BUNCH BEAM LOADING EXPERIMENTS THE SLAC TWO-MILE ACCELERATOR* R. F. Koontz, G. A. Loew, R. 11.Miller ON SLAC-PUB-1564 March 1975 (4 Stanford Linear Accelerator Center Stanford University, Stanford, California, 94305 Introduction In Septembe...
Stanford >> PUBS >> 1500 (Fall, 2009)
SLAC-PUB-1591 (Rev.) WE. -461 (Rev. ) \' July 1975 PHOTOPRODUCTION U. Camerini, J. G. Learned, OF THE PSI PARTICLES* C. M. Spencer, D, E. Wiser R. Prepost, University of Wisconsin Madison, Wisconsin 53706 W. W. Ash, R. L. Anderson, D. M. Ritson, D. ...
Stanford >> PUBS >> 1250 (Fall, 2009)
H f 1 5 4 *pol.~ad a?nup -!pnoo -ayeq auws aql Japan pqeoeaJ AI$nanbasqns a.zqaJaqJ Z/T-~ e ~03 ~nq ` SUO!~ q~q y\' io 9Z` OZ PZ\' 4 vo33a uB UJ sIqJawu .ralJe pm? ampq -!.zoIeo *urnmqg Bulleoo 30 saplxo 30 sZk1yuro0 J.roqs aaoqv aql auro3JaAo ha0 q...
Texas Tech >> CS >> 5375 (Fall, 2008)
ARTES project 0005-4 2000-05-22 1 Techniques for Module-Level Speculative Parallelization on Shared-Memory Multiprocessors Research Proposal to the PAMP program Research Plan Sept 1, 2000 - Dec 31, 2001 (Note: only 1.25 years) Per Stenstrm Depart...
Texas Tech >> CS >> 5375 (Fall, 2008)
Bundling: Reducing the Overhead of Multiprocessor Prefetchers Dan Wallin and Erik Hagersten Uppsala University, Department of Information Technology P.O. Box 337, SE-751 05 Uppsala, Sweden dan.wallin, erik.hagersten @it.uu.se Abstract Prefetching has...
Texas Tech >> CS >> 5375 (Fall, 2008)
Reducing the Communication Overhead of Dynamic Applications on Shared Memory Multiprocessors Anand Sivasubramaniam Department of Computer Science and Engineering The Pennsylvania State University University Park, PA 16802 anand@cse.psu.edu Abstract ...
Texas Tech >> CS >> 5375 (Fall, 2008)
1. Introduction The purpose of this paper is two fold. The first part gives an overview of cache, while the second part explains how the Pentium Processor implements cache. A simplified model of a cache system will be examined first. The simplified m...
Texas Tech >> CS >> 5375 (Fall, 2008)
Advanced Cache Topics Based on Yaron Sheffer slides , p-1 Micro Computers Class Cache Coherency memory q The process in w...
Texas Tech >> CS >> 5375 (Fall, 2008)
Bus-based SMP Systems Symmetric Multiprocessors (SMPs) Shared Memory Systems provide a shared address space to the programmer. Symmetric Multiprocessors (SMP) are shared memory systems with the following characteristics global physical address sp...
Texas Tech >> CS >> 5375 (Fall, 2008)
~ nQhidmX\"QgQe\"f3rx\"Qr i d l k j i i h f d ( ( y W 3xt g BwQf8)VDXI)QY Xp u$sA p V84 rig bv 62 YH 5 6H@ P 52 W 2 6H q 4 t A 4 Y@ E q p h a ` A 4@H ` A 2 6H A b a ` Y A A W PH T R 6 PH G 4 E C A@ 9 6 6 5 42 0 ( eBfed!)8c...
Texas Tech >> CS >> 5375 (Fall, 2008)
The Declining Effectiveness of Dynamic Caching for General-Purpose Microprocessors Douglas C. Burger, James R. Goodman, Alain Kgi Computer Sciences Department University of Wisconsin-Madison 1210 West Dayton Street, Madison, Wisconsin 53706 USA galil...
Texas Tech >> CS >> 5375 (Fall, 2008)
A version of this paper appears in the 23rd International Symposium on Computer Architecture, May, 1996. Reprinted by permission of ACM Memory Bandwidth Limitations of Future Microprocessors Doug Burger, James R. Goodman, and Alain Kgi Computer Sci...
Texas Tech >> CS >> 5375 (Fall, 2008)
Cache Coherence in Large-Scale Shared Memory Multiprocessors: Issues and Comparisons David J. Lilja Department of Electrical Engineering University of Minnesota 200 Union Street S.E. Minneapolis, MN 55455 Phone: (612) 625-5007 FAX: (612) 625-4583 E-m...
Texas Tech >> CS >> 5375 (Fall, 2008)
4su\"\' ! sBu4suY xBFkus\'Fkx$s9d \'s BBkYuY d\'dk sB4ksYs 9 ss% \'zF kBs u9@s 5s s k4kskR4 \'kTs s BdBYB us Y@us T sT T h\"! u u Yus skktB s$4 \"5 s@ sdBBApB 4ksG kBtY B! Gss9&4 s ss F Y...
Texas Tech >> CS >> 5375 (Fall, 2008)
Supporting Cache Coherence in Heterogeneous Multiprocessor Systems Taeweon Suh, Douglas M. Blough, and Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332 {suhtw, dblough, leehs}@ece.gatec...
Texas Tech >> CS >> 5375 (Fall, 2008)
IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 10, October 1993, pp. 1130-1146. Improving Memory Utilization in Cache Coherence Directories David J. Lilja Department of Electrical Engineering University of Minnesota 200 Union Str...
What are you waiting for?