l20_handouts_4up - Advanced Memory Topics Online Course...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Advanced Memory Topics Online Course Evals (DO IT, DO IT, DO IT): Wed 23 Apr -- Weds 7 May (FLEXGRADE!) http://www.engineering.cornell.edu/courseeval We listen to your feedback, so you can help next year's students Homework 6 available now, due Tuesday, April 15 (tax day), 10p.m. Project 4b due Thursday, April 17, 10p.m. Prelim2 April 22, 7:30, Uris Hall Auditorium Copyright Sally A. McKee 2006 The Unnecessary but Well Meant Slide (AGAIN) Plan your time carefully for the final project Work out any team conflicts or miscommunications ASAP Organize together with your team to make sure you can work on things TOGETHER and AT THE SAME TIME Otherwise, if you divide stuff up into portions, you're going to miss out on the learning of doing the other person's portion Think about the industrial perspective: remember that it's just as important to VALIDATE as to DESIGN, so switch roles amongst teammates Copyright Sally A. McKee 2006 Hennessy & Patterson Read 7.4 (it's long, alas) Read 8.1-8.7 for next week Our Canonical Vector Example vec_add(int x[16], int y[16], int s[16]) { int i; for (i = 0; i < 16; i++) { s[i] = x[i] + y[i]; s[i] x[i] y[i]; } for (i = 0; i < 16; i++) { s[i] = s[i] + 1; s[i] s[i] } } // simple, direct-mapped cache: direct// 128 byte cache, 16 byte (4 word) blocks/lines Copyright Sally A. McKee 2006 Copyright Sally A. McKee 2006 1 Array Addresses byte offset hex divisions word offset index tag Usually put separately so that tag array is smaller, enjoys faster tag lookup 0 1 2 3 4 5 6 7 Step by Step data 1 x[0] x[1] x[2] x[3] Address index = 4 Copyright Sally A. McKee 2006 Copyright Sally A. McKee 2006 Our Direct Mapped Cache tag 0 1 2 3 Step by Step data 4 y[0] y[1] y[2] y[3] Address index = 0 Addresses: x (tag = 1, index = 4) y (tag = 4, index = 0) s (tag = 5, index = 2) At end of 1st loop this is the cache state Copyright Sally A. McKee 2006 4 What if all vars had same index? 1 x[0] x[1] x[2] x[3] 5 6 7 Copyright Sally A. McKee 2006 2 Step by Step tag 0 1 2 3 4 5 6 7 Copyright Sally A. McKee 2006 Step by Step tag data data 4 5 1 y[0] s[0] x[0] y[1] s[1] x[1] y[2] s[2] x[2] y[3] s[3] x[3] Address index = 2 0 1 2 3 4 5 6 7 4 4 5 1 1 y[0] y[4] s[0] x[0] x[4] y[1] y[5] s[1] x[1] x[5] y[2] y[6] s[2] x[2] x[6] y[3] y[7] s[3] x[3] x[7] Address index = 1 Copyright Sally A. McKee 2006 Step by Step tag 0 1 2 3 4 5 6 7 Copyright Sally A. McKee 2006 Step by Step tag data data 4 5 1 1 y[0] s[0] x[0] x[4] y[1] s[1] x[1] x[5] y[2] s[2] x[2] x[6] y[3] s[3] x[3] x[7] 0 1 2 3 4 Address index = 5 5 6 7 4 4 5 5 1 1 y[0] y[4] s[0] s[4] x[0] x[4] y[1] y[5] s[1] s[5] x[1] x[5] y[2] y[6] s[2] s[6] x[2] x[6] y[3] y[7] s[3] s[7] x[3] x[7] next iteration starts overwriting data Copyright Sally A. McKee 2006 3 Same Example, Different Cache vec_add(int x[16], int y[16], int s[16]) { int i; for (i = 0; i < 16; i++) { s[i] = x[i] + y[i]; s[i] x[i] y[i]; } for (i = 0; i < 16; i++) { s[i] = s[i] + 1; s[i] s[i] } } The Way I Draw It Different from H&P Works for me -- pick what that works for you Index indicates SET, LRU indicates WAY SET, Word Offset indicates which word, Byte Offset, etc. word, Set 0 Set 1 Set 2 ... Breakdown of a block or line word and a word is 4 bytes for MIPS Way 0 block Way 1 or line ... Way N-1 // 4-way associative cache: // 128 byte cache, 16 byte (4 word) blocks/lines Copyright Sally A. McKee 2006 Copyright Sally A. McKee 2006 Associative Caches Keep this for exams Trust this Four-way associative means: FourFOUR WAYS (numbered 0-3) 0NUMBER OF SETS undetermined need to know block size (16B here), and total cache size (128B here) ... so in this case 128B/16B blocks/4 ways gives number of sets (or #entries per way) Calculating Cache Parameters You have to know block size You have to know associativity - if problem doesn't say how many ways: assume direct-mapped if cache is large directassume fully associative if cache is (very) small Two-way associative means: TwoTWO WAYS NUMBER OF SETS undetermined need to know block size (16B here), and total cache size (128B here) Total cache size in Bytes/block size (in Bytes) will Give you a place to start, then ... Divide by #ways to get #sets Copyright Sally A. McKee 2006 Direct-mapped means 1-WAY ASSOCIATIVE: DirectASSOCIATIVE: N is # sets, which here is same as # blocks in cache (jlike indexing an array) sets, Only one index possible for any given block of memory (based on address) Fully associative means single set, N ways, where N is # blocks set, ways, Copyright Sally A. McKee 2006 4 Addresses (Correct!) Address of x is 0x0C0 tag 000000000000000000000000110 index 0 00 00 word offset 0 00 00 S0 x[0] S1 byte offset tag 6 Four-Way Associative Cache tag tag 21 16 tag v v v v X 1110 0000 x[3] y[0] y[3] S0 S1 s[0] 3 LRU info 2 1 0 What happens here for the first three iterations? Address of y is 0x200 000000000000000000000010000 s[3] way0 way1 way2 way3 Address of s is 0x2a0 000000000000000000000010101 0 00 00 Anybody see a performance problem coming here? Copyright Sally A. McKee 2006 Addresses: x (tag = 6, index = 0) y (tag = 16, index = 0) s (tag = 21, index = 0) Request: load x[0], y[1]; store s[0] Copyright Sally A. McKee 2006 Same for iterations i=0 through i=3 What's the allocateon-write policy? Four-Way Associative Cache Addresses: x (tag = 6, index = 0) y (tag = 16, index = 0) s (tag = 21, index = 0) Request: load x[0] Request: load y[0] Request: store s[0] Four-Way Associative Cache Addresses: x (tag = 6, index = 1) y (tag = 16, index = 1) s (tag = 21, index = 1) Request: load x[4] We're changing sets now Request: load y[4] Request: store s[4] (allocate on write, so read in s[4]-s[7], then write) Copyright Sally A. McKee 2006 Copyright Sally A. McKee 2006 5 Four-Way Associative Cache tag 6 6 S0 x[0] S1 x[4] tag tag 21 16 21 16 x[3] x[7] Virtual Memory How do we fit several programs in memory at once? How does the assembler/compiler know what addresses are OK to use? Multiprogramming allows >1 program to run "at the same time" (we've been over this one) Who keeps track of what's where? tag v v v v 1110 1110 y[0] y[4] y[3] y[7] S0 S1 s[0] s[4] 3 3 LRU info 2 1 2 1 s[3] s[7] 0 0 way0 way1 way2 way3 Addresses: x (tag = 6, index = 1) y (tag = 16, index = 1) s (tag = 21, index = 1) Request: load x[4], y[4]; store s[4] Copyright Sally A. McKee 2006 Same for iterations i=4 through i=7 Copyright Sally A. McKee 2006 What Happens for x[8]? tag 6 6 S0 x[0] S1 x[4] tag tag 21 16 21 16 x[3] x[7] Virtual Memory tag v v v v 7 1111 1110 y[0] y[4] y[3] y[7] S0 S1 s[0] s[4] 4 3 LRU info 3 2 2 1 s[3] s[7] 1 0 x[8] x[11] way0 way1 way2 way3 x, y, and s would have been in one of these places, so would have had different addrs Why? Addresses: x (tag = 7, index = 0) y (tag = 17, index = 0) 17, s (tag = 22, index = 0) 22, Request: load x[8], y[8]; store s[8] y[8]; Copyright Sally A. McKee 2006 After load x[8], now it gets interesting: Where does y[8] go? Where does x[12] go? 0x00001000 Copyright Sally A. McKee 2006 6 Pages Divide memory into chunks For caches, lines or blocks For main memory, pages Virtual to Physical Mapping virtual address 0x00001000 physical address 0x0000C000 00000000000000000001 00000000 00 00 page number translation page offset Assume virtual == physical page size 4KByte pages for now 0x00000000 is invalid (remember why?) 0x00001000 0x00002000, . . . Copyright Sally A. McKee 2006 00000000000000001101 00000000 00 00 page number page offset page offset doesn't change! Copyright Sally A. McKee 2006 Virtual to Physical Mapping 0x40000000 Page Table virtual page number page offset V D R X K page table holds: physical page numbers bookkeeping info valid dirty read-only/writeable executable OS kernel access only 0x00000000 0x00000000 virtual memory Copyright Sally A. McKee 2006 physical memory (DRAM) physical page number page offset Copyright Sally A. McKee 2006 7 Page Table What if the page table entry is invalid? Page misses (Fig 7.22 greatly oversimplifies) Code Static variables Stack Heap Page table? Translation Lookaside Buffer Why "TLB"? Cache for PTEs (Page Table Entries) Looks a lot like a little page table Usually fully associative May have two levels (like regular caches) Virtual page number tag physical page v dr x k Swapping (need replacement policy, probably LRU) Copyright Sally A. McKee 2006 ... Copyright Sally A. McKee 2006 Page Table Each process's page table is in memory But memory is SLOW! What can we do? TLBs Size: 16-512 entries 16Tag is virtual page number Need to track LRU information (this is the ref bit in the book) TLB lookup takes place in parallel with cache access Typically high hit rates (miss < 1% of time) L1 cache indexed by virtual addresses L2 can be virtual or physical (if TLB hit, we'll have physical address in time for lookup) Copyright Sally A. McKee 2006 Copyright Sally A. McKee 2006 8 TLBs It's just cache for our current working set of Page Table Entries Replacement can be hardware or software Simple TLB hardware replacement easy Can have other associativities, multiple levels When entry evicted, bookkeeping bits must be updated in page table Copyright Sally A. McKee 2006 TLB What if entry is invalid? Not present? TLB misses If present, get PTE from memory If not present, OS handles page fault Can be very expensive! Many memory accesses Need to transfer control to OS Possibly disk accesses or network accesses Copyright Sally A. McKee 2006 9 ...
View Full Document

Ask a homework question - tutors are online