09LecSp12Componentsx6

09LecSp12Componentsx6 - 2/13/12 New ­School Machine...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 2/13/12 New ­School Machine Structures (It’s a bit more complicated!) So/ware Hardware •  Parallel Requests CS 61C: Great Ideas in Computer Architecture Compilers, Components Assigned to computer e.g., Search “Katz” •  Parallel Threads Parallelism & Assigned to core e.g., Lookup, Ads Compiler Assembly Language Program (e.g., MIPS) lw lw sw sw Assembler Machine Language Program (MIPS) $t0, 0($2) $t1, 4($2) $t1, 0($2) $t0, 4($2) 0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 Core Func[onal Unit(s) A0+B0 A1+B1 A2+B2 A3+B3 •  Hardware descrip[ons Cache Memory Today’s •  Programming Languages Lecture 2/13/12 Spring 2012  ­ ­ Lecture #9 Logic Gates 2 Review Today’s Lecture temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program (e.g., C) Input/Output Instruc[on Unit(s) >1 data item @ one [me e.g., Add of 4 pairs of words 1 Core Memory (Cache) •  Parallel Data Spring 2012  ­ ­ Lecture #9 Today’s Lecture Computer … Core >1 instruc[on @ one [me e.g., 5 pipelined instruc[ons All gates @ one [me Levels of Representa[on/ Interpreta[on Achieve High Performance •  Parallel Instruc[ons Instructor: David A. Pa?erson h?p://inst.eecs.Berkeley.edu/~cs61c/sp12 2/13/12 Harness Smart Phone Warehouse Scale Computer •  Everything is a (binary) number in a computer Anything can be represented as a number, i.e., data or instruc[ons 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111 ! Machine Interpreta4on Hardware Architecture DescripCon (e.g., block diagrams) Architecture Implementa4on –  Instruc[ons and data; stored program concept •  Assemblers can enhance machine instruc[on set to help assembly ­language programmer •  Translate from text that easy for programmers to understand into code that machine executes efficiently: Compilers, Assemblers •  Linkers allow separate transla[on of modules •  Interpreters for debugging, but slow execu[on •  Hybrid (Java): Compiler + Interpreter to try to get best of both •  Compiler Op[miza[on to relieve programmer Logic Circuit DescripCon 2/13/12 (Circuit SchemaCc Diagrams) Spring 2012  ­ ­ Lecture #9 3 2/13/12 •  Compilers, Op[miza[on, Interpreters, Just ­In ­ Time Compiler •  Administrivia •  Dynamic Linking •  Technology Trends Revisited •  Technology Break •  Components of a Computer Spring 2012  ­ ­ Lecture #9 4 What is Typical Benefit of Compiler Op[miza[on? Agenda 2/13/12 Spring 2012  ­ ­ Lecture #9 •  What is a typical program? •  For now, try a toy program: BubbleSort.c 5 2/13/12 #define ARRAY_SIZE 20000 int main() { int iarray[ARRAY_SIZE], x, y, holder; for(x = 0; x < ARRAY_SIZE; x++) for(y = 0; y < ARRAY_SIZE ­1; y++) if(iarray[y] > iarray[y+1]) { holder = iarray[y+1]; iarray[y+1] = iarray[y]; iarray[y] = holder; } } Spring 2012  ­ ­ Lecture #9 6 1 2/13/12 Unop[mized MIPS Code $L3: addu $2,$3,$2 lw $2,80016($sp) lw $4,80020($sp) slt $3,$2,20000 addu $3,$4,1 bne $3,$0,$L6 move $4,$3 j $L4 sll $3,$4,2 $L6: addu $4,$sp,16 .set noreorder addu $3,$4,$3 nop lw $2,0($2) .set reorder lw $3,0($3) sw $0,80020($sp) slt $2,$3,$2 $L7: beq $2,$0,$L9 lw $2,80020($sp) lw $3,80020($sp) slt $3,$2,19999 addu $2,$3,1 bne $3,$0,$L10 move $3,$2 j $L5 sll $2,$3,2 $L10: addu $3,$sp,16 lw $2,80020($sp) addu $2,$3,$2 move $3,$2 lw $3,0($2) sll $2,$3,2 addu $3,$sp,16 2/13/12 lw $3,80020($sp) addu $2,$3,1 move $3,$2 sll $2,$3,2 addu $3,$sp,16 addu $2,$3,$2 lw $3,80020($sp) move $4,$3 sll $3,$4,2 addu $4,$sp,16 addu $3,$4,$3 lw $4,0($3) sw $4,0($2) lw $2,80020($sp) move $3,$2 sll $2,$3,2 addu $3,$sp,16 addu $2,$3,$2 sw $3,80024($sp lw $3,80024($sp)  ­O2 op[mized MIPS Code $L11: $L9: lw $2,80020($sp) addu $3,$2,1 sw $3,80020($sp) j $L7 $L8: $L5: lw $2,80016($sp) addu $3,$2,1 sw $3,80016($sp) j $L3 li $13,65536 s lt $2,$4,$3 ori $13,$13,0x3890 beq $2,$0,$L9 addu $13,$13,$sp sw $3,0($5) sw $28,0($13) sw $4,0($6) move $4,$0 $L9: addu $8,$sp,16 move $3,$7 $L4: $L2: li $12,65536 ori $12,$12,0x38b0 addu $13,$12,$sp addu $sp,$sp,$12 addu $6,$8,$2 addu $7,$3,1 sll $2,$7,2 addu $5,$8,$2 lw $3,0($6) lw $4,0($5) $L6: move $3,$0 addu $9,$4,1 .p2align 3 $L10: sll $2,$3,2 7 2/13/12 Compiler vs. Interpreter Advantages 2/13/12 Compilation: •  Harder to debug program •  Takes longer to change source code, recompile, and relink Interpreter: •  Easier to debug program •  Faster development [me Spring 2012  ­ ­ Lecture #9 9 8 2/13/12 Interpreter: •  Slower execu[on [mes •  No op[miza[on •  Need all of source code available •  Source code larger than executable for large systems •  Interpreter must remain installed while the program is interpreted Spring 2012  ­ ­ Lecture #9 10 Why Bytecodes? •  Pla|orm ­independent •  Load from the Internet faster than source code •  Interpreter is faster and smaller than it would be for Java source •  Source code is not revealed to end users •  Interpreter performs addi[onal security checks, screens out malicious code •  A Java compiler converts Java source code into instruc[ons for the Java Virtual Machine (JVM) •  These instruc[ons, called bytecodes, are same for any computer / OS •  A CPU ­specific Java interpreter interprets bytecodes on a par[cular computer Spring 2012  ­ ­ Lecture #9 Spring 2012  ­ ­ Lecture #9 Compiler vs. Interpreter Disadvantages Java’s Hybrid Approach: Compiler + Interpreter 2/13/12 ori $12,$12,0x38a0 addu $13,$12,$sp addu $sp,$sp,$12 j $31 . Gcc compiler output Bubble sort unop[mized: 66 MIPS instruc[ons  ­O2 op[mized: 30 MIPS instruc[ons (“sta[c” comparison => size of MIPS program vs. “dynamic” comparison => number of MIPS instruc[ons executed to bubble sort some data set) j $31 sw $3,0($2) Spring 2012  ­ ­ Lecture #9 Compilation: •  Faster Execu[on •  Single file to execute •  Compiler can do be?er diagnosis of syntax and seman[c errors, since it has more info than an interpreter (Interpreter only sees one line at a [me) •  Can find syntax errors before run program •  Compiler can op[mize code slt $2,$3,19999 bne $2,$0,$L10 move $4,$9 slt $2,$4,20000 bne $2,$0,$L6 li $12,65536 2 ­11 2/13/12 Spring 2012  ­ ­ Lecture #9 2 ­12 2 2/13/12 Java Bytecodes (Stack) vs. MIPS (Reg.) JVM uses Stack vs. Registers a = b + c; => iload b ; push b onto Top Of Stack (TOS) iload c ; push c onto Top Of Stack (TOS) iadd ; Next to top Of Stack (NOS) = ; Top Of Stack (TOS) + NOS istore a ; store TOS into a and pop stack 2/13/12 Spring 2012  ­ ­ Lecture #9 13 2/13/12 Spring 2012  ­ ­ Lecture #9 Star[ng Java Applica[ons Dynamic Linking •  Only link/load library procedure a/er it is called Simple portable instruc[on set for the JVM Compiles bytecodes of “hot” methods into na[ve code for host machine 2/13/12 Spring 2012  ­ ­ Lecture #9 Interprets bytecodes Just In Time (JIT) compiler translates bytecode into machine language just before execu[on –  Avoids image bloat caused by sta[c linking of all (transi[vely) referenced libraries –  Automa[cally picks up new library versions –  Requires procedure code to be relocatable •  Dynamic linking is default on UNIX and Windows Systems 15 2/13/12 •  1st [me pay extra overhead of DLL (Dynamically Linked Library), subsequent [mes almost no cost •  Compiler sets up code and data structures to find desired library first [me •  Linker fixes up address at run[me so fast call subsequent [mes •  Note that return from library is fast every [me Spring 2012  ­ ­ Lecture #9 Spring 2012  ­ ­ Lecture #9 16 Dynamic Linkage Dynamic Linking Idea 2/13/12 14 17 Call to DLL Library Indirec[on table that ini[ally points to stub code Stub: Loads rou[ne ID so can find desired library, Jump to linker/loader Indirec[on table now points to DLL Linker/loader code finds desired library and edits jump address in indirec[on table, jumps to desired rou[ne Dynamically mapped code executes and returns 2/13/12 Spring 2012  ­ ­ Lecture #9 18 3 2/13/12 Administrivia CSUA Github Help Session •  Labs 5 and 6 posted, Project 2 posted •  Homework, Proj 2 ­Part 1 Due Sunday @ 11:59:59 •  Midterm is now on the horizon: –  No discussion during exam week –  TA Review: Su, Mar 4, star[ng 2 PM, 2050 VLSB –  Exam: Tu, Mar 6, 6:40 ­9:40 PM, 2050 VLSB (room change) –  Small number of special considera[on cases, due to class conflicts, etc.—contact me 2/13/12 Spring 2012  ­ ­ Lecture #9 20 •  Wednesday 2/15, 6 ­8pm, 380 Soda. •  Learn about source control, git, se•ng up your Github account, and using GitHub for your CSUA Hackathon submission. •  Bring laptops. •  The presenta[on will be from 6:10 ­7. Individual troubleshoo[ng help will be from 7 ­8. •  This helpsession will be especially useful for those a?ending CSUA's Hackathon @436 on Friday. h?p://[nyurl.com/csuaHackathon 2/13/12 Projects •  “Erasing the Boundaries,” NY Times, 2/12/12 •  The new strategy is to build a device, sell it to consumers and then sell them the content to play on it. … Google is preparing its first Google ­ branded home entertainment device — a system for streaming music in the house —…fits solidly into an industry wide goal in which each tech company would like to be all things to all people all day long. –  Add ~ 200 (repe[[ve) lines of C code to framework –  Lots of Cut & Past –  Appendix B describes all MIPS instruc[ons in detail –  Make your own unit test! Spring 2012  ­ ­ Lecture #9 22 2/13/12 Technology Cost over Time: What does Improving Technology Look Like? Student Roule?e? Cost $ 21 61C in the News •  Project 2: MIPS ISA simulator in C 2/13/12 Spring 2012  ­ ­ Lecture #9 A •  Their job boards,…are brimming with posi[ons for people with degrees in electrical engineering and hardware design. •  On Amazon’s Web site, for example, the boards have dozens of lis[ngs for jobs with [tles you might expect at a hardware company. Among them: Senior Hardware Engineering Manager, Director, Hardware Pla|orms and Systems, and Hardware EE Reliability Engineer. (EE is short for electrical engineer.) Spring 2012  ­ ­ Lecture #9 23 Technology Cost over Time Successive Genera[ons Cost $ D How Can Tech Gen 2 Replace Tech Gen 1? Technology Genera[on 2 Technology Genera[on 1 B Technology Genera[on 2 C Technology Genera[on 3 Time 2/13/12 Spring 2012  ­ ­ Lecture #9 24 Time 2/13/12 Spring 2012  ­ ­ Lecture #9 26 4 2/13/12 Predicts: 2X Transistors / chip every 2 years Moore’s Law 2/13/12 “Integrated circuits will lead to such wonders as home computers ­ ­or at least terminals connected to a central computer ­ ­automa[c controls for automobiles, and personal portable communica[ons equipment. The electronic wristwatch needs only a display to be feasible today.” Spring 2012  ­ ­ Lecture #9 27 # of transistors on an integrated circuit (IC) “The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. …That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000.” (from 50 in 1965) Moore’s Law Gordon Moore, “Cramming more components onto integrated circuits,” Electronics, Volume 38, Number 8, April 19, 1965 Gordon Moore Intel Cofounder B.S. Cal 1950! 2/13/12 Memory Chip Size 4x in 3 years Year 28 End of Moore’s Law? 2x in 3 years Spring 2012  ­ ­ Lecture #9 •  It’s also a law of investment in equipment as well as increasing volume of integrated circuits that need more transistors per chip •  Exponen[al growth cannot last forever •  More transistors/chip will end during your careers –  2020? 2025? –  (When) will something replace it? Growth in memory capacity slowing 2/13/12 Spring 2012  ­ ­ Lecture #9 29 2/13/12 Technology Trends: Uniprocessor Performance (SPECint) Spring 2012  ­ ­ Lecture #9 30 Limits to Performance: Faster Means More Power Improvements in processor performance have slowed Why? P = CV2f 2/13/12 Spring 2012  ­ ­ Lecture #9 31 2/13/12 Spring 2012  ­ ­ Lecture #9 32 5 2/13/12 Doing Nothing Well—NOT! P = C V2 f •  Tradi[onal processors consume about two thirds as much power at idle (doing nothing) as they do at peak •  Higher performance (server class) processors approaching 300 W at peak •  Implica[ons for ba?ery life? •  Power is propor[onal to Capacitance * Voltage2 * Frequency of switching •  What is the effect on power consump[on of: –  “Simpler” implementa[on (fewer transistors)? –  Smaller implementa[on (shrunk down design)? –  Reduced voltage? –  Increased clock frequency? 2/13/12 Spring 2012  ­ ­ Lecture #9 33 2/13/12 Computer Technology: Growing, But More Slowly Spring 2012  ­ ­ Lecture #9 34 Internet Connec[on Bandwidth Over Time 50% annualized growth rate per year •  Processor –  Speed 2x / 1.5 years (since ’85 ­’05) [slowing!] –  Now +2 cores / 2 years –  When you graduate: 3 ­4 GHz, 6 ­8 Cores in client, 10 ­14 in server •  Memory (DRAM) –  Capacity: 2x / 2 years (since ’96) [slowing!] –  Now 2X/3 ­4 years –  When you graduate: 8 ­16 GigaBytes •  Disk •  2/13/12 –  Capacity: 2x / 1 year (since ’97) –  250X size last decade –  When you graduate: 6 ­12 TeraBytes Network –  Core: 2x every 2 years –  Access: 100 ­1000 mbps from home, 1 ­10 mbps cellular Spring 2012  ­ ­ Lecture #9 35 2/13/12 Internet Connec[on Bandwidth Over Time 2/13/12 Spring 2012  ­ ­ Lecture #9 Spring 2012  ­ ­ Lecture #9 36 Internet Connec[on Bandwidth Over Time 37 2/13/12 Spring 2012  ­ ­ Lecture #9 38 6 2/13/12 Reality Check: Typical MIPS Chip Die Photograph Five Components of a Computer •  •  •  •  •  2/13/12 Protec[on ­ oriented Virtual Memory Support Control Datapath Memory Input Output Spring 2012  ­ ­ Lecture #9 Performance Enhancing On ­Chip Memory (iCache + dCache) Floa[ng Pt Control and Datapath 40 2/13/12 Integer Control and Datapath Spring 2012  ­ ­ Lecture #9 41 Example MIPS Block Diagram Computer Eras: Mainframe 1950s ­60s Processor (CPU) Memory I/O “Big Iron”: IBM, UNIVAC, … build $1M computers for businesses => COBOL, Fortran, [mesharing OS 2/13/12 Spring 2012  ­ ­ Lecture #9 42 2/13/12 A MIPS Family (Toshiba) Spring 2012  ­ ­ Lecture #9 43 The Processor •  Processor (CPU): the ac[ve part of the computer, which does all the work (data manipula[on and decision ­making) –  Datapath: por[on of the processor which contains hardware necessary to perform opera[ons required by the processor (“the brawn”) –  Control: por[on of the processor (also in hardware) which tells the datapath what needs to be done (“the brain”) 2/13/12 Spring 2012  ­ ­ Lecture #9 44 2/13/12 Spring 2012  ­ ­ Lecture #9 45 7 2/13/12 Stages of the Datapath : Overview •  Problem: a single, atomic block which “executes an instruc[on” (performs all necessary opera[ons beginning with fetching the instruc[on) would be too bulky and inefficient •  Solu[on: break up the process of “execu[ng an instruc[on” into stages or phases, and then connect the phases to create the whole datapath –  Smaller phases are easier to design –  Easy to op[mize (change) one phase without touching the others 2/13/12 Spring 2012  ­ ­ Lecture #9 46 Instruc[on Level Parallelism P 1 P 2 P 3 Instr 1 IF ID ALU MEM WR Instr 2 IF ID IF Instr 3 Instr 4 P 6 P 7 P 8 ALU MEM WR IF ID ALU MEM WR P 4 ID IF Instr 5 P 5 ID ALU MEM WR ID IF Instr 7 ALU MEM WR ID IF Instr 8 ALU MEM WR ID IF 2/13/12 Project 2 Warning P 10 P 11 P 12 ALU MEM WR IF Instr 6 P 9 ALU MEM WR ID ALU MEM WR Spring 2012  ­ ­ Lecture #9 47 Phases of the Datapath (1/5) •  There is a wide variety of MIPS instruc[ons: so what general steps do they have in common? •  Phase 1: InstrucOon Fetch •  You are going to write a simulator in C for MIPS, implemen[ng these 5 phases of execu[on –  No ma?er what the instruc[on, the 32 ­bit instruc[on word must first be fetched from memory (the cache ­memory hierarchy) –  Also, this is where we Increment PC (that is, PC = PC + 4, to point to the next instruc[on: byte addressing so + 4) •  Simulator: Instruc[on = Memory[PC]; PC+=4; 2/13/12 Spring 2012  ­ ­ Lecture #9 48 2/13/12 Phases of the Datapath (2/5) –  Upon fetching the instruc[on, we next gather data from the fields (decode all necessary instruc[on data) –  First, read the opcode to determine instruc[on type and field lengths –  Second, read in data from all necessary registers •  For add, read two registers •  For addi, read one register •  For jal, no reads necessary Spring 2012  ­ ­ Lecture #9 49 Simulator for Decode Phase •  Phase 2: InstrucOon Decode 2/13/12 Spring 2012  ­ ­ Lecture #9 Register1 = Register[rsfield]; Register2 = Register[r|ield]; if (opcode == 0) … else if (opcode >5 && opcode <10) … else if (opcode …) … else if (opcode …) … •  Be?er C statement for chained if statements? Student Roule?e? 50 2/13/12 Spring 2012  ­ ­ Lecture #9 51 8 2/13/12 Phases of the Datapath (3/5) Phases of the Datapath (4/5) •  Phase 3: ALU (Arithme[c ­Logic Unit) •  Phase 4: Memory Access –  Real work of most instruc[ons is done here: arithme[c (+,  ­, *, /), shi‡ing, logic (&, |), comparisons (slt) –  What about loads and stores? •  lw $t0, 40($t1) •  Address we are accessing in memory = the value in $t1 PLUS the value 40 •  So we do this addi[on in this stage •  Simulator: Result = Register1 op Register2; Address = Register1 + Addressfield 2/13/12 Spring 2012  ­ ­ Lecture #9 52 –  Actually only the load and store instruc[ons do anything during this phase; the others remain idle during this phase or skip it all together –  Since these instruc[ons have a unique step, we need this extra phase to account for them –  (As a result of the cache system, this phase is expected to be fast: talk about next week) •  Simulator: Register[r|ield] = Memory[Address] or Memory[Address] = Register[r|ield] 2/13/12 Phases of the Datapath (5/5) Spring 2012  ­ ­ Lecture #9 53 Laptop Innards •  Phase 5: Register Write –  Most instruc[ons write the result of some computa[on into a register –  E.g.,: arithme[c, logical, shi‡s, loads, slt –  What about stores, branches, jumps? •  Don’t write anything into a register at the end •  These remain idle during this fi‡h phase or skip it all together •  Simulator: Register[rdfield] = Result 2/13/12 Spring 2012  ­ ­ Lecture #9 54 2/13/12 Server Internals Spring 2012  ­ ­ Lecture #9 55 Server Internals Google Server 2/13/12 Spring 2012  ­ ­ Lecture #9 56 2/13/12 Spring 2012  ­ ­ Lecture #9 57 9 2/13/12 The ARM Inside the iPhone ARM Architecture •  h?p://en.wikipedia.org/wiki/ ARM_architecture 2/13/12 Spring 2012  ­ ­ Lecture #9 iPhone Innards Processor 58 2/13/12 Spring 2012  ­ ­ Lecture #9 59 Review I/O •  Key Technology Trends and Limita[ons –  Transistor doubling BUT power constraints and latency considera[ons limit performance improvement –  (Single Processor) computers are about as fast as they are likely to get, exploit parallelism to go faster •  Five Components of a Computer 1 GHz ARM Cortex A8 I/O Memory You will about mul[ple processors, data level parallelism, caches in 61C 2/13/12 Spring 2012  ­ ­ Lecture #9 I/O 60 –  Processor/Control + Datapath –  Memory –  Input/Output: Human interface/KB + Mouse, Display, Storage … evolving to speech, audio, video •  Architectural Family: One Instruc[on Set, Many Implementa[ons 2/13/12 Spring 2012  ­ ­ Lecture #9 61 10 ...
View Full Document

Ask a homework question - tutors are online