Performance and the Design process

Computer Organization and Design: The Hardware/Software Interface

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Review: Combinational Elements+DeMorgan Equivalence Wire In Out Inverter In Out In Out 0 1 1 0 0 1 0 1 CS152 Computer Architecture and Engineering Lecture 4 Performance and The Design Process Out = In NAND Gate Out = In NOR Gate A A B Out 0 0 1 1 A A B Out 0 0 1 1 B Out 0 1 0 1 1 1 1 0 B Out 0 1 0 1 1 0 0 0 February 4, 2004 John Kubiatowicz (www.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ A B Out = A • B = A + B A B 0 0 0 1 1 0 1 1 A 1 1 0 0 DeMorgan’s Theorem B Out 1 0 1 0 1 1 1 0 A B Out = A + B = A • B A B 0 0 0 1 1 0 1 1 A 1 1 0 0 B Out 1 0 1 0 1 0 0 0 Out Out 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.2 Review: recall General C/L Cell Delay Model A B . . . Combinational Logic Cell Vout Cout X X X Internal Delay Ccritical Cout X Delay Va -> Vout X X Storage Element’s Timing Model Clk D Q Setup D Don’t Care Hold Don’t Care Clock-to-Q Q Unknown X delay per unit load ° Setup Time: Input must be stable BEFORE trigger clock edge ° Hold Time: Input must REMAIN stable after trigger clock edge ° Clock-to-Q time: • Output cannot change instantaneously at the trigger clock edge • Similar to delay in logic gates, two components: - Internal Clock-to-Q - Load dependent Clock-to-Q ° Combinational Cell (symbol) is fully specified by: • functional (input -> output) behavior - truth-table, logic equation, VHDL • load factor of each input • critical propagation delay from each input to each output for each transition - THL(A, o) = Fixed Internal Delay + Load-dependent-delay x load ° Linear model composes 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.3 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.4 Critical Path & Cycle Time Clk The Design Process "To Design Is To Represent" Design activity yields description/representation of an object . . . . . . . . . . . . -- Traditional craftsman does not distinguish between the conceptualization and the artifact -- Separation comes about because of complexity -- The concept is captured in one or more representation languages ° Critical path: the slowest path between any two storage devices ° Cycle time is a function of the critical path ° must be greater than: Clock-to-Q + Longest Path through Combination Logic + Setup -- This process IS design Design Begins With Requirements -- Functional Capabilities: what it will do -- Performance Characteristics: Speed, Power, Area, Cost, . . . 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.5 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.6 Design Process (cont.) Design Finishes As Assembly -- Design understood in terms of components and how they have been assembled -- Top Down decomposition of complex functions (behaviors) into more primitive functions Datapath ALU Regs Nand Gate Shifter CPU Control Design Refinement Informal System Requirement Initial Specification Intermediate Specification refinement increasing level of detail Final Architectural Description -- bottom-up composition of primitive building blocks into more complex assemblies Intermediate Specification of Implementation Design is a "creative process," not a simple method Final Internal Specification Physical Implementation 2/4/04 ©UCB Spring 2004 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.7 CS152 / Kubiatowicz Lec4.8 Design as Search Problem A Strategy 1 Strategy 2 Measurement and Evaluation Architecture is an iterative process -- searching the space of possible designs -- at all levels of computer systems Design Analysis SubProb 1 SubProb2 SubProb3 BB1 BB2 BB3 BBn Creativity Cost / Performance Analysis Design involves educated guesses and verification -- Given the goals, how should these be prioritized? -- Given alternative design pieces, which should be selected? -- Given design space of components & assemblies, which part will yield the best solution? Good Ideas Feasible (good) choices vs. Optimal choices 2/4/04 ©UCB Spring 2004 Bad Ideas 2/4/04 Mediocre Ideas CS152 / Kubiatowicz Lec4.10 CS152 / Kubiatowicz Lec4.9 ©UCB Spring 2004 Performance: Two notions of “performance” Plane DC to Paris Speed Passengers Throughput (pmph) 286,700 What is Time? ° Straightforward definition of time: • Total time to complete a task, including disk accesses, memory accesses, I/O activities, operating system overhead, ... • “real time”, “response time” or “elapsed time” Boeing 747 6.5 hours 610 mph 470 BAD/Sud Concorde 3 hours 1350 mph 132 178,200 ° Alternative: just time processor (CPU) is working only on your program (since multiple processes running at same time) • “CPU execution time” or “CPU time ” • Often divided into system CPU time (in OS) and user CPU time(in user program) Which has higher performance? ° Time to do the task (Execution Time) – execution time, response time, latency ° Tasks per day, hour, week, sec, ns. .. (Performance) – throughput, bandwidth Response time and throughput often are in opposition 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.11 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.12 How to Measure Time? ° User Time ⇒ seconds ° CPU Time: Computers constructed using a clock that runs at constant rate • These discrete time intervals called clock cycles (or informally clocks or cycles) • Length of clock period: clock cycle time (e.g., 250 picoseconds or 250 ps) and clock rate (e.g., 4 gigahertz, or 4 GHz), which is the inverse of the clock period; use these! Measuring Time using Clock Cycles ° CPU execution time for program = Clock Cycles for a program x Clock Cycle Time ° One way to define clock cycles: Clock Cycles for program = Instructions for a program (called “Instruction Count”) x Average Clock cycles Per Instruction (called “CPI”) 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.13 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.14 Performance Calculation ° CPU execution time for program = Clock Cycles for program x Clock Cycle Time ° Substituting for clock cycles: CPU execution time for program = (Instruction Count x CPI) x Clock Cycle Time = Instruction Count x CPI x Clock Cycle Time CPU time = Instructions x Cycles x Seconds Program Instruction Cycle CPU time = Instructions x Cycles x Seconds Program Instruction Cycle How Calculate the 3 Components? ° Clock Cycle Time: in specification of computer (Clock Rate in advertisements) ° Instruction Count: • Count instructions in loop of small program • Use simulator to count instructions • Hardware counter in spec. register (most CPUs) ° CPI: • Calculate: Execution Time / Clock cycle time Instruction Count • Hardware counter in special register (most CPUs) CPU time = Seconds Program 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.15 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.16 Example: Calculating CPI by averaging separately What Programs Measure for Comparison? ° Ideally run typical programs with typical input before purchase, or before even build machine • Called a “workload”; For example: • Engineer uses compiler, spreadsheet • Author uses word processor, drawing program, compression software Op ALU Load Store Branch CPIi Prod (% Time) 1 .5 (33%) 2 .4 (27%) 2 .2 (13%) 2 .4 (27%) 1.5 Instruction Mix (Where time spent) Freqi 50% 20% 10% 20% ° In some situations its hard to do • Don’t have access to machine to “benchmark” before purchase • Don’t know workload in future ° Standardized benchmarks • Obviously, apparent speed of processor depends on code used to test it • Need industry standards so that different processors can be fairly compared • Companies exist that create these benchmarks: “typical” code used to evaluate systems • Need to be changed every 2 or 3 years since designers could target these standard benchmarks • What if Branch instructions twice as fast? 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.17 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.18 Amdahl's Law: The Law of Diminishing Returns ° Speedup due to enhancement E: Speedup(E) ExTime w/o E = ------------ExTime w/ E = Performance w/ E ------------------Performance w/o E Administrivia ° Prereq Quizes not graded yet • Hopefully very soon ° Lab 1 due tonight by midnight. • • • • You will be graded in your lab report on methodology: Not just the fact that you found the bugs! How did you approach the debugging process? Why do you think that you have things completely covered? ° Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected ° Lab #2/Homework #2 starting today (or tomorrow) ° Form 4 or 5 person teams by Thursday 2/12 Who have full teams? Needs teammates? ° Then Maximum benefit: Speedupmaximum = 2/4/04 1 1 – Fraction timeaffected CS152 / Kubiatowicz Lec4.19 ° Office hours in Lab • Mon 5 – 6:30 Jack, Tue 3:30-5 Kurt, Wed 3 – 4:30 John ° Kubi’s office hours Monday 3:30 – 5 2/4/04 ©UCB Spring 2004 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.20 Computers in the Real World ° Problem: Prof. Dawson monitors redwoods by climbing trees, stringing miles of wire, placing printer sized data logger in tree, collect data by climbing trees (300’ high) http://www.berkeley.edu/news/media/releases/ 2003/07/28_redwood.shtml Problem: Design a “fast” ALU for the MIPS ISA ° Requirements? ° Must support the Arithmetic / Logic operations ° Tradeoffs of cost and speed based on frequency of occurrence, hardware budget ° We will optimize time to do ALU ops for today Solution: CS Prof. Culler proposes wireless “micromotes” in trees. Automatically network together (without wire). Size of film canister, lasts for months on C battery, much less expensive. Read data by walking to base of tree with wireless laptop. “Will revolutionize environmental monitoring” 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.21 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.22 MIPS ALU requirements ° Add, AddU, Sub, SubU, AddI, AddIU • => 2’s complement adder/sub with overflow detection MIPS arithmetic instruction format 31 R-type: I-Type: Type ADDI 25 Rs Rs funct xx xx xx xx xx xx xx xx 20 Rt Rt Type ADD 15 Rd 5 funct Immed 16 0 op op op 10 ° And, Or, AndI, OrI, Xor, Xori, Nor • => Logical AND, logical OR, XOR, nor op 00 funct 40 41 42 43 44 45 46 47 Type op 00 00 funct 50 51 52 53 ° SLTI, SLTIU (set less than) • => 2’s complement adder with inverter, check sign bit of result ADDIU 11 SLTI SLTIU ANDI ORI XORI LUI 12 13 14 15 16 17 ADDU 00 SUB 00 ° ALU from from CS 150 / P&H book chapter 4 supports these ops SLT 00 SUBU 00 AND OR XOR NOR 00 00 00 00 SLTU 00 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.23 ° Signed arithmetic generate overflow, no carry 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.24 Design Trick: divide & conquer ° Trick #1: Break the problem into simpler problems, solve them and glue together the solution ° Example: assume the immediates have been taken care of before the ALU • 10 operations (4 bits) 00 01 02 03 04 05 06 07 12 13 2/4/04 ©UCB Spring 2004 Refined Requirements (1) Functional Specification inputs: 2 x 32-bit operands A, B, 4-bit mode outputs: 32-bit result S, 1-bit carry, 1 bit overflow operations: add, addu, sub, subu, and, or, xor, nor, slt, sltU (2) Block Diagram (schematic symbol, Verilog description) 32 A c ovf B 32 4 m add addU sub subU and or xor nor slt sltU CS152 / Kubiatowicz Lec4.25 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.26 ALU S 32 Behavioral Representation: verilog module ALU(A, B, m, S, c, ovf); input [0:31] A, B; input [0:3] m; output [0:31] S; output c, ovf; reg [0:31] S; reg c, ovf; always @(A, B, m) begin case (m) 0: S = A + B; . . . end endmodule Design Decisions ALU bit slice 7-to-2 C/L PLD Gates 7 3-to-2 C/L CL0 CL6 mux ° Simple bit-slice • big combinational problem • many little combinational problems • partition into 2-step problem ° Bit slice with carry look-ahead °... 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.27 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.28 Refined Diagram: bit-slice ALU A 32 B 32 7-to-2 Combinational Logic ° start turning the crank . . . Function Inputs M0 M1 M2 M3 A B Cin 4 M 0 add 0 0 0 0 0 0 0 Outputs S Cout 0 0 K-Map b31 ALU31 m co s31 cin a31 b0 ALU0 m co cin s0 a0 Ovflw 32 S 127 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.29 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.30 Seven plus a MUX ? ° Design trick 2: take pieces you know (or can imagine) and try to put them together ° Design trick 3: solve part of the problem and extend CarryIn A Additional operations ° A - B = A + (– B) = A + B + 1 • form two complement by invert and add one S-select CarryIn invert and A S-select and A Full Adder (3->2 element) B C Co O or Result B 1-bit Full Adder add 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 1 0 0 1 or Result Mux Mux B 1-bit Full Adder add CarryOut CarryOut 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.31 2/4/04 Set-less-than? – left as an exercise ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.32 Revised Diagram ° LSB and MSB need to do a little extra A 32 B 32 Overflow Decimal 0 1 2 3 4 5 6 Binary 0000 0001 0010 0011 0100 0101 0110 0111 Decimal 0 -1 -2 -3 -4 -5 -6 -7 -8 2’s Complement 0000 1111 1110 1101 1100 1011 1010 1001 1000 a31 b31 a0 b0 4 M C/L to produce select, comp, c-in 7 ? ALU0 co cin s31 ALU0 co cin s0 ° Examples: 7 + 3 = 10 but ... ° 0 -4 - 5 = -9 1 0 + 0 1 1 1 0 0 1 1 1 1 but ... 1 1 1 0 7 3 –6 ©UCB Spring 2004 1 + 1 0 1 0 1 0 1 1 0 1 1 –4 –5 7 Ovflw S 2/4/04 32 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.33 2/4/04 CS152 / Kubiatowicz Lec4.34 Overflow Detection ° Overflow: the result is too large (or too small) to represent properly • Example: - 8 ≤ 4-bit binary number ≤ 7 Overflow Detection Logic ° Carry into MSB ≠ Carry out of MSB • For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1] CarryIn0 A0 B0 A1 B1 A2 B2 1 1 1 0 7 3 –6 ©UCB Spring 2004 ° When adding operands with different signs, overflow cannot occur! ° Overflow occurs when adding: • 2 positive numbers and the sum is negative • 2 negative numbers and the sum is positive 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit ALU CarryIn3 1-bit ALU CarryOut3 Result2 X 0 0 1 1 Y 0 1 0 1 X XOR Y 0 1 1 0 ° On your own: Prove you can detect overflow by: • Carry into MSB ≠ Carry out of MSB 0 1 0 + 0 1 2/4/04 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 0 1 1 –4 –5 7 2/4/04 A3 B3 Overflow Result3 + 1 0 CS152 / Kubiatowicz Lec4.35 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.36 2/4/04 signed-arith and cin xor co ° LSB and MSB need to do a little extra Carry Look Ahead (Design trick: peek) More Revised Diagram Ovflw ALU0 co cin s31 a31 C0 = Cin 00000000000000 0000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 00000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 00 00 00 00 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 0000000000000000000000 00 00 00 00 00 00 00 00 00 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000000000000000000000000000000000000000000000000000000000000 0000 0000000000000000000000000000000 00000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 0000000000000000000 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000000000000000000000000000000000000000000000000000000000000000000000000000 00000 00000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000000000000000000000000000000000 0 0 0 0 0 0 000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 2/4/04 B3 A3 B2 A2 B1 A1 B0 A0 G P G P G P G P S S S S A b31 C4 = . . . C2 = G1 + G0 • P1 + C0 • P0 • P1 C1 = G0 + C0 • P0 32 S ©UCB Spring 2004 C3 = G2 + G1 • P2 + G0 • P1 • P2 + C0 • P0 • P1 • P2 ©UCB Spring 2004 32 B ALU0 co cin s0 a0 G P 32 A 0 0 1 1 b0 B 0 1 0 1 G = A and B P = A xor B C-out 0 C-in C-in 1 CS152 / Kubiatowicz Lec4.39 CS152 / Kubiatowicz Lec4.37 “kill” “propagate” “propagate” “generate” C/L to produce select, comp, c-in 4 M 2/4/04 2/4/04 ° Critical Path of n-bit Rippled-carry adder is n*CP Plumbing as Carry Lookahead Analogy But What about Performance? B3 A3 B2 A2 B0 A0 B1 A1 c1 1-bit Result2 ALU CarryIn3 CarryOut2 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit Result0 ALU CarryIn1 CarryOut0 CarryIn0 g0 1-bit ALU c2 CarryOut3 g1 c4 g3 p0 g0 Result3 g2 c0 ©UCB Spring 2004 ©UCB Spring 2004 p1 p3 Design Trick: Throw hardware at it g1 p0 p2 g0 c0 p1 p0 CS152 / Kubiatowicz Lec4.40 CS152 / Kubiatowicz Lec4.38 c0 Cascaded Carry Look-ahead (16-bit): Abstraction C L A C0 00000000000000000000 00000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 0 0 0 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000 0 0 0 0 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 00 00 00 00 00 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 2/4/04 2/4/04 CP(2n) = 2*CP(n) Design Trick: Guess (or “Precompute”) Cout CP(2n) = CP(n) + CP(mux) 4-bit Adder 4-bit Adder 4-bit Adder n-bit adder G0 P0 C1 = G0 + C0 • P0 C4 = . . . C3 = G2 + G1 • P2 + G0 • P1 • P2 + C0 • P0 • P1 • P2 C2 = G1 + G0 • P1 + C0 • P0 • P1 1 ©UCB Spring 2004 ©UCB Spring 2004 n-bit adder n-bit adder G P Carry-select adder 0 n-bit adder n-bit adder CS152 / Kubiatowicz Lec4.43 CS152 / Kubiatowicz Lec4.41 2/4/04 2/4/04 P0 2nd level Carry, Propagate as Plumbing Carry Skip Adder: reduce worst case delay Exercise: optimal design uses variable block sizes Just speed up the slowest case for each block P3 p3 4-bit Ripple Adder P2 p2 P1 p1 G0 P0 B A4 ©UCB Spring 2004 ©UCB Spring 2004 g3 p0 S g2 P3 p3 4-bit Ripple Adder g1 P2 P1 p2 CS152 / Kubiatowicz Lec4.44 CS152 / Kubiatowicz Lec4.42 g0 P0 B A0 S p1 Additional MIPS ALU requirements ° Mult, MultU, Div, DivU (next lecture) => Need 32-bit multiply and divide, signed and unsigned ° Sll, Srl, Sra (next lecture) => Need left shift, right shift, right shift arithmetic by 0 to 31 bits ° Nor (leave as exercise to reader) => logical NOR or use 2 steps: (A OR B) XOR 1111....1111 Elements of the Design Process ° Divide and Conquer (e.g., ALU) • Formulate a solution in terms of simpler components. • Design each of the components (subproblems) ° Generate and Test (e.g., ALU) • Given a collection of building blocks, look for ways of putting them together that meets requirement ° Successive Refinement (e.g., carry lookahead) • Solve "most" of the problem (i.e., ignore some constraints or special cases), examine and correct shortcomings. ° Formulate High-Level Alternatives (e.g., carry select) • Articulate many strategies to "keep in mind" while pursuing any one approach. ° Work on the Things you Know How to Do • The unknown will become “obvious” as you make progress. 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.45 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.46 Summary of the Design Process Hierarchical Design to manage complexity Top Down vs. Bottom Up vs. Successive Refinement Importance of Design Representations: Block Diagrams Decomposition into Bit Slices Truth Tables, K-Maps Circuit Diagrams top down bottom up mux design meets at TT Why should you keep a design notebook? ° Keep track of the design decisions and the reasons behind them • Otherwise, it will be hard to debug and/or refine the design • Write it down so that can remember in long project: 2 weeks ->2 yrs • Others can review notebook to see what happened ° Record insights you have on certain aspect of the design as they come up ° Record of the different design & debug experiments • Memory can fail when very tired Other Descriptions: state diagrams, timing diagrams, reg xfer, . . . Optimization Criteria: Gate Count [Package Count] Pin Out 2/4/04 Area Logic Levels Delay Fan-in/Fan-out Cost ©UCB Spring 2004 ° Industry practice: learn from others mistakes Power Design time CS152 / Kubiatowicz Lec4.47 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.48 Why do we keep it on-line? ° You need to force yourself to take notes • Open a window and leave an editor running while you work 1) Acts as reminder to take notes 2) Makes it easy to take notes • 1) + 2) => will actually do it How should you do it? ° Keep it simple • DON’T make it so elaborate that you won’t use (fonts, layout, ...) ° Separate the entries by dates • type “date” command in another window and cut&paste ° Take advantage of the window system’s “cut and paste” features ° It is much easier to read your typing than your writing ° Also, paper log books have problems • Limited capacity => end up with many books • May not have right book with you at time vs. networked screens • Can use computer to search files/index files to find what looking for ° Start day with problems going to work on today ° Record output of simulation into log with cut&paste; add date • May help sort out which version of simulation did what ° Record key email with cut&paste ° Record of what works & doesn’t helps team decide what went wrong after you left ° Index: write a one-line summary of what you did at end of each day 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.49 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.50 On-line Notebook Example ° Refer to the handout “Example of On-Line Log Book” on cs 152 handouts page 1st page of On-line notebook (Index + Wed. 9/6/95) * Index ============================================================== Wed Sep 6 00:47:28 PDT 1995 - Created the 32-bit comparator component Thu Sep 7 14:02:21 PDT 1995 - Tested the comparator Mon Sep 11 12:01:45 PDT 1995 - Investigated bug found by Bart in comp32 and fixed it + ==================================================================== Wed Sep 6 00:47:28 PDT 1995 Goal: Layout the schematic for a 32-bit comparator I've layed out the schemtatics and made a symbol for the comparator. I named it comp32. The files are ~/wv/proj1/sch/comp32.sch ~/wv/proj1/sch/comp32.sym Wed Sep 6 02:29:22 PDT 1995 - ==================================================================== • Add 1 line index at front of log file at end of each session: date+summary • Start with date, time of day + goal • Make comments during day, summary of work • End with date, time of day (and add 1 line summary at front of file) 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.51 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.52 2nd page of On-line notebook (Thursday 9/7/95) + ==================================================================== Thu Sep 7 14:02:21 PDT 1995 Goal: Test the comparator component I've written a command file to test comp32. in ~/wv/proj1/diagnostics/comp32.cmd. I've placed it 3rd page of On-line notebook (Monday 9/11/95) + =================================================================== = Mon Sep 11 12:01:45 PDT 1995 Goal: Investigate bug discovered in comp32 and hopefully fix it Bart found a bug in my comparator component. He left the following e-mail. ------------------From bart@simpsons.residence Sun Sep 10 01:47:02 1995 Received: by wayne.manor (NX5.67e/NX3.0S) id AA00334; Sun, 10 Sep 95 01:47:01 -0800 Date: Wed, 10 Sep 95 01:47:01 -0800 From: Bart Simpson <bart@simpsons.residence> To: bruce@wanye.manor, old_man@gokuraku, hojo@sanctuary Subject: [cs152] bug in comp32 Status: R Hey Bruce, I think there's a bug in your comparator. The comparator seems to think that ffffffff and fffffff7 are equal. Can you take a look at this? Bart ---------------- I ran the command file in viewsim and it looks like the comparator is working fine. I saved the output into a log file called ~/wv/proj1/diagnostics/comp32.log Notified the rest of the group that the comparator is done. Thu Sep 7 16:15:32 PDT 1995 - ==================================================================== 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.53 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.54 4th page of On-line notebook (9/11/95 contd) I verified the bug. here's a viewsim of the bug as it appeared.. (equal should be 0 instead of 1) -----------------SIM>stepsize 10ns SIM>v a_in A[31:0] SIM>v b_in B[31:0] SIM>w a_in b_in equal SIM>a a_in ffffffff\h SIM>a b_in fffffff7\h SIM>sim time = 10.0ns A_IN=FFFFFFFF\H B_IN=FFFFFFF7\H EQUAL=1 Simulation stopped at 10.0ns. ------------------Ah. I've discovered the bug. I mislabeled the 4th net in the comp32 schematic. I corrected the mistake and re-checked all the other labels, just in case. I re-ran the old diagnostic test file and tested it against the bug Bart found. It seems to be working fine. hopefully there aren’t any more bugs:) 5th page of On-line notebook (9/11/95 contd) On second inspectation of the whole layout, I think I can remove one level of gates in the design and make it go faster. But who cares! the comparator is not in the critical path right now. the delay through the ALU is dominating the critical path. so unless the ALU gets a lot faster, we can live with a less than optimal comparator. I e-mailed the group that the bug has been fixed Mon Sep 11 14:03:41 PDT 1995 - ==== ================================================================ • Perhaps later critical path changes; what was idea to make compartor faster? Check log book! 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.55 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.56 Added benefit: cool post-design statistics Sample graph from the Alewife project: Lecture Summary ° An Overview of the Design Process • Know your metrics of success! (Performance, etc) • Design is an iterative process, multiple approaches to get started • Do NOT wait until you know everything before you start ° Example: Instruction Set drives the ALU design • Divide an Conquer • Take pieces you know and put them together • Start with a partial solution and extend ° Optimization: Start simple and analyze critical path • For the Communications and Memory Management Unit (CMMU) • These statistics came from on-line record of bugs • For adder: the carry is the slowest element • Logarithmic trees to flatten linear computation • Precompute: Double hardware and postpone slow decision ° On-line Design Notebook • Open a window and keep an editor running while you work;cut&paste • Refer to the handout as an example • Former CS 152 students (and TAs) say they use on-line notebook for programming as well as hardware design; one of most valuable skills 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.57 2/4/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec4.58 ...
View Full Document

This note was uploaded on 01/29/2008 for the course CS 152 taught by Professor Kubiatowicz during the Spring '04 term at University of California, Berkeley.

Ask a homework question - tutors are online