L8Dpath1 - CS324: Computer CS324: Architecture Lecture 8:...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS324: Computer CS324: Architecture Lecture 8: Datapath Design The Big Picture: The Performance Perspective The Performance of a machine is determined by: Performance – Instruction count (per program) – Clock cycle time (for period) – Clock cycles per instruction (avg) CPI Inst. Count Cycle Time ISA and compiler determine instruction Count ISA Processor design (datapath and control) will determine: Processor – Clock cycle time – Clock cycles per instruction Today, Single cycle processor: Today, – Advantage: One clock cycle per instruction – Disadvantage: long cycle time Simplified View of Implementation Details Data Register # Registers Register # Register # Data PC Address Instruction memory Instruction ALU Address Data memory Two types of functional units: Two – elements that operate on data values (combinational: output depends only on input) – elements that contain state ie) memory elements (sequential: output depends on input AND internal state) Sequential Circuits (memory elements) Sequential Clocks Clocks – needed to decide when state should be updated state by convention, edge-triggered (state change on clock edge) by – free-running signal with fixed cycle time (period) 2 parts: clock is high or clock is low parts: – frequency is inverse of cycle time rising edge Sequential Circuits (memory elements) Sequential edge-triggered (synchronous system) edge – signals written into state elements must be valid on active edge valid => value won’t change until inputs change valid clock cycle must be long enough for signals in combinational block to clock stabilize some state elements written on every edge, others only under certain some conditions Latches and Flip-flops Output is equal to the stored value inside the element Output Change of state (value) is based on the clock Change Latches: State changes whenever the inputs change, Latches: AND the clock is asserted Latches and Flip-flops Flip-flop: state changes only on a clock edge Flip – Edge-triggered methodology – clocking methodology defines when signals can be read and written – wouldn’t want to read a signal at the same time it was being wouldn written. Why? value read could correspond to old value, new value, or some combination of the two Our Implementation Our An edge triggered methodology An Typical execution: Typical – read contents of some state elements – send values through some combinational logic – write results to one or more state elements State element 1 Combinational logic State element 2 Clock cycle The Processor: Datapath & Control We're ready to look at an implementation of the MIPS We're Simplified to contain only: Simplified – memory-reference instructions: lw, sw lw – arithmetic-logical instructions: add, sub, and, or, slt add, – control flow instructions: beq, j beq • Generic Implementation: – – – – use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do • All instructions use the ALU after reading the registers Why? memory-reference, arithmetic, or control flow Simple Implementation: Include the functional Simple units we need for each instruction. What’s needed? Instruction address PC Instruction Instruction memory Add Sum a. Instruction memory b. Program counter c. Adder 5 Register numbers 5 5 Read register 1 Read register 2 Registers Write register Write data 3 Read data 1 Data Read data 2 ALU contr ol Zero ALU ALU result Data RegWrite a. Registers b. ALU Simple Implementation: Include the functional Simple units we need for each instruction Why do we need all of this stuff? MemWrite Address Read data Data memory 16 Sign extend 32 Write data MemRead a. Data memory unit b. Sign-extension unit Building the Datapath: Use multiplexors to Building stitch them together PCSrc Add 4 Shift left 2 Registers PC Read address Instruction Instruction memory Read register 1 Read Read data 1 register 2 Write register Write data RegWrite 16 Read data 2 ALUSrc 3 ALU operation MemWrite MemtoReg Address Read data Add ALU result M u x M u x Zero ALU ALU result Data memory Write data MemRead M u x Sign extend 32 How to Design a Processor: step-by-step How 1. Analyze instruction set => datapath requirements – the meaning of each instruction is given by the register register transfers – datapath must include storage element for ISA registers possibly more storage possibly – datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology How to Design a Processor: step-by-step How 3. 4. Assemble datapath meeting the requirements Analyze implementation of each instruction to determine setting of control points that effects the register transfer. Assemble the control logic Before beginning design: Before – find out what instructions look like! 5. The MIPS Instruction Formats The All MIPS instructions are 32 bits long (three instruction All formats): 31 26 21 16 11 – R-type – I-type – J-type 31 op 6 bits op 6 bits 31 op 6 bits 26 target address 26 bits 26 rs 5 bits rs 5 bits 21 rt 5 bits rt 5 bits 16 immediate 16 bits 0 rd 5 bits shamt 5 bits 6 funct 6 bits 0 0 The different fields are: The – op: operation of the instruction – rs, rt, rd: the source and destination register specifiers – shamt: shift amount – funct: selects the variant of the operation in the “op” field – address / immediate: address offset or immediate value – target address: target address of the jump instruction Step 1a: The MIPS-lite Subset for Step today ADD and SUB ADD – addU rd, rs, rt – subU rd, rs, rt 31 26 21 16 11 6 0 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 OR Immediate: OR op rs rt immediate 6 bits 5 bits 5 bits 16 bits – ori rt, rs, imm16 26 21 16 LOAD/STORE Word 31 LOAD/STORE op rs rt immediate – lw rt, rs, imm16 6 bits 5 bits 5 bits 16 bits 0 – sw rt, rs, imm16 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits target address 26 bits immediate 16 bits 0 0 BRANCH: BRANCH: – beq rs, rt, imm16 JUMP: JUMP: – j address 31 op 6 bits 26 Logical Register Transfers Logical RTL gives the meaning of the instructions RTL All execution starts by fetching the instruction All op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 inst ADDU SUBU ORi LOAD STORE BEQ = MEM[ PC ] Instruction is “fetched” from memory location whose address is in the PC Register Transfers R[rd] <– R[rs] + R[rt]; R[rd] <– R[rs] – R[rt]; R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4 PC <– PC + 4 PC <– PC + 4 R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4 MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC <– PC + 4 if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4 Step 1: Requirements of the Step Instruction Set Memory Memory – For instructions & data Registers (32 x 32) Registers – read RS – read RT – Write RT or RD PC PC Sign extender (e.g. for immediate values) Sign Add and Sub register or extended immediate Add Add 4 or extended immediate to PC Add Step 2: Select Components of the Step Datapath Combinational Elements Combinational –what are these? Storage Elements Storage – Choose clocking methodology Combinational Logic Elements Combinational (Basic Building Blocks) A CarryIn 32 Adder Adder Adder B Select A 32 32 MUX 32 Sum Carry 32 MUX MUX O P A 32 B 32 Y ALU ALU ALU B 32 32 Result Storage Element: Storage Register (Basic Building Block) D flip-flop flip – 1-bit input plus clock input – state changes with clock asserted Register Register – Similar to the D Flip Flop except N-bit input and output Write Enable input Write Write Enable Data In N Clk Data Out N – Write Enable: negated (0): Data Out will not change negated asserted (1): Data Out will become Data In asserted Storage Element: Register File Storage Register File consists of 32 registers: Register – Two 32-bit output busses: busA and busB – One 32-bit input bus: busW RW RA RB Write Enable 5 5 busW 32 Clk 5 Register is selected by: Register busA 32 32 32-bit Registers busB 32 – RA (number) selects the register to put on busA (data) – RB (number) selects the register to put on busB (data) – RW (number) selects the register to be written via busW (data) when Write Enable is 1 Clock input (CLK) Clock – The CLK input is a factor ONLY during write operation – During read operation, behaves as a combinational logic block: RA or RB valid => busA or busB valid after “access time.” RA Register File: Group of registers that can be read and written For read, 1 input: reg. number; output data in reg. For For write, 3 inputs: reg. #, data to write, clock signal that For inputs: controls the write Read register number 1 Register 0 Register 1 Register n – 1 Register n Read register number 2 M u x Read data 1 Read register number 1 Read register number 2 Write register Write data M u x Read data 2 Read data 1 Register file Read data 2 Write Storage Element: Idealized Storage Memory Memory (idealized) Memory – One input bus: Data In – One output bus: Data Out Data In 32 Clk Write Enable Address DataOut 32 Memory word is selected by: Memory – Write Enable = 0: Address selects the word to put on Data Out – Write Enable = 1: address selects the memory word to be written via the Data In bus at next clock tick Clock input (CLK) Clock – The CLK input is a factor ONLY during write operation – During read operation, behaves as a combinational logic block: Address valid => Data Out valid after “access time.” Address Step 3 Step Register Transfer Requirements Register –> Datapath Assembly – Instruction Fetch – Read Operands and Execute Operation 3a: Overview of the Instruction Fetch 3a: Unit The common RTL operations The – Fetch the Instruction: mem[PC] Clk PC Next Address Logic Address Instruction Memory – Update the program counter: Sequential Code: PC <- PC + 4 Sequential Branch and Jump: Branch PC <- “something else” Instruction Word 32 3b: Add & Subtract 3b: R[rd] <- R[rs] op R[rt] Example: addu rd, rs, rt R[rd] – Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields – ALUctr and RegWr: control logic after decoding the instruction 31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0 Rd Rs Rt RegWr 5 5 5 busW 32 Clk Rw Ra Rb 32 32-bit Registers ALUctr busA 32 busB 32 ALU Result 32 Register-Register Timing Clk PC Old Value Rs, Rt, Rd, Op, Func Clk-to-Q New Value Old Value Old Value Old Value Old Value Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value ALUctr RegWr busA, B busW (decode opcode, func, set control signals) Rd Rs Rt RegWr 5 5 5 busW 32 Clk Rw Ra Rb 32 32-bit Registers busA 32 busB 32 ALU ctr ALU Result 3 2 Register Write Occurs Here 3c: Logical Operations with Immediate R[rt] <- R[rs] op ZeroExt[imm16] ] R[ 31 26 op 6 bits 31 0000000000000000 16 bits RegDst Rd Rt Mux Rs RegWr 5 5 5 Rw Ra Rb 32 32-bit Registers busA 32 busB 32 32 ALUSrc Mux 21 rs 5 bits 16 rt 5 bits 16 15 11 immediate 16 bits rd? immediate 16 bits 0 I-Type 0 ALU ctr Result 32 ALU busW 32 Clk ZeroExt imm16 16 3d: Load Operations R[rt] <- Mem[R[rs] + SignExt[imm16]] R[ lw rt, rs, imm16 31 Rd RegDst Mux Rt Rs 55 busA 32 busB 32 32 ALUSrc Mux 26 op 6 bits 21 rs 5 bits ALUc tr ALU 16 rt 5 bits Example: 11 0 immediate 16 bits rd RegWr 5 busW 32 Clk Rw Ra Rb 32 32-bit Registers W_Src 32 MemWr WrEn Adr Mu x Extender imm16 16 Data In 32 Clk Data Memory 32 ExtOp 3e: Store Operations Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: Mem[ sw rt, rs, imm16 31 26 op 6 bits Rd RegDst Mux 5 Rs Rt 5 busA 32 busB 32 32 Mux ALU 32 Mu x WrEn Adr Data In 32 Clk ALUSrc Data Memory 0 => RegWr 5 busW 32 Clk Rt 21 rs 5 bits ALUct r 16 rt 5 bits MemWr immediate 16 bits W_Src 0 Rw Ra Rb 32 32-bit Registers Extender imm16 32 16 ExtOp 3f: The Branch Instruction 3f: 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits immediate 16 bits 0 beq rs, rt, imm16 beq – mem[PC] memory Fetch the instruction from – Equal <- R[rs] == R[rt] Calculate the branch condition Rt – Rs – if (COND eq 0) – else PC <- PC + 4 PC Calculate the next instruction’s address PC <- PC + 4 + ( SignExt(imm16) x 4 ) PC Datapath for Branch Operations beq rs, rt, imm16 Datapath generates condition (equal) beq 31 26 21 16 rt 5 bits immediate 16 bits Cond Rs Rt 5 busA 32 busB 32 Equal? 0 op rs 6 bits 5 bits Inst Address 4 Adder Mux Adder imm16 PC Ext nPC_sel 32 00 busW Clk PC Clk RegWr 5 5 Rw Ra Rb 32 32-bit Registers Putting it All Together: A Single Cycle Datapath Inst Memory Adr nPC_sel RegDst Instruction<31:0> <21:25> Rs <16:20> Rt <11:15> Rd <0:15> Imm16 Equal ALUct MemWr MemtoReg r Rd Rt 10 5 4 Adder Mux Adder PC Ext 00 RegWr 5 busW 32 Clk Clk imm16 Rs Rt 5 busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Mux Extender 1 32 = ALU 32 WrEn Adr Data Memory 0 Mux PC imm16 32 Data In Clk 1 16 ExtOp ALUSrc Single-cycle Datapath Single MIPS makes design easier MIPS – Instructions same size – Source registers always in same place – Immediates same size, location – Operations always on registers/immediates Single cycle datapath Single – CPI=1 – Clock Cycle Time (CCT) is long CCT determined by longest instruction CCT Complicated operations (FP) really slow things down Complicated An Abstract View of the Critical Path Register file and ideal memory: Register – The CLK input is a factor ONLY during write operation – During read operation, behave as combinational logic: Address valid => Output valid after “access time.” Address Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Data Address Ideal Data In Clk Data Memory Ideal Instruction Memory Instruction Address Next Address 32 PC Instruction Rd Rs 5 5 Rt 5 Imm 16 A Rw Ra Rb 32 32-bit Registers 32 B 32 ALU 32 Clk Clk ...
View Full Document

Ask a homework question - tutors are online