CA5.6 - Chapter 5(cont The Processor Multicycle DataPath and Control 1 Arithmetic& Logical(e.g add PC Load PC Store PC Branch PC Inst Memory

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 5 (cont.) The Processor: Multicycle DataPath and Control 1 Arithmetic & Logical (e.g. add) PC Load PC Store PC Branch PC Inst Memory Inst Memory Inst Memory Inst Memory What’s wrong with CPI=1 processor? Reg File mux ALU mux setup Reg File Critical Path Reg File mux ALU Data Mem mux setup mux ALU Data Mem Reg File cmp mux 2 An Abstract View of the Critical Path Ideal Instruction Memory Instruction Address Next Address 32 PC Instruction Rd Rs 5 5 Rt 5 Imm 16 A Rw Ra Rb 32 32-bit Registers 32 B Critical Path = PC’s Clk-to-Q + I- Memory’s Access Time + Reg File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Reg File Write + Clock Skew Data Address Data In Clk 32 ALU Ideal Data Memory Clk Clk 32 3 Arithmetic & Logical (e.g. add) PC Load PC Store PC Branch PC Inst Memory Inst Memory Inst Memory Inst Memory What’s wrong with CPI=1 processor? Reg File mux ALU mux setup Reg File Critical Path Reg File mux ALU Data Mem mux setup mux ALU Data Mem Reg File cmp mux • Long Cycle Time • Duplicate Resources – All instructions take as much time as the slowest – Real memory slower than idealized memory 4 Arithmetic & Logical (e.g. add) PC Inst Memory What’s wrong with CPI=1 processor? Reg File mux ALU mux setup Arithmetic & Logical (e.g. div) PC Inst Memory Reg File mux ALU Load (L1 hit) PC Inst Memory Reg File mux ALU Data Mem mux setup Load (e.g. L1 miss) PC Store PC Branch PC Inst Memory Reg File cmp mux Inst Memory Reg File Critical Path mux ALU Data Mem Inst Memory Reg File mux ALU Data Mem 5 Memory Access Physics => fast memories are small (large memories are slow) • Time Storage Array selected word line storage cell address bit line address decoder sense amps proc. bus Processor Cache 1-2 clock-period L2 Cache mem. bus • => Use a hierarchy of memories memory 200 - 500 clock-periods 6 6-10 clock-periods Reducing Cycle • Cut combinational dependency graph and insert register / latch Time • Do same work in two fast cycles, rather than one slow one some instructions! storage element Acyclic Combinational Logic (A) storage element • May be able to short­circuit path and remove some components for Acyclic Combinational Logic ⇒ storage element Acyclic Combinational Logic (B) storage element storage element 7 Multicycle DataPath • Minimizes Hardware: 1 memory, 1 adder PCWr PCWrCond Zero IorD PC 32 0 32 32 RAdr 0 Rs Ra Rt 32 Rt Rd 0 5 5 Rb busA PCSrc ALUSelA BrWr Target 32 MemWr 32 IRWr RegDst RegWr 1 Mux 0 Zero Mux Instruction Reg 1 32 32 Ideal Memory WrAdr Din Dout A 4 32 32 Mux ALU 1 0 ALU Out Reg File Rw busW busB Mux MDR B 32 1 2 32 1 1 Mux 0 << 2 3 ALU Control Imm Extend 16 32 ALUOp ALUSelB ExtOp MemtoReg 8 Note the differences: One memory, one ALU PCWrCond Zero IorD PC 32 0 32 32 RAdr New regs added PCSrc ALUSelA BrWr Target 32 PCWr MemWr 32 IRWr RegDst RegWr 1 Mux 0 0 Rs Ra Rt 32 Rt Rd 0 5 5 Rb busA Zero Mux Instruction Reg 1 32 32 Ideal Memory WrAdr Din Dout A 4 32 32 New mux added Mux ALU 1 0 ALU Out Reg File Rw busW busB Mux MDR 32 B 32 1 2 32 1 1 Mux 0 << 2 3 ALU Control Imm Extend 16 32 ALUOp ALUSelB ExtOp MemtoReg 9 Breaking Inst Execution into Clock Cycles 1) 2) 3) 4) 5) Instruction Fetch Instruction Decode and Reg Read Execution, Mem addr, or Branch Mem access, Operation complete Load complete 10 10 Summary of Steps taken on Inst Execution Step inst fetch Action for R­type Action for L/S Action for branch Action for jump IR <= memory[pc], pc=pc+4 inst decode A<=reg[IR[25:21], B<=reg[IR[20:16]], reg read ALUout <= pc + ((signext(IR[15:0])<<2) execution memory access load complete A op B Reg = ALUout ALUout <= A + offset Read or write Reg = MDR 11 11 If (A=B)? PC<= target CPI in a Multicycle Datapath Step inst fetch Action for R­type Action for L/S Action for branch Action for jump IR <= memory[pc], pc=pc+4 inst decode A<=reg[IR[25:21], B<=reg[IR[20:16]], reg read ALUout <= pc + ((signext(IR[15:0])<<2) execution memory access load complete A op B Reg = ALUout ALUout <= A + offset Read or write If (A=B)? PC<= target 4 L=5 Reg = MDR S=4 3 3 12 12 CPI in Multicycle Datapath Using Specint2000 instruction mix, what is the CPI? 0.25*5+ Inst mix: 0.10*4+ 0.11*3+ Load 25% 0.02*3+ Store 10% 0.52*4 Branch 11% = 4.12 Jump 2% ALU 52% 13 13 Announcement • Lab#2 is posted. • • • • You are required to expand VeSPA architecture with 4 new instructions. Modify the VeSPA behavior model (coded in Verilog) to include the four new instructions. Use the extended instruction set to write VeSPA assembly code, and simulate them on your new VeSPA simulator. Due before Thanksgiving holiday, that is 11/22. 14 14 Extra Credit • To execute the Jump and Link Instruction, what changes should we make to the existing control signals. JAL L Regdst only allow Rd and Rt $r31 PC+4 Goto L; 15 15 ALUop op 6 Instr<31:26> Main Control RegDst ALUSrc 3 func Instr<5:0> 6 Instruction Fetch Unit ALU Control ALUctr 3 : nPC_sel Clk Instruction<31:0> <21:25> <16:20> <11:15> <0:15> RegDst 31 Rd Mux RegWr 5 Rt Rs Rt PC+4 5 5 ALUctr busA 32 0 Mux ALU Rt Zero 32 Rs Rd Imm16 MemtoReg 0 Mux busW mux 32 Clk Rw Ra Rb 32 32-bit Registers busB 32 Extender MemWr 1 32 ALUSrc imm16 Instr<15:0> Data In 32 Clk WrEn Adr Data Memory 32 1 16 ExtOp 16 16 Defining the control • Controls in multicycle datapath are more complex because the instruction is executed in a series of steps. • Two techniques to be discussed are – FSM (Finite State Machine) – Microprogramming • HDL (e.g. Verilog) can be used to implement the control. 17 17 High­level view of FSM Start Memory Ops R­type Ops Branch inst Jump inst 18 18 High­level view of FSM Inst Fetch, decode & Reg read Memory Ops R­type Ops Branch inst Jump inst 19 19 Control Spec for multicycle proc IR <= MEM[PC] PC=PC+4 “instruction fetch” “Finite State Diagram” A <= R[rs] B <= R[rt] “decode / operand fetch” R-type S <= A fun B ORi S <= A | ZX LW SW S <= A + SX PC <= Next(PC,Equal) S <= A + SX M <= MEM[S] MEM[S] <= B R[rd] <= S R[rt] <= S R[rt] <= M 20 20 Write-back Memory Execute BEQ Control Spec for each state “Finite State Diagram” ALUselA=0 ALUselB=1 1 ALUop=00 IorD=0 IRwr=1 PCsrc=0 PCwr=1 ALUselA=0 ALUselB=0 ALUop=00 BEQ R-type ORi LW SW Execute RegDst=0 RegWr=1 MemtoReg=0 RegDst=1 RegWr=1 MemtoReg=1 21 21 Write-back IorD=1 Memwr=1 Memory ALUselA=1 ALUselB=1 Assigning State Number IR <= MEM[PC] PC=PC+4 “Finite State Diagram” A <= R[rs] B <= R[rt] 0 1 “instruction fetch” “decode / operand fetch” S <= A fun B S <= A | ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) M <= MEM[S] MEM[S] <= B R[rd] <= S R[rt] <= S R[rt] <= M 9 10 11 22 22 Write-back Memory 7 8 Execute 2 R-type 3 ORi 4 LW SW 5 BEQ 6 State Reduction IR <= MEM[PC] PC<=PC+4 “instruction fetch” “Finite State Diagram” A <= R[rs] B <= R[rt] “decode / operand fetch” R-type S <= A fun B ORi S <= A | ZX LW/SW PC <= Next(PC,Equal) S <= A + SX LW M <= MEM[S] SW MEM[S] <= B R[rd] <= S R[rt] <= S R[rt] <= M 23 23 Write-back Memory Execute BEQ FSM Controller Combinational Control logic Outputs Datapath control outputs Inputs State register Input from OP codes 24 24 FSM Combinational Logic Truth Table Current State Inputs from Next State OP codes Outputs …. …. …. …. 25 25 Instruction Set and Control Options • 7­instruction subset MIPS easy to implement FSM by hand • Full MIPS instruction set > 100 instructions, inclucing shift,logical,*, /, Floating Point +, ­, *, /, memory ops, • Full IA­32 instruction set? – Need to use Verilog, CAD tools – > 500 instructions, including copy and edit, save/ restore state, setup for memory protection, repeat, … 26 26 Exceptions • Usually the hardest parts of control • Events other than branches/jumps that • Different from software controlled exceptions, which are implemented by branch/jump instructions change the flow of instruction execution Exception: internal event, such as divide by 0, bad memory address (e.g. bus error) Interrupt: external events, such as I/O 27 27 Examples of Exceptions Type of events I/O device requests System calls Arithmetic overflow Using undefined inst From where? External Internal Internal Internal MIPS term Interrupt Exception Exception Exception Either Hardware malfunction Either More examples: TLB miss, Page fault, Memory fault, Address out­of­range, Protection violation, .. etc 28 28 How Exceptions Are Handled? • OS is called to handle exceptions Hardware must provide the exception program counter (EPC) – so that the program may restart the cause of the exception (Cause) – so that the appropriate actions can be taken. This is usually called program status register. two control signals to add to cause EPC and Cause registers to be written the error condition to write into the Cause register the exception address, the entry point to OS (8000 0180h). 29 29 • Additional work in Control design Controller • The state diagrams that arise define the controller for an Design • • ⇒ microprogramming instruction set processor are highly structured Use this structure to construct a simple “microsequencer” Control reduces to programming this very simple device taken datapath control Micro-PC op-code Map ROM 30 30 Microprogrammin •gMicroprogramming is a convenient method for implementing structured control state diagrams: • Control design reduces to Microprogramming – Random logic replaced by microPC sequencer and ROM – Each line of ROM called a microinstruction: contains sequencer control + values for control points – To reduce confusion, normal instruction (e.g., MIPS addu) called “macroinstruction” – limited state transitions: branch to zero, next sequential, branch to µ instruction address from dispatch ROM – Part of the design process is to develop a “language” that describes control and is easy for humans to understand 31 31 “Macroinstruction” Interpretation Main Memory ADD SUB AND User program plus Data this can change! . . . one of these is mapped into one of these DATA execution unit CPU control memory AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) 32 32 Designing a “Microinstruction Set” 1) Start with list of control signals 2) Group signals together that make sense (vs. random): “fields” 3) Place fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) 4) To minimize the width, encode operations that will never be used at the same time • “Horizontal” µ Code: one control bit in µ Instruction for every control line in datapath • “Vertical” µ Code: groups of control­lines coded together in µ Instruction (e.g. possible ALU dest) 5) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals – Use computers to design computers 33 33 Microinstructio n Mem Control ALU Source PC Control Reg Control ALU function Micro sequence control 00 next micro inst (upc+=1) 01 use dispatch table 1 10 use dispatch table 2 11 go to state 0 34 34 Microprogrammed Control Unit Dispatch Table 1 Dispatch Table 2 0 0 1 2 3 MicroPC Incr Microprogram Memory Or Control Store Micro­instruction Control signals to data path 35 35 Overview of Control may be designed using one of several initial representations. • Control The choice of sequence control, and how logic is represented, can then Initial Representation Finite State Diagram Microprogram be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Sequencing Control Logic Representation Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Equations Truth Tables Implementation Technique PLA “hardwired control” ROM “microprogrammed control” 36 36 Microprogramming Pros and Cons Pros ☺Ease of design ☺Flexibility ☺Easy to adapt to changes in organization, timing, technology ☺Can make changes late in design cycle, or even in the field ☺Can implement very powerful instruction sets (just more control memory) ☺Generality ☺Can implement multiple instruction sets on same machine. ☺Can tailor instruction set to application. ☺Compatibility ☺Many organizations, same instruction set 37 37 Cons – – – Microprogramming Pros and Cons ROM ROM not faster than RAM (so not much faster than 1st level instruction cache), PLA smaller and therefore faster than ROM Simpler instruction sets popular now CAD tools + fast computers allows simulation => correct control – Control unit in same chip as data path, so can’t replace 38 38 Legacy Software and Microprogramming • IBM bet company on 360 Instruction Set Architecture (ISA): • single instruction set for many classes of machines – (8­bit to 64­bit) What to do about software compatibility – If microprogramming could easily do same instruction set on many different microarchitectures, then why couldn’t multiple microprograms do multiple instruction sets on the same microarchitecture? – Coined term “emulation”: instruction set interpreter in microcode for non­native instruction set – Very successful: in early years of IBM 360 it was hard to know whether old instruction set or new instruction set was more frequently used 39 39 Microprogramming­­ inspiration for RISC • If simple instruction could execute at very high • • • • • clock rate… If you could even write compilers to produce microinstructions… If most programs use simple instructions and addressing modes… If microcode is kept in RAM instead of ROM so as to fix bugs … If same memory used for control memory could be used instead as cache for “macroinstructions”… Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1­1) 40 40 Summary (1 of 3) • Disadvantages of the Single Cycle Processor • Multiple Cycle Processor: – Long cycle time – Cycle time is too long for all instructions except the Load – Divide the instructions into smaller steps – Execute each step (instead of the entire instruction) in one cycle • Partition datapath into equal size chunks to minimize cycle time – ~10 levels of logic between latches 41 41 Summary (cont’d) (2 of 3) • Control is specified by finite state diagram • Specialize state­diagrams easily captured by • • • microsequencer – simple increment & “branch” fields – datapath control fields Control design reduces to Microprogramming Control is more complicated with: – complex instruction sets – restricted datapaths Simple Instruction set and powerful datapath ⇒ simple control – could try to reduce hardware – rather go for speed => many instructions at once! 42 42 Summary (3 of 3) • Microprogramming is a fundamental concept • Design of a Microprogramming language – implement an instruction set by building a very simple processor and interpreting the instructions – essential for very complex instructions and when few register transfers are possible – Control design reduces to Microprogramming – Start with list of control signals – Group signals together that make sense (vs. random): called “fields” – Place fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) – To minimize the width, encode operations that will never be used at the same time 43 43 • • • • Where to get more information? Multiple Cycle Controller: Appendix C of your text book. Microprogramming: Section 5.7 of your text book. D. Patterson, “Microprograming,” Scientific American, March 1983. D. Patterson and D. Ditzel, “The Case for the Reduced Instruction Set Computer,” Computer Architecture News 8, 6 (October 15, 1980) 44 44 ...
View Full Document

This note was uploaded on 01/26/2011 for the course CSCI 4203 taught by Professor Weichunghsu during the Fall '05 term at Minnesota.

Ask a homework question - tutors are online