L10_multicycle

L10_multicycle - CS324 Computer CS324 Architecture Lecture...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS324: Computer CS324: Architecture Lecture 10: Multicycle Processors Next PC PC Instruction Fetch nPC_sel Basic Limits on Cycle Time Next address logic Next – PC <= branch ? PC + offset : PC + 4 Instruction Fetch Instruction Operand Fetch ExtOp ALUSr c ALUctr – InstructionReg <= Mem[PC] Register Access Register – A <= R[rs] Mem Access Data Mem Reg. File Result Store Exec ALU operation ALU – R <= A + B MemRd MemWr RegDst RegWr MemWr Control We are limited to performing 1 ALU We operation, 1 memory access, or 1 register file access per cycle Next PC nPC_sel PC Instruction Fetch Partitioning the CPI=1 Datapath register Operand Fetch ExtOp ALUSrc ALUc tr MemRd MemWr Balance amount of work Balance done in each step to minimize cycle time Add registers between Add smallest steps Mem Access Data Mem Result Store Exec Reg. RegDst RegWr File MemWr Next PC PC Instruction Fetch IR Operand Fetch Reg A File B nPC_sel Example Multicycle Datapath ExtOp Ext ALUSrc ALU ALUc R Mem Acce ss Data Mem Result Store M tr MemRd MemWr Registers hold values Registers “calculated” in the previous cycle MemToReg Reg. RegDst RegWr File Equal Next PC nPC_sel Instruction Fetch PC IR Reg A File B Operand Fetch Execute Memory Access Result Store Mem Acce ss Data Mem M ExtOp Ext ALUSrc ALU ALUctr R MemRd MemWr Multi-cycle Datapath MemToReg Reg. RegDst RegWr File Equal Example Multicycle Datapath PC 0 M u x 1 Address Memory MemData Write data Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register Instruction [15– 0] 0 M Instruction u x [15– 11] 1 0 M u x 1 16 Read register 1 Read Read register 2 data 1 Registers Write Read register data 2 Write data A 0 M u x 1 Zero ALU ALU result ALUOut B 4 0 1M u 2x 3 Memory data register Sign extend 32 Shift left 2 Break up the instructions into steps, each step takes a cycle Break Registers IR, A, B, ALUout and MDR have enable signals, i.e. Aen, Registers IRen … At the end of a cycle At – store values for use in later cycles (easiest thing to do) Five Execution Steps Instruction Fetch Instruction Instruction Decode and Operand (Register) Fetch Instruction Execution, Memory Address Computation, or Branch Execution, Completion Memory Access or R-type instruction completion Memory Write-back step Write INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! Stage 1: Instruction Fetch Use PC to get instruction and put it in the Instruction Register. Use Increment the PC by 4 and put the result back in the PC. Increment Can be described succinctly using RTL "Register-Transfer Can Language" IR = Memory[PC]; PC = PC + 4; What is the advantage of updating the PC now? Incrementing of PC and Instruction memory access can happen in parallel. Stage 2: Instruction Decode and Register Fetch Read registers rs and rt in case we need them Read Compute the branch address in case the instruction is a branch. Compute Written to PC at next cycle iff Branch RTL: RTL: A = Reg[RS]; (RS = IR[25-21]) (RS B = Reg[RT]; (RT = IR[20-16]) (RT ALUOut = PC + (sign-extend(IR[15-0]) << 2); We aren't setting any control lines based on the instruction type (we We are busy "decoding" it in our control logic) Stage 3: (instruction dependent) ALU is performing one of three functions, based on instruction ALU type Memory Reference, calculate address: Memory ALUOut = A + sign-extend(IR[15-0]); R-type, ALU performs operation specified by function code ALUOut = A op B; Branch, ALU computes subtraction of A and B Branch, if (A==B) PC = ALUOut; JUMP JUMP – write jump address to PC Stage 4: R-type or memory-access Loads and stores access memory Loads MDR = Memory[ALUOut]; or Memory[ALUOut] = B; R-type instructions finish Reg[IR[15-11]] = ALUOut; The write actually takes place at the end of the cycle on the edge 5. Write-back step LOAD LOAD – Reg[IR[20-16]]= MDR; What happens at this stage for all the other instructions? They have completed by now. Multi-cycle implementations begin a new instruction as soon as the current instruction completes. Recall: Step-by-step Processor Design Recall: Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control Step 4: R-rtype: (add, sub, . . .) Step Logical Register Transfer Logical Physical Register Transfers Physical inst ADDU inst ADDU Logical Register Transfers R[rd] <– R[rs] + R[rt]; PC <– PC + 4 Physical Register Transfers IR <– MEM[pc] A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 Inst. Mem Next PC PC IR A B Exec Reg File S Mem Access M Data Mem Reg. File Equal Step 4: Logical immed Step Logical Register Transfer Logical Physical Register Transfers Physical inst ADDU inst ADDU Logical Register Transfers R[rt] <– R[rs] OR zx(Im16); PC <– PC + 4 Physical Register Transfers IR <– MEM[pc] A<– R[rs]; S <– A OR ZeroExt(Im16) R[rt] <– S; PC <– PC + 4 Inst. Mem Next PC PC A B Exec Reg File IR S Mem Access M Data Mem Reg. File Equal Step 4 : Load Step inst Logical Register Transfer Logical Physical Register Transfers Physical LW inst LW Logical Register Transfers R[rt] <– MEM(R[rs] + sx(Im16); PC <– PC + 4 Physical Register Transfers IR <– MEM[pc] A<– R[rs]; S <– A + SignEx(Im16) M <– MEM[S] R[rt] <– M; PC <– PC + 4 Inst. Mem Next PC PC A B Exec Reg File IR S Mem Access M Data Mem Reg. File Equal Step 4 : Store Step Logical Register Transfer Logical inst SW Logical Register Transfers MEM(R[rs] + sx(Im16) <– R[rt]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] SW A<– R[rs]; B <– R[rt] S <– A + SignEx(Im16); MEM[S] <– B PC <– PC + 4 Physical Register Transfers Physical Inst. Mem Next PC PC A B Exec Reg File IR S Mem Access M Data Mem Reg. File Equal Step 4 : Branch Step Logical Register Transfer Logical inst BEQ Logical Register Transfers if R[rs] == R[rt] then PC <= PC + sx(Im16) || 00 Physical Register Transfers Physical inst Physical Register Transfers IR <– MEM[pc] BEQ|Eq PC <– PC + 4 inst else PC <= PC + 4 Physical Register Transfers IR <– MEM[pc] BEQ|Eq PC <– PC + sx(Im16) || 00 Inst. Mem Next PC PC A B Exec Reg File IR S Mem Access M Data Mem Reg. File Equal Summary: Summary: Step name Instruction fetch Instruction decode/register fetch Execution, address computation, branch/ jump completion Memory access or R-type completion Memory read completion ALUOut = A op B Action for R-type instructions Action for memory-reference Action for instructions branches IR = Memory[PC] PC = PC + 4 A = Reg [IR[25-21]] B = Reg [IR[20-16]] ALUOut = PC + (sign-extend (IR[15-0]) << 2) ALUOut = A + sign-extend if (A ==B) then (IR[15-0]) PC = ALUOut Load: MDR = Memory[ALUOut] or Store: Memory [ALUOut] = B Load: Reg[IR[20-16]] = MDR Action for jumps PC = PC [31-28] II (IR[25-0]<<2) Reg [IR[15-11]] = ALUOut blanks indicate instruction type has completed execution by the time this step is reached. Control can be determined from this sequence. Alternative datapath: Multiple Cycle Datapath Alternative Miminizes Hardware: 1 memory, 1 adder Miminizes PCWr PCWrCond Zero IorD MemWr IRWr 32 32 32 32 0 1 32 32 Rs 32 Rt Rt 0 Rd 1 1 Mux 0 5 5 0 1 PCSrc RegDst RegWr ALUSelA 1 0 32 0 1 2 3 32 BrWr 32 Target Mux PC Mux Zero ALU Out ALU Mux Instruction Reg RAdr Ra Rb Rw busW busB 32 busA Ideal Memory Reg File 32 4 Mux WrAdr 32 Din Dout 32 << 2 ALU Control Imm 16 Extend 32 ALUOp ALUSelB ExtOp MemtoReg The Big Picture: Where are We Now? The The Five Classic Components of a Computer The Processor Input Control Memory Datapath Output To continue: To – – – FSMs and Hardwired Control Microprogrammed control Exceptions Multicycle Approach We will be reusing functional units We – ALU used to compute address and to increment PC – Memory used for both instruction and data Our control signals will not be determined solely by Our instruction – Which stage is being performed will also play a role We’ll use a finite state machine for control We Can’t just be combinational since can’t be determined by values of opcode alone Implementing the Control Value of control signals is dependent upon: Value – what instruction is being executed – which step is being performed Use the information we’ve accumulated to specify a Use finite state machine – specify the finite state machine graphically, or – use microprogramming Implementation can be derived from the specification Implementation Finite state machines Finite state machines: Finite – a set of states – next state function (determined by current state and the input) – output function (determined by current state and possibly input) controls asserted when in that state controls Next-state function Next state Current state Clock Inputs Output function Outputs Our Control Model Our State specifies control points for Register Transfer State Transfer occurs upon exiting state (same falling edge) Transfer inputs (conditions) Next State Logic Control State State X Register Transfer Control Points Depends on Input Output Logic outputs (control points) FSM Control Specification FSM Corresponds to the 5 execution stages Corresponds Each state takes 1 clock cycle Each First 2 stages identical for all instructions First – so first 2 states common to all instructions Stages 3-5 differ depending upon opcode Stages – remaining states thus differ by instruction FSM returns to initial state (instruction fetch) after FSM execution of every instruction Step 4 => Control Specification w/ RTL for multicycle processor IR <= MEM[PC] PC <= PC+4 A <= R[rs] B <= R[rt] “instruction fetch” (initial state) “decode / operand fetch” Execute R-type ORi LW SW BEQ & Equal BEQ & ~Equal PC <= PC + SX || 00 S <= A fun B S <= A or ZX S <= A + SX S <= A + SX Memory M <= MEM[S] MEM[S] <= B Write-back R[rd] <= S R[rt] <= S R[rt] <= M Step 5: Step datapath + state diagram => control Translate RTs into control points Translate Assign states (number to distinguish) Assign Then go build the controller Then Mapping RTs to Control Points Mapping IR <= MEM[PC] PC <= PC+4 imem_rd, IRen A <= R[rs] B <= R[rt] “instruction fetch” “decode” Aen, Ben R-type S <= A fun B Execute ORi LW S <= A + SX SW BEQ & Equal BEQ & ~Equal PC <= PC + SX || 00 ALUfun, Sen S <= A or ZX S <= A + SX Memory M <= MEM[S] MEM[S] <= B R[rd] <= S RegDst, RegWr, PCen R[rt] <= S R[rt] <= M Write-back Assigning States: How many bits do we need? IR <= MEM[PC] PC <= PC+4 “instruction fetch” 0000 A <= R[rs] B <= R[rt] “decode” 0001 R-type S <= A fun B Execute ORi S <= A or ZX LW S <= A + SX SW 1011 BEQ & Equal BEQ & ~Equal 0011 PC <= PC + SX || 00 S <= A + SX 0100 0110 Memory 1000 M <= MEM[S] 0010 MEM[S] <= B 1001 1100 R[rt] <= M Write-back R[rd] <= S R[rt] <= S 0101 0111 1010 FSM Implementation of Controller FSM state register stores current state state combinational logic block combinational – – – inputs: current state and opcode field outputs: datapath control signals and next state like big truth table specifying outputs in terms of inputs control functions expressed as one logic equation for each output control – computed via logic as ROM (read only memory) or ROM PLA (programmable logic array) PLA FSM Implementation of Controller FSM state register stores current state state combinational logic block combinational – – – inputs: current state and opcode field outputs: datapath control signals and next state like big truth table specifying outputs in terms of inputs control functions expressed as one logic equation for each output control – computed via logic as ROM (read only memory) or ROM PLA (programmable logic array) PLA ROM Implementation ROM = "Read Only Memory“ ROM – values of memory locations are fixed ahead of time Easiest control implementation Easiest A ROM can be used to implement a truth table ROM – if the address is m-bits, we can address 2m entries in the ROM. – our outputs are the bits of data that the address points to. 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 m n m is the "height", and n is the "width" ROM Implementation How many inputs are there? How 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses or locations) How many outputs are there? How 16 datapath-control outputs, 4 state bits = 20 outputs (datapath-control outputs depend only on current state) ROM is 210 x 20 = 20K bits (an unusual size) ROM Wasteful, since for many entries, the outputs are the same Wasteful, – e.g. if current state = 0, output will always be 0001 regardless of opcode ROM Implementation ROM Inputs to control become Inputs address lines to ROM ROM size will be 2 x 20 ROM – row corresponds to 1 of 210 input combinations – entry value indicates which outputs are active for that input 10 Address 0000000000 0000000001 . . . 1111111110 1111111111 control vals . . . PLA Implementation PLA Reduce amount of storage needed Reduce – simplify logic equations to sum of minterms form sum minterm corresponds to single truth table entry minterm each output consists of an OR of these terms each only those input combinations that produce an active output needed only – size = (inputs x minterms)+(outputs x minterms) = (10 x 17)+(20x17) = 510 much smaller than a ROM (20 Kbits), therefore more efficient much Performance Evaluation Performance What is the average CPI? What – state diagram gives CPI for each instruction type – workload gives frequency of each type Type Arith/Logic Load Store branch CPIi for type 4 5 4 3 Frequency 40% 30% 10% 20% Average CPI: CPIi x freqIi 1.6 1.5 0.4 0.6 4.1 Graphical Representations Graphical of Control As instruction set becomes more complex As – Number of states per instruction can vary widely could be between 1-20 cycles per instruction could – FSM can contain hundreds of states easier to make mistakes! easier Alternative: Microprogramming Alternative: Microprogramming Microprogramming Think of control signals that must be asserted as a Think μinstruction to be executed by the datapath – A μinstruction defines the control signals that must be asserted in a given state – Executing a μinstruction has the effect of asserting the control signals specified by the μinstruction Need a method to specify μinstruction sequencing Need – Sequential – branch Controller Design construct a simple construct “microsequencer” w/counter Control reduces to programming Control this very simple device – microprogramming ≈ control signals – symbolic representation of control that will be translated by a program to control logic sequencer control datapath control microinstruction micro-PC sequencer Microcode Microcode Combine states, registers, additional control logic, Combine and ROM to get a FSM called a μcode engine. – opcode from IR becomes jump address into ROM – μPC can be used to step thru series of fetches from ROM beginning with this address – each fetch results in control signals being sent out Alternative to μPC: each μinstruction explicitly Alternative specifies the address of its successor – one of the fields of the μinstruction may be the address of the next μinstruction “Macroinstruction” Interpretation Main Memory ADD SUB AND User program plus Data this can change! . . . DATA execution unit CPU control memory one of these is mapped into one of these AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) Microcode Instruction Format Microcode Simplest form: each bit represents a control signal Simplest sent to datapath – bits in the μinstruction are connected to control wires, thus causing actions to occur in the datapath control signal outputs 0 next instruction 23 24 31 Microcode Instruction Format Microcode Format designs are often classified by the width of Format the word employed. – horizontal: a very wide word that contains all of the control signals necessary to drive the system – vertical: a narrower word, with a sequence of μinstructions driving all of the control signals vertical μinstruction seems inherently slower – vertical it takes a sequence of operations to accomplish what the horizontal μinstruction can do in a single cycle. But… Vertical Instruction Format Vertical in many cases, individual control lines are asserted only in in certain minor states if control grouped by minor state, we can reuse some bits of the if micro-word by having their outputs first fed to a "demultiplexer" that steers them to the proper signal lines according to the current minor state In effect the microinstruction format is using multiple instruction In formats to reduce redundancy state dependent state field fixed signal field next instruction demultiplexor state 0 signals state 1 signals state 2 signals Horizontal vs. Vertical Microprogramming Horizontal •Most microprogramming-based controllers vary between: •horizontal organization (1 control bit per control point) •vertical organization (fields encoded in the control memory and must be decoded to control something) Horizontal + more control over the potential parallelism of operations in the datapath uses up lots of control store... this can be considerable Vertical + easier to program, not very different from programming a RISC machine in assembly language extra level of decoding may slow the machine down ...
View Full Document

This note was uploaded on 02/15/2010 for the course CS 324 taught by Professor Lballesteros during the Fall '08 term at Mt. Holyoke.

Ask a homework question - tutors are online