L13-pipelining2

L13-pipelining2 - CS324: Computer Architecture CS324:...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS324: Computer Architecture CS324: Pipeline Hazards Review: Processor Pipelining Review: “Pipeline registers” are added to the datapath/controller to neatly divide the single cycle processor into “pipeline stages”. Optimal Pipeline – Each stage is executing part of an instruction each clock cycle. – One inst. finishes during each clock cycle. – On average, execute far more quickly. What makes this work well? – Similarities between instructions allow us to use same stages for all instructions (generally). – Each stage takes about the same amount of time as all others: little wasted time. Review: Pipeline Review: Pipelining is a BIG IDEA – widely used concept What makes it less than perfect? – Structural hazards: Conflicts for resources. Suppose we had only one cache? ⇒ Need more HW resources – Control hazards: Branch instructions effect which instructions come next. ⇒ Delayed branch – Data hazards: Data flow between instructions. Control Hazard: Branching Control Time (clock cycles) I n I$ D$ Reg Reg beq s I$ D$ Reg Reg t Instr 1 r. I$ D$ Reg Reg Instr 2 O I$ D$ Reg Reg Instr 3 r I$ D$ Reg Reg d Instr 4 e r Where do we do the compare for the branch? ALU ALU ALU ALU ALU Control Hazard: Branching Control We have branch decision-making hardware in ALU stage We – therefore two more instructions after the branch will always be always fetched, whether or not the branch is taken Desired functionality of a branch Desired – if we do not take the branch, don’t waste any time and continue executing normally – if we take the branch, don’t execute any instructions after the branch, just go to the desired label Control Hazard: Branching Control Initial Solution: Stall until decision is made Initial – insert “no-op” instructions (those that accomplish nothing, just take time) or hold up the fetch of the next instruction (for 2 cycles). – Drawback: branches take 3 clock cycles each (assuming comparator is put in ALU stage) Control Hazard: Branching Control Optimization #1: Optimization – insert special branch comparator in Stage 2 – as soon as instruction is decoded (Opcode identifies it as a branch), immediately make a decision and set the new value of the PC – Benefit: since branch is complete in Stage 2, only one unnecessary instruction is fetched, so only one no-op is needed – Side Note: This means that branches are idle in Stages 3, 4 and 5. Control Hazard: Branching Control Time (clock cycles) I n I$ D$ Reg Reg beq s I$ D$ Reg Reg t Instr 1 r. I$ D$ Reg Reg Instr 2 O I$ D$ Reg Reg Instr 3 r I$ D$ Reg Reg d Instr 4 e r Branch comparator moved to Decode stage. ALU ALU ALU ALU ALU Control Hazard: Branching I n s t r. O r d e r User inserting no-op instruction User Time (clock cycles) ALU add beq nop lw I$ Reg D$ ALU Reg I$ Reg D$ Reg bub ble bub ble I$ bub ble Reg bub ble ALU bub ble D$ Reg Impact: 2 clock cycles per branch instruction ⇒ slow Impact: Control Hazard: Branching I n s t r. O r d e r Controller inserting a single bubble Controller Time (clock cycles) ALU add beq lw I$ Reg D$ ALU Reg I$ Reg D$ Reg ALU bub ble I$ Reg D$ Reg Impact: 2 clock cycles per branch instruction ⇒ slow Impact: Control Hazard: Branching Control Optimization #2: Redefine branches – Old definition: if we take the branch, none of the instructions after the branch get executed by accident – New definition: whether or not we take the branch, the single instruction immediately following the branch gets executed (called the branch-delay slot) The term “Delayed Branch” means we always execute inst after branch This optimization is used on the MIPS Control Hazard: Branching Control Notes on Branch-Delay Slot Notes – Worst-Case Scenario: can always put a no-op in the branchdelay slot – Better Case: find an instruction preceding the branch which can be placed in the branch-delay slot without affecting flow of the program re-ordering instructions is a common method of speeding up programs re compiler must be very smart in order to find instructions to do this compiler usually can find such an instruction at least 50% of the time usually As launch more instruction per clock cycle, less useful As Jumps also have a delay slot… Jumps Example: Nondelayed vs. Delayed Branch Example: Nondelayed Branch or $8, $9 ,$10 add $1 ,$2,$3 sub $4, $5,$6 beq $1, $4, Exit xor $10, $1,$11 Delayed Branch add $1 ,$2,$3 sub $4, $5,$6 beq $1, $4, Exit or $8, $9 ,$10 xor $10, $1,$11 Exit: Exit: Control Hazard Solutions Predict: guess one direction then back up if wrong Predict: – Predict not taken I n s t r. O r d e r Time (clock cycles) ALU Add Beq Load Mem Reg Mem Mem ALU Reg Reg Mem ALU Reg Reg Mem Mem Reg Impact: 1 clock cycles per branch instruction if right, 2 if Impact: wrong (right - 50% of time) More dynamic scheme: history of 1 branch (- 90%) More Data Hazard on r1 Data Sub needs r1’s value before Sub add has completed Compiler could resolve Compiler problem by never generating these types of code sequences. add r1 ,r2,r3 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11 Data Hazard on r1: • Dependencies backwards in time are hazards Time (clock cycles) IF I n s t r. O r d e r ID/RF Reg EX MEM Dm ALU WB Reg ALU add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Im Im Reg Dm ALU Reg Im Reg Dm ALU Reg Im Reg Dm ALU Reg Im Reg Dm Reg Data Hazard Solution Data Hazard add r1 ,r2,r3 sub r4, r1 ,r3 nop and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11 add r1 ,r2,r3 nop nops: instructions that do nothing. Force sub to wait. Problem: data dependencies are too frequent for compiler to avoid reliably Data Hazard Solution: • “Forward” result from one stage to another Time (clock cycles) IF I n s t r. O r d e r ID/RF Reg EX MEM Dm ALU WB Reg ALU add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Im Im Reg Dm ALU Reg Im Reg Dm ALU Reg Im Reg Dm ALU Reg Im Reg Dm Reg •“or” hazard solved by register hardware if define read/write properly Resolving Hazards (hardware) Resolving 1. Detect by looking at pipeline register fields sub and or add sw $2, $1, $3 $12, $2, $5 $13, $6, $2 $14, $2, $2 $15, 100($2) pipeline register name pipeline register name EX/MEM.RegisterRd = ID/EX.RegisterRs = $s2 field in pipeline register field in pipeline register 2. Forward proper value to resolve – Use temp. results, don’t wait for them to be written – register file forwarding to handle read/write to same register (write during 1st half of cycle, read during 2nd half) – ALU forwarding (forward result asap to ALU) Data Hazards and Stalls Data Forwarding unit controls the ALU multiplexors to Forwarding replace the value from a general-purpose register with the value from the proper pipeline register. Hazard detection unit controls the writing of the PC Hazard and IF/ID registers plus the multiplexor that chooses between the real control values and all 0s. – Stalls and deasserts the control fields if the load-use hazard test on previous slide is true. Forwarding (or Bypassing): What about Loads • Dependencies backwards in time are hazards Time (clock cycles) IF ID/RF Reg EX MEM Dm ALU WB Reg ALU lw r1,0(r2) sub r4,r1,r3 Im Im Reg Dm Reg • Can’t solve with forwarding: • Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads • Dependencies backwards in time are hazards Time (clock cycles) IF ID/RF Reg EX ALU MEM Dm WB Reg ALU lw r1,0(r2) sub r4,r1,r3 Im Stall Im Reg Dm Reg • Can’t solve with forwarding: • Must delay/stall instruction dependent on load, then forward Summary: Pipelining Summary: Reduce CPI by overlapping many instructions Reduce – Average throughput of approximately 1 CPI with fast clock Utilize capabilities of the Datapath Utilize – start next instruction while working on the current one – limited by length of longest stage (plus fill/flush) – detect and resolve hazards Summary: Pipelining Summary: What makes it easy What – – – all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? What – – – structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction Designing a Pipelined Processor Designing Go back and examine your datapath and Go control diagram associate resources with states associate ensure that flows do not conflict, or figure ensure out how to resolve conflicts assert control in appropriate stage assert Pipelined Datapath Pipelined 0 M u x 1 IF/ID Add 4 ID/EX EX/MEM MEM/WB Add Add result Shift left 2 Instruction Read register 1 PC Address Instruction memory Read data 1 Read register 2 Registers Read Write data 2 register Write data 0 M u x 1 Zero ALU ALU result Address Data memory Write data Read data 1 M u x 0 16 Sign extend 32 Pipeline control We have 5 stages. What needs to be controlled in each We stage? – – – – – Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back We can use the same control lines as before, but now We they must be grouped by pipeline stage. Control information is created during instruction decode Control and then is passed via pipeline registers to the appropriate stage. Pipeline Control Pass control signals along just like the data Pass Execution/Address Calculation Memory access stage stage control lines control lines Reg ALU ALU ALU Mem Mem Dst Op1 Op0 Src Branch Read Write 1 1 0 0 0 0 0 0 0 0 1 0 1 0 X 0 0 1 0 0 1 X 0 1 0 1 0 0 WB Instruction Instruction R-format lw sw beq Write-back stage control lines Reg Mem to write Reg 1 0 1 1 0 X 0 X Control M EX WB M WB IF/ID ID/EX EX/MEM MEM/WB Pipelined Processor (almost) for slides Pipelined What happens if we start a new instruction every What cycle? Valid Inst. Mem Mem Ctrl Dcd Ctrl Ex Ctrl WB Ctrl Reg. File Mem Access M Data Mem Equal Next PC PC A B Exec Reg File S IRmem IRwb IRex IR Control and Datapath Control IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; S <– A or ZX; S <– A + SX; S <– A + SX; If Cond PC < PC+SX; M <– Mem[S] R[rd] <– S; R[rt] <– S; R[rd] <– M; Mem[S] <- B Equal Inst. Mem Next PC Reg File A B Exec PC IR S M Mem Access D Data Mem Reg. File Pipelining the Load Instruction Pipelining Cycle 1 Cycle 2 Clock 1st lw Ifetch Reg/Dec Exec Reg/Dec Ifetch Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 2nd lw Ifetch 3rd lw The 5 independ functional units in the pipeline datapath: The – – – – – Instruction Memory for the Ifetch stage Register File’s Read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register File’s Write port (bus W) for the Wr stage The Four Stages of R-type The Cycle 1 Cycle 2 Cycle 3 Cycle 4 R-type Ifetch Reg/Dec Exec Wr Ifetch: Instruction Fetch Ifetch – Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Reg Exec: Exec: – ALU operates on the two register operands – Update PC Wr: Write the ALU output back to the register file Wr PipeliningCycle 2 Cycle-3typeCycle 5 Cycle 6 Cycle 7 Instruction the R Cycle 4 and Load Instruction Cycle 1 Cycle 8 Cycle 9 Clock R-type Ifetch R-type Reg/Dec Ifetch Load Exec Reg/Dec Ifetch Wr Exec Reg/Dec Wr Exec Reg/Dec Mem Exec Reg/Dec Wr Wr Exec Wr Ops! We have a problem! R-type Ifetch R-type Ifetch We have pipeline conflict or structural hazard: We – Two instructions try to write to the register file at the same time! – Only one write port Important Observation Important Each functional unit can only be used once per Each instruction Each functional unit must be used at the same stage for Each all instructions: – Load uses Register File’s Write Port during its 5th stage Load 1 Ifetch 2 Reg/Dec 3 Exec 4 Mem 5 Wr – R-type uses Register File’s Write Port during its 4th stage 1 R-type Ifetch 2 Reg/Dec 3 Exec 4 Wr ° 2 ways to solve this pipeline hazard. Solution 1: Insert “Bubble” into the Pipeline Solution Cycle 1 Cycle 2 Clock Ifetch Load Reg/Dec Ifetch Exec Reg/Dec Wr Exec Reg/Dec Mem Exec Wr Wr Wr Exec Reg/Dec Wr Exec Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 R-type Ifetch R-type Ifetch Reg/Dec Pipeline Exec R-type Ifetch Bubble Reg/Dec Ifetch Insert a “bubble” into the pipeline to prevent 2 writes at the Insert same cycle – The control logic can be complex. – Lose instruction fetch and issue opportunity. No instruction is started in Cycle 6! No Solution 2: Solution Delay R-type’s register write by one cycle: Delay – Now R-type instructions also use Reg File’s write port at Stage 5 – Mem stage is a NOOP stage: nothing is being done. Cycle 1 Cycle 2 Clock R-type Ifetch R-type Reg/Dec Ifetch Load Exec Reg/Dec Ifetch Mem Exec Reg/Dec Wr Mem Exec Reg/Dec Wr Mem Exec Reg/Dec Wr Mem Exec Wr Mem Wr Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 R-type Ifetch R-type Ifetch Modified Control & Datapath IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; S <– A or ZX; M <– S R[rt] <– M; S <– A + SX; S <– A + SX; if Cond PC < PC+SX; M <– S M <– Mem[S] Mem[S] <- B R[rd] <– M; R[rd] <– M; Equal Inst. Mem Next PC PC IR A B Exec Reg File S Mem Acces s Data M Mem D Reg. File The Four Stages of Store The Cycle 1 Cycle 2 Cycle 3 Cycle 4 Store Ifetch Reg/Dec Exec Mem Wr Ifetch: Instruction Fetch Ifetch – Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Reg Exec: Calculate the memory address Exec: Mem: Write the data into the Data Memory Mem The Three Stages of Beq The Cycle 1 Cycle 2 Cycle 3 Cycle 4 Beq Ifetch Reg/Dec Exec Mem Wr Ifetch: Instruction Fetch Ifetch – Fetch the instruction from the Instruction Memory Reg/Dec: Reg – Registers Fetch and Instruction Decode Exec: Exec: – – – compares the two register operand, select correct branch target address latch into PC Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; S <– A or ZX; S <– A + SX; S <– A + SX; If Cond PC < PC+SX; M <– S M <– S R[rt] <– S; M <– Mem[S] Mem[S] <- B R[rd] <– S; R[rd] <– M; Equal Inst. Mem Next PC PC IR A B Exec Reg File S M Mem Acces s Data Mem D Reg. File Data Stationary Control Data The Main Control generates the control signals during The Reg/Dec – – – Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later Reg/Dec ExtOp ALUSrc ALUOp Main Control RegDst MemW r Branch MemtoReg RegWr Exec ExtOp ALUSrc ALUOp RegDst MemW r Branch MemtoReg RegWr Mem Wr Ex/Mem Register Mem/Wr Register ID/Ex Register IF/ID Register MemW rBranch MemtoReg RegWr MemtoReg RegWr Datapath + Data Stationary Control Datapath Inst. Mem IR fun Decode rt rs op v rw wb me ex im Exec A B v rw wb me v rw wb Mem Ctrl WB Ctrl rs rt Reg File S Mem Access M D Next PC PC Data Mem Reg. File Let’s Try it Out Let these addresses are octal 10 14 20 24 30 34 100 lw sub beq ori add and r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 r13, r14, 15 addI r2, r2, 3 Start: Fetchn10 Start: n Inst. Mem Decode n n IR Mem Ctrl im A B Exec S Mem Access M WB Ctrl rs rt Reg File Data Mem = D Next PC IF 10 14 20 24 30 34 Reg. File lw sub beq ori add r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 r13, r14, 15 addI r2, r2, 3 PC 10 100 and Inst. Mem lw r1, r2(35) Fetch 14, Decode 10 n Fetch n Decode Mem Ctrl rt im A B = D Next PC Exec S Mem Access n WB Ctrl IR 2 Reg File M Data Mem ID 10 IF 14 20 24 30 34 Reg. File lw sub beq ori add r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 r13, r14, 15 addI r2, r2, 3 PC 14 100 and Inst. Mem addI r2, r2, 3 Fetch 20, Decode 14, Exec 10 n n Decode lw r1 2 rt 35 IR Mem Ctrl WB Ctrl Reg File r2 B Exec S Mem Access M Data Mem = D Next PC EX 10 ID 14 IF 20 24 30 34 Reg. File lw sub beq ori add r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 r13, r14, 15 addI r2, r2, 3 PC 20 100 and sub r3, r4, r5 Inst. Mem addI r2, r2, 3 Fetch 24, Decode 20, Exec 14, Mem 10 Fetch n Decode lw r1 4 5 3 IR Mem Ctrl WB Ctrl r2 B Exec r2+35 Reg File M Mem Access Data Mem = D Next PC M 10 EX 14 ID 20 IF 24 30 34 Reg. File lw sub beq ori add r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 r13, r14, 15 addI r2, r2, 3 PC 24 100 and Inst. Mem beq r6, r7 100 etch 30, Dcd 24, Ex 20, Mem 14, WB 10 Decode sub r3 addI r2 lw r1 IR Mem Ctrl WB Ctrl M[r2+35] Mem Access Data Mem 6 7 r4 r5 = D Exec r2+3 Reg File WB 10 M 14 EX 20 ID 24 IF 30 34 Reg. File lw sub beq ori add r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 r13, r14, 15 addI r2, r2, 3 Next PC Note Delayed Branch: always execute ori after beq PC 30 100 and Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Fetch Inst. Mem ori r8, r9 17 Decode sub r3 addI r2 beq IR 9 xx r6 r7 Exec = D Next PC Mem Acces s Data Mem 10 lw sub beq ori add r1=M[r2+35] r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 r13, r14, 15 addI r2, r2, 3 100 Mem Ctrl WB Ctrl Reg. File WB M EX ID 14 20 24 30 34 PC 100 r4-r5 Reg File r2+3 IF 100 and Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 Fetch Inst. Mem ? IR Reg File Exec Decode WB Ctrl Reg. File Mem Acces s Data Mem 10 14 lw sub beq ori add Mem Ctrl = D Next PC ___ r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 addI r2, r2, 3 WB 20 M 24 EX 30 34 Fill it in yourself! ID 100 and r13, r14, 15 PC Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 Fetch Inst. Mem ? IR Reg File Decode ? Mem Ctrl ? ? ? D Next PC Mem = Mem Acces s Data 10 14 20 lw sub beq ori add r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 addI r2, r2, 3 WB Ctrl Reg. File WB 24 M 30 34 Fill it in yourself! PC ___ Exec EX 100 and r13, r14, 15 Fetch 114, Dcd 110, Ex 104, Mem 100, WB Fetch 30 Inst. Mem Decode ? IR Reg File ? ? Mem Ctrl ? ? Exec ? Mem Acces s Data Mem Reg. File 10 14 20 24 lw sub beq ori add WB Ctrl = D Next PC r1, r2(35) r3, r4, r5 r6, r7, 100 r8, r9, 17 r10, r11, r12 addI r2, r2, 3 ___ WB 30 34 PC Fill it in yourself! M 100 and r13, r14, 15 Pipeline Hazards Again Pipeline I-Fetch Structural Hazard I-Fet ch DCD OpFetch IFetch Jump DCD DCD MemOpFetch OpFetch IFetch DCD °°° Exec Store Control Hazard °°° IF DCD EX IF Mem WB RAW (read after write) Data Hazard WB Mem WB DCD WAW Data Hazard (write after write) OF Ex RS Ex Mem DCD EX IF Mem DCD EX IF IF DCD OF WAR Data Hazard (write after read) Data Hazards Data Avoid some “by design” Avoid – eliminate RAW by always fetching operands early (DCD) in pipe – eliminate WAW by doing all WBs in order (last stage, static) Detect and resolve remaining ones Detect – stall or forward (if possible) IF DCD EX IF Mem WB Mem Mem WB DCD IF DCD OF Ex RS RAW Data Hazard WB WAW Data Hazard OF Ex Mem DCD EX IF DCD EX IF RAW Data Hazard Hazard Detection Hazard Suppose instruction i is about to be issued and a predecessor Suppose is instruction j is in the instruction pipeline. is A RAW hazard exists on register ρ if ρ ∈ Rregs( i ) ∩ Wregs( j ) RAW if Rregs Wregs – Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. – When instruction issues, reserve its result register. – When on operation completes, remove its write reservation. A WAW hazard exists on register ρ if ρ ∈ Wregs( i ) ∩ Wregs( j ) WAW if Wregs Wregs A WAR hazard exists on register ρ if ρ ∈ Wregs( i ) ∩ Rregs( j ) WAR if Wregs Rregs Record of Pending Writes IAU npc I mem Regs im op rw rs rt n op rw PC Current operand Current registers Pending writes Pending hazard <= hazard ((rs == rwex) & regWex) OR ((rs == rwmem) & regWme) OR ((rs == rwwb) & regWwb) regW OR ((rt == rwex) & regWex) regW OR ((rt == rwmem) & regWme) regW OR ((rt == rwwb) & regWwb) B A alu S n op rw D mem m Regs n op rw What about Interrupts, Faults? What External Interrupts: External – Allow pipeline to drain, – Load PC with interrupt address Faults (within instruction, restartable) Faults – – – Force trap instruction into IF disable writes till trap hits WB must save multiple PCs or PC + state Refer to MIPS solution Exception Handling Exception IAU npc I mem Regs im lw $2,20($5) detect bad instruction address PC detect bad instruction B A n op rw detect overflow alu S D mem m Regs detect bad data address Allow exception to take effect Summary: Pipelining Summary: What makes it easy What – all instructions are the same length – just a few instruction formats – memory operands appear only in loads and stores What makes it hard? What – structural hazards: suppose we had only one memory – control hazards: need to worry about branch instructions – data hazards: an instruction depends on a previous instruction Summary Summary What really makes it hard: What – exception handling – trying to improve performance with out-of-order execution, etc. Pipelining is a fundamental concept Pipelining – multiple steps using distinct resources Utilize capabilities of the Datapath by pipelined Utilize instruction processing – start next instruction while working on the current one – limited by length of longest stage (plus fill/flush) – detect and resolve hazards ...
View Full Document

Ask a homework question - tutors are online