This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 11/22/10 1 Mark Redekopp, Al rights reserved EE 357 Unit 18 Basic Pipelining Techniques Mark Redekopp, Al rights reserved Single & Multi-Cycle Performance Single-Cycle CPU CPI = 1, Long clock cycle Each piece of the datapath requires only a small period of the overall instruction execution (clock cycle) time yielding low utilization of the HWs actual capabilities Multi-Cycle CPU CPI = n, Short clock cycle Sharing resources allows for compact logic design but in modern design we can afford replicated structures if needed Each instruction still requires several cycles to complete Mark Redekopp, Al rights reserved Pipelining Combines elements of both designs Datapath of single-cycle CPU w/ separate resources Datapath broken into stages with temporary registers between stages Short clock cycle A single instruction requires CPI = n System can achieve CPI = 1 Overlapping Multiple Instructions (separate instruction in each stage at once) Inst. 1 Inst. 1 Inst. 1 Inst. 2 Inst. 2 Inst. 2 Inst. 3 Inst. 3 Inst. 3 Inst. 4 Inst. 4 Inst. 5 F D Ex Clock 1 Clock 2 Clock 3 Clock 4 Clock 5 Inst. 1 Inst. 1 Inst. 2 Mem WB Mark Redekopp, Al rights reserved Basic 5 Stage Pipeline Same structure as single cycle but now broken into 5 stages Pipeline stage registers act as temp. registers storing intermediate results and thus allowing previous stage to be reused for another instruction Also, act as a barrier from signals from different stages intermixing Fetch Decode Exec. Mem WB I-Cache PC + Addr. Instruc. Instruction Register Register File Read Reg. 1 # Read Reg. 2 # Write Reg. # Write Data Read data 1 Read data 2 Sign Extend Pipeline Stage Register ALU Res. Zero Sh. Left 2 + Pipeline Stage Register D-Cache Addr. Read Data Write Data Pipeline Stage Register A B 4 16 32 5 5 11/22/10 2 Mark Redekopp, Al rights reserved Issues with Pipelining No sharing of HW/logic resources between stages because of full utilization Cant have a single cache (both I & D) because each is needed to fetch one instruction while another accesses data] Prevent signals in one stage (instruc.) from flowing into another stage (instruc.) and becoming convoluted Balancing stage delay Clock period = longest stage In example below, clock period = 50ns means 150ns delay for only 70ns of logic delay Fetch Logic Decode Logic Execute Logic Sample Stage Delay 10ns 10ns 50ns Mark Redekopp, Al rights reserved Resolution of Pipelining Issues No sharing of HW/logic resources between stages For full performance, no feedback (stage i feeding back to stage i-k) If two stages need a HW resource, replicate the resource in both stages (e.g. an I- AND D-cache) Prevent signals from one stage (instruc.) from flowing into another stage (instruc.) and becoming convoluted Stage Registers act as barrier wall to signals until next edge Balancing stage delay [Important!!!] Balance or divide long stages (See next slides)...
View Full Document
- Spring '08