This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 11/22/10 1 © Mark Redekopp, Al rights reserved EE 357 Unit 18 Basic Pipelining Techniques © Mark Redekopp, Al rights reserved Single & Multi-Cycle Performance Single-Cycle CPU • CPI = 1, Long clock cycle • Each piece of the datapath requires only a small period of the overall instruction execution (clock cycle) time yielding low utilization of the HW’s actual capabilities Multi-Cycle CPU • CPI = n, Short clock cycle • Sharing resources allows for compact logic design but in modern design we can afford replicated structures if needed • Each instruction still requires several cycles to complete © Mark Redekopp, Al rights reserved Pipelining • Combines elements of both designs – Datapath of single-cycle CPU w/ separate resources – Datapath broken into stages with temporary registers between stages • Short clock cycle • A single instruction requires CPI = n • System can achieve CPI = 1 – Overlapping Multiple Instructions (separate instruction in each stage at once) Inst. 1 Inst. 1 Inst. 1 Inst. 2 Inst. 2 Inst. 2 Inst. 3 Inst. 3 Inst. 3 Inst. 4 Inst. 4 Inst. 5 F D Ex Clock 1 Clock 2 Clock 3 Clock 4 Clock 5 Inst. 1 Inst. 1 Inst. 2 Mem WB © Mark Redekopp, Al rights reserved Basic 5 Stage Pipeline • Same structure as single cycle but now broken into 5 stages • Pipeline stage registers act as temp. registers storing intermediate results and thus allowing previous stage to be reused for another instruction – Also, act as a barrier from signals from different stages intermixing Fetch Decode Exec. Mem WB I-Cache PC + Addr. Instruc. Instruction Register Register File Read Reg. 1 # Read Reg. 2 # Write Reg. # Write Data Read data 1 Read data 2 Sign Extend Pipeline Stage Register ALU Res. Zero Sh. Left 2 + Pipeline Stage Register D-Cache Addr. Read Data Write Data Pipeline Stage Register A B 4 16 32 5 5 11/22/10 2 © Mark Redekopp, Al rights reserved Issues with Pipelining • No sharing of HW/logic resources between stages because of full utilization – Can’t have a single cache (both I & D) because each is needed to fetch one instruction while another accesses data] • Prevent signals in one stage (instruc.) from flowing into another stage (instruc.) and becoming convoluted • Balancing stage delay – Clock period = longest stage – In example below, clock period = 50ns means 150ns delay for only 70ns of logic delay Fetch Logic Decode Logic Execute Logic Sample Stage Delay 10ns 10ns 50ns © Mark Redekopp, Al rights reserved Resolution of Pipelining Issues • No sharing of HW/logic resources between stages – For full performance, no feedback (stage i feeding back to stage i-k) – If two stages need a HW resource, replicate the resource in both stages (e.g. an I- AND D-cache) • Prevent signals from one stage (instruc.) from flowing into another stage (instruc.) and becoming convoluted – Stage Registers act as barrier wall to signals until next edge • Balancing stage delay [Important!!!] – Balance or divide long stages (See next slides)...
View Full Document
- Spring '08
- Mark Redekopp, Pipeline Stage Register