L6_OOO_2010

L6_OOO_2010 - OutOfOrderExecution LihuRappoportandAdiYoaz

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Computer Architecture         Out-Of-Order Execution Out-Of-Order Execution Lihu Rappoport and Adi Yoaz
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Computer Architecture 2010 – Out-Of-Order Execution 2 What’s Next What’s Next Remember our goal: minimize CPU Time CPU Time =  clock cycle  ×  CPI  ×  IC So far we have learned Minimize  clock cycle    add more pipe stages  Minimize  CPI    use pipeline  Minimize  IC    architecture In a pipelined CPU: CPI w/o hazards is 1 CPI with hazards is > 1 Adding more pipe stages reduces clock cycle but increases CPI Higher penalty due to control hazards More data hazards Beyond some point adding more pipe stages does not help What can we do ?  Further reduce the CPI !
Background image of page 2
Computer Architecture 2010 – Out-Of-Order Execution 3 A Superscalar CPU A Superscalar CPU Duplicating HW in one pipe stage won’t help  e.g., have 2 ALUs the bottleneck moves to other stages Getting IPC > 1 requires to fetch, decode, exe, and  retire >1 instruction per clock: IF ID EXE MEM WB IF ID EXE MEM WB
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Computer Architecture 2010 – Out-Of-Order Execution 4 The Pentium The Pentium  Processor  Processor Fetches and decodes 2 instructions per cycle Before register file read, decide on  pairing can the two instructions be executed in parallel Pairing decision is based on Data dependencies: instructions must be independent Resources:  Some instructions use resources from the 2 pipes The second pipe can only execute part of the instructions IF ID U-pipe V-pipe pairing
Background image of page 4
Computer Architecture 2010 – Out-Of-Order Execution 5 MPI : miss-per-instruction: #incorrectly predicted branches                #predicted branches  MPI =                                           = MPR×                 total # of instructions                          total # of instructions MPI correlates well with performance. E.g., assume: MPR = 5%, %branches = 20%   MPI = 1%   Without hazards IPC=2  (2 instructions per cycles) Flush penalty of 5 cycles We get:  MPI = 1%   flush in every 100 instructions  IPC=2       flush every 100/2 = 50 cycles 5 cycles flush penalty every 50 cycles  10% in performance For IPC=1 we would get 5 cycles flush penalty per 100 cycles   5% in performance Misprediction Penalty in a Superscalar CPU Misprediction Penalty in a Superscalar CPU
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/14/2011 for the course CS 234267 taught by Professor Rapaport during the Spring '07 term at Technion.

Page1 / 17

L6_OOO_2010 - OutOfOrderExecution LihuRappoportandAdiYoaz

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online