lecture15 - CIS 450 Computer Architecture and Organization...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CIS 450 Computer Architecture and Organization Lecture 15: Code Optimization II Mitch Neilsen Mitch Neilsen [email protected] [email protected] 219D Nichols Hall 219D Nichols Hall – 2 – Topics Topics Machine Machine-- Independent Optimizations Independent Optimizations ¡ Optimization Blockers Machine Dependent Optimizations Machine Dependent Optimizations ¡ Understanding Processor Operations ¡ Branches and Branch Prediction – 3 – Getting High Performance Getting High Performance Don Don ’t Do Anything Stupid t Do Anything Stupid ¡ Watch out for hidden algorithmic inefficiencies ¡ Write compiler-friendly code z Help compiler past optimization blockers: function calls & memory references. Tune Code For Machine Tune Code For Machine ¡ Exploit instruction-level parallelism ¡ Avoid unpredictable branches ¡ Make code cache friendly z Covered later in course – 4 – Modern CPU Design Modern CPU Design Execution Execution Functional Units Instruction Control Instruction Control Integer/ Branch FP Add FP Mult/Div Load Store Instruction Cache Data Cache Fetch Control Instruction Decode Address Instrs. Operations Prediction OK? Data Data Addr. Addr. General Integer Operation Results Retirement Unit Register File Register Updates – 5 – CPU Capabilities of Pentium IV CPU Capabilities of Pentium IV Multiple Instructions Can Execute in Parallel Multiple Instructions Can Execute in Parallel ¡ 1 load, with address computation ¡ 1 store, with address computation ¡ 2 simple integer (one may be branch) ¡ 1 complex integer (multiply/divide) ¡ 1 FP/SSE3 unit ¡ 1 FP move (does all conversions) Some Instructions Take > 1 Cycle, but Can be Pipelined Some Instructions Take > 1 Cycle, but Can be Pipelined ¡ Instruction Latency Cycles/Issue ¡ Load / Store 5 1 ¡ Integer Multiply 10 1 ¡ Integer/Long Divide 36/106 36/106 ¡ Single/Double FP Multiply 7 2 ¡ Single/Double FP Add 5 2 ¡ Single/Double FP Divide 32/46 32/46 – 6 – Instruction Control Instruction Control Instruction Control Grabs Instruction Bytes From Memory Grabs Instruction Bytes From Memory ¡ Based on current PC + predicted targets for predicted branches ¡ Hardware dynamically guesses whether branches taken/not taken and (possibly) branch target Translates Instructions Into Translates Instructions Into Operations Operations (for CISC style CPUs) (for CISC style CPUs) ¡ Primitive steps required to perform instruction ¡ Typical instruction requires 1–3 operations Converts Register References Into Converts Register References Into Tags Tags ¡ Abstract identifier linking destination of one operation with sources of later operations Instruction Control Instruction Cache Fetch Control Instruction Decode Address Instrs....
View Full Document

{[ snackBarMessage ]}

Page1 / 51

lecture15 - CIS 450 Computer Architecture and Organization...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online