This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: L24– Paral el Processing 1 6.004 Fal 2010 12/7/10 Parallel Processing The 9-member “Baby in a Month” team Final deadline: Thu 12/9 HKN Survey! L24– Paral el Processing 2 6.004 Fal 2010 12/7/10 Taking a step back Dynamic Execution Path Static Code Path Length = number of instructions along path aka l Thread of Execution z loop: LD(n, r1) CMPLT(r31, r1, r2) BF(r2, done) LD(r, r3) LD(n,r1) MUL(r1, r3, r3) ST(r3, r) LD(n,r1) SUBC(r1, 1, r1) ST(r1, n) BR(loop) done: L24– Paral el Processing 3 6.004 Fal 2010 12/7/10 We have been building machines to execute one thread (quickly) Beta Processor Memory Execution Thread Path Length x Clocks-per-Instruction Time = Clocks-per-second L24– Paral el Processing 4 6.004 Fal 2010 12/7/10 Can we make CPI < 1 ? Two Places to Find Parallelism Instruction Level (ILP) – Fetch and issue groups of independent instructions within a thread of execution Thread Level (TLP) – Simultaneously execute multiple execution streams …Implies we can complete more than one instruction each clock cycle! L24– Paral el Processing 5 6.004 Fal 2010 12/7/10 Instruction-Level Parallelism Sequential Code This is okay, but smarter coding does be er in this example! loop: LD(n, r1) CMPLT(r31, r1, r2) BF(r2, done) LD(r, r3) LD(n,r1) MUL(r1, r3, r3) ST(r3, r) LD(n,r4) SUBC(r4, 1, r4) ST(r4, n) BR(loop) done: What if I tried to do multiple iterations at once? loop: LD(n, r1) CMPLT(r31, r1, r2) BF(r2, done) LD(r, r3) LD(n,r1) LD(n,r4) MUL(r1, r3, r3) SUBC(r4, 1, r4) ST(r3, r) ST(r4, n) BR(loop) done: l Safe z Parallel Code L24– Paral el Processing 6 6.004 Fal 2010 12/7/10 Superscalar Parallelism - Popular now, but the limits are near (8-issue) - Multiple instruction dispatch - Speculative execution L24– Paral el Processing 7 6.004 Fal 2010 12/7/10 SIMD Processing (Single Intruction Multiple Data) Each datapath has its own local data (Register File) All data paths execute the same instruction Conditional branching is di ffi cult… (What if only one CPU has R1 = 0?) Conditional operations are common in SIMD machines if (flag1) Rc = Ra <op> Rb Global ANDing or ORing of flag registers are used for high-level control Reg File ALU PC +1 or Branch Reg File ALU Reg File ALU Reg File ALU Data Memory Instruction Memory addr addr data data Addressing Unit Control Model: l hide z parallelism in primitives (eg, vector operations) This sort of construct is also becoming popular on modern uniprocessors L24– Paral el Processing 8 6.004 Fal 2010 12/7/10 SIMD Coprocessing Units SIMD data path added to a traditional CPU core...
View Full Document