parallel_circuits - Exploiting Parallelism Greg Stitt ECE...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Exploiting Parallelism Greg Stitt ECE Department University of Florida
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Why are Custom Circuits Fast? Circuits can exploit massive amounts of parallelism Microprocessor/VLIW/Superscalar parallelism ~5 ins/cycle in best case (rarely occurs) Digital Circuits Potentially thousands of operations per cycle As many operations as will fit in device Also, supports different types of parallelism
Background image of page 2
Types of Parallelism Bit-level parallelism x = (x >>16) | (x <<16); x = ((x >> 8) & 0x00ff00ff) | ((x << 8) & 0xff00ff00); x = ((x >> 4) & 0x0f0f0f0f) | ((x << 4) & 0xf0f0f0f0); x = ((x >> 2) & 0x33333333) | ((x << 2) & 0xcccccccc); x = ((x >> 1) & 0x55555555) | ((x << 1) & 0xaaaaaaaa); C Code for Bit Reversal sll $v1[3],$v0[2],0x10 srl $v0[2],$v0[2],0x10 or $v0[2],$v1[3],$v0[2] srl $v1[3],$v0[2],0x8 and $v1[3],$v1[3],$t5[13] sll $v0[2],$v0[2],0x8 and $v0[2],$v0[2],$t4[12] or $v0[2],$v1[3],$v0[2] srl $v1[3],$v0[2],0x4 and $v1[3],$v1[3],$t3[11] sll $v0[2],$v0[2],0x4 and $v0[2],$v0[2],$t2[10] ... Binary Compilation Processor Processor Requires between 32 and 128 cycles Circuit for Bit Reversal Bit Reversed X Value Bit Reversed X Value Bit Reversed X Value . . . . . . . . . . . . . . . . . . . . . . Original X Value Processor FPGA Requires only 1 cycle  (speedup of 32x to 128x)  for same clock
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Arithmetic-level Parallelism (i.e. “wide” parallelism) for (i=0; i < 128; i++)     y[i] += c[i] * x[i] .. .. .. for (i=0; i < 128; i++)     y += c[i] * x[i] .. .. .. * * * * * * * * * * * * + + + + + + + + + + + + C Code Processor         Processor 1000’s of instructions Several thousand cycles Circuit Processor      FPGA ~ 7 cycles  (assuming 1 op per cycle) Speedup > 100x for same clock . . . . . . . . . . . . 128 multipliers
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 29

parallel_circuits - Exploiting Parallelism Greg Stitt ECE...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online