23Thread1

23Thread1 - CS108, Stanford Winter 2012 Handout #23 Young...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS108, Stanford Handout #23 Winter 2012 Young Threading 1 Handout written by Nick Parlante Concurrency Trends Faster Computers How is it that computers are faster now than 10 years ago? - Process improvements -- chips are smaller and run faster - Superscalar pipelining parallelism techniques -- doing more than one thing at a time from the one instruction stream. Instruction Level Parallelism (ILP) - There is a limit to the amount of parallelism that can be extracted from a single, serial stream of instructions. - The limit is around 3x or 4x - We are well in to the diminishing-returns region of ILP technology. Hardware Trends Moore's law: the density of transistors that we can fit per square mm seems to double about every 18 months -- due to figuring out how to make the transistors and other elements smaller and smaller. Here are some hardware factoids to illustrate the increasing transistor budget. - The cost of a chip is related to its size in mm^2. It's a super-linear function -- doubling the size of a chip more than doubles its cost. - Notice that the chip size has varied around 100-200mm2 while the number of transistors has gone up by a factor of 100. - Each chip has a "feature size" its smallest part. As Moore's law progresses, feature size gets smaller. "um" is micrometer -- a millionth of a meter, "nm" is nanometer -- a billionth of a meter - 1989: 486 -- 1.0 um -- 1.2M transistors -- 79mm2 - 1995: Pentium MMX 0.35 um -- 5.5 M transistors -- 128 mm2 - 1997: AMD Athlon -- 0.25 um -- 22M transistors -- 184mm2 - 2001: Pentium 4 -- 0.18um -- 42M transistors -- 217 mm2 - 2004: Prescott Pentium 4 -- 90nm -- 125M transistors -- 112 mm2 - 2006: Core 2 Duo -- 65nm -- 291M transistors -- 143mm2 - 2008: Core 2 Penryn -- 45nm -- 410M transistors -- 107mm2 Q: what do we do with all these transistors? A: more cache A: more functional units (ILP) A: multiple cores, multiple threads on each core (SMT) 1 Billion Transistors How do you design a chip with 1 billion transistors? What will you do with them all? Extract more ILP? -- not really More and bigger cache -- ok, but there are limits Explicit concurrency -- YES
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Hardware vs. Software -- Hard Tradeoff Writing serial, single-thread software is much easier -- key advice to remember! Therefore, hardware thus far has largely been spent in extracting more ILP from a serial stream of instructions. That is, we put the burden on the hardware, and keep the software simple. But we are hitting a limit there For better performance, we can now move the problem to the programmers -- they must write explicitly parallel code. The code is much harder to write, but it can extract much more work from a given amount of hardware. Hardware Concurrency Trends
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 13

23Thread1 - CS108, Stanford Winter 2012 Handout #23 Young...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online