chapter5-m1-ziavras

Essentially tied on specint 25 limits to ilp doubling

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: n terms of efficiency, • IBM Power5 is the most effective user of energy on SPECFP and essentially tied on SPECINT on SPECFP and essentially tied on SPECINT 25 Limits to ILP • Doubling issue rates above today’s 3-6 instructions per clock, say to 6 to 12 instructions, probably requires a processor to – – – – issue 3 or 4 data memory accesses per cycle, resolve 2 or 3 branches per cycle, rename and access more than 20 registers per cycle, and fetch 12 to 24 instructions per cycle. • The complexities of implementing these co capabilities is likely to mean sacrifices in the maximum clock rate – E.g, widest issue processor is the Itanium 2, but it also has widest issue processor is the Itanium but it also has the slowest clock rate, despite the fact that it consumes the most power! 26 Limits to ILP • • • Most techniques for increasing performance increase power consumption Key question: is a technique energy efficient? Does it increase power consumption faster than it increases performance? Multiple issue processor techniques are energy inefficient 1. Issuing multiple instructions incurs some overhead in logic that grows faster than the issue rate grows faster than the issue rate grows 2. Growing gap between peak issue rates and sustained performance • Number of transistors switching = f(peak issue rate) of transistors switching f(peak issue rate) • Performance = f(sustained rate)= f(IPC) • Growing gap between peak and sustained performance gap between peak and sustained performance increasing energy per unit of performance 27 Commentary • Itanium does not represent a significant breakthrough in scaling ILP or in avoiding the problems of complexity and power consumption complexity and power consumption • Instead of pursuing more ILP, architects are increasingly focusing on TLP implemented with singlechip multiprocessors multiprocessors • In 2000, IBM announced the 1st commercial singlechip, general-purpose multiprocessor, the Power4, which contains Power3 processors and an integrated which contains 2 Power3 processors and an integrated L2 cache – Since then, Sun Microsystems, AMD, and Intel have switched focus to single-chip multiprocessors rather than more aggressive uniprocessors • Right balance of ILP and TLP is unclear today – Perhaps right choice for server market, which can exploit more TLP, may differ from desktop market, where single-thread performance may continue to be a primary requirement 28 And in conclusion … • Limits to ILP (power efficiency, compilers, dependencies …) seem to limit to 3-6 simultaneously issued instrs in practice • Explicitly parallel (DLP pr TLP) is next step to performance • Coarse-grained vs. fine-grained multithreading – Only on big stall vs. every clock cycle • Simultaneous Multithreading if fine grained fi multithreading based on a superscalar microarchitecture – Instead of replicating registers, reuse rename registers • Itanium/EPIC/VLIW is not a breakthrough in ILP • Balance of ILP and TLP decided in marketplace of ILP and TLP decided in marketplace 29...
View Full Document

Ask a homework question - tutors are online