AppF - F.1 F.2 F.3 F.4 F.5 F.6 F.7 F.8 F.9 F.10 Why Vector...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
F.1 Why Vector Processors? F-2 F.2 Basic Vector Architecture F-4 F.3 Two Real-World Issues: Vector Length and Stride F-16 F.4 Enhancing Vector Performance F-23 F.5 Effectiveness of Compiler Vectorization F-32 F.6 Putting It All Together: Performance of Vector Processors F-34 F.7 A Modern Vector Supercomputer: The Cray X1 F-40 F.8 Fallacies and Pitfalls F-44 F.9 Concluding Remarks F-45 F.10 Historical Perspective and References F-47 Exercises F-53
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
F Vector Processors Revised by Krste Asanovic Massachusetts Institute of Technology I’m certainly not inventing vector processors. There are three kinds that I know of existing today. They are represented by the Illiac-IV, the (CDC) Star processor, and the TI (ASC) processor. Those three were all pioneering processors. . . . One of the problems of being a pioneer is you always make mistakes and I never, never want to be a pioneer. It’s always best to come second when you can look at the mistakes the pioneers made. Seymour Cray Public lecture at Lawrence Livermore Laboratories on the introduction of the Cray-1 (1976)
Background image of page 2
F-2 n Appendix F Vector Processors In Chapters 2 and 3 we saw how we could signifcantly increase the per±ormance o± a processor by issuing multiple instructions per clock cycle and by more deeply pipelining the execution units to allow greater exploitation o± instruction- level parallelism. (This appendix assumes that you have read Chapters 2 and 3 and Appendix G completely; in addition, the discussion on vector memory sys- tems assumes that you have read Appendix C and Chapter 5.) Un±ortunately, we also saw that there are serious di±fculties in exploiting ever larger degrees o± ILP. As we increase both the width o± instruction issue and the depth o± the machine pipelines, we also increase the number o± independent instructions required to keep the processor busy with use±ul work. This means an increase in the number o± partially executed instructions that can be in ²ight at one time. For a dynamically scheduled machine, hardware structures, such as instruction win- dows, reorder bu±±ers, and rename register fles, must grow to have su±fcient capacity to hold all in-²ight instructions, and worse, the number o± ports on each element o± these structures must grow with the issue width. The logic to track dependencies between all in-²ight instructions grows quadratically in the number o± instructions. Even a statically scheduled VLIW machine, which shi±ts more o± the scheduling burden to the compiler, requires more registers, more ports per register, and more hazard interlock logic (assuming a design where hardware manages interlocks a±ter issue time) to support more in-²ight instructions, which similarly cause quadratic increases in circuit size and complexity. This rapid increase in circuit complexity makes it di±fcult to build machines that can control large numbers o± in-²ight instructions, and hence limits practical issue widths and pipeline depths.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 03/23/2011.

Page1 / 59

AppF - F.1 F.2 F.3 F.4 F.5 F.6 F.7 F.8 F.9 F.10 Why Vector...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online