This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 24 Alpha microprocessors have been performance leaders since their introduction in 1992. The first generation 21064 and the later 21164 1,2 raised expectations for the newest generation—performance leadership was again a goal of the 21264 design team. Benchmark scores of 30+ SPECint95 and 58+ SPECfp95 offer convincing evidence thus far that the 21264 achieves this goal and will con- tinue to set a high performance standard. A unique combination of high clock speeds and advanced microarchitectural techniques, including many forms of out-of-order and speculative execution, provide exceptional core computational performance in the 21264. The processor also features a high-bandwidth mem- ory system that can quickly deliver data values to the execution core, providing robust perfor- mance for a wide range of applications, includ- ing those without cache locality. The advanced performance levels are attained while main- taining an installed application base. All Alpha generations are upward-compatible. Database, real-time visual computing, data mining, med- ical imaging, scientific/technical, and many other applications can utilize the outstanding performance available with the 21264. Architecture highlights The 21264 is a superscalar microprocessor that can fetch and execute up to four instruc- tions per cycle. It also features out-of-order execution. 3,4 With this, instructions execute as soon as possible and in parallel with other nondependent work, which results in faster execution because critical-path computations start and complete quickly. The processor also employs speculative exe- cution to maximize performance. It specula- tively fetches and executes instructions even though it may not know immediately whether the instructions will be on the final execution path. This is particularly useful, for instance, when the 21264 predicts branch directions and speculatively executes down the predicted path. Sophisticated branch prediction, coupled with speculative and dynamic execution, extracts instruction parallelism from applica- tions. With more functional units and these dynamic execution techniques, the processor is 50% to 200% faster than its 21164 prede- cessor for many applications, even though both generations can fetch at most four instructions per cycle. 5 The 21264’s memory system also enables high performance levels. On-chip and off- chip caches provide for very low latency data access. Additionally, the 21264 can service many parallel memory references to all caches in the hierarchy, as well as to the off-chip memory system. This permits very high band- width data access. 6 For example, the proces- sor can sustain more than 1.3 GBytes/sec on the Stream benchmark....
View Full Document
This note was uploaded on 11/12/2011 for the course CSEE 4824 taught by Professor Carloni during the Fall '11 term at Columbia.
- Fall '11
- The Land