Unformatted text preview: nd ﬂoating-point multiplication and addition were considered important operations in design of the Pentium III, even though a signiﬁcant amount of hardware is required to achieve the low latencies and high degree of pipelining shown. On the other hand, division is relatively infrequent, and difﬁcult to implement with short latency or issue time, and so these operations are relatively slow. 5.7.3 A Closer Look at Processor Operation
As a tool for analyzing the performance of a machine level program executing on a modern processor, we have developed a more detailed textual notation to describe the operations generated by the instruction decoder, as well as a graphical notation to show the processing of operations by the functional units. Neither of these notations exactly represents the implementation of a speciﬁc, real-life processor. They are simply methods to help understand how a processor can take advantage of parallelism and branch prediction in executing a program. Translating Instructions into Operations
We present our notation by working with combine4 (Figure 5.10), our fastest code up to this point as an example. We focus just on the computation performed by the loop, since this is the dominating factor in performance for large vectors. We consider the ca...
View Full Document