This preview shows page 1. Sign up to view the full content.
Unformatted text preview: calar implementations and can interact badly with branch prediction mechanisms. 38 The ARM Architecture On the original ARM delayed branches were not used because they made exception handling more complex; in the long run this has turned out to be a good decision since it simplifies re-implementing the architecture with a different pipeline. Single-cycle execution of all instructions. Although the ARM executes most data processing instructions in a single clock cycle, many other instructions take multiple clock cycles. The rationale here was based on the observation that with a single memory for both data and instructions, even a simple load or store instruction requires at least two memory accesses (one for the instruction and one for the data). Therefore single cycle operation of all instructions is only possible with separate data and instruction memories, which were considered too expensive for the intended ARM application areas. Instead of single-cycle execution of all instructions, the ARM was designed to use the minimum number of cycles required for memory accesses. Where this was greater than one, the extra cycles were used, where possible, to do something useful, such as support auto-indexing addressing modes. This reduces the total number of ARM instructions required to perform any sequence of operations, improving performance and code density. Simplicity An overriding concern of the original ARM design team was the need to keep the design simple. Before the first ARM chips, Acorn designers had experience only of gate arrays with complexities up to around 2,000 gates, so the full-custom CMOS design medium was approached with some respect. When venturing into unknown territory it is advisable to minimize those risks which are under your control, since this still leaves significant risks from those factors which are not well understood or are fundamentally not controllable. The simplicity of the ARM may be more apparent in the hardware organization and implementation (described in Chapter 4) than it is in the instruction set architecture. From the programmer'...
View Full Document
This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.
- Spring '09