It then supplies up to one instruction per clock

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: instruction and data buses to the left of the figure; the interface to the data memory is to the right of the figure and comprises an address bus, a write data bus and a read data bus. Note that the read memory data bus supports double-bandwidth transfers on load multiple instructions, and both register write ports are used in load multiple instructions to store the double-bandwidth data stream. Since load multiple instructions transfer word quantities only, it does not matter that half of the data stream bypasses the byte alignment and sign extension logic. ARMS was designed as a general-purpose processor core that can readily be manufactured by ARM Limited's many licensees, so it is not highly optimized for a particular process technology. It offers significantly (two to three times) higher performance than the simpler ARM7 cores for a similar increase in silicon area, and requires the support of double-bandwidth on-chip memory if it is to realize its full potential. One application of the ARMS core is to build a high-performance CPU such as the ARM810, described in Section 12.2 on page 323. There the double-bandwidth memory is in the form of a cache, and the chip also incorporates a memory management unit and system control coprocessor CP15. The ARMS core uses 124,554 transistors and operates at speeds up to 72 MHz on a 0.5 um CMOS process with three metal layers. ARMS applications ARMS Silicon ARMS 259 Figure 9.6 ARMS integer unit organization. The core layout can be seen in the ARMS 10 die photograph in Figure 12.6 on page 326 in the upper left-hand area of the die. 260 ARM Processor Cores 9.3 ARM9TDMI The ARM9TDMI core takes the functionality of the ARM7TDMI up to a significantly higher performance level. Like the ARM7TDMI (and unlike the ARMS) it includes support for the Thumb instruction set and an EmbeddedlCE module for on-chip debug support. The performance improvement is achieved by adopting a 5-stage pipeline to increase the maximum clock rate and by using separate instruction and data memory ports to allow an improved CPI...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online