454000 mips 41 mm2 power none mipsw 40 140 mw 285 05

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: on and data memory ports) which provides the higher memory bandwidth required to support higher performance. Only those instructions that require data memory access (loads and stores) pass through the data memory, so the pipeline depth is greater for these instructions than, for example, for simple data processing instructions. Further details on each of the pipeline stages are given below. AMULET3 core organization Instruction prefetch unit The prefetch unit operates autonomously, fetching 32-bit (one ARM or two Thumb) instruction packets from memory whenever it has room to store them. It incorporates a branch prediction unit that is similar to the jump trace buffer used in AMULET2, but has extensions to support branch prediction in Thumb code. A two- 388 The AMULET Asynchronous ARM Processors Figure 14.10 AMULETS core organization. instruction Thumb packet may contain zero, one or two branch instructions, and the branch predictor must handle all of these cases. When executing Thumb code the 16-entry branch prediction unit works in two 8-entry halves, with branches at even half-word addresses being stored in one half and branches at odd half-word addresses in the other. A packet with a branch in each half word may 'hit' in both halves, whereupon the target of the first instruction (at the even address) takes priority. AMULET3 includes a zero-power 'Halt' instruction, as did AMULET2. Here the halt takes effect in the prefetch unit rather than the execution unit, giving a faster resumption of full throughput when an interrupt occurs. Interrupts themselves are also handled here, again giving reduced latency. Decode and register read The decode stage includes Thumb and ARM decode logic, and the register read and forwarding mechanisms. The ARM7TDMI implemented the Thumb decode function by first converting Thumb instructions into their ARM equivalents, and then performing ARM decode. ARM9TDMI reduced decode latency by decoding Thumb instructions directly. AMULET3 adopts a middle way, decoding cert...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online