3 on page 378 the design is based upon a set of

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: position corresponding to the destination register. In this figure the FIFO controls 16 registers (in AMULET 1 the FIFO has a column for each physical register, including the ARM banked registers) and is shown in a state where the first result to arrive will be written into rO, the second into r2, the third into r!2 and the fourth FIFO stage is empty. Register locking Figure 14.4 AMULET register lock FIFO organization. 380 The AMULET Asynchronous ARM Processors If a subsequent instruction requests r!2 as a source operand, an inspection of the FIFO column corresponding to r!2 (outlined in the figure) reveals whether or not r!2 is valid. AT anywhere in the column signifies a pending write to r!2, so its current value is obsolete. The read waits until the ' 1' is cleared, then it can proceed. The 'inspection' is implemented in hardware by a logical 'OR' function across the column for each register. This may appear hazardous since the data in the FIFO may move down the FIFO while the 'OR' output is being used. However, data moves in an asynchronous FIFO by being duplicated from one stage into the next, only then is it removed from the first stage, so a propagating ' 1' will appear alternately in one or two positions and it will never disappear completely. The 'OR' output will therefore be stable even though the data is moving. AMULET 1 depends entirely on the register locking mechanism to maintain register coherency, and as a result the execution pipeline is stalled quite frequently in typical code. (Register dependencies between consecutive instructions are common in typical code since the compiler makes no attempt to avoid them because standard ARM processors are insensitive to such dependencies.) AMULET! performance AMULET 1 was developed to demonstrate the feasibility of designing a fully asynchronous implementation of a commercial microprocessor architecture. The prototype chips were functional and ran test programs generated using standard ARM development tools. The performance of the prototype...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online