ARM.SoC.Architecture

As with a clocked processor the basic approach is

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ain time-critical control signals directly and others via the ARM instruction decoder. The asynchronous pipeline AMULETS 389 allows some elasticity in its operation, and there is a separate Thumb decode stage that effectively collapses when ARM code is running. The register read and forwarding stage collects operands from the register file and searches the reorder buffer for newer values, supplying the correct up-to-date value to the execution unit. If the correct value is not yet available the forwarding process will stall until it becomes available. Following the insights of the StrongARM designers, the AMULETS register file has three read ports. This allows nearly all instructions to be issued in a single cycle. Execute The execution stage includes the adder, shifter and multiplier, and the PSRs also fit logically into this stage. The multiplier computes 8 bits of product in a cycle, but cycles much faster than the typical processor cycle time so it computes a 32 x 32 product in around 20 ns. The adder uses the carry arbitration scheme also used in the ARM9TDMI (see 'Carry arbitration adder' on page 91). The data memory interface operates as a separate pipeline stage in load and store instructions but is bypassed by other types of instruction. A load that addresses a slow memory location will therefore not hold up other instructions in the pipeline, provided that they do not use the loaded value. The reorder buffer accepts results from the execution unit and the data memory interface in the order that they become available and stores them until they can be written back to the register file in program order. Once a result is in the reorder buffer it is available for forwarding to any instruction that needs it. The structure used to achieve this functionality is illustrated in Figure 14.11 on page 390. The buffer has four 'slots', and a slot is allocated to each result produced by an instruction when that instruction is issued. (Slots are also allocated to 'store' instructions so that memory faults can be handled correctly.) As soon as the result is available it is m...
View Full Document

Ask a homework question - tutors are online