Multiplier design all arm processors apart from the

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: the 2-bit multiplier to be implemented by a simple shift and add or subtract, possibly carrying the x 4 over to the next cycle. The control settings for the Nth cycle of the multiplication are shown in Table 4.3 on page 94. (Note that the x 2 case is also implemented with a subtract and carry; it could equally well use an add with no carry, but the control logic is slightly simplified with this choice.) Since this multiplication uses the existing shifter and ALU, the additional hardware it requires is limited to a dedicated two-bits-per-cycle shift register for the multiplier and a few gates for the Booth's algorithm control logic. In total this amounts to an overhead of a few per cent on the area of the ARM core. 94 ARM Organization and Implementation Table 4.3 The 2-bit multiplication algorithm, Nth cycle. High-speed multiplier Where multiplication performance is very important, more hardware resource must be dedicated to it. In some embedded systems the ARM core is used to perform real-time digital signal processing (DSP) in addition to general control functions. DSP programs are typically multiplication intensive and the performance of the multiplication hardware can be critical to meeting the real-time constraints. The high-performance multiplication used in some ARM cores employs a widely-used redundant binary representation to avoid the carry-propagate delays associated with adding partial products together. Intermediate results are held as partial sums and partial carries where the true binary result is obtained by adding these two together in a carry-propagate adder such as the adder in the main ALU, but this is only done once at the end of the multiplication. During the multiplication the partial sums and carries are combined in carry-save adders where the carries only propagate across one bit per addition stage. This gives the carry-save adder a much shorter logic path than the carry-propagate adder, which may have to propagate a carry across all 32 bits. Therefore several carry-save operations may be performed in a single clock cycle which can only accommodate one ca...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online