CA3 - Chapter 3 Arithmetic for Computers (multiplication)...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 3 Arithmetic for Computers (multiplication) csci4203/ece4363 csci4203/ece4363 1 Two's Complement Operations • Negating a two's complement number: invert all bits and add 1 – “negate” and “invert” are quite different! • Converting n bit numbers into numbers with more than n bits: – MIPS 16 bit immediate gets converted to 32 bits for arithmetic – copy the most significant bit (the sign bit) into the other bits 0010 1010 -> 0000 0010 -> 1111 1010 csci4203/ece4363 csci4203/ece4363 2 – "sign extension" (lbu vs. lb) Overflow Overflow: the result is too large (or too small) to • Detection • • • represent properly – Example: ­ 8 ≤ 4­bit binary number ≤ 7 When adding operands with different signs, overflow cannot occur! Overflow occurs when adding: – 2 positive numbers and the sum is negative – 2 negative numbers and the sum is positive Detect overflow by: Carry into MSB ≠ Carry out of MSB csci4203/ece4363 csci4203/ece4363 3 Examples of overflow using 4 bit 2’s complement addition 0 + 1 0 0 1 0 1 1 0 1 0 1 0 1 1 0 –4 1 –5 17 4 1 1 0 0 1 1 1 1 17 13 0 –6 1 + csci4203/ece4363 csci4203/ece4363 Overflow Detection Logic • Carry into MSB ≠ Carry out of MSB CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit ALU CarryIn3 1-bit ALU CarryOut3 Result2 X 0 0 1 1 Y 0 1 0 1 – For a N­bit ALU: Overflow = CarryIn[N ­ 1] XOR CarryOut[N ­ 1] X XOR Y 0 1 1 0 Overflow Result3 csci4203/ece4363 csci4203/ece4363 5 MIPS arithmetic instructions Meaning Instruction Example add add subtract add immediate add unsigned subtract unsigned add imm. unsign. multiply multiply multiply unsigned divide divide divide unsigned remainder Move from Hi Move from Lo add $1,$2,$3 $1 = $2 + $3 add sub $1,$2,$3 $1 = $2 – $3 addi $1,$2,10 $1 = $2 + 10 addu $1,$2,$3 $1 = $2 + $3 subu $1,$2,$3 $1 = $2 – $3 addiu $1,$2,10 $1 = $2 + 10 mult $2,$3 mult multu$2,$3 div $2,$3 div divu $2,$3 divu mfhi $1 mflo $1 Hi, Lo = $2 x $3 Hi, Lo = $2 x $3 Lo = $2 ÷ $3, Hi = $2 mod $3 Lo = $2 ÷ $3, Hi = $2 mod $3 Hi $1 = Hi $1 = Lo csci4203/ece4363 csci4203/ece4363 Comments 3 operands; exception possible 3 operands; exception possible + constant; exception possible 3 operands; no exceptions 3 operands; no exceptions + constant; no exceptions 64-bit signed product 64-bit unsigned product Lo = quotient, Hi = remainder Lo Unsigned quotient & Unsigned Used to get copy of Hi Used to get copy of Lo 6 Paper and pencil example (unsigned): • (unsigned) 1000 1001 1000 0000 0000 0000 1000 01001000 Multiplicand Multiplier MULTIPLY Product • m bits x n bits = m+n bit product • Binary makes it easy: 0 => place 0 1 => place a copy ( 0 x multiplicand) ( 1 x multiplicand) 7 csci4203/ece4363 csci4203/ece4363 Unsigned Combinational 0 0 0 Multiplier A A A A 3 2 1 0 0 B0 B1 B2 B3 A3 A2 A1 A0 A3 A2 A1 A0 A3 A2 A1 A0 P7 P6 P5 P4 P3 P2 P1 P0 • • Q: How much hardware for 32 bit multiplier? What is the critical path? csci4203/ece4363 csci4203/ece4363 8 Stage i accumulates A * 2 i if Bi == 1 How does it work? 0 0 A3 A3 P7 P6 A2 P5 0 0 A3 A2 A1 A0 P3 P2 A2 A1 A0 0 A1 A0 0 A0 0 B0 B1 A3 A2 A1 P4 B2 B3 P1 P0 • at each stage shift A left ( x 2) • use next bit of B to determine whether to add in • shifted multiplicand accumulate 2n bit partial product at each stage csci4203/ece4363 csci4203/ece4363 9 Unsigned shift­add multiplier • 64­bit Multiplicand reg, 64­bit ALU, 64­bit Product reg, 32­bit multiplier reg Multiplicand 64 bits Shift Left (version 1) Multiplier 64-bit ALU Write 32 bits Shift Right Product 64 bits Control csci4203/ece4363 csci4203/ece4363 10 Multiply Algorithm (Version 1) Multiplier0 = 1 Start 1. Test Multiplier0 Multiplier0 = 0 1a. Add multiplicand to product & place the result in Product register • Product Multiplier Multiplicand 0000 0000 0011 0000 0010 0001 0000 0110 0000 0000 0110 0000 0000 0110 0000 0000 0010 0000 0100 0000 1000 0001 0000 0010 0000 2. Shift the multiplicand reg left 1 bit 3. Shift the multiplier reg right 1 bit 32nd repetition? Done No: < 32 repetitions • 0000 0110 csci4203/ece4363 csci4203/ece4363 Yes: 32 repetitions 11 • 1 clock per cycle => ≈ 100 clocks per multiply • 1/2 bits in multiplicand always 0 – Ratio of multiply to add 5:1 to 100:1 Observations on Multiply Version 1 => 64­bit adder is wasted • 0’s inserted in right of multiplicand as shifted => least significant bits of product never changed once formed • Instead of shifting multiplicand to left, shift product to right? csci4203/ece4363 csci4203/ece4363 12 • 32­bit Multiplicand reg, 32 ­bit ALU, 64­bit (Version 2) Product reg, 32­bit Multiplier reg Multiplicand 32 bits 32-bit ALU Shift Right HI Product LO Multiplier MULTIPLY HARDWARE Shift Right 32 bits Control Write 64 bits csci4203/ece4363 csci4203/ece4363 13 Original combinational multiplier: 0 A3 A2 0 A1 0 A0 0 B0 B1 B2 B3 A3 A2 A1 A0 A3 A2 A1 A0 A3 A2 A1 A0 P7 P6 P5 P4 P3 P2 P1 P0 14 csci4203/ece4363 csci4203/ece4363 Simply warp to let product move right... 0 0 0 0 A3 A2 A1 A0 B0 A3 A2 A1 A0 B1 A3 A2 A1 A0 B2 A3 A2 A1 A0 B3 • Multiplicand stays still and product moves right csci4203/ece4363 csci4203/ece4363 P7 P6 P5 P4 P3 P2 P1 P0 15 Multiply Algorithm (Version 2) Multiplier0 = 1 Start 1. Test Multiplier0 Multiplier0 = 0 1a. Add multiplicand to the left half of product & Place the result in the left half of product reg Product Multiplier Multiplicand 0000 0000 0011 0010 1: 0010 0000 0011 0010 2: 0001 0000 0011 0010 3: 0001 0000 0001 0010 1: 0011 0000 0001 0010 2: 0001 1000 0001 0010 3: 0001 1000 0000 0010 1: 0001 1000 0000 0010 2: 0000 1100 0000 0010 3: 0000 1100 0000 0010 1: 0000 1100 0000 0010 2: 0000 0110 0000 0010 3: 0000 0110 0000 0010 0000 0110 0000 0010 csci4203/ece4363 csci4203/ece4363 2. Shift the product reg right 1 bit 3. Shift the multiplier reg right 1 bit 32nd repetition? Done No: < 32 repetitions Yes: 32 repetitions 16 Still wasted spaces Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 1a. Add multiplicand to the left half of product & Place the result in the left half of product reg Product Multiplier Multiplicand 1: 2: 3: 1: 2: 3: 1: 2: 3: 1: 2: 3: 0000 0000 0010 0000 0001 0000 0001 0000 0011 0000 0001 1000 0001 1000 0001 1000 0000 1100 0000 1100 0000 1100 0000 0110 0000 0110 0000 0110 0011 0011 0011 0001 0001 0001 0000 0000 0000 0000 0000 0000 0000 0000 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 csci4203/ece4363 csci4203/ece4363 2. Shift the product reg right 1 bit 3. Shift the multiplier reg right 1 bit 32nd repetition? Done No: < 32 repetitions Yes: 32 repetitions 17 Observations on Multiply Version 2 • Product register wastes space that exactly matches size of multiplier => combine Multiplier register and Product register csci4203/ece4363 csci4203/ece4363 18 (Version 3) • 32­bit Multiplicand reg, 32 ­bit ALU, 64­bit Product reg, (0­bit Multiplier reg) Multiplicand MULTIPLY HARDWARE 32 bits 32-bit ALU Shift Right Product (Multiplier) 64 bits Write Control csci4203/ece4363 csci4203/ece4363 19 • 2 steps per bit because Multiplier & Product combined • MIPS registers Hi and Lo are left and right half of Product • Gives us MIPS instruction MultU • How can you make it faster? Observations on Multiply Version 3 – One 32­bit adder for each bit of multiplier? – More adders to support handling multiple bits per step? e.g. 2 adders to handle 2 bits per step. csci4203/ece4363 csci4203/ece4363 20 Fast Multiplication Hardware Mplier1*Mcand Mplier0*Mcand Mplier2*Mcand Mplier3*Mcand Mplier31*Mcand …. Product63..32 product0 csci4203/ece4363 csci4203/ece4363 21 What about signed multiplication? Easiest solution is to make both positive & remember whether to complement product when done (leave out the sign bit, run for 31 steps) Apply definition of 2’s complement need to sign­extend partial products and subtract at the end Booth’s Algorithm is elegant way to multiply signed numbers using same hardware as before and save cycles can handle multiple bits at a time csci4203/ece4363 csci4203/ece4363 22 • Example 2 x 6 = 0010 x 0110: + + + + Motivation for Booth’s Algorithm 0010 0010 x 0110 0000 shift (0 in multiplier) 0010 add (1 in multiplier) add 0010 add (1 in multiplier) add 0000 shift (0 in multiplier) shift 00001100 =–2+8=2+4 = – 00010 + 01000 = 11110 + 01000 00010 0010 0010 0110 0000 • ALU with add or subtract gets same result in more than one way: • For example 6 0110 0110 x shift (0 in multiplier) – 0010 sub (first 1 in multpl.) 0000 shift 0000 (mid string of 1s) + 0010 0010 add(prior step had last 1) csci4203/ece4363 csci4203/ece4363 00001100 00001100 23 Motivation for Booth’s Algorithm A * 01111 = A*8+A*4+A*2+A*1 = A*16 – A*1 A* 011110 = A*16+A*8+A*4+A*2 = A*32 – A*2 Based observation: ­1+10000 = 01111 Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one csci4203/ece4363 csci4203/ece4363 24 Booth’s Algorithm end of run Current Bit 1 1 0 0 middle of run beginning of run Explanation Begins run of 1s Middle of run of 1s End of run of 1s Middle of run of 0s Example 0001111000 0001111000 0001111000 0001111000 Op sub none add none 011110 Bit to the Right 0 1 1 0 Originally for Speed (when shift was faster than add) • Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one csci4203/ece4363 csci4203/ece4363 25 1a. P = P ­ m 1110 +1110 Booths Example (2 x Operation Multiplicand Product 7) value 0010 0. initial 0000 0111 0 1110 0111 0 0010 0010 0010 1111 0011 1 1111 1001 1 1111 1100 1 next? 10 -> sub shift P (sign ext) 11 ­> nop, shift 11 ­> nop, shift 01 ­> add shift done 1b. 2. 3. 4a. 4b. 0010 +0010 0001 1100 1 0010 0000 1110 0 csci4203/ece4363 csci4203/ece4363 26 Booths Example (2 x Operation Multiplicand Product ­3) 0. initial value 1a. P = P ­ m 1b. 2a. 2b. 3a. 3b. 4a 4b. 0010 0000 1101 0 1110 +1110 1110 1101 0 0010 1111 0110 1 + 0010 0001 0110 1 0010 0000 1011 0 +1110 0010 1110 1011 0 0010 1111 0101 1 1111 0101 1 0010 1111 1010 1 csci4203/ece4363 csci4203/ece4363 next? 10 -> sub shift P (sign ext) 01 ­> add shift P 10 ­> sub shift 11 ­> nop shift done 27 Current Bit to the Bits Right 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 0 1 1 1 1 Radix­4 Modified Booth’s ⇒ Multiple representations Middle of zeros Single one Explanation Example 00 00 00 00 00 00 00 00 01 00 Recode 00 (0) 01 (1) 10 (­2) 01 (­1) 01 (1) 10 (2) 01 (­1) 00 (0) 28 Begins run of 1s 00 01 11 10 00 Begins run of 1s 00 01 11 11 00 Ends run of 1s Ends run of 1s Isolated 0 Middle of run csci4203/ece4363 csci4203/ece4363 00 00 11 11 00 00 01 11 11 00 00 11 10 11 00 00 11 11 11 00 Variable multiply by constants can be replaced by a sequence of shift and add e.g. A*17 A*16+A what could be the longest sequence a compiler generate? Compiler Code Gen and Optimization Table lookup may be used for byte multiplication. For architectures with no fixed point multiply instructions, fp multiply may be used lib routine can be used (called mini­code in HP­PA) csci4203/ece4363 csci4203/ece4363 29 What is the critical path? • Critical Path of n­bit Rippled­carry adder is n*CP CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit Result2 ALU CarryIn3 CarryOut2 1-bit ALU CarryOut3 csci4203/ece4363 csci4203/ece4363 30 Design Trick: Throw hardware at it Result3 Carry Look C0 = Cin Ahead A0 B0 G P S C1 = G0 + C0 • P0 A1 B1 A2 B2 G P S A 0 0 1 1 B 0 1 0 1 C-out 0 C-in C-in 1 “kill” “propagate” “propagate” “generate” G = A and B P = A xor B C2 = G1 + G0 • P1 + C0 • P0 • P1 G P S C3 = G2 + G1 • P2 + G0 • P1 • P2 + C0 • P0 • P1 • P2 A3 B3 G P S G P C4 = . . . csci4203/ece4363 csci4203/ece4363 31 Cascaded Carry Look­ ahead C0 C L A G0 P0 C1 = G0 + C0 • P0 4-bit Adder C2 = G1 + G0 • P1 + C0 • P0 • P1 4-bit Adder C3 = G2 + G1 • P2 + G0 • P1 • P2 + C0 • P0 • P1 • P2 4-bit Adder C4 = . . . G P csci4203/ece4363 csci4203/ece4363 32 Shifter Two kinds: s logical-- value shifted in is always "0" "0" msb lsb "0" arithmetic-- on right shifts, sign extend msb lsb "0" Note: these are single bit shifts. A given instruction might request 0 to 32 bits to be shifted! csci4203/ece4363 csci4203/ece4363 33 ...
View Full Document

This note was uploaded on 01/26/2011 for the course CSCI 4203 taught by Professor Weichunghsu during the Fall '05 term at Minnesota.

Ask a homework question - tutors are online