L5_comp_arith_FP

L5_comp_arith_FP - CS 324 Computer CS Architecture Lecture...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS 324 Computer CS Architecture Lecture 5: Instruction Set Architecture/ Complex Arithmetic Sequential Circuits (memory elements) Sequential Clocks Clocks – needed to decide when state should be updated by convention, edge-triggered (state change on clock edge) by – free-running signal with fixed cycle time (period) 2 parts: clock is high or clock is low parts: – frequency is inverse of cycle time rising edge Sequential Circuits (memory elements) Sequential edge-triggered (synchronous system) edge – signals written into state elements must be valid on active edge valid => value won’t change until inputs change valid clock cycle must be long enough for signals in combinational block to clock stabilize some state elements written on every edge, others only under certain some conditions Building a 32 bit ALU Building CarryIn a0 b0 CarryIn ALU0 CarryOut Operation Result0 cin a1 CarryIn ALU1 CarryOut Result1 a b1 a2 b2 CarryIn ALU2 CarryOut Result2 b cout a31 b31 CarryIn ALU31 Result31 ripple carry adder: cout of LSB can effect cout of MSB Ripple-carry Adders Ripple MSB waits for sequential evaluation of all adders MSB – too slow for time-critical hardware Time to circuit output is prop to max logic levels thru Time which signal passes – takes 2 levels to compute c1 from a0and b0, and two more to compute c2 from a1,b1, and c1 so … from and – with n adders, 2n levels Slowest adder but, Cheapest Slowest Cheapest – built with n simple cells connected in simple, regular way Speeding Up Addition Speeding Why is this important? Why – FP operations eventually reduce to integer ops – Even if no explicit arithmetic Must increment PC Must Manipulate addresses Manipulate Problem: ripple carry adder is slow Is there more than one way to do addition? Is – two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c1 c2 c3 c4 = = = = b0c0 b1c1 b2c2 b3c3 + + + + a0c0 a1c1 a2c2 a3c3 + + + + a0b0 a1b1 a2b2 a3b3 Not feasible! Why? Not rewrite c2 to get rid of c1 rewrite c1 = b0c0 + a0c0 + a0b0 c2 = b1c1 + a1c1 + a1b1 =(b1+a1)(b0c0 + a0c0 + a0b0)+a1b1 = b1b0c0+b1a0c0+b1a0b0+a1b0c0+a1a0c0+a1a0b0 +a1b1 Imagine how this equation expands as we get to higher bits! Equation expands exponentially with the number of bits. Requires too much hardware. Carry-lookahead adder (CLA): An approach in-between our two extremes sumi = sum ai bi ci + ai bi ci + ai bi c + ai bi ci – If we don't know the value of carry-in(ci), what could we do? want some way to write ci in terms of ai and bi want ci +1 = ai bi + ai ci + bi ci factored: ci +1 = ai bi + (ai + bi )ci gi = ai bi pi = ai + bi – When would we always generate a carry? generate – When would we propagate the carry? propagate c1 c2 c3 c4 = = = = g0 g1 g2 g3 + + + + p0c0 p1c1 p2c2 p3c3 Consider what happens when gi = 1; What about when gi = 0 and pi = 1? The adder generates a Ci+1, independent of the value of Ci, when gi = 1 Carry-lookahead adder (CLA) sumi = sum ai bi ci + ai bi ci + ai bi c + ai bi ci ; gi = ai bi; pi = ai + bi begin replacing ci with ci-1 begin c1 = g0 + p0c0 c2 = g1 + p1c1 c3 = g2 + p2c2 c4 = g3 + p3c3 Feasible! ci = gi−1 + pi−1gi−2 + pi−1 pi−2 gi−3 + ...+ pi−1 pi−2 ...p1g0 + pi−1 pi−2...p1 p0c0 CLA CLA ci = gi−1 + pi−1gi−2 + pi−1 pi−2gi−3 +...+ pi−1 pi−2...p1g0 + pi−1 pi−2...p1 p0c0 Consider the circuitry for a CLA for 3 bits: Consider – recall g i = ai bi , pi = ai + bi c 2 = g1 + p1c1 = g1 + p1 (g 0 c 0 ) = g1 + p1g 0 + p1p 0 c 0 Use principle to build bigger adders Use Could use ripple carry of 4-bit CLA adders Could – 4-bit adder block sends generate/propagate signals for the next 4-bit adder block: P0 = p3 p2 p1 p0; G0 = g3+(p3g2)+(p3p2g1)+(p3p2p1g0) Pi will be true iff each of the bits in the group propagate a carry Gi will be true iff an earlier generate is true and all intermediate propagates are true Use principle to build bigger adders CarryIn a0 b0 a1 b1 a2 b2 a3 b3 CarryIn Result0--3 ALU0 P0 G0 C1 a4 b4 a5 b5 a6 b6 a7 b7 CarryIn Result4--7 ALU1 P1 G1 C2 CarryIn Result8--11 ALU2 P2 G2 C3 CarryIn Result12--15 ALU3 P3 G3 C4 CarryOut pi + 3 gi + 3 ci + 4 pi + 2 gi + 2 ci + 3 pi + 1 gi + 1 ci + 2 pi gi Carry-lookahead unit ci + 1 Hierarchical scheme is used Hierarchical – usually consists of 4 or 8-bit CLAs – feed another level of lookahead circuitry that produces the carry in to each of the blocks at the same time – All logic begins evaluating at tick – Result does not change once output of each gate stops changing Fewer gates are traversed to send c signal Fewer output stops changing sooner output time for adder is less time a8 b8 a9 b9 a10 b10 a11 b11 a12 b12 a13 b13 a14 b14 a15 b15 Multiplication Multiplication More complicated than More addition – accomplished via shifting and addition More time and more area More Let's look at 3 versions based Let's on grade school algorithm: – n-bit x m-bit result is m+n bits – must cope with overflow 0010 __x_1011 0010 0010 0000 0010___ 10110 (multiplicand) (multiplier) Multiplicand shifted by 0 bits Multiplicand shifted by 1 bits Multiplicand shifted by 3 bits Negative numbers: convert Negative and multiply Multiplication Multiplication For simple multipliers For – we shift and add the multiplicand to a result register repeatedly – at each shift, test corresponding digit of the multiplier if 0 we skip the add if if 1 we carry out the add if clever schemes take adv of space vacated by shifts clever Multiplication: Implementation Multiplication: Start Multiplicand Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 0000…0010 64 bits Shift left 0000…1011 Multiplier Shift right 32 bits 1a. Add multiplicand to product and place the result in Product register 64-bit ALU Product 0000…0010 Write 2. Shift the Multiplicand register left 1 bit Control test 64 bits 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiplication: Implementation Multiplication: Start 0010 __x_1011 Multiplier0 = 0 Multiplier0 = 1 1. Test Multiplier0 Multiplicand 0000…00100 64 bits 0000…0010 1a. Add multiplicand to product and place the result in Product register Shift left 0000…0101 Multiplier Shift right 32 bits 64-bit ALU 2. Shift the Multiplicand register left 1 bit Product 0000…0110 64 bits Write Control test 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiplication: Implementation Multiplication: Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Multiplicand 0000…001000 Shift left 64 bits 0000…0110 1a. Add multiplicand to product and place the result in Product register 0000…0010 Multiplier Shift right 32 bits 64-bit ALU 2. Shift the Multiplicand register left 1 bit Product 0000…0110 64 bits Write Control test 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiplication: Implementation Multiplication: Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Multiplicand 0000…010000 Shift left 64 bits 0000…0110 1a. Add multiplicand to product and place the result in Product register 0000…0001 Multiplier Shift right 32 bits 64-bit ALU 2. Shift the Multiplicand register left 1 bit Product 0000…10110 64 bits Write Control test 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiplication: Implementation Multiplication: Start Multiplicand Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Shift left 64 bits 1a. Add multiplicand to product and place the result in Product register 64-bit ALU Multiplier Shift right 32 bits Product Write 2. Shift the Multiplicand register left 1 bit Control test 64 bits 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done but … • if 1 cycle per step, then 100 cycles to complete • as we shifted, filled with zeros so right multiplicand bits wasted. Better to shift product right, then only need 32 bits for multiplicand, adder need only be 32 bits Note: See nice example p.253 Second Version Second Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Multiplicand 32 bits 1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register 32-bit ALU Multiplier Shift right 32 bits Product Shift right Write Control test 2. Shift the Product register right 1 bit 64 bits 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions but … • prod. reg. has wasted space the size of multiplier • as wasted space in product. disappears, so do bits of multiplier! Done Final Version Final Start Multiplicand Product0 = 1 1. Test Product0 Product0 = 0 32 bits 1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register 32-bit ALU Product 2. Shift the Product register right 1 bit Shift right Write Control test 64 bits 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done • multiplier is assigned to right half of product register • add to left half of product register then • shift right Multiplication Multiplication Multiplication via shift and add hardware Multiplication Compilers often use shifts for multiplication by 2n Compilers Signed multiplication requires sign-bit replication Signed with each shift Other algorithms take advantage of factorization Other – Booth’s algorithm Floating-Point Floating What can be represented in N bits? What – Unsigned 0 – 2s Complement - 2N-1 to – 1s Complement -2N-1+1 to – But, what about? very large numbers? very very small number? very rationals rationals irrationals irrationals N to 2N 2N-1 - 1 2N-1-1 9,349,398,989,787,762,244,859,087,678 0.0000000000000000000000045691 2/3 ¦2 Recall Scientific Notation Recall decimal point 23 6.02 x 10 radix (base) 1.673 x 10 exponent Sign, magnitude -24 Mantissa Sign, magnitude E - 127 Issues: Issues: IEEE F.P. ± 1.M x 2 – Arithmetic (+, -, *, / ) – Representation, Normal form (no leading 0s left of decimal pt) – Range and Precision – Rounding – Exceptions (e.g., divide by zero, overflow, underflow) – Errors – Properties ( negation, inversion, if A ° B then A - B ° 0 ) Floating Point (a brief look) Floating We need a way to represent We – numbers with fractions, e.g., 3.1416 – very small numbers, e.g., .000000001 – very large numbers, e.g., 3.15576 ´ 109 Representation: Representation: – sign, exponent, significand (mantissa): (–1)sign ´ significand ´ 2exponent – more bits for significand gives more accuracy – more bits for exponent increases range Floating Point (a brief look) Floating IEEE 754 floating point standard: IEEE – single precision: 8 bit exponent, 23 bit significand – double precision: 11 bit exponent, 52 bit significand Leading “1” bit of significand is implicit Leading Exponent is “biased” to make sorting easier Exponent – all 0s is smallest exponent all 1s is largest – bias of 127 for single precision and 1023 for double precision – summary: (–1)sign ´ (1+significand) ´ 2exponent – bias IEEE 754 floating-point standard IEEE Example: Example: – decimal: -.75 = -3/4 = -3/22 – binary: -.11 = -1.1 x 2-1= -11x2-2 – floating point: exponent = 126 = 01111110 – IEEE single precision: 1 01111110 10000000000000000000000 sign exponent significand - 1.1 x 2126-127 Convert Binary FP to Decimal Convert 1 10000001 01000000000000000000000 sign = exp = fraction = Convert Binary FP to Decimal Convert 1 10000001 01000000000000000000000 sign = negative exp = 129 fraction = .012 = 1 x 2-2 (i.e.(0 x 2-1)+ (1 x 2-2)) = ¼ = .25 Convert Binary FP to Decimal Convert 1 10000001 01000000000000000000000 sign = negative exp = 129 fraction = .012 = 1 x 2-2 (i.e.(0 x 2-1)+ (1 x 2-2)) = ¼ = .25 - 1.25 x 2129-127= -1.25 x 22 = -5.0 Floating Point Complexities Floating Operations are somewhat more complicated Operations In addition to overflow (exp too large) we can have In “underflow” (negative exp too large) Accuracy can be a big problem Accuracy – IEEE 754 keeps two extra bits, guard and round – four rounding modes – positive divided by zero yields “infinity” – zero divide by zero yields “not a number” – other complexities Decimal FP Addition: Decimal 9.999 x 101 + 1.610 x 10-1 1.610 1) 1.610 x 10-1 = .1610 x 100 = 0.01610 x 101 = (too many digits) (too 0.016 x 101 2) 9.999 x 101 0.016 x 101 0.016 10.015 x 101 = (too many digits) (too 10.02 x 101 3) 1.002 x 102 3) Assume 4 digits for exponent Assume and significand Shift # with smallest exp Shift until exp matches lgr exp Add significands Add Normalize and check for Normalize over/underflow (does exp still fit in field?) Binary FP Addition: Binary 0.5 + (- 0.4375) 0.5 = ½ = 1 x 2-1 -0.4375 = -7/16 = -7/2-4= -.0111= -1.11 x 2-2 1) –1.11 x 2-2 = -0.111 x 2-1 2) 1.000 x 2-1 + -0.111 x 2-1 -0.001 x 2-1 3) -1.0 x 2-4 = -1.0 x 1/16 = -0.0625 Assume 4 digits for exponent and significand Shift # with smallest exp Shift until exp matches lgr exp Add significands Add Normalize and check for Normalize over/underflow (1 <= biased exp <= 254) in this case, -4 + 127 = 123 ...
View Full Document

Ask a homework question - tutors are online