This preview shows page 1. Sign up to view the full content.
Unformatted text preview: CS 324 Computer CS Architecture
Lecture 5: Instruction Set Architecture/ Complex Arithmetic Sequential Circuits (memory elements) Sequential
Clocks Clocks
– needed to decide when state should be updated
by convention, edgetriggered (state change on clock edge) by – freerunning signal with fixed cycle time (period)
2 parts: clock is high or clock is low parts: – frequency is inverse of cycle time rising edge Sequential Circuits (memory elements) Sequential
edgetriggered (synchronous system) edge – signals written into state elements must be valid on active edge
valid => value won’t change until inputs change valid clock cycle must be long enough for signals in combinational block to clock stabilize some state elements written on every edge, others only under certain some conditions Building a 32 bit ALU Building
CarryIn a0 b0 CarryIn ALU0 CarryOut Operation Result0 cin
a1 CarryIn ALU1 CarryOut Result1 a b1 a2 b2 CarryIn ALU2 CarryOut Result2 b cout a31 b31 CarryIn ALU31 Result31 ripple carry adder: cout of LSB can effect cout of MSB Ripplecarry Adders Ripple
MSB waits for sequential evaluation of all adders MSB
– too slow for timecritical hardware Time to circuit output is prop to max logic levels thru Time which signal passes
– takes 2 levels to compute c1 from a0and b0, and two more to compute c2 from a1,b1, and c1 so … from and – with n adders, 2n levels Slowest adder but, Cheapest Slowest Cheapest
– built with n simple cells connected in simple, regular way Speeding Up Addition Speeding
Why is this important? Why
– FP operations eventually reduce to integer ops – Even if no explicit arithmetic
Must increment PC Must Manipulate addresses Manipulate Problem: ripple carry adder is slow
Is there more than one way to do addition? Is
– two extremes: ripple carry and sumofproducts Can you see the ripple? How could you get rid of it?
c1 c2 c3 c4 = = = = b0c0 b1c1 b2c2 b3c3 + + + + a0c0 a1c1 a2c2 a3c3 + + + + a0b0 a1b1 a2b2 a3b3 Not feasible! Why? Not rewrite c2 to get rid of c1 rewrite c1 = b0c0 + a0c0 + a0b0 c2 = b1c1 + a1c1 + a1b1
=(b1+a1)(b0c0 + a0c0 + a0b0)+a1b1 = b1b0c0+b1a0c0+b1a0b0+a1b0c0+a1a0c0+a1a0b0 +a1b1 Imagine how this equation expands as we get to higher bits! Equation expands exponentially with the number of bits. Requires too much hardware. Carrylookahead adder (CLA): An approach inbetween our two extremes
sumi = sum
ai bi ci + ai bi ci + ai bi c + ai bi ci
– If we don't know the value of carryin(ci), what could we do?
want some way to write ci in terms of ai and bi want ci +1 = ai bi + ai ci + bi ci factored: ci +1 = ai bi + (ai + bi )ci
gi = ai bi pi = ai + bi – When would we always generate a carry? generate – When would we propagate the carry? propagate c1 c2 c3 c4 = = = = g0 g1 g2 g3 + + + + p0c0 p1c1 p2c2 p3c3 Consider what happens when gi = 1; What about when gi = 0 and pi = 1? The adder generates a Ci+1, independent of the value of Ci, when gi = 1 Carrylookahead adder (CLA)
sumi = sum
ai bi ci + ai bi ci + ai bi c + ai bi ci ; gi = ai bi; pi = ai + bi begin replacing ci with ci1 begin c1 = g0 + p0c0 c2 = g1 + p1c1 c3 = g2 + p2c2 c4 = g3 + p3c3 Feasible! ci = gi−1 + pi−1gi−2 + pi−1 pi−2 gi−3 + ...+ pi−1 pi−2 ...p1g0 + pi−1 pi−2...p1 p0c0 CLA CLA
ci = gi−1 + pi−1gi−2 + pi−1 pi−2gi−3 +...+ pi−1 pi−2...p1g0 + pi−1 pi−2...p1 p0c0
Consider the circuitry for a CLA for 3 bits: Consider
– recall g i = ai bi , pi = ai + bi c 2 = g1 + p1c1 = g1 + p1 (g 0 c 0 ) = g1 + p1g 0 + p1p 0 c 0 Use principle to build bigger adders Use Could use ripple carry of 4bit CLA adders Could
– 4bit adder block sends generate/propagate signals for the next 4bit adder block: P0 = p3 p2 p1 p0; G0 = g3+(p3g2)+(p3p2g1)+(p3p2p1g0)
Pi will be true iff each of the bits in the group propagate a carry Gi will be true iff an earlier generate is true and all intermediate propagates are true Use principle to build bigger adders
CarryIn a0 b0 a1 b1 a2 b2 a3 b3 CarryIn Result03 ALU0 P0 G0 C1 a4 b4 a5 b5 a6 b6 a7 b7 CarryIn Result47 ALU1 P1 G1 C2 CarryIn Result811 ALU2 P2 G2 C3 CarryIn Result1215 ALU3 P3 G3 C4 CarryOut pi + 3 gi + 3 ci + 4 pi + 2 gi + 2 ci + 3 pi + 1 gi + 1 ci + 2 pi gi Carrylookahead unit ci + 1 Hierarchical scheme is used Hierarchical
– usually consists of 4 or 8bit CLAs – feed another level of lookahead circuitry that produces the carry in to each of the blocks at the same time – All logic begins evaluating at tick – Result does not change once output of each gate stops changing
Fewer gates are traversed to send c signal Fewer output stops changing sooner output time for adder is less time a8 b8 a9 b9 a10 b10 a11 b11 a12 b12 a13 b13 a14 b14 a15 b15 Multiplication Multiplication
More complicated than More addition
– accomplished via shifting and addition More time and more area More Let's look at 3 versions based Let's on grade school algorithm:
– nbit x mbit result is m+n bits – must cope with overflow 0010 __x_1011 0010 0010 0000 0010___ 10110 (multiplicand) (multiplier)
Multiplicand shifted by 0 bits Multiplicand shifted by 1 bits Multiplicand shifted by 3 bits Negative numbers: convert Negative and multiply Multiplication Multiplication
For simple multipliers For
– we shift and add the multiplicand to a result register repeatedly – at each shift, test corresponding digit of the multiplier
if 0 we skip the add if if 1 we carry out the add if clever schemes take adv of space vacated by shifts clever Multiplication: Implementation Multiplication:
Start Multiplicand
Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 0000…0010
64 bits Shift left 0000…1011
Multiplier Shift right 32 bits 1a. Add multiplicand to product and place the result in Product register 64bit ALU Product 0000…0010 Write
2. Shift the Multiplicand register left 1 bit Control test 64 bits 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiplication: Implementation Multiplication:
Start 0010 __x_1011
Multiplier0 = 0 Multiplier0 = 1 1. Test Multiplier0 Multiplicand 0000…00100 64 bits 0000…0010
1a. Add multiplicand to product and place the result in Product register Shift left 0000…0101
Multiplier Shift right 32 bits 64bit ALU 2. Shift the Multiplicand register left 1 bit Product 0000…0110
64 bits Write Control test 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiplication: Implementation Multiplication:
Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Multiplicand 0000…001000 Shift left
64 bits 0000…0110
1a. Add multiplicand to product and place the result in Product register 0000…0010
Multiplier Shift right 32 bits 64bit ALU 2. Shift the Multiplicand register left 1 bit Product 0000…0110 64 bits Write Control test 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiplication: Implementation Multiplication:
Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Multiplicand 0000…010000 Shift left
64 bits 0000…0110
1a. Add multiplicand to product and place the result in Product register 0000…0001
Multiplier Shift right 32 bits 64bit ALU 2. Shift the Multiplicand register left 1 bit Product 0000…10110
64 bits Write Control test 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done Multiplication: Implementation Multiplication:
Start Multiplicand
Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Shift left 64 bits 1a. Add multiplicand to product and place the result in Product register 64bit ALU Multiplier Shift right 32 bits Product Write
2. Shift the Multiplicand register left 1 bit Control test 64 bits 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done but … • if 1 cycle per step, then 100 cycles to complete • as we shifted, filled with zeros so right multiplicand bits wasted. Better to shift product right, then only need 32 bits for multiplicand, adder need only be 32 bits Note: See nice example p.253 Second Version Second
Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Multiplicand 32 bits 1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register 32bit ALU Multiplier Shift right 32 bits Product Shift right Write Control test 2. Shift the Product register right 1 bit 64 bits 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions but … • prod. reg. has wasted space the size of multiplier • as wasted space in product. disappears, so do bits of multiplier! Done Final Version Final
Start Multiplicand
Product0 = 1 1. Test Product0 Product0 = 0 32 bits 1a. Add multiplicand to the left half of the product and place the result in the left half of the Product register 32bit ALU Product
2. Shift the Product register right 1 bit Shift right Write Control test 64 bits 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done • multiplier is assigned to right half of product register • add to left half of product register then • shift right Multiplication Multiplication
Multiplication via shift and add hardware Multiplication Compilers often use shifts for multiplication by 2n Compilers Signed multiplication requires signbit replication Signed with each shift Other algorithms take advantage of factorization Other
– Booth’s algorithm FloatingPoint Floating What can be represented in N bits? What
– Unsigned 0 – 2s Complement  2N1 to – 1s Complement 2N1+1 to – But, what about?
very large numbers? very very small number? very rationals rationals irrationals irrationals
N to 2N 2N1  1 2N11 9,349,398,989,787,762,244,859,087,678 0.0000000000000000000000045691 2/3 ¦2 Recall Scientific Notation Recall
decimal point 23 6.02 x 10 radix (base) 1.673 x 10 exponent Sign, magnitude 24 Mantissa Sign, magnitude E  127 Issues: Issues: IEEE F.P. ± 1.M x 2 – Arithmetic (+, , *, / ) – Representation, Normal form (no leading 0s left of decimal pt) – Range and Precision – Rounding – Exceptions (e.g., divide by zero, overflow, underflow) – Errors – Properties ( negation, inversion, if A ° B then A  B ° 0 ) Floating Point (a brief look) Floating
We need a way to represent We
– numbers with fractions, e.g., 3.1416 – very small numbers, e.g., .000000001 – very large numbers, e.g., 3.15576 ´ 109 Representation: Representation:
– sign, exponent, significand (mantissa):
(–1)sign ´ significand ´ 2exponent – more bits for significand gives more accuracy – more bits for exponent increases range Floating Point (a brief look) Floating
IEEE 754 floating point standard: IEEE
– single precision: 8 bit exponent, 23 bit significand – double precision: 11 bit exponent, 52 bit significand Leading “1” bit of significand is implicit Leading Exponent is “biased” to make sorting easier Exponent
– all 0s is smallest exponent all 1s is largest – bias of 127 for single precision and 1023 for double precision – summary: (–1)sign ´ (1+significand) ´ 2exponent – bias IEEE 754 floatingpoint standard IEEE Example: Example:
– decimal: .75 = 3/4 = 3/22 – binary: .11 = 1.1 x 21= 11x22 – floating point: exponent = 126 = 01111110 – IEEE single precision: 1 01111110 10000000000000000000000
sign exponent significand  1.1 x 2126127 Convert Binary FP to Decimal Convert
1 10000001 01000000000000000000000 sign = exp = fraction = Convert Binary FP to Decimal Convert
1 10000001 01000000000000000000000 sign = negative exp = 129 fraction = .012 = 1 x 22 (i.e.(0 x 21)+ (1 x 22)) = ¼ = .25 Convert Binary FP to Decimal Convert
1 10000001 01000000000000000000000 sign = negative exp = 129 fraction = .012 = 1 x 22 (i.e.(0 x 21)+ (1 x 22)) = ¼ = .25  1.25 x 2129127= 1.25 x 22 = 5.0 Floating Point Complexities Floating
Operations are somewhat more complicated Operations In addition to overflow (exp too large) we can have In “underflow” (negative exp too large) Accuracy can be a big problem Accuracy
– IEEE 754 keeps two extra bits, guard and round – four rounding modes – positive divided by zero yields “infinity” – zero divide by zero yields “not a number” – other complexities Decimal FP Addition: Decimal
9.999 x 101 + 1.610 x 101 1.610
1) 1.610 x 101 = .1610 x 100 = 0.01610 x 101 = (too many digits) (too 0.016 x 101 2) 9.999 x 101 0.016 x 101 0.016 10.015 x 101 = (too many digits) (too 10.02 x 101 3) 1.002 x 102 3) Assume 4 digits for exponent Assume and significand Shift # with smallest exp Shift until exp matches lgr exp Add significands Add Normalize and check for Normalize over/underflow (does exp still fit in field?) Binary FP Addition: Binary
0.5 + ( 0.4375)
0.5 = ½ = 1 x 21 0.4375 = 7/16 = 7/24= .0111= 1.11 x 22 1) –1.11 x 22 = 0.111 x 21 2) 1.000 x 21 + 0.111 x 21 0.001 x 21 3) 1.0 x 24 = 1.0 x 1/16 = 0.0625 Assume 4 digits for exponent and significand Shift # with smallest exp Shift until exp matches lgr exp Add significands Add Normalize and check for Normalize over/underflow (1 <= biased exp <= 254)
in this case, 4 + 127 = 123 ...
View
Full
Document
 Fall '08
 LBallesteros
 Computer Architecture

Click to edit the document details