Unformatted text preview: Chapter 3 Arithmetic for Computers (division,floating point) csci4203/ece4363 csci4203/ece4363 1 Divide: Paper & 1001 1001 Divisor 1000 1001010 1000 1001010 Pencil–1000
10 10 101 1010 –1000 10 Quotient Quotient Dividend Dividend Remainder (or Modulo result) Remainder See how big a number can be subtracted, creating See quotient bit on each step quotient Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder csci4203/ece4363 csci4203/ece4363 2 DIVIDE HARDWARE (Version 1) • 64bit Divisor reg, 64bit ALU, 64bit Remainder reg, 32bit Quotient reg
Divisor 64 bits Shift Right Quotient 64bit ALU Write 32 bits Shift Left Remainder 64 bits Control csci4203/ece4363 csci4203/ece4363 3 •Takes n+1 steps for nbit Quotient & Rem.
Remainder Quotient Divisor Divide Algorithm (v1)
0010 0000 Start: Place Dividend in Remainder 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Test Remainder Remainder < 0 0000 0111 0000 Remainder ≥ 0 2a. Shift the Quotient register to the left setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the Remainder register, Also shift the Quotient register to the left, setting the new least significant bit to 0. 3. Shift the Divisor register right 1 bit. n+1 No: < n+1 repetitions csci4203/ece4363 csci4203/ece4363 repetition? Yes: n+1 repetitions Done 4 Divide Algorithm I example (7 / 2)
1: 2: 3: 1: 2: 3: 1: 2: 3: 1: 2: 2: 3: 3: 1: 1: 2: 2: 3: 3: Remainder 0000 0111 1110 0111 0000 0111 0000 0111 1111 0111 0000 0111 0000 0111 1111 1111 0000 0111 0000 0111 0000 0011 0000 0011 0000 0011 0000 0001 0000 0001 0000 0001 Quotient Divisor Quotient 00000 0010 0000 00000 0010 0000 00000 0010 0000 00000 0001 0000 00000 0001 0000 00000 0001 0000 00000 0000 1000 00000 0000 1000 00000 0000 1000 00000 0000 0100 00000 0000 0100 0000 00001 0000 0100 0000 00001 0000 0010 0000 00001 0000 0010 0000 00011 0000 0010 0000 00011 0000 0010
csci4203/ece4363 csci4203/ece4363 Answer: Quotient = 3 Remainder = 1 5 • 1/2 bits in divisor always 0 • • Observations on Divide Version 1
=> 1/2 of 64bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step cannot produce a 1 in quotient bit (otherwise too big) => switch order to shift first and then subtract, can save 1 iteration csci4203/ece4363 csci4203/ece4363 6 Divide Algorithm I example: (wasted space)
1: 2: 3: 1: 2: 3: 1: 2: 3: 1: 2: 2: 3: 3: 1: 1: 2: 2: 3: 3: Remainder Quotient Divisor Quotient 0000 0111 0010 0000 0000 1110 0111 0010 0000 0000 0000 0111 0010 0000 0000 0000 0111 0 0001 0000 1111 0111 0 0001 0000 0000 0111 0 0001 0000 0000 0111 00 0000 1000 00 1111 1111 00 0000 1000 00 0000 0111 00 0000 1000 00 0000 0111 000 0000 0100 000 0000 0011 000 0000 0100 000 0000 0011 001 0000 0100 000 0000 0011 0001 0000 0010 0000 0000 0001 0001 0000 0010 0000 0000 0001 0011 0011 0000 0010 0000 0000 0001 00011 0000 0010 0000
csci4203/ece4363 csci4203/ece4363 7 Divide: Paper & 01010 1010 Divisor 0001 00001010 Pencil 00001
–0001 –0001 0000 0001 –0001 –0001 0 00 Quotient Quotient Dividend Dividend Remainder (or Modulo result) Remainder – Notice that there is no way to get a 1 in leading digit! Notice (this would be an overflow, since quotient would have n+1 bits) csci4203/ece4363 csci4203/ece4363 8 DIVIDE HARDWARE (Version 32 • 2) bit Divisor reg, 32bit ALU, 64bit Remainder reg, 32bit Quotient reg
Divisor 32 bits Quotient 32bit ALU Shift Left Remainder 64 bits Write 32 bits Shift Left Control csci4203/ece4363 csci4203/ece4363 9 Observations on Divide (Version • Eliminate Quotient register by combining with Remainder as shifted left
– Start by shifting the Remainder left as before. – Thereafter loop contains only two steps because the shifting of the Remainder register shifts both the remainder in the left half and the quotient in the right half – The consequence of combining the two registers together and the new order of the operations in the loop is that the remainder will shifted left one time too many. – Thus the final correction step must shift back only the remainder in the left half of the register
csci4203/ece4363 csci4203/ece4363 10 2) Remainder Quotient Divisor 0000 0111 0000 0010 Divide Algorithm (v2) Start: Place Dividend in Remainder 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Test Remainder Remainder < 0 Remainder ≥ 0 3a. Shift the Quotient register to the left setting the new rightmost bit to 1. 3b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. No: < n repetitions nth repetition? Done
csci4203/ece4363 csci4203/ece4363 Yes: n repetitions (n = 4 here)
11 DIVIDE HARDWARE (Version 3) • 32bit Divisor reg, 32 bit ALU, 64bit Remainder reg, (0bit Quotient reg)
Divisor 32 bits
32bit ALU
“HI” “LO” Shift Left Remainder (Quotient) Control
Write 64 bits csci4203/ece4363 csci4203/ece4363 12 Remainder 0000 0111 Divide Algorithm v3 Divisor 0010 Start: Place Dividend in Remainder 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder ≥ 0 Test Remainder Remainder < 0 3a. Shift the Remainder register to the left setting the new rightmost bit to 1. 3b. Restore the original value by adding the Divisor register to the left half of the Remainder register,. Also shift the Remainder register to the left, setting the new least significant bit to 0. nth repetition?
csci4203/ece4363 csci4203/ece4363 No: < n repetitions Yes: n repetitions (n = 4 here) Done. Shift left half of Remainder right 1 bit. 13 Observations on Divide Version 3 • Same Hardware as Multiply: just need ALU to add • • or subtract, and 64bit register to shift left or right Hi and Lo registers in MIPS combine to act as 64bit register for multiply and divide Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary
– Note: Dividend and Remainder must have same sign – Note: Quotient negated if Divisor sign & Dividend sign disagree e.g., –7 ÷ 2 = –3, remainder = –1 – What about? –7 ÷ 2 = –4, remainder = +1
csci4203/ece4363 csci4203/ece4363 14 Divide by Powerof 2 u >> k gives u / >> – Uses arithmetic shift – Rounds wrong direction when u < 0 Simple example: 7/2 Wants –3, but –7 >> 1 gives 4 Assuming 8 bits representation 7 = 1111 1001 7 >> 1 = 1111 1100 =4 Biased dividend toward 0 if negative (7 + 1) >> 1 = 3 Bias = 2k 1 csci4203/ece4363 csci4203/ece4363 2k 15 Correct Divide by Powerof 2 Negative Negative numbers need to be adjusted with a bias Dividend and Remainder must have same sign Dividend In C it can be written as: In (u<0 ?(u+(1<<k)1))>> k : u >> k) csci4203/ece4363 csci4203/ece4363 16 We need a way to represent We Floating Point (a brief look) Numbers with fractions, for example, 0.5, 2.25 Numbers 0.5, very large numbers, larger than 231 , 263, for example, very for 1.2345678x10300. 1.2345678x10 very small numbers, for example, 1.2x1010 1.2x10 Representation: Representation: sign,exponent,significand: (–1)sign× significand × 2exponent significand more bits for significand gives more accuracy more bits for exponent increases range IEEE 754 floating point standard: single precision: 8 bit exponent, 23 bit significand double precision: 11 bit exponent, 52 bit significand csci4203/ece4363
csci4203/ece4363 17 FloatingPoint Representation of floating point numbers in IEEE 754 standard: Arithmetic 1 8 23
single precision sign S E M actual exponent is e = E  127 S E127 N = (1) 2 (1.M) mantissa: exponent: sign + magnitude, normalized excess 127 binary integer binary significand w/ hidden integer bit: 1.M 0 < E < 255 0 = 0 00000000 0 . . . 0
126 1.5 = 1 01111111 10 . . . 0 Magnitude of numbers that can be represented is in the range: 2 to 2 127 (2  2 23 ) 38 which is approximately: 38 to 1.8 x 10 3.40 x 10 csci4203/ece4363 csci4203/ece4363 18 Basic Addition Algorithm/Multiply For addition (or subtraction) this translates into the following steps: issues
(1) compute Ye  Xe (getting ready to align binary point) (2) right shift Xm that many positions to form Xm 2 (3) compute Xm 2
XeYe XeYe + Ym if representation demands normalization, then normalization step follows: (4) left shift result, decrement result exponent (e.g., 0.001xx…) right shift result, increment result exponent (e.g., 101.1xx…) continue until MSB of data is 1 (NOTE: Hidden bit in IEEE Standard) (5) for multiply, doubly biased exponent must be corrected: = 7+8 Xe = 1111 = 15 Xe = 7 = 3 + 8 Ye = 0101 =5 Ye = 3 4+8+8 10100 20 Excess 8 extra subtraction step of the bias amount (6) if result is 0 mantissa, may need to zero exponent by special step
csci4203/ece4363 csci4203/ece4363 19 FP Addition 1. start Compare exponents, Shift the small one to the right until it match the larger one 2. Add the significands 3. Normalize the sum, Shift right & increment exponent Or shift left & decrement exponent overflow or underflow no 4. Round the significand Still Normalized?
csci4203/ece4363 csci4203/ece4363 20 yes exception FP Multiply 1. 2. 3. start
Add the biased exponents, adjust the sum Multiply the significands Normalize the product, Shift right & increment exponent yes exception overflow or underflow no 4. Round the significand Still Normalized?
csci4203/ece4363 csci4203/ece4363 yes Set the sign
21 FP Multiply Example Let’s try 0.75 x 0.4375 In binary, this is 1.1x21 x 1.110 x 22 Step1: Step1: 1 + 2 = 3 Biased representation: (1+127)+(2+127)127=124 (1+127)+(2+127)127=124 csci4203/ece4363 csci4203/ece4363 22 FP Multiply Example (cont.) Let’s try 0.75 x 0.4375 In binary, this is 1.1x21 x 1.110 x 22 Step2: multiply the significands 1.100 x 1.110 = 10.101000 Step3: Normalize 10.101 x 23 = 1.0101 x 22 = 1.3125 x 0.25 10.101 1.0101 = 0.328125 Step4: set the sign to negative
csci4203/ece4363 Result= 0.328125csci4203/ece4363 23 Denormalized Numbers (when exponent=0)
0 denorm bias 1bias 2 2 gap normal numbers with hidden bit > 2 2bias The gap between 0 and the next representable number is much larger than the gaps between nearby representable numbers. IEEE standard uses denormalized numbers to fill in the gap, making the distances between numbers near 0 more alike. 2bias 0 p1 bits of precision 2 bias p bits of precision 2 1bias 2 same spacing, half as many values! NOTE: PDP11, VAX cannot represent subnormal numbers. These machines underflow to zero instead.
csci4203/ece4363 csci4203/ece4363 24 Infinity and NaNs
Result of operation overflows, i.e., is larger than the largest number that can be represented Overflow is not the same as divide by zero (raises a different exception) +/ infinity S 1...1 0...0 It may make sense to do further computations with infinity e.g., X/0 > Y may be a valid comparison Not a number, but not infinity (e.q. sqrt(4)) invalid operation exception (unless operation is = or =) NaN S 1 . . . 1 nonzero HW decides what goes here NaNs propagate: f(NaN) = NaN csci4203/ece4363 csci4203/ece4363 25 • Standard C adopts Quiet NaN. When NaN is adopts generated, no exception will be reported. NaN will propagate. propagate. • NaN does imply the computation is wrong. So why NaN not generate a signal? Numerical C extension is pushing for Signaling NaN. pushing • Signaling NaN will make it diffcult to deal with Signaling speculative execution. speculative Signaling/Quiet NaNs e.g. e.g. f1=f2/f3; f1=f2/f3; if (a > b) { f4=c[i]/f7;} can we overlap f2/f3 with c/f7?
26 csci4203/ece4363 csci4203/ece4363 Default initial value?
• 0 is used for integers, how about floating point? should it be 0.0 or NaN
• Program bugs can be hidden for a long time under Program incorrect default initial value 0.0. NaN will propagate to everywhere, so the bugs will likely be detected. to • Some game programs don’t like NaN since it will Some unnecessarily mess up the screen and results. unnecessarily csci4203/ece4363 csci4203/ece4363 27 Test Yourself
What is the smallest nonzero, positive number What for IEEE 754 single precision? for
a) a) b) c) d) 1.0 * 2127 1.0 * 2126 1.0 * 2149 1.0 * 21022 Answer: C csci4203/ece4363 csci4203/ece4363 28 Test Yourself
What is the greatest positive number for IEEE What 754 double precision? 754
a) a) b) c) d) ~1.0 * 2127 ~1.0 * 21023 ~1.0 * 21024 ~1.0 * 22047 Answer: C csci4203/ece4363 csci4203/ece4363 29 Test Yourself
Which of the following computation does not Which loose precision.
a) a) b) b) c) c) d) d) Doing Int32*Int32 in double precision Doing Doing Int64*Int64 in double precision Doing Doing Int32*Int32 in single precision Doing Doing Int64*Int64 in single precision Doing Answer: a
csci4203/ece4363 csci4203/ece4363 30 Test Yourself
Should we always use double precision to Should ensure required precision and range? ensure
Double precision requires twice the memory space (it matters when arrays are large) Single precision operations have lower latency Single precision operations have higher bandwidth (e.g. SSE2 in PentiumIV)
csci4203/ece4363 csci4203/ece4363 31 • Associtivity is Why FP Operations are not associative X+(Y+Z) = (X+Y)+Z Suppose x= 1.5 x 1038 , y= 1.5 x 1038 , and z=1.0 1.5 10 1.5 10 X+(y+z) = 1.5 x 1038 + (1.5 x 1038 +1.0) 1.5 10 1.5 10 = 1.5 x 1038 + 1.5 x 1038 1.5 10 1.5 10 = 0.0 (x+y)+z = (1.5 x 1038 + 1.5 x 1038 ) + 1.0 1.5 10 1.5 10 = 1.0 Insufficient precision to keep 1.0 in result csci4203/ece4363 csci4203/ece4363 32 • In general, cannot make use of associtivity for fp FP Operations Optimizations • optimization. The compiler usually relies on user options that relax precision requirements. e.g. (a+b+c+d) (a+b) + (c+d) would be a very desirable transformation. FP divide is most expensive. c1= 1.0/c; loop { Loop { a[i] = b[i]/c; a[i] = b[i]* c1; } }
csci4203/ece4363 csci4203/ece4363 Need to check if loop iterate 0 times
33 • Pentium FP Divider uses algorithm to generate multiple bits per Pentiu m Bug steps – FPU uses most significant bits of divisor & dividend/remainder to guess next 2 bits of quotient – Guess is taken from lookup table: 2, 1,0,+1,+2 (if previous guess too large a reminder, quotient is adjusted in subsequent pass of 2) – Guess is multiplied by divisor and subtracted from remainder to generate a new remainder – Called SRT division after 3 people who came up with idea csci4203/ece4363 csci4203/ece4363 34 Pentium Bug (cont.)
• Pentium table uses 7 bits of remainder + 4 bits of divisor = 211 • •
entries 5 entries of divisors omitted: 1.0001, 1.0100, 1.0111, 1.1010, 1.1101 from PLA (fix is just add 5 entries back into PLA: cost $200,000) Self correcting nature of SRT => string of 1s must follow error – e.g., 1011 1111 1111 1111 1111 1011 1000 0010 0011 0111 1011 0100 (2.99999892918) Since indexed also by divisor/remainder bits, sometimes bug doesn’t show even with dangerous divisor value • csci4203/ece4363 csci4203/ece4363 35 • • • Pentium bug appearance First 11 bits to right of decimal point always correct: bits 12 to 52 where bug can occur (4th to 15th decimal digits) FP divisors near integers 3, 9, 15, 21, 27 are dangerous ones: – 3.0 > d ≥ 3.0 36 x 2–22 , 9.0 > d ≥ 9.0 36 x 2–20 – 15.0 > d ≥ 15.0 36 x 2–20 , 21.0 > d ≥ 21.0 36 x 2–19 In Microsoft Excel, try (4,195,835 / 3,145,727) * 3,145,727 – = 4,195,835 => not a Pentium with bug – = 4,195,579 => Pentium with bug (assuming Excel doesn’t already have SW bug patch) – Rarely noticed since error in 5th significant digit – Success of IEEE standard made discovery possible: all computers should get same answer csci4203/ece4363 csci4203/ece4363 36 • • • Pentium Bug Time June 1994: Intel discovers bug in Pentium: takes months to make line change, reverify, put into production: plans good chips in January • • 1995 4 to 5 million Pentiums produced with bug Scientist suspects errors and posts on Internet in September 1994 Nov. 22 Intel Press release: “Can make errors in 9th digit ... Most engineers and financial analysts need only 4 of 5 digits. Theoretical mathematician should be concerned. ... So far only heard from one.” Intel claims happens once in 27,000 years for typical spread sheet user: – 1000 divides/day x error rate assuming numbers random Dec 12: IBM claims happens once per 24 days: Bans Pentium sales – 5000 divides/second x 15 minutes = 4,200,000 divides/day – IBM statement: http://www.ibm.com/Features/pentium.html – Intel said it regards IBM's decision to halt shipments of its Pentium processorbased systems as unwarranted.
csci4203/ece4363 csci4203/ece4363 37 Pentium conclusion: Dec. 21, 1994 “To owners of Pentium processorbased computers and the PC community: ($500M writeoff) We at Intel wish to sincerely apologize for our handling of the recently publicized Pentium processor flaw.
………… We want to resolve these concerns. Intel will exchange the current version of the Pentium processor for an updated version, in which this floatingpoint divide flaw is corrected, …..” Sincerely, Andrew S. Grove Craig R. Barrett Gordon E. Moore President /CEO Executive Vice President Chairman of the Board
csci4203/ece4363 csci4203/ece4363 38 Summary • Bits have no inherent meaning: operations determine • Divide can use same hardware as multiply: Hi & Lo •
whether they are really ASCII characters, integers, floating point numbers registers in MIPS Floating point basically follows paper and pencil method of scientific notation using integer algorithms for multiply and divide of significands csci4203/ece4363 csci4203/ece4363 39 ...
View
Full Document
 Fall '05
 WeiChungHsu
 Remainder, IEEE 7542008, Remainder register, Remainder Quotient Divisor

Click to edit the document details