This preview shows pages 1–10. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Applied linear algebra and numerical analysis Session 19 Prof. Ulrich Hetmaniuk Department of Applied Mathematics February 19, 2010 Computer arithmetic It is important to be aware of limitations and of potential pitfalls of computer arithmetic. Careless construction of algorithms can have sometimes dramatic consequences. Floating point real numbers Double precision ( 1 ) s 2 e 1023 ( 1 + m ) 1 bit for sign: s is either 0 or 1 11 bits for exponent: 0 e 2 11 1 = 2047 e = 0 is reserved for representing 0 (when m = 0). e = 0 and m 6 = 0 gives the denormalized numbers. &lt; e &lt; 2047 is the range for normalized numbers. e = 2047 is reserved to represent Inf ( m = 0) and NaN ( m 6 = 0). 52 bits for mantissa: 0 m 1 2 53 Range of positive real numbers = [ 2 1022 , 2 1023 ] [ 2 . 2 10 308 , 1 . 8 10 308 ] If an operation gives a number outside this range, we get an overflow or underflow. Floating point real numbers Special symbols Inf and NaN Inf means too large to represent. NaN (not a number), which covers all other errors. When a nonzero number is divided by zero (the divisor must be exactly zero), a divide by zero event occurs. Floatingpoint hardware is generally designed to handle operands of infinity in a reasonable way, such as (+Inf) + (+7) = (+Inf) (+Inf) (2) = (Inf) (+Inf) 0 = NaN there is no meaningful thing to do Floating point real numbers Machine epsilon Half the distance between 1 and the next larger floating point number, 1 + 2 52 . It is also defined as x R , there exists a floating number x , x x machine x . Double precision: 2 52 2 . 22 10 16 16 digits of relative accuracy Single precision: 2 23 1 . 19 10 7 7 digits of relative accuracy Floating point arithmetic When arithmetic is done the result must generally be rounded off to nearest machine number. Example On a decimal machine with 3digit mantissa and 2digit exponent: . 523 e 00 + . 745 e 00 = . 1268 e 01 will be rounded to 0.127e01. . 523 e 00 + . 745 e 04 = . 5230745 e 00 will be rounded to 0.523e00. . 523 e 00 * . 745 e 00 = . 389635 e 00 will be rounded to 0.390e00. Floating point arithmetic An awareness of when loss of significance can occur is useful. Subtraction of nearly equal operands may cause extreme loss of accuracy. Conversions to integer can be dangerous. Testing for safe division is problematic. A nonzero denominator may produce an overflow. Testing for equality is problematic. Programmers often perform comparisons within some tolerance. Example t = 1 3 , t i + 1 = q t 2 i + 1 1 t i t i + 1 = t i q t 2 i + 1 + 1 t i = t i 62 i Floating point arithmetic Naive use of floatingpoint arithmetic can lead to many problems....
View
Full
Document
This note was uploaded on 03/31/2010 for the course AMATH 352 taught by Professor Leveque during the Winter '07 term at University of Washington.
 Winter '07
 Leveque
 Numerical Analysis

Click to edit the document details