fafafrt - Suppose x is a real number which is in the range...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
Floating Point Numbers Most numbers cannot be represented in a computer. We are forced to use approximations which can be represented on the computer. Let the floating point approximation of x be called the float of x and write it as fl( x ). In our floating point number system, we can give a bound on the relative error that is made when approximating a number by its float. We assume that our floating point numbers have the form ¯ x = ± 0 .b 1 b 2 . . . b t × 2 e , where e n e e p and b k is 0 or 1, but b 1 = 1. Think of it as a (base-2) fraction times 2 e . Numbers too | large | for this representation are said to overflow, and numbers too | small | are said to underflow. Since we have alloted t bits for the fractional part, the distance between ˆ x and an adjacent float is no more than 2 e - t . Dividing this by ˆ x gives an upper bound on the relative distance between any two floats: 2 1 - t . We define the machine precision, μ , to be half of this quantity: For a floating point system with t a digit fractional part, the machine precision is μ = 2 - t . The Floating Point Representation Theorem.
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Suppose x is a real number which is in the range of the floating point system (doesn’t underflow or overflow). Then fl( x ) = x (1 + ± ) , where | ± | ≤ μ This is a statement about relative error, and can also be written as | x-fl( x ) | | x | ≤ μ. The set of floats is not closed under our arithmetic operations. For example, when we add two floats, the result is not necessarily a float, and it will need to be approximated by another float. Computers today almost always satisfy the following rule: The Fundamental Axiom of Floating Point Arithmetic. Let x op y be some arithmetic operation. That is, op is one of +,-, × or ÷ . Suppose x and y are floats and that x op y doesn’t underflow or overflow. Then fl( x op y ) = ( x op y )(1 + ± ) , where | ± | ≤ μ Notice that this is a statement about floats. Real numbers need to be represented by floats before we can do the arithmetic!...
View Full Document

This note was uploaded on 12/18/2010 for the course PHYS 5073 taught by Professor Mark during the Fall '10 term at Arkansas.

Ask a homework question - tutors are online