fafafrt

fafafrt - Suppose x is a real number which is in the range...

This preview shows page 1. Sign up to view the full content.

Floating Point Numbers Most numbers cannot be represented in a computer. We are forced to use approximations which can be represented on the computer. Let the ﬂoating point approximation of x be called the ﬂoat of x and write it as ﬂ( x ). In our ﬂoating point number system, we can give a bound on the relative error that is made when approximating a number by its ﬂoat. We assume that our ﬂoating point numbers have the form ¯ x = ± 0 .b 1 b 2 . . . b t × 2 e , where e n e e p and b k is 0 or 1, but b 1 = 1. Think of it as a (base-2) fraction times 2 e . Numbers too | large | for this representation are said to overﬂow, and numbers too | small | are said to underﬂow. Since we have alloted t bits for the fractional part, the distance between ˆ x and an adjacent ﬂoat is no more than 2 e - t . Dividing this by ˆ x gives an upper bound on the relative distance between any two ﬂoats: 2 1 - t . We deﬁne the machine precision, μ , to be half of this quantity: For a ﬂoating point system with t a digit fractional part, the machine precision is μ = 2 - t . The Floating Point Representation Theorem.
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Suppose x is a real number which is in the range of the ﬂoating point system (doesn’t underﬂow or overﬂow). Then ﬂ( x ) = x (1 + ± ) , where | ± | ≤ μ This is a statement about relative error, and can also be written as | x-ﬂ( x ) | | x | ≤ μ. The set of ﬂoats is not closed under our arithmetic operations. For example, when we add two ﬂoats, the result is not necessarily a ﬂoat, and it will need to be approximated by another ﬂoat. Computers today almost always satisfy the following rule: The Fundamental Axiom of Floating Point Arithmetic. Let x op y be some arithmetic operation. That is, op is one of +,-, × or ÷ . Suppose x and y are ﬂoats and that x op y doesn’t underﬂow or overﬂow. Then ﬂ( x op y ) = ( x op y )(1 + ± ) , where | ± | ≤ μ Notice that this is a statement about ﬂoats. Real numbers need to be represented by ﬂoats before we can do the arithmetic!...
View Full Document

This note was uploaded on 12/18/2010 for the course PHYS 5073 taught by Professor Mark during the Fall '10 term at Arkansas.

Ask a homework question - tutors are online