This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: © Mark Redekopp, All rights reserved EE 357 Unit 3 IEEE 754 Floating Point Representation Floating Point Arithmetic © Mark Redekopp, All rights reserved Floating Point • Used to represent very small numbers (fractions) and very large numbers – Avogadro‟s Number: +6.0247 * 10 23 – Planck‟s Constant: +6.6254 * 1027 – Note: 32 or 64 bit integers can‟t represent this range • Floating Point representation is used in HLL‟s like C by declaring variables as float or double © Mark Redekopp, All rights reserved Fixed Point • Unsigned and 2‟s complement fall under a category of representations called “Fixed Point” • The radix point is assumed to be in a fixed location for all numbers – Integers: 10011101. (binary point to right of LSB) • For 32bits, unsigned range is 0 to ~4 billion – Fractions: .10011101 (binary point to left of MSB) • Range [0 to 1) • Main point: By fixing the radix point, we limit the range of numbers that can be represented – Floating point allows the radix point to be in a different location for each value © Mark Redekopp, All rights reserved Floating Point Representation • Similar to scientific notation used with decimal numbers – ± D.DDD * 10 ± exp • Floating Point representation uses the following form – ± b.bbbb * 2 ± exp – 3 Fields: sign, exponent, fraction (also called mantissa or significand) S Exp. fraction Overall Sign of # © Mark Redekopp, All rights reserved Normalized FP Numbers • Decimal Example – +0.754*10 15 is not correct scientific notation – Must have exactly one significant digit before decimal point: +7.54*10 14 • In binary the only significant digit is „1‟ • Thus normalized FP format is: ± 1.bbbbbb * 2 ± exp • FP numbers will always be normalized before being stored in memory or a reg. – The 1. is actually not stored but assumed since we always will store normalized numbers – If HW calculates a result of 0.001101*2 5 it must normalize to 1.101000*2 2 before storing © Mark Redekopp, All rights reserved IEEE Floating Point Formats • Single Precision (32bit format) – 1 Sign bit (0=p/1=n) – 8 Exponent bits (Excess127 representation) – 23 fraction (significand or mantissa) bits – Equiv. Decimal Range: 7 digits x 10 ± 38 • Double Precision (64bit format) – 1 Sign bit (0=p/1=n) – 11 Exponent bits (Excess1023 representation) – 52 fraction (significand or mantissa) bits – Equiv. Decimal Range: 16 digits x 10 ± 308 S Fraction Exp. 1 8 23 S Fraction Exp. 1 11 52 © Mark Redekopp, All rights reserved Exponent Representation • Exponent includes its own sign (+/) • Rather than using 2‟s comp. system, SinglePrecision uses Excess127 while DoublePrecision uses Excess1023 – This representation allows FP numbers to be easily compared • Let E‟ = stored exponent code and E = true exponent value • For single precision: E‟ = E + 127 – 2 1 => E = 1, E‟ = 128 10 = 10000000 2 • For double precision: E‟ = E + 1023...
View
Full
Document
This note was uploaded on 04/04/2010 for the course EE 357 taught by Professor Mayeda during the Spring '08 term at USC.
 Spring '08
 MAYEDA

Click to edit the document details