{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

EE357Unit3_FP

EE357Unit3_FP - Floating Point Used to represent very small...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 © Mark Redekopp, All rights reserved EE 357 Unit 3 IEEE 754 Floating Point Representation Floating Point Arithmetic © Mark Redekopp, All rights reserved Floating Point Used to represent very small numbers (fractions) and very large numbers – Avogadro’s Number: +6.0247 * 10 23 – Planck’s Constant: +6.6254 * 10 -27 – Note: 32 or 64-bit integers can’t represent this range Floating Point representation is used in HLL’s like C by declaring variables as float or double © Mark Redekopp, All rights reserved Fixed Point Unsigned and 2’s complement fall under a category of representations called “Fixed Point” The radix point is assumed to be in a fixed location for all numbers – Integers: 10011101. (binary point to right of LSB) For 32-bits, unsigned range is 0 to ~4 billion – Fractions: .10011101 (binary point to left of MSB) Range [0 to 1) Main point: By fixing the radix point, we limit the range of numbers that can be represented – Floating point allows the radix point to be in a different location for each value © Mark Redekopp, All rights reserved Floating Point Representation Similar to scientific notation used with decimal numbers – ±D.DDD * 10 ±exp Floating Point representation uses the following form – ±b.bbbb * 2 ±exp – 3 Fields: sign, exponent, fraction (also called mantissa or significand) S Exp. fraction Overall Sign of #
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 © Mark Redekopp, All rights reserved Normalized FP Numbers Decimal Example – +0.754*10 15 is not correct scientific notation Must have exactly one significant digit before decimal point: +7.54*10 14 In binary the only significant digit is ‘1’ Thus normalized FP format is: ±1.bbbbbb * 2 ±exp FP numbers will always be normalized before being stored in memory or a reg. – The 1. is actually not stored but assumed since we always will store normalized numbers – If HW calculates a result of 0.001101*2 5 it must normalize to 1.101000*2 2 before storing © Mark Redekopp, All rights reserved IEEE Floating Point Formats Single Precision (32-bit format) – 1 Sign bit (0=p/1=n) – 8 Exponent bits (Excess-127 representation) – 23 fraction (significand or mantissa) bits – Equiv. Decimal Range: 7 digits x 10 ±38 Double Precision (64-bit format) – 1 Sign bit (0=p/1=n) – 11 Exponent bits (Excess-1023 representation) – 52 fraction (significand or mantissa) bits – Equiv. Decimal Range: 16 digits x 10 ±308 S Fraction Exp. 1 8 23 S Fraction Exp. 1 11 52 © Mark Redekopp, All rights reserved Exponent Representation Exponent includes its own sign (+/-) Rather than using 2’s comp. system, Single-Precision uses Excess-127 while Double-Precision uses Excess-1023 This representation allows FP numbers to be easily compared Let E’ = stored exponent code and E = true exponent value For single-precision: E’ = E + 127 – 2 1 => E = 1, E’ = 128 10 = 10000000 2 For double-precision: E’ = E + 1023 – 2 -2 => E = -2, E’ = 1021 10 = 01111111101 2 2’s comp.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}