Floating-Point Calculator

IEEE 754 floating-point number converter and analyzer

IEEE 754 Floating-Point Converter

Enter any real number (supports scientific notation like 1.23e-4)

Results

0
Sign Bit
10000011
Exponent (8 bits)
10111100000000000000000
Fraction (23 bits)
Binary Representation:
01000001110111100000000000000000
Hexadecimal:
0x41DE0000
Actual Stored Value:
27.75

Example Conversions

27.75 (Single Precision)

Binary: 01000001101111100000000000000000

Sign: 0 (positive)

Exponent: 10000011 (131 - 127 = 4)

Fraction: 01111100000000000000000

Formula: (-1)⁰ × 2⁴ × 1.734375 = 27.75

0.1 Precision Loss

Input: 0.1

Stored as: 0.10000000149011612

Error: 1.49 × 10⁻⁹

Reason: 0.1 cannot be exactly represented in binary

Special Values

+∞: 01111111100000000000000000000000

-∞: 11111111100000000000000000000000

NaN: 01111111100000000000000000000001

+0: 00000000000000000000000000000000

IEEE 754 Format

Single (32-bit)

• 1 bit: Sign

• 8 bits: Exponent

• 23 bits: Fraction

• Bias: 127

• Range: ±3.4 × 10³⁸

Double (64-bit)

• 1 bit: Sign

• 11 bits: Exponent

• 52 bits: Fraction

• Bias: 1023

• Range: ±1.7 × 10³⁰⁸

Special Cases

0

Zero

E=0, F=0

Positive or negative zero

Infinity

E=max, F=0

Result of overflow

?

NaN

E=max, F≠0

Not a Number

~

Subnormal

E=0, F≠0

Very small numbers

Key Concepts

Sign bit: 0 = positive, 1 = negative

Exponent is biased (subtract bias to get true exponent)

Fraction has implicit leading 1 (except subnormals)

Not all decimal numbers can be exactly represented

Double precision provides more accuracy than single

Understanding IEEE 754 Floating-Point

What is Floating-Point?

Floating-point is a standardized method for representing real numbers in computers. The IEEE 754 standard defines how these numbers are stored in binary format, allowing computers to perform mathematical operations on fractional numbers.

Why Use Floating-Point?

  • Represents a wide range of numbers (very small to very large)
  • Efficient storage in fixed number of bits
  • Hardware-optimized arithmetic operations
  • Standardized across different systems

Format Structure

S | EEEEEEEE | FFFFFFFFFFFFFFFFFFFFFFF

Sign | Exponent | Fraction (Single Precision)

Conversion Formula

(-1)^S × 2^(E-Bias) × (1.F)

S: Sign bit (0 or 1)

E: Biased exponent

F: Fractional part

Bias: 127 (single) or 1023 (double)

Precision Limitations

Floating-point numbers cannot represent all real numbers exactly. This is because:

Limited Precision

Only a finite number of bits are available for the fraction, so some decimal numbers like 0.1 cannot be represented exactly in binary.

Rounding Errors

When converting from decimal to binary and back, small rounding errors accumulate, especially in repeated calculations.