Binary Numbers and Floating Point Representation
1. Binary Number Systems
Binary Basics:
Each digit (bit) is either
0or1.Binary numbers are used to represent all data in computers.
Example:
1011in binary is11in decimal.
Binary to Decimal Conversion:
Each bit represents a power of 2.
Example:
1011= 1×23 + 0×22 + 1×21 + 1×20 = 11.
Decimal to Binary Conversion:
Divide by 2 and record remainders.
Example:
11in decimal is1011in binary.
2. Encoding Integers
Unsigned Integers (B2U):
Represents non-negative numbers.
Range: 0 to 2w – 1 (where w is the number of bits).
Example: 4 bits can represent 0 to 15.
Signed Integers (B2T – Two’s Complement):
Represents both positive and negative numbers.
Range: -2w-1 to 2w-1 – 1.
Negation: Invert bits and add 1.
Example:
1101in 4-bit two’s complement is-3.
Overflow:
Occurs when a result exceeds the representable range.
Example: Adding
10 + 7in 4-bit unsigned results in1(overflow).
3. Fractional Binary Numbers
Fractional Binary Representation:
Bits to the right of the binary point represent negative powers of 2.
Example:
101.101= 1×22 + 0×21 + 1×20 + 1×2-1 + 0×2-2 + 1×2-3 = 5.625.
Precision Limitation:
Only numbers of the form x/2k can be exactly represented.
Example:
1/3cannot be exactly represented in binary.
4. IEEE Floating Point Standard (IEEE 754)
Floating Point Representation:
Sign bit (s): Determines if the number is negative or positive.
Significand (M): Fractional value in the range [1.0, 2.0).
Exponent (E): Weights the value by a power of 2.
Formula: v = (-1)s × M × 2E.
Normalized Values:
Exponent is neither all 0s nor all 1s.
Example:
0100 0110 0110 1101 1011 0100 0000 0000represents15213.0.
Denormalized Values:
Exponent is all 0s.
Used to represent very small numbers close to 0.
Special Values:
Infinity: Exponent is all 1s, significand is 0.
NaN (Not a Number): Exponent is all 1s, significand is non-zero.
5. Floating Point Arithmetic
Addition:
Align exponents, add significands, and normalize the result.
Example: 1.5×23 + 1.25×21 = 1.5×23 + 0.15625×23 = 1.65625×23.
Multiplication:
Multiply significands, add exponents, and normalize the result.
Example: 1.5×23 × 1.25×21 = 1.875×24.
Rounding:
Round to nearest even to avoid bias.
Example:
1.1011rounded to 3 bits becomes1.110.
6. Bit Shift Operations
Left Shift (
<<):Shifts bits to the left, filling with
0s.Equivalent to multiplying by 2k.
Example:
1010 << 2=101000.
Right Shift (
>>):Shifts bits to the right.
Logical Shift: Fills with
0s (for unsigned).Arithmetic Shift: Replicates the sign bit (for signed).
Example:
1010 >> 2=0010(logical),1110(arithmetic).
7. Byte Ordering
Big Endian:
Most significant byte is stored at the lowest address.
Example:
0x01234567is stored as01 23 45 67.
Little Endian:
Least significant byte is stored at the lowest address.
Example:
0x01234567is stored as67 45 23 01.
8. Casting in C
Explicit Casting:
Convert between data types.
Example:
int x = (int) 3.14;.
Implicit Casting:
Automatically converts types in expressions.
Example:
int x = 3.14;(truncates to3).
9. Mathematical Properties of Floating Point
Commutativity:
Addition and multiplication are commutative.
Example: a + b = b + a, a × b = b × a.
Associativity:
Addition and multiplication are not associative due to rounding errors.
Example: (a + b) + c ≠ a + (b + c).
Distributivity:
Multiplication does not distribute over addition due to rounding.
Example: a × (b + c) ≠ a × b + a × c.
10. Key Takeaways
Precision vs. Range:
Floating point offers a trade-off between precision and range.
Higher precision for small numbers, lower precision for large numbers.
Overflow and Underflow:
Be cautious of overflow in integer arithmetic and underflow in floating point.
Bitwise Operations:
Useful for low-level manipulation but requires careful handling of overflow and sign.
