Python Image Processing: Core Techniques with OpenCV

This document demonstrates various fundamental image processing techniques using Python libraries such as OpenCV, NumPy, and Matplotlib. Each section provides a code example and a brief explanation of the concept.

Image Transformations

Image transformations involve altering the spatial arrangement of pixels within an image. This section covers common transformations like rotation, scaling, and translation.

Rotation (45° Around Center)

Rotation transforms an image by turning it around a central point. Here, we rotate an image by 45 degrees.

import cv2
import numpy as np

# Load the input image
img = cv2.imread('input.jpg')  # Reads the image from file

# Get image height and width
h, w = img.shape[:2]

# Rotation matrix: center, angle, scale=1
M_rot = cv2.getRotationMatrix2D((w/2, h/2), 45, 1)

# Apply rotation
rotated = cv2.warpAffine(img, M_rot, (w, h))

Scaling (Down to 50%)

Scaling resizes an image, either enlarging or shrinking it. This example scales the image down to 50% of its original size.

# Scaling (Down to 50%)
scaled = cv2.resize(img, None, fx=0.5, fy=0.5)  # fx, fy are scaling factors

Translation (Right 100px, Down 50px)

Translation shifts an image along the X and Y axes. Here, the image is moved 100 pixels to the right and 50 pixels down.

# Translation (Right 100px, Down 50px)
M_trans = np.float32([[1, 0, 100], [0, 1, 50]])  # Translation matrix
translated = cv2.warpAffine(img, M_trans, (w, h))  # Apply translation

Displaying Transformation Results

The transformed images are displayed using OpenCV’s imshow function.

# Display All
cv2.imshow("Original", img)
cv2.imshow("Rotated", rotated)
cv2.imshow("Scaled", scaled)
cv2.imshow("Translated", translated)

cv2.waitKey(0)  # Wait for any key press to close windows
cv2.destroyAllWindows()  # Close all OpenCV windows

Frequency Domain Processing

Processing images in the frequency domain involves transforming the image from its spatial representation to a frequency representation, often using the Fast Fourier Transform (FFT). This allows for operations like filtering based on frequency components.

import cv2
import numpy as np
import matplotlib.pyplot as plt

1D Fast Fourier Transform (FFT)

The 1D FFT is applied to a single row of the grayscale image to analyze its frequency components.

# Load grayscale image
img = cv2.imread('input.jpg', 0)  # Read image in grayscale

# 1D FFT of middle row
row = img[img.shape[0] // 2]  # Take middle row of the image
fft1d = np.fft.fft(row)       # Compute 1D FFT

# Plot original row and its FFT magnitude
plt.subplot(1,2,1), plt.plot(row), plt.title("Row")
plt.subplot(1,2,2), plt.plot(np.abs(fft1d)), plt.title("1D FFT")
plt.show()

2D Fast Fourier Transform (FFT)

The 2D FFT transforms the entire image into its frequency components. The zero-frequency component is shifted to the center for better visualization of the magnitude spectrum.

# 2D FFT
dft = np.fft.fft2(img)                  # Perform 2D FFT
dft_shift = np.fft.fftshift(dft)        # Shift zero-freq component to center
mag = 20 * np.log(np.abs(dft_shift) + 1)  # Log-magnitude for better visualization

Frequency Masking: Low-Pass Filter

A low-pass filter in the frequency domain allows low-frequency components (smooth regions) to pass while attenuating high-frequency components (edges, noise). This effectively blurs the image.

# Frequency Masking: Low-pass filter
rows, cols = img.shape
mask = np.zeros((rows, cols), np.uint8)  # Initialize mask with zeros
r = 30  # Radius of low-pass square region
mask[rows//2 - r:rows//2 + r, cols//2 - r:cols//2 + r] = 1  # Fill center with ones
masked = dft_shift * mask  # Apply mask (element-wise multiplication)

Inverse FFT and Display

After applying the frequency mask, the inverse FFT is used to reconstruct the image back into the spatial domain. The results are then displayed.

# Inverse FFT to reconstruct image
idft = np.fft.ifft2(np.fft.ifftshift(masked))  # Undo shift and apply inverse FFT
img_back = np.abs(idft)  # Get magnitude (discard complex part)

# Display Results
plt.subplot(131), plt.imshow(img, cmap='gray'), plt.title("Input")
plt.subplot(132), plt.imshow(mag, cmap='gray'), plt.title("2D FFT")
plt.subplot(133), plt.imshow(img_back, cmap='gray'), plt.title("Filtered")
plt.show()

Image Enhancement

Image enhancement techniques aim to improve the visual quality of an image or to make certain features more discernible. This section covers contrast stretching, histogram equalization, and RGB channel separation.

import cv2
import numpy as np
import matplotlib.pyplot as plt

Contrast Stretching

Contrast stretching improves image contrast by expanding the range of pixel intensities to span the full dynamic range (e.g., 0-255).

# Step 1: Load the image
img = cv2.imread('low_contrast.jpg')  # Load a low contrast image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # Convert to grayscale for processing

# Step 2: Contrast Stretching (improves contrast by stretching intensity range)
p1, p2 = np.percentile(gray, (2, 98))  # Get 2nd and 98th percentile of intensities
stretched = cv2.normalize(gray, None, 0, 255, cv2.NORM_MINMAX)  # Normalize to full 0–255

Histogram Equalization

Histogram equalization enhances contrast by redistributing pixel intensities to create a more uniform histogram. This is particularly effective for images with low contrast.

# Step 3: Histogram Calculation
hist = cv2.calcHist([gray], [0], None, [256], [0, 256])  # Compute grayscale histogram

# Step 4: Histogram Equalization (enhances contrast using histogram)
equalized = cv2.equalizeHist(gray)  # Equalize histogram of grayscale image

RGB Channel Separation

Images are composed of color channels (Red, Green, Blue). Separating these channels allows for individual processing or analysis of each color component.

# Step 5: Display RGB Planes
B, G, R = cv2.split(img)  # Split original image into Blue, Green, Red channels

Displaying Enhanced Images and Histogram

The original, contrast-stretched, and histogram-equalized grayscale images are displayed, along with the individual RGB channels and the histogram of the original grayscale image.

# Step 6: Display images using OpenCV
cv2.imshow("Original Gray", gray)
cv2.imshow("Contrast Stretched", stretched)
cv2.imshow("Histogram Equalized", equalized)
cv2.imshow("Red Plane", R)
cv2.imshow("Green Plane", G)
cv2.imshow("Blue Plane", B)

# Step 7: Plot histogram using Matplotlib
plt.title("Histogram of Grayscale Image")
plt.xlabel("Pixel Intensity")
plt.ylabel("Frequency")
plt.plot(hist)
plt.show()

# Step 8: Wait and close OpenCV windows
cv2.waitKey(0)
cv2.destroyAllWindows()

Image Filtering

Image filtering is used to modify or enhance an image by applying a mathematical operation to each pixel based on its neighborhood. This section demonstrates mean and median filtering for noise reduction.

import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load grayscale image
img = cv2.imread('input.jpg', 0)  # 0 means load as grayscale

Mean Filtering

Mean filtering, also known as averaging filtering, replaces each pixel’s value with the average of its neighbors. It’s effective for reducing random noise but can blur edges.

# Mean Filtering
# This is also called averaging filter
mean_filtered = cv2.blur(img, (5, 5))  # Apply 5x5 mean filter

Median Filtering

Median filtering replaces each pixel’s value with the median of its neighbors. It is particularly effective at removing salt-and-pepper noise while preserving edges better than mean filtering.

# Median Filtering
# Great for salt-and-pepper noise
median_filtered = cv2.medianBlur(img, 5)  # Kernel size must be odd (e.g., 3, 5, 7...)

Displaying Filtered Images

The original image is compared with its mean-filtered and median-filtered versions.

# Display Results
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.imshow(img, cmap='gray')
plt.title('Original')
plt.axis('off')

plt.subplot(1, 3, 2)
plt.imshow(mean_filtered, cmap='gray')
plt.title('Mean Filtered (5x5)')
plt.axis('off')

plt.subplot(1, 3, 3)
plt.imshow(median_filtered, cmap='gray')
plt.title('Median Filtered (5x5)')
plt.axis('off')

plt.tight_layout()
plt.show()

Image Sharpening and Edge Detection

These techniques are crucial for highlighting details and identifying boundaries within an image. Sharpening enhances fine details, while edge detection identifies significant changes in pixel intensity.

import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load image in grayscale
img = cv2.imread('input.jpg', 0)

Image Sharpening with Laplacian Filter

The Laplacian filter is a second-order derivative filter used to highlight regions of rapid intensity change, effectively sharpening the image.

# 1. Image Sharpening using Laplacian Filter
# Laplacian Kernel for sharpening
laplacian_kernel = np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]])

# Apply Laplacian filter to the image
sharpened_img = cv2.filter2D(img, -1, laplacian_kernel)

Edge Detection with Sobel Filters

Sobel operators are first-order derivative filters used to detect edges by approximating the image gradient in horizontal (X) and vertical (Y) directions. The magnitude of the gradient indicates the strength of the edge.

# 2. Edge Detection using Gradient Filters
# Sobel operators for edge detection (gradient filters)
sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])  # Horizontal edges
sobel_y = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]])  # Vertical edges

# Apply Sobel filter for edge detection
edges_x = cv2.filter2D(img, -1, sobel_x)  # Gradient in x-direction
edges_y = cv2.filter2D(img, -1, sobel_y)  # Gradient in y-direction

# Combine the gradients (magnitude of the gradient)
edges = np.sqrt(np.square(edges_x) + np.square(edges_y))

Displaying Sharpened and Edge Images

This section displays the original image, its sharpened version, and the results of horizontal, vertical, and combined edge detection.

# Show Results
plt.figure(figsize=(12, 8))

# Original Image
plt.subplot(2, 3, 1)
plt.imshow(img, cmap='gray')
plt.title("Original Image")

# Sharpened Image (using Laplacian filter)
plt.subplot(2, 3, 2)
plt.imshow(sharpened_img, cmap='gray')
plt.title("Sharpened Image")

# Sobel X Gradient (Edges in horizontal direction)
plt.subplot(2, 3, 3)
plt.imshow(edges_x, cmap='gray')
plt.title("Sobel X (Horizontal Edges)")

# Sobel Y Gradient (Edges in vertical direction)
plt.subplot(2, 3, 4)
plt.imshow(edges_y, cmap='gray')
plt.title("Sobel Y (Vertical Edges)")

# Combined Edge Detection (Magnitude of gradient)
plt.subplot(2, 3, 5)
plt.imshow(edges, cmap='gray')
plt.title("Edge Detection (Magnitude)")

plt.tight_layout()
plt.show()

Image Compression Techniques

Image compression reduces the amount of data required to represent an image, making it more efficient for storage and transmission. This section demonstrates three common compression methods: Discrete Cosine Transform (DCT), Differential Pulse Code Modulation (DPCM), and Huffman Coding.

import cv2
import numpy as np
import heapq
from collections import defaultdict
import matplotlib.pyplot as plt

# Load image in grayscale
img = cv2.imread('input.jpg', 0)

Discrete Cosine Transform (DCT) Compression

DCT is a lossy compression technique widely used in image and video compression standards (e.g., JPEG). It transforms image blocks into frequency coefficients, allowing high-frequency (less visually significant) components to be discarded.

# 1. DCT (Discrete Cosine Transform) Compression
block_size = 8  # Block size for DCT
compressed_img_dct = np.zeros_like(img)

# Apply DCT to 8x8 blocks of the image
for i in range(0, img.shape[0], block_size):
    for j in range(0, img.shape[1], block_size):
        block = img[i:i+block_size, j:j+block_size]

        # Apply DCT to each block
        dct_block = cv2.dct(np.float32(block))

        # Zero out high-frequency components for compression
        dct_block[5:, 5:] = 0  # Zero out higher frequencies

        # Apply inverse DCT to get the compressed block
        compressed_block = cv2.idct(dct_block)

        # Place the block back into the image
        compressed_img_dct[i:i+block_size, j:j+block_size] = compressed_block

Differential Pulse Code Modulation (DPCM) Compression

DPCM is a lossless or lossy compression technique that encodes the difference between consecutive pixel values rather than the absolute values. This exploits the spatial redundancy in images.

# 2. DPCM (Differential Pulse Code Modulation) Compression
encoded_dpcm = np.zeros_like(img)
encoded_dpcm[0, :] = img[0, :]  # First row remains the same

# Encode by storing differences between consecutive rows
for i in range(1, img.shape[0]):
    encoded_dpcm[i, :] = img[i, :] - img[i - 1, :]

# Decode the image by adding the differences back
decoded_dpcm = np.zeros_like(img)
decoded_dpcm[0, :] = encoded_dpcm[0, :]
for i in range(1, img.shape[0]):
    decoded_dpcm[i, :] = encoded_dpcm[i, :] + decoded_dpcm[i - 1, :]

Huffman Coding Compression

Huffman coding is a lossless data compression algorithm that assigns variable-length codes to input characters based on their frequencies. More frequent characters get shorter codes, leading to overall compression.

# 3. Huffman Coding Compression
# Calculate histogram of pixel intensities
hist, _ = np.histogram(img.flatten(), bins=256, range=[0, 256])

# Create a priority queue (min-heap) for Huffman coding
class Node:
    def __init__(self, value, freq):
        self.value = value
        self.freq = freq
        self.left = None
        self.right = None

    def __lt__(self, other):
        return self.freq < other.freq

# Build Huffman tree
def build_huffman_tree(hist):
    heap = [Node(value, freq) for value, freq in enumerate(hist) if freq > 0]
    heapq.heapify(heap)

    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)
        merged = Node(None, left.freq + right.freq)
        merged.left = left
        merged.right = right
        heapq.heappush(heap, merged)

    return heap[0]

# Generate Huffman codes from the tree
def generate_huffman_codes(root, prefix='', codes=defaultdict()):
    if root:
        if root.value is not None:
            codes[root.value] = prefix
        generate_huffman_codes(root.left, prefix + '0', codes)
        generate_huffman_codes(root.right, prefix + '1', codes)
    return codes

# Compress the image using Huffman coding
huffman_tree = build_huffman_tree(hist)
codes = generate_huffman_codes(huffman_tree)

# Example: Replace pixel values with their Huffman codes
compressed_img_huffman = ''.join([codes[pixel] for pixel in img.flatten()])

Displaying Compression Results and Statistics

The results of DCT and DPCM compression are displayed visually, while Huffman coding compression is represented by its resulting bit size, as it’s a bitstream and not directly displayable as an image.

# Show Results
plt.figure(figsize=(12, 8))

# Original Image
plt.subplot(2, 3, 1)
plt.imshow(img, cmap='gray')
plt.title("Original Image")

# DCT Compressed Image
plt.subplot(2, 3, 2)
plt.imshow(compressed_img_dct, cmap='gray')
plt.title("DCT Compressed")

# DPCM Encoded Image (Delta Values)
plt.subplot(2, 3, 3)
plt.imshow(encoded_dpcm, cmap='gray')
plt.title("DPCM Encoded")

# DPCM Decoded Image
plt.subplot(2, 3, 4)
plt.imshow(decoded_dpcm, cmap='gray')
plt.title("DPCM Decoded")

# Image using Huffman Coding (Displaying Size as Placeholder)
plt.subplot(2, 3, 5)
# Corrected: 'font2' is not a valid argument, use 'fontsize'
plt.text(0.5, 0.5, f"Huffman Size: {len(compressed_img_huffman)} bits", ha='center', va='center', fontsize=12)
plt.title("Huffman Compressed")

plt.tight_layout()
plt.show()

print(f"Original image size: {img.size} pixels")
print(f"Compressed image size (DCT): {compressed_img_dct.size} pixels")
print(f"Compressed image size (Huffman): {len(compressed_img_huffman)} bits")