Python Image Processing: Core Techniques with OpenCV
This document demonstrates various fundamental image processing techniques using Python libraries such as OpenCV, NumPy, and Matplotlib. Each section provides a code example and a brief explanation of the concept.
Image Transformations
Image transformations involve altering the spatial arrangement of pixels within an image. This section covers common transformations like rotation, scaling, and translation.
Rotation (45° Around Center)
Rotation transforms an image by turning it around a central point. Here, we rotate an image by 45 degrees.
import cv2
import numpy as np
# Load the input image
img = cv2.imread('input.jpg') # Reads the image from file
# Get image height and width
h, w = img.shape[:2]
# Rotation matrix: center, angle, scale=1
M_rot = cv2.getRotationMatrix2D((w/2, h/2), 45, 1)
# Apply rotation
rotated = cv2.warpAffine(img, M_rot, (w, h))
Scaling (Down to 50%)
Scaling resizes an image, either enlarging or shrinking it. This example scales the image down to 50% of its original size.
# Scaling (Down to 50%)
scaled = cv2.resize(img, None, fx=0.5, fy=0.5) # fx, fy are scaling factors
Translation (Right 100px, Down 50px)
Translation shifts an image along the X and Y axes. Here, the image is moved 100 pixels to the right and 50 pixels down.
# Translation (Right 100px, Down 50px)
M_trans = np.float32([[1, 0, 100], [0, 1, 50]]) # Translation matrix
translated = cv2.warpAffine(img, M_trans, (w, h)) # Apply translation
Displaying Transformation Results
The transformed images are displayed using OpenCV’s imshow
function.
# Display All
cv2.imshow("Original", img)
cv2.imshow("Rotated", rotated)
cv2.imshow("Scaled", scaled)
cv2.imshow("Translated", translated)
cv2.waitKey(0) # Wait for any key press to close windows
cv2.destroyAllWindows() # Close all OpenCV windows
Frequency Domain Processing
Processing images in the frequency domain involves transforming the image from its spatial representation to a frequency representation, often using the Fast Fourier Transform (FFT). This allows for operations like filtering based on frequency components.
import cv2
import numpy as np
import matplotlib.pyplot as plt
1D Fast Fourier Transform (FFT)
The 1D FFT is applied to a single row of the grayscale image to analyze its frequency components.
# Load grayscale image
img = cv2.imread('input.jpg', 0) # Read image in grayscale
# 1D FFT of middle row
row = img[img.shape[0] // 2] # Take middle row of the image
fft1d = np.fft.fft(row) # Compute 1D FFT
# Plot original row and its FFT magnitude
plt.subplot(1,2,1), plt.plot(row), plt.title("Row")
plt.subplot(1,2,2), plt.plot(np.abs(fft1d)), plt.title("1D FFT")
plt.show()
2D Fast Fourier Transform (FFT)
The 2D FFT transforms the entire image into its frequency components. The zero-frequency component is shifted to the center for better visualization of the magnitude spectrum.
# 2D FFT
dft = np.fft.fft2(img) # Perform 2D FFT
dft_shift = np.fft.fftshift(dft) # Shift zero-freq component to center
mag = 20 * np.log(np.abs(dft_shift) + 1) # Log-magnitude for better visualization
Frequency Masking: Low-Pass Filter
A low-pass filter in the frequency domain allows low-frequency components (smooth regions) to pass while attenuating high-frequency components (edges, noise). This effectively blurs the image.
# Frequency Masking: Low-pass filter
rows, cols = img.shape
mask = np.zeros((rows, cols), np.uint8) # Initialize mask with zeros
r = 30 # Radius of low-pass square region
mask[rows//2 - r:rows//2 + r, cols//2 - r:cols//2 + r] = 1 # Fill center with ones
masked = dft_shift * mask # Apply mask (element-wise multiplication)
Inverse FFT and Display
After applying the frequency mask, the inverse FFT is used to reconstruct the image back into the spatial domain. The results are then displayed.
# Inverse FFT to reconstruct image
idft = np.fft.ifft2(np.fft.ifftshift(masked)) # Undo shift and apply inverse FFT
img_back = np.abs(idft) # Get magnitude (discard complex part)
# Display Results
plt.subplot(131), plt.imshow(img, cmap='gray'), plt.title("Input")
plt.subplot(132), plt.imshow(mag, cmap='gray'), plt.title("2D FFT")
plt.subplot(133), plt.imshow(img_back, cmap='gray'), plt.title("Filtered")
plt.show()
Image Enhancement
Image enhancement techniques aim to improve the visual quality of an image or to make certain features more discernible. This section covers contrast stretching, histogram equalization, and RGB channel separation.
import cv2
import numpy as np
import matplotlib.pyplot as plt
Contrast Stretching
Contrast stretching improves image contrast by expanding the range of pixel intensities to span the full dynamic range (e.g., 0-255).
# Step 1: Load the image
img = cv2.imread('low_contrast.jpg') # Load a low contrast image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to grayscale for processing
# Step 2: Contrast Stretching (improves contrast by stretching intensity range)
p1, p2 = np.percentile(gray, (2, 98)) # Get 2nd and 98th percentile of intensities
stretched = cv2.normalize(gray, None, 0, 255, cv2.NORM_MINMAX) # Normalize to full 0–255
Histogram Equalization
Histogram equalization enhances contrast by redistributing pixel intensities to create a more uniform histogram. This is particularly effective for images with low contrast.
# Step 3: Histogram Calculation
hist = cv2.calcHist([gray], [0], None, [256], [0, 256]) # Compute grayscale histogram
# Step 4: Histogram Equalization (enhances contrast using histogram)
equalized = cv2.equalizeHist(gray) # Equalize histogram of grayscale image
RGB Channel Separation
Images are composed of color channels (Red, Green, Blue). Separating these channels allows for individual processing or analysis of each color component.
# Step 5: Display RGB Planes
B, G, R = cv2.split(img) # Split original image into Blue, Green, Red channels
Displaying Enhanced Images and Histogram
The original, contrast-stretched, and histogram-equalized grayscale images are displayed, along with the individual RGB channels and the histogram of the original grayscale image.
# Step 6: Display images using OpenCV
cv2.imshow("Original Gray", gray)
cv2.imshow("Contrast Stretched", stretched)
cv2.imshow("Histogram Equalized", equalized)
cv2.imshow("Red Plane", R)
cv2.imshow("Green Plane", G)
cv2.imshow("Blue Plane", B)
# Step 7: Plot histogram using Matplotlib
plt.title("Histogram of Grayscale Image")
plt.xlabel("Pixel Intensity")
plt.ylabel("Frequency")
plt.plot(hist)
plt.show()
# Step 8: Wait and close OpenCV windows
cv2.waitKey(0)
cv2.destroyAllWindows()
Image Filtering
Image filtering is used to modify or enhance an image by applying a mathematical operation to each pixel based on its neighborhood. This section demonstrates mean and median filtering for noise reduction.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load grayscale image
img = cv2.imread('input.jpg', 0) # 0 means load as grayscale
Mean Filtering
Mean filtering, also known as averaging filtering, replaces each pixel’s value with the average of its neighbors. It’s effective for reducing random noise but can blur edges.
# Mean Filtering
# This is also called averaging filter
mean_filtered = cv2.blur(img, (5, 5)) # Apply 5x5 mean filter
Median Filtering
Median filtering replaces each pixel’s value with the median of its neighbors. It is particularly effective at removing salt-and-pepper noise while preserving edges better than mean filtering.
# Median Filtering
# Great for salt-and-pepper noise
median_filtered = cv2.medianBlur(img, 5) # Kernel size must be odd (e.g., 3, 5, 7...)
Displaying Filtered Images
The original image is compared with its mean-filtered and median-filtered versions.
# Display Results
plt.figure(figsize=(12, 4))
plt.subplot(1, 3, 1)
plt.imshow(img, cmap='gray')
plt.title('Original')
plt.axis('off')
plt.subplot(1, 3, 2)
plt.imshow(mean_filtered, cmap='gray')
plt.title('Mean Filtered (5x5)')
plt.axis('off')
plt.subplot(1, 3, 3)
plt.imshow(median_filtered, cmap='gray')
plt.title('Median Filtered (5x5)')
plt.axis('off')
plt.tight_layout()
plt.show()
Image Sharpening and Edge Detection
These techniques are crucial for highlighting details and identifying boundaries within an image. Sharpening enhances fine details, while edge detection identifies significant changes in pixel intensity.
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load image in grayscale
img = cv2.imread('input.jpg', 0)
Image Sharpening with Laplacian Filter
The Laplacian filter is a second-order derivative filter used to highlight regions of rapid intensity change, effectively sharpening the image.
# 1. Image Sharpening using Laplacian Filter
# Laplacian Kernel for sharpening
laplacian_kernel = np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]])
# Apply Laplacian filter to the image
sharpened_img = cv2.filter2D(img, -1, laplacian_kernel)
Edge Detection with Sobel Filters
Sobel operators are first-order derivative filters used to detect edges by approximating the image gradient in horizontal (X) and vertical (Y) directions. The magnitude of the gradient indicates the strength of the edge.
# 2. Edge Detection using Gradient Filters
# Sobel operators for edge detection (gradient filters)
sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]) # Horizontal edges
sobel_y = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]]) # Vertical edges
# Apply Sobel filter for edge detection
edges_x = cv2.filter2D(img, -1, sobel_x) # Gradient in x-direction
edges_y = cv2.filter2D(img, -1, sobel_y) # Gradient in y-direction
# Combine the gradients (magnitude of the gradient)
edges = np.sqrt(np.square(edges_x) + np.square(edges_y))
Displaying Sharpened and Edge Images
This section displays the original image, its sharpened version, and the results of horizontal, vertical, and combined edge detection.
# Show Results
plt.figure(figsize=(12, 8))
# Original Image
plt.subplot(2, 3, 1)
plt.imshow(img, cmap='gray')
plt.title("Original Image")
# Sharpened Image (using Laplacian filter)
plt.subplot(2, 3, 2)
plt.imshow(sharpened_img, cmap='gray')
plt.title("Sharpened Image")
# Sobel X Gradient (Edges in horizontal direction)
plt.subplot(2, 3, 3)
plt.imshow(edges_x, cmap='gray')
plt.title("Sobel X (Horizontal Edges)")
# Sobel Y Gradient (Edges in vertical direction)
plt.subplot(2, 3, 4)
plt.imshow(edges_y, cmap='gray')
plt.title("Sobel Y (Vertical Edges)")
# Combined Edge Detection (Magnitude of gradient)
plt.subplot(2, 3, 5)
plt.imshow(edges, cmap='gray')
plt.title("Edge Detection (Magnitude)")
plt.tight_layout()
plt.show()
Image Compression Techniques
Image compression reduces the amount of data required to represent an image, making it more efficient for storage and transmission. This section demonstrates three common compression methods: Discrete Cosine Transform (DCT), Differential Pulse Code Modulation (DPCM), and Huffman Coding.
import cv2
import numpy as np
import heapq
from collections import defaultdict
import matplotlib.pyplot as plt
# Load image in grayscale
img = cv2.imread('input.jpg', 0)
Discrete Cosine Transform (DCT) Compression
DCT is a lossy compression technique widely used in image and video compression standards (e.g., JPEG). It transforms image blocks into frequency coefficients, allowing high-frequency (less visually significant) components to be discarded.
# 1. DCT (Discrete Cosine Transform) Compression
block_size = 8 # Block size for DCT
compressed_img_dct = np.zeros_like(img)
# Apply DCT to 8x8 blocks of the image
for i in range(0, img.shape[0], block_size):
for j in range(0, img.shape[1], block_size):
block = img[i:i+block_size, j:j+block_size]
# Apply DCT to each block
dct_block = cv2.dct(np.float32(block))
# Zero out high-frequency components for compression
dct_block[5:, 5:] = 0 # Zero out higher frequencies
# Apply inverse DCT to get the compressed block
compressed_block = cv2.idct(dct_block)
# Place the block back into the image
compressed_img_dct[i:i+block_size, j:j+block_size] = compressed_block
Differential Pulse Code Modulation (DPCM) Compression
DPCM is a lossless or lossy compression technique that encodes the difference between consecutive pixel values rather than the absolute values. This exploits the spatial redundancy in images.
# 2. DPCM (Differential Pulse Code Modulation) Compression
encoded_dpcm = np.zeros_like(img)
encoded_dpcm[0, :] = img[0, :] # First row remains the same
# Encode by storing differences between consecutive rows
for i in range(1, img.shape[0]):
encoded_dpcm[i, :] = img[i, :] - img[i - 1, :]
# Decode the image by adding the differences back
decoded_dpcm = np.zeros_like(img)
decoded_dpcm[0, :] = encoded_dpcm[0, :]
for i in range(1, img.shape[0]):
decoded_dpcm[i, :] = encoded_dpcm[i, :] + decoded_dpcm[i - 1, :]
Huffman Coding Compression
Huffman coding is a lossless data compression algorithm that assigns variable-length codes to input characters based on their frequencies. More frequent characters get shorter codes, leading to overall compression.
# 3. Huffman Coding Compression
# Calculate histogram of pixel intensities
hist, _ = np.histogram(img.flatten(), bins=256, range=[0, 256])
# Create a priority queue (min-heap) for Huffman coding
class Node:
def __init__(self, value, freq):
self.value = value
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):
return self.freq < other.freq
# Build Huffman tree
def build_huffman_tree(hist):
heap = [Node(value, freq) for value, freq in enumerate(hist) if freq > 0]
heapq.heapify(heap)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
merged = Node(None, left.freq + right.freq)
merged.left = left
merged.right = right
heapq.heappush(heap, merged)
return heap[0]
# Generate Huffman codes from the tree
def generate_huffman_codes(root, prefix='', codes=defaultdict()):
if root:
if root.value is not None:
codes[root.value] = prefix
generate_huffman_codes(root.left, prefix + '0', codes)
generate_huffman_codes(root.right, prefix + '1', codes)
return codes
# Compress the image using Huffman coding
huffman_tree = build_huffman_tree(hist)
codes = generate_huffman_codes(huffman_tree)
# Example: Replace pixel values with their Huffman codes
compressed_img_huffman = ''.join([codes[pixel] for pixel in img.flatten()])
Displaying Compression Results and Statistics
The results of DCT and DPCM compression are displayed visually, while Huffman coding compression is represented by its resulting bit size, as it’s a bitstream and not directly displayable as an image.
# Show Results
plt.figure(figsize=(12, 8))
# Original Image
plt.subplot(2, 3, 1)
plt.imshow(img, cmap='gray')
plt.title("Original Image")
# DCT Compressed Image
plt.subplot(2, 3, 2)
plt.imshow(compressed_img_dct, cmap='gray')
plt.title("DCT Compressed")
# DPCM Encoded Image (Delta Values)
plt.subplot(2, 3, 3)
plt.imshow(encoded_dpcm, cmap='gray')
plt.title("DPCM Encoded")
# DPCM Decoded Image
plt.subplot(2, 3, 4)
plt.imshow(decoded_dpcm, cmap='gray')
plt.title("DPCM Decoded")
# Image using Huffman Coding (Displaying Size as Placeholder)
plt.subplot(2, 3, 5)
# Corrected: 'font2' is not a valid argument, use 'fontsize'
plt.text(0.5, 0.5, f"Huffman Size: {len(compressed_img_huffman)} bits", ha='center', va='center', fontsize=12)
plt.title("Huffman Compressed")
plt.tight_layout()
plt.show()
print(f"Original image size: {img.size} pixels")
print(f"Compressed image size (DCT): {compressed_img_dct.size} pixels")
print(f"Compressed image size (Huffman): {len(compressed_img_huffman)} bits")