Parallel Computer Architectures and Optimization Techniques
Parallel Computer Architectures
Instruction and Data Streams
SISD (Single Instruction Single Data): The classical von Neumann architecture, where a single data stream is processed by one instruction stream (uniprocessor).
SIMD (Single Instruction Multiple Data): Each instruction is executed on a different set of data by different processors. This is commonly used in array processing and vector processors.
MISD (Multiple Instructions Single Data): A theoretical concept where multiple processing units operate on one single-data stream. This type is non-existent in practice.
MIMD (Multiple Instructions Multiple Data): The most common type of parallel machine, where each processor has a separate program and instruction stream, operating on different data. This includes shared memory (tightly coupled) and multicomputer (loosely coupled) systems.
Analogy: Airport Check-in Desks
- SISD: A single check-in desk.
- SIMD: Many desks with a supervisor giving instructions that every desk follows.
- MIMD: Many desks working independently, synchronized through a central database.
GPU/CPU
GPU Computing: A type of heterogeneous computing that combines many-core GPUs with multicore CPUs for higher performance. GPUs excel in data-intensive, parallel computations.
Differences between GPU and CPU Threads
- GPU Threads: Extremely lightweight, requiring thousands for full efficiency.
- CPU Threads: Heavier, requiring only a few for multi-core CPUs.
Feature | CPU | GPU |
---|---|---|
Memory | Large, directly accessible | Relatively small, managed by CPU |
Control Logic | Independent for each core | Shared by groups of compute cores |
Execution | Independent | Shared within groups |
Cache & Synchronization | Coherent caches between cores, shared & synchronized | Shared cache & synchronization within groups, none between groups |
Hazards in Pipelining
Structural Hazards: Resource conflicts where multiple instructions request the same resource.
Data Hazards: Data dependency where instructions rely on the results of previous instructions.
Control Hazards: Branching instructions require changes to the program counter (PC).
Types of Cache Misses
Compulsory Miss: First reference to a block, unavoidable regardless of cache size.
Capacity Miss: Cache size is insufficient, leading to block eviction and retrieval.
Conflict Miss: Multiple addresses from different blocks map to the same cache location.
Optimization Techniques
Techniques covered in the course include:
- Branch prediction
- GPU acceleration
- Cache and memory system optimization
- Pipelining
- Bus width enhancement
- VLIW (Very Long Instruction Word)
- Shared memory architecture
- Multiprocessors
- Vector machines
- Instruction design
- Multithreading
Topologies in Multiprocessor Systems
Topology: The pattern of connections between processors, impacting performance and cost.
Key Characteristics:
- Diameter: Maximum distance between two processors.
- Total Bandwidth: Capacity of communication links multiplied by the number of links.
- Bisection Bandwidth: Maximum data transfer possible at the bottleneck.
Common Topologies
Shared Bus Topology: Processors communicate via a single bus, handling one transmission at a time.
Ring Topology: Direct connections between processors, allowing simultaneous communication but potentially requiring data to travel through multiple processors.
Tree Topology: Direct connections with each processor having three connections, providing a single unique path between any two processors.
Mesh Topology: Each processor connects to its immediate neighbors (above, below, left, and right).
Hypercube Topology: A multi-dimensional mesh where processors connect based on binary representation differences.