Parallel Computing and Data Decomposition Fundamentals
Three Data Decomposition Techniques
1. Functional Decomposition
This is the process of taking a complex process and breaking it down into its individual functions or tasks. It focuses on what the system does rather than how the data is structured. In engineering and software, this results in a hierarchy of functions where the top level is the broad goal and the lower levels are specific operations.
Example: Imagine designing an Automated Teller Machine (ATM).
2. Work Breakdown Structure (WBS)
Widely used in project management, WBS decomposes a project into “work packages.” It is deliverable-oriented, meaning it breaks down the total scope of work to be carried out by the project team to accomplish project objectives.
Example: Building a House.
3. Object-Oriented Decomposition
Unlike functional decomposition, this technique identifies the objects (entities) within a system and the relationships between them. Each object contains both the data and the logic (methods) related to it. This is the backbone of modern software development.
Example: An E-commerce System.
Understanding Tasks and Their Characteristics
A task is the smallest unit of work that needs to be completed to reach a specific goal. If a project is a puzzle, a task is a single piece. Think of it as a bridge between “thinking” and “doing”; it transforms an abstract objective into a concrete action.
The Characteristics of Tasks
- Specificity (Well-Defined): A task must have a clear beginning and end. Vague instructions like “work on marketing” are objectives, not tasks. A true task is specific, such as: “Write a 500-word blog post about summer shoes.”
- Measurability: You should be able to determine exactly when a task is “Done.” This is often defined by Acceptance Criteria. For example, a coding task isn’t finished just because the code is written; it’s finished when it passes all unit tests.
- Assignability: A task needs an “owner.” In project management, if three people are responsible for one task, usually no one is. Tasks are most effective when assigned to a single individual or a specific functional unit.
- Time-Bound (Duration): Every task requires an estimated timeframe or a hard deadline. This prevents “task creep,” where a small piece of work expands to fill all available time. Example: “Complete the data entry by Thursday at 5:00 PM.”
- Atomicity (Independence): In ideal decomposition, tasks are atomic, meaning they are broken down to a level where they cannot be divided further without becoming trivial. They should also ideally be independent, so one person can work on their task without being constantly stalled by someone else’s progress.
Parallel Algorithm Models
1. Data Parallel Model
In this model, the same operation is performed on different subsets of a large data set. It is the most straightforward form of parallelism. If you have 1,000 numbers to square and 10 processors, each processor takes 100 numbers and performs the exact same squaring function.
2. Task Graph Model
In this model, the focus is on the functions rather than the data. A complex problem is decomposed into many smaller tasks. These tasks might be different from one another, and they are mapped out in a Directed Acyclic Graph (DAG) to show which tasks depend on others.
3. Master-Worker Model
This model involves two types of entities: a single Master and multiple Workers. The Master node prioritizes the workload, divides it into smaller jobs, and assigns them to the workers. The Workers execute the assigned jobs and send the results back to the master.
4. Pipeline Model
This model mimics an assembly line. A task is broken into a sequence of stages. While the first piece of data is being processed in Stage 2, the second piece of data is already being processed in Stage 1.
Mapping Techniques for Load Balancing
In parallel computing, mapping is the process of assigning tasks to specific processors. The goal of mapping is to achieve load balancing—ensuring that every processor has an equal amount of work so that no single processor sits idle while others are overloaded. If mapping is done poorly, the entire system slows down to the speed of the slowest (most burdened) processor.
1. Dynamic Mapping
Dynamic mapping happens during execution. It is used when the amount of work per task is unpredictable or when the processing power of the nodes varies. Common dynamic techniques include:
- Self-Scheduling: A “Master” maintains a pool of tasks. When a Worker processor finishes its current job, it requests a new one from the master.
- Work Stealing: There is no central master. Every processor has its own local queue of tasks. If Processor A runs out of work, it “steals” a task from the back of Processor B’s queue. This is highly efficient for recursive algorithms like Quicksort.
- Dynamic Load Balancing (Diffusion): Processors “look” at their immediate neighbors. If a processor sees its neighbor is idle, it “diffuses” (sends) some of its tasks to that neighbor to level out the pressure.
2. Static Mapping
Static mapping occurs before the algorithm starts executing. The tasks are distributed among processors based on a predetermined strategy. This works best when the amount of work per task is predictable and doesn’t change during runtime. A common static technique is Block Mapping.
Characteristics of Inter-Task Interactions
1. Timing (Static vs. Dynamic)
Static Interaction: The timing and the identity of the tasks that interact are known in advance (at compile time). These are predictable and easier to optimize.
Dynamic Interaction: The timing or the “who-interacts-with-whom” is only determined during execution (at runtime). These are harder to manage because they can lead to unpredictable delays or “race conditions.”
2. Spatial Pattern (Regular vs. Irregular)
Regular: Interactions follow a specific, repeating pattern (like a grid, ring, or tree). For example, in image processing, a task might only ever talk to its four immediate neighbors.
Irregular: Tasks interact with others in a seemingly random or data-dependent way. This usually requires complex data structures (like graphs or linked lists) to manage.
3. Direction of Data Flow (One-Way vs. Two-Way)
Unidirectional (One-Way): Data flows from a producer task to a consumer task without a required immediate response.
Bidirectional (Two-Way): Both tasks send and receive information from each other. This is common in iterative scientific simulations where neighbors must swap boundary data at every step.
The Importance of Interaction Overhead
Every interaction carries a cost called Interaction Overhead. This is the time a processor spends communicating rather than calculating. It is usually composed of:
- Latency: The time it takes to start the communication.
- Bandwidth: The speed at which data is actually transferred.
- Synchronization: The time a task spends “waiting” for another task to be ready to talk.
Principles of Superscalar Architecture
Superscalar architecture is a method used in modern CPUs to execute more than one instruction per clock cycle. While a standard processor executes one instruction at a time (like a single-lane road), a superscalar processor has multiple execution units, allowing it to handle several instructions in parallel (like a multi-lane highway).
The Basic Working Principle
The core principle relies on Instruction Level Parallelism (ILP). Here is the step-by-step summary:
- Fetch & Decode: The CPU fetches multiple instructions from memory simultaneously and decodes them.
- Instruction Dispatch: A specialized piece of hardware, the Dispatcher, looks at the pool of decoded instructions. It identifies which ones can be run at the same time without interfering with each other.
- Parallel Execution: The dispatcher sends these independent instructions to different Execution Units (such as ALUs for math, FPUs for floating-point numbers, or Load/Store units for memory).
Process vs. Task Comparison
| Feature | Process | Task |
|---|---|---|
| Definition | An executing program instance. | A unit of execution within a program. |
| Memory | Has its own private memory space. | Shares memory with other tasks in the same process. |
| Weight | Heavyweight: Resource-intensive to create. | Lightweight: Fast to create and destroy. |
| Communication | Uses IPC (Inter-Process Communication). | Can communicate directly via shared memory. |
| Independence | Highly independent and isolated. | Dependent on the parent process. |
