Distributed Systems Concepts and Fault Management

Posted on Mar 5, 2026 in Computers

Distributed System Definition and Characteristics

A Distributed System is a collection of independent computers that communicate with each other through a network and appear to the users as a single coherent system.

Characteristics

Resource Sharing: Hardware, software, and data are shared among nodes.
Concurrency: Multiple processes run simultaneously.
Scalability: System can grow by adding more nodes.
Fault Tolerance: System continues working even if some nodes fail.
Transparency: Hides distribution details (location, access, replication).

Types of Faults and Recovery Methods

Faults in distributed systems require specific recovery strategies:

Hardware Fault: Physical failure of components. Recovery: Redundancy, replacement, backup hardware.
Software Fault: Errors in programs or OS. Recovery: Restart, patching, rollback.
Network Fault: Communication failure. Recovery: Rerouting, retransmission, link redundancy.
Transient Fault: Temporary fault. Recovery: Retry, error correction.
Intermittent Fault: Occurs irregularly. Recovery: Monitoring, diagnostics, replacement.
Permanent Fault: Does not disappear. Recovery: Repair, component replacement, failover.
Byzantine Fault: Unpredictable or malicious behavior. Recovery: Consensus algorithms, authentication.

System Coordination and Consistency Models

Coordinator Role

A coordinator is a special process or node in a distributed system that controls, manages, and synchronizes the activities of other processes or nodes. It is responsible for tasks like resource allocation, transaction control, synchronization, and maintaining consistency among distributed components.

Example: In a distributed database, the coordinator manages transactions and ensures all participating nodes commit or rollback changes consistently.

Consistency Models

Continuous Consistency

Continuous consistency ensures that all replicas of shared data remain nearly identical at all times, allowing only a small, bounded difference between them. Updates are propagated continuously, and systems define limits on how stale or different the data can be.

Key Idea: Data inconsistency is allowed only within a specified bound (time or value).

Example: In a stock trading system, stock prices at different servers may differ by at most 1 second or 0.5%, ensuring near-real-time accuracy.

Sequential Consistency

Sequential consistency ensures that the result of execution is the same as if all operations were executed in some sequential order, and each process’s operations appear in that order.

Key Idea: All nodes see updates in the same order, but not necessarily at the same time.

Example: If two processes update a shared variable X:

Process P1 writes X = 10
Process P2 writes X = 20

All processes will observe either: X = 10 followed by X = 20, or X = 20 followed by X = 10, but never different orders.

Remote Procedure Call (RPC)

Definition and Operation

Remote Procedure Call (RPC) is a communication mechanism that allows a program to call a procedure (function) located on another computer (remote system) as if it were a local function call.

Operation of RPC

Client calls a local stub (client stub).
The client stub marshals (packs) parameters into a message.
Message is sent over the network to the server.
Server stub unmarshals parameters.
Server executes the requested procedure.
Result is sent back to the client.
Client stub receives and returns the result to the client program.

Advantages and Disadvantages of RPC

Advantages

Makes distributed programming simple and transparent
Hides network communication details
Supports modularity and code reuse
Improves productivity
Works across different platforms

Disadvantages

Slower than local procedure calls
Network failures affect execution
Difficult to debug and handle partial failures
Limited support for asynchronous communication
Data type compatibility issues

Middleware Organization Patterns

Middleware Definition

Middleware is software that acts as an intermediate layer between the operating system/network and applications in a distributed system. It provides common services and capabilities to applications, enabling them to communicate and manage data efficiently across heterogeneous environments.

Types of Design Patterns Used in Middleware

1. Client-Server Pattern

Description: Separates applications into clients (requesters) and servers (providers).

Example: Web applications where a browser (client) requests data from a web server.

Use in Middleware: Handles request/response communication.

2. Broker Pattern

Description: Introduces a broker component that mediates communication between clients and servers.

Example: CORBA ORB (Object Request Broker)

Use in Middleware: Supports decoupled and scalable communication; clients don’t need direct references to servers.

3. Peer-to-Peer (P2P) Pattern

Description: All nodes act as both clients and servers.

Example: File-sharing networks like BitTorrent

Use in Middleware: Used in decentralized systems for data sharing without a central server.

4. Publish-Subscribe Pattern

Description: Clients subscribe to events or messages, and servers publish them. Subscribers automatically receive updates.

Example: Event notification systems, message queues

Use in Middleware: Supports asynchronous communication and decoupling.

5. Layered Pattern

Description: Middleware is organized in layers, each providing specific services to the upper layer.

Example: OSI model in networking

Use in Middleware: Enhances modularity, maintainability, and separation of concerns.

Virtualization Concepts

Definition

Virtualization is the technique of creating virtual (logical) versions of computing resources such as servers, operating systems, storage, or networks instead of using physical resources directly. It allows multiple virtual machines to run on a single physical system by sharing hardware resources efficiently. Virtualization is widely used in distributed systems, cloud computing, and data centers.

Types of Virtualization

Hardware (Server) Virtualization

In hardware virtualization, a software layer called a hypervisor is installed on the physical machine. The hypervisor allows multiple operating systems to run simultaneously on the same hardware.

Example: Running Windows and Linux on the same computer using VMware or VirtualBox.

Operating System Virtualization

In OS virtualization, multiple isolated environments (containers) run on the same operating system kernel. It is lightweight and faster compared to hardware virtualization.

Example: Docker containers running multiple applications.

Storage Virtualization

Storage virtualization combines multiple physical storage devices into a single logical storage system, making storage management easier.

Example: Storage Area Network (SAN).

Network Virtualization

Network virtualization creates multiple virtual networks over a single physical network to improve security and flexibility.

Example: Virtual LAN (VLAN).

Advantages of Virtualization

Better utilization of hardware resources
Reduced cost and power consumption
Easy scalability and flexibility
Improved system management

Conclusion

Virtualization is an important technology that improves efficiency, reduces cost, and supports scalable and reliable distributed systems.