Distributed Systems Concepts and Fault Management
Distributed System Definition and Characteristics
A Distributed System is a collection of independent computers that communicate with each other through a network and appear to the users as a single coherent system.
Characteristics
- Resource Sharing: Hardware, software, and data are shared among nodes.
- Concurrency: Multiple processes run simultaneously.
- Scalability: System can grow by adding more nodes.
- Fault Tolerance: System continues working even if some nodes fail.
- Transparency: Hides distribution details (location, access, replication).
Types of Faults and Recovery Methods
Faults in distributed systems require specific recovery strategies:
- Hardware Fault: Physical failure of components. Recovery: Redundancy, replacement, backup hardware.
- Software Fault: Errors in programs or OS. Recovery: Restart, patching, rollback.
- Network Fault: Communication failure. Recovery: Rerouting, retransmission, link redundancy.
- Transient Fault: Temporary fault. Recovery: Retry, error correction.
- Intermittent Fault: Occurs irregularly. Recovery: Monitoring, diagnostics, replacement.
- Permanent Fault: Does not disappear. Recovery: Repair, component replacement, failover.
- Byzantine Fault: Unpredictable or malicious behavior. Recovery: Consensus algorithms, authentication.
System Coordination and Consistency Models
Coordinator Role
A coordinator is a special process or node in a distributed system that controls, manages, and synchronizes the activities of other processes or nodes. It is responsible for tasks like resource allocation, transaction control, synchronization, and maintaining consistency among distributed components.
Example: In a distributed database, the coordinator manages transactions and ensures all participating nodes commit or rollback changes consistently.
Consistency Models
Continuous Consistency
Continuous consistency ensures that all replicas of shared data remain nearly identical at all times, allowing only a small, bounded difference between them. Updates are propagated continuously, and systems define limits on how stale or different the data can be.
Key Idea: Data inconsistency is allowed only within a specified bound (time or value).
Example: In a stock trading system, stock prices at different servers may differ by at most 1 second or 0.5%, ensuring near-real-time accuracy.
Sequential Consistency
Sequential consistency ensures that the result of execution is the same as if all operations were executed in some sequential order, and each process’s operations appear in that order.
Key Idea: All nodes see updates in the same order, but not necessarily at the same time.
Example: If two processes update a shared variable X:
- Process P1 writes X = 10
- Process P2 writes X = 20
All processes will observe either: X = 10 followed by X = 20, or X = 20 followed by X = 10, but never different orders.
Remote Procedure Call (RPC)
Definition and Operation
Remote Procedure Call (RPC) is a communication mechanism that allows a program to call a procedure (function) located on another computer (remote system) as if it were a local function call.
Operation of RPC
- Client calls a local stub (client stub).
- The client stub marshals (packs) parameters into a message.
- Message is sent over the network to the server.
- Server stub unmarshals parameters.
- Server executes the requested procedure.
- Result is sent back to the client.
- Client stub receives and returns the result to the client program.
Advantages and Disadvantages of RPC
Advantages
- Makes distributed programming simple and transparent
- Hides network communication details
- Supports modularity and code reuse
- Improves productivity
- Works across different platforms
Disadvantages
- Slower than local procedure calls
- Network failures affect execution
- Difficult to debug and handle partial failures
- Limited support for asynchronous communication
- Data type compatibility issues
Middleware Organization Patterns
Middleware Definition
Middleware is software that acts as an intermediate layer between the operating system/network and applications in a distributed system. It provides common services and capabilities to applications, enabling them to communicate and manage data efficiently across heterogeneous environments.
Types of Design Patterns Used in Middleware
1. Client-Server Pattern
Description: Separates applications into clients (requesters) and servers (providers).
Example: Web applications where a browser (client) requests data from a web server.
Use in Middleware: Handles request/response communication.
2. Broker Pattern
Description: Introduces a broker component that mediates communication between clients and servers.
Example: CORBA ORB (Object Request Broker)
Use in Middleware: Supports decoupled and scalable communication; clients don’t need direct references to servers.
3. Peer-to-Peer (P2P) Pattern
Description: All nodes act as both clients and servers.
Example: File-sharing networks like BitTorrent
Use in Middleware: Used in decentralized systems for data sharing without a central server.
4. Publish-Subscribe Pattern
Description: Clients subscribe to events or messages, and servers publish them. Subscribers automatically receive updates.
Example: Event notification systems, message queues
Use in Middleware: Supports asynchronous communication and decoupling.
5. Layered Pattern
Description: Middleware is organized in layers, each providing specific services to the upper layer.
Example: OSI model in networking
Use in Middleware: Enhances modularity, maintainability, and separation of concerns.
Virtualization Concepts
Definition
Virtualization is the technique of creating virtual (logical) versions of computing resources such as servers, operating systems, storage, or networks instead of using physical resources directly. It allows multiple virtual machines to run on a single physical system by sharing hardware resources efficiently. Virtualization is widely used in distributed systems, cloud computing, and data centers.
Types of Virtualization
Hardware (Server) Virtualization
In hardware virtualization, a software layer called a hypervisor is installed on the physical machine. The hypervisor allows multiple operating systems to run simultaneously on the same hardware.
Example: Running Windows and Linux on the same computer using VMware or VirtualBox.
Operating System Virtualization
In OS virtualization, multiple isolated environments (containers) run on the same operating system kernel. It is lightweight and faster compared to hardware virtualization.
Example: Docker containers running multiple applications.
Storage Virtualization
Storage virtualization combines multiple physical storage devices into a single logical storage system, making storage management easier.
Example: Storage Area Network (SAN).
Network Virtualization
Network virtualization creates multiple virtual networks over a single physical network to improve security and flexibility.
Example: Virtual LAN (VLAN).
Advantages of Virtualization
- Better utilization of hardware resources
- Reduced cost and power consumption
- Easy scalability and flexibility
- Improved system management
Conclusion
Virtualization is an important technology that improves efficiency, reduces cost, and supports scalable and reliable distributed systems.
