Distributed Systems Concepts: Characteristics and Middleware

Characteristics of Distributed Systems

  1. Resource Sharing: Sharing of hardware, software, and data among multiple users.
  2. Openness: Uses standard protocols and interfaces for interoperability.
  3. Transparency: Hides the complexity of distribution (location, access, failure, etc.) from users.
  4. Scalability: Can expand in size and users without affecting performance.
  5. Concurrency: Multiple users or processes can work at the same time.
  6. Fault Tolerance: Continues working even if some components fail.
  7. Heterogeneity: Supports different hardware, operating systems, and networks.
  8. Security: Protects data and resources from unauthorized access.

Role of Middleware in Distributed Systems

  • Communication
    • Enables communication between different systems.
    • Provides message passing, remote procedure calls (RPC), etc.
  • Hides Complexity
    • Hides network details, hardware differences, and system distribution.
    • Makes the system appear as a single unified system (transparency).
  • Heterogeneity Support
    • Connects different operating systems, hardware, and programming languages.
    • Ensures interoperability.
  • Security Services
    • Provides authentication, authorization, and encryption.
    • Protects data during communication.
  • Transaction Management
    • Ensures data consistency during multiple operations.
    • Supports commit and rollback mechanisms.

What is Virtualization? Types of Virtualization

Virtualization is a technology that creates a virtual (software-based) version of physical resources such as servers, storage devices, networks, or operating systems.

Types of Virtualization

  1. Hardware Virtualization
    • Creates virtual machines (VMs) on a physical server.
    • Example: Running Windows and Linux on the same computer.
  2. Operating System (OS) Virtualization
    • Allows multiple isolated user spaces on a single OS.
    • Example: Containers (like Docker).
  3. Server Virtualization
    • Divides one physical server into multiple virtual servers.
    • Improves resource utilization and reduces cost.
  4. Storage Virtualization
    • Combines multiple physical storage devices into one virtual storage unit.
    • Makes data management easier.
  5. Network Virtualization
    • Creates virtual networks within physical networks.
    • Improves flexibility and security.
  6. Desktop Virtualization
    • User desktops are stored on a central server.
    • Users can access their desktop from any device.

Client vs. Server Comparison

ClientServer
A client is a computer or program that requests services from a server.A server is a computer or program that provides services to clients.
It initiates communication by sending requests.It waits for requests and sends responses.
Performs limited processing.Performs heavy processing and manages data/resources.
Depends on the server for resources.Controls and manages shared resources.
Example: Web browser.Example: Web server.

Stateful Server vs. Stateless Server

Stateful ServerStateless Server
Maintains the state of a client between requests.Does not keep any client state between requests.
Remembers previous interactions with clients.Treats each request independently.
Useful for complex transactions requiring context.Simpler, easier to scale, and more reliable.
Requires more memory and resources to track clients.Requires less memory and resources.
Example: Online shopping cart server.Example: RESTful web service.

RPC vs. Message-Oriented Communication (MOC)

Remote Procedure Call (RPC)Message-Oriented Communication (MOC)
Enables a program to execute a procedure on a remote system as if it were local.Enables systems to exchange messages asynchronously.
Synchronous communication (caller waits for response).Can be asynchronous (sender and receiver operate independently).
Focuses on invoking functions/methods.Focuses on sending and receiving data messages.
Tighter coupling between client and server.Looser coupling, more flexible and scalable.
Example: Java RMI, gRPC.Example: JMS (Java Message Service), RabbitMQ.

RPC Process Explained Briefly

Remote Procedure Call (RPC) allows a program to execute a procedure on a remote system as if it were local. The process involves the following steps:

  • Client Stub: The client calls a procedure, but the call is intercepted by the client stub.
  • Marshalling: The client stub converts procedure parameters into a standard format to send over the network.
  • Communication: The request is transmitted to the server over the network.
  • Server Stub: The server stub receives the request and unmarshals the parameters.
  • Procedure Execution: The server executes the requested procedure.
  • Return Results: The server stub marshals the results and sends them back to the client.
  • Client Receives Response: The client stub unmarshals the result and returns it to the calling program.

Distributed System Security and Coordination

The Bully Algorithm

The Bully Algorithm is a method used in distributed systems to elect a coordinator (leader) among multiple nodes. When a node detects that the current coordinator has failed, it initiates an election by sending messages to all nodes with higher IDs. If no higher-ID node responds, the initiating node becomes the new coordinator. However, if a higher-ID node responds, that node takes over the election process. The key feature of the Bully Algorithm is that the node with the highest ID always becomes the coordinator, ensuring that the most “most powerful” node leads the system. While it is simple and effective, it can generate many messages in large networks, which may affect performance.

Public Key Cryptography

Public Key Cryptography, also known as asymmetric cryptography, is a cryptographic system that uses a pair of keys: a public key and a private key. The public key is shared openly and is used to encrypt data, while the private key is kept secret and is used to decrypt the data. This system ensures confidentiality, because only the holder of the private key can read the encrypted message, and it also provides authentication, as messages signed with a private key can be verified using the corresponding public key. Public key cryptography is widely used in applications such as digital signatures, SSL/TLS for secure web communication, and secure email. An example of this system is the RSA algorithm, which allows secure communication over insecure networks without the need to share secret keys in advance.

Access Control Matrix (ACM)

An Access Control Matrix (ACM) is a security model used to define the permissions of subjects, such as users or processes, on objects like files, devices, or resources. It is represented as a matrix where the rows correspond to subjects, the columns correspond to objects, and each cell specifies the access rights a subject has on an object, such as read, write, or execute. The access control matrix helps ensure security by clearly defining who can do what on which resource. It can be implemented using Access Control Lists (ACLs), where each object lists the subjects and their rights, or using capabilities, where each subject lists the objects it can access. This model provides a systematic way to manage and enforce access permissions in a distributed or multi-user system.

Denial of Service (DoS) Attack

A Denial of Service (DoS) attack is a type of cyberattack in which an attacker tries to make a computer, network, or service unavailable to its legitimate users. This is usually achieved by overwhelming the system with excessive requests, consuming all its resources, or exploiting software vulnerabilities. DoS attacks can come from a single source or multiple sources, in which case it is called a Distributed Denial of Service (DDoS) attack. The result is that legitimate users cannot access the service, leading to disruptions, slowdowns, or crashes of websites, servers, or networks, causing loss of productivity and revenue.

Distributed Commit and Two-Phase Commit (2PC)

Define Distributed Commit

A Distributed Commit is a protocol used in distributed database systems to ensure that a transaction involving multiple nodes (databases) is either committed on all nodes or rolled back on all nodes. This guarantees atomicity across the distributed system, meaning the transaction is all-or-nothing.

Two-Phase Commit (2PC) Protocol

The Two-Phase Commit (2PC) is a common distributed commit protocol used to coordinate transactions across multiple nodes. It ensures that either all participating nodes commit a transaction or all abort, maintaining consistency.

Phases of 2PC:
  1. Phase 1 – Prepare (Voting Phase)
    • The coordinator sends a prepare message to all participating nodes.
    • Each participant checks if it can commit the transaction and replies with: Yes – Ready to commit, or No – Cannot commit (abort).
  2. Phase 2 – Commit (Decision Phase)
    • If all participants respond Yes, the coordinator sends a commit message to all.
    • If any participant responds No, the coordinator sends an abort message to all.
    • Participants then act according to the coordinator’s decision.

Reliable Client-Server Communication

Reliable client-server communication ensures that data exchanged between a client and a server is delivered accurately, completely, and in the correct order, even in the presence of network failures or errors. In a distributed system, unreliable networks can cause packet loss, duplication, or out-of-order delivery, so reliability mechanisms are essential.

Key Features of Reliable Communication:

  1. Acknowledgments (ACKs): The receiver confirms receipt of messages to the sender.
  2. Retransmission: Lost or corrupted messages are resent.
  3. Sequencing: Messages are delivered in the correct order.
  4. Error Detection and Correction: Detects corrupted data and requests retransmission if needed.
  5. Flow Control: Prevents sender from overwhelming the receiver with too many messages.

Data-Centric Consistency Models

In distributed systems, data-centric consistency models define how updates to a shared data item are propagated and seen by different nodes. These models ensure that all nodes in the system observe a consistent view of the data, even when multiple copies exist.

  1. Strict Consistency
    • The strongest consistency model.
    • Example: Reading a file immediately after it is updated anywhere.
  2. Sequential Consistency
    • Operations from all nodes are executed in some sequential order.
    • Easier to implement than strict consistency.
  3. Causal Consistency
    • Writes that are causally related must be seen by all nodes in the same order.
    • Preserves cause-effect relationships between operations.
  4. Eventual Consistency
    • Updates to a data item will eventually propagate to all nodes.
    • Common in highly available systems like NoSQL databases (e.g., DynamoDB, Cassandra).
  5. Weak Consistency
    • The system does not guarantee immediate consistency.
    • Guarantees are provided only under certain conditions (e.g., after a synchronization event).

Name, Address, and Identifier Definitions

  • Name

    A name is a human-readable label used to identify a resource or object in a distributed system. It is meaningful to users but may not indicate the location of the resource. Example: www.example.com (domain name).

  • Address

    An address specifies the location of a resource in the system. It tells the system where to find the resource but may not be meaningful to humans. Example: 192.168.1.10 (IP address of a server).