Introduction to Distributed Systems

Characteristics of Distributed Systems (DS)

  • Concurrency
  • Independent Failures
  • Parallelism
  • Scalability
  • No Global Clock: DS do not need a global clock system to coordinate with the network at different locations because the only way of communication is message passing through the network.

Pros of DS

  • Unlimited Horizontal Scaling
  • Low Latency
  • Fault Tolerance
  • Decentralization
  • Security

Cons of DS

  • Data Integration & Consistency
  • Network and Communication Failure
  • Management Overhead

Design Goals of DS

  • Making Resources Accessible
  • Distribution Transparency
  • Openness
  • Scalability

Types of DS

  1. Distributed Computing System (Cluster, Grid)
  2. Distributed Information System – Transaction Processing System, Enterprise Application Integration, Web Servers, Distributed Databases
  3. Distributed Pervasive System – Distributed systems of mobile and embedded devices, Home Systems, Sensor networks, surveillance systems, etc.

Architectural Styles of DS

  1. Layered architectures
  2. Object-based architectures (System as interacting objects with encapsulated data and behavior)
  3. Data-centered architectures (processes communicate through a common repository)
  4. Event-based architectures (processes communicate through the propagation of events)

Overlay Networks

  1. Structured ON
  2. Unstructured ON
  3. Hierarchical ON
  4. Peer-to-Peer (P2P) ON
  5. Content Delivery Networks (CDNs)

Hybrid Architecture

In a hybrid architecture, client-server solutions are combined with decentralized architectures. Examples include BitTorrent, Super peers, and edge server systems.

Remote Procedure Call (RPC)

RPC is a protocol that enables a program to execute a procedure on a remote server as if it were a local call, abstracting network communication complexities.

Operation of RPC

  • Client Stub: Marshals procedure arguments and sends them as a message over the network.
  • Message Sending: The marshaled message is transmitted to the remote server.
  • Server Stub: Receives the message, unmarshals arguments, and invokes the corresponding procedure.
  • Procedure Execution: The server executes the procedure and marshals the result.
  • Result Sending: The server sends the result back to the client.
  • Client Receives Result: The client stub unmarshals the result and returns it to the calling procedure.

Pros of RPC

  • Simplicity: Abstracts network communication, making remote calls appear as local.
  • Transparency: Hides complexity, enabling easier development of distributed systems.
  • Interoperability: Allows interaction across different languages and platforms.
  • Modularity: Facilitates organized and scalable application design.
  • Error Handling

Cons of RPC

  • Complex Implementation: Requires setup of infrastructure like stubs and protocols.
  • Network Dependency: Performance and reliability are tied to network conditions.
  • Security Risks: Introduces potential vulnerabilities, needing strong security measures.
  • Transparency Limits: Issues like latency and network failures can complicate usage.
  • Scalability Issues

Types of RPC

  • Conventional RPC
  • Asynchronous RPC
  • Deferred synchronous RPC

Middleware

Middleware is software that connects and facilitates communication between different applications or components in a distributed system.

Functions of Middleware

  1. Communication Management: Manages communication between components, providing a uniform interface.
  2. Data Management: Handles data storage, retrieval, and transformation across systems.
  3. Security and Authentication: Controls access, encrypts data, and ensures secure interactions.
  4. Transaction Management: Ensures consistency and coordination in distributed transactions.
  5. Load Balancing: Distributes workloads to optimize performance and prevent overload.

Threads

A thread is a minimal software processor in whose context a series of instructions can be executed. It is a lightweight process created by a process.

Lifecycle of a Thread

  • New
  • Runnable
  • Waiting
  • Timed Waiting
  • Terminated

Types of Threads

  1. User-level threads:
    • All code and data structures for the library exist in user space.
    • Invoking a function in the API results in a local function call in user space and not a system call.
  2. Kernel-level threads:
    • All code and data structures for the library exist in kernel space.
    • Invoking a function in the API typically results in a system call to the kernel.

Virtualization

Virtualization allows multiple virtual instances (hardware, software, or both) to run on a single physical system, enhancing resource utilization, flexibility, and scalability.

Types of Virtualization

  1. Hardware Virtualization: Creates virtual machines (VMs) on a physical server, each with its own OS. Examples: VMware ESXi, Microsoft Hyper-V.
  2. Operating System Virtualization: Runs multiple containers on a single OS kernel, isolating applications. Examples: Docker, Linux Containers (LXC).
  3. Network Virtualization: Combines physical network resources into virtual networks. Examples: Software-Defined Networking (SDN), VLANs.
  4. Storage Virtualization: Merges physical storage devices into a single virtual storage resource. Examples: Storage Area Network (SAN), RAID.
  5. Application Virtualization: Runs applications in a virtual environment, independent of the underlying OS. Examples: Microsoft App-V, Citrix XenApp.
  6. Desktop Virtualization: Separates desktop environments from physical devices for remote access. Examples: Virtual Desktop Infrastructure (VDI), VMware Horizon.
  7. Data Virtualization: Unifies data from multiple sources without moving it. Examples: Denodo, IBM InfoSphere.

Server Clusters

A server cluster is a collection of machines connected through a network, where each machine runs one or more servers. As in any multitiered client-server architecture, many server clusters also contain servers dedicated to application processing. In cluster computing, these are typically servers running on high-performance hardware dedicated to delivering compute power.

TCP Handoff

A standard way of accessing a server cluster is to set up a TCP connection over which application-level requests are then sent as part of a session. A session ends by tearing down the connection. In the case of transport-layer switches, the switch accepts incoming TCP connection requests and hands off such connections to one of the servers.

Code Migration

Code migration refers to the process of transferring code, processes, or entire applications from one system environment to another. In distributed systems, code migration allows components of a program (like processes or objects) to move between different nodes or machines.

Migration Models

  • Process Migration: Moves an entire running process, including its code, data, and execution state.
  • Thread Migration: Moves individual threads within a process to different machines, while the process continues on the original machine.
  • Code Migration: Moves only the code (and some data) to another machine; execution state is recreated or restarted.
    • Strong Mobility: Moves code, data, and execution state.
    • Weak Mobility: Moves code and initial data, with execution starting anew.
  • Agent-Based Migration: Moves autonomous software agents across machines to perform tasks closer to data or services.
  • Object Migration: Moves objects (with state and methods) between machines in object-oriented systems.

Types of Communication

  • Transient Communication: Messages exist only while the sender and receiver are both active. If the receiver is not available, the message is lost (e.g., UDP).
  • Persistent Communication: Messages are stored by the communication system until the receiver retrieves them, even if the receiver is not active at the time of sending (e.g., Email).
  • Synchronous Communication: The sender blocks further operations until some sort of acknowledgment or response is received, hence the name blocking communication.
  • Asynchronous Communication: The sender continues execution without waiting for any acknowledgment or response. This form needs a local buffer at the sender to deal with it at a later stage.

Java Remote Method Invocation (RMI)

The Java Remote Method Invocation (RMI) system allows an object running in one Java virtual machine to invoke methods on an object running in another Java virtual machine. RMI provides for remote communication between programs written in the Java programming language.

Architecture of an RMI Application

  1. Transport Layer: This layer connects the client and the server. It manages the existing connection and also sets up new connections.
  2. Stub: A stub is a representation (proxy) of the remote object at the client. It resides in the client system; it acts as a gateway for the client program.
  3. Skeleton: This is the object which resides on the server side. The stub communicates with this skeleton to pass requests to the remote object.
  4. RRL (Remote Reference Layer): It is the layer which manages the references made by the client to the remote object.

Working of an RMI Application

  • When the client makes a call to the remote object, it is received by the stub which eventually passes this request to the RRL.
  • When the client-side RRL receives the request, it invokes a method called invoke() of the object remoteRef. It passes the request to the RRL on the server side.
  • The RRL on the server side passes the request to the Skeleton (proxy on the server) which finally invokes the required object on the server. The result is passed all the way back to the client.

Naming in Distributed Systems

A name in a distributed system is a string of bits or characters that is used to refer to an entity. An entity in a distributed system can be practically anything. An entity may have multiple access points and addresses.

Structured Naming

Structured naming involves organizing names within a hierarchical or systematic structure to manage and access resources in a distributed system. It uses a predefined format where names are composed of multiple components, each representing a specific part of the hierarchy. This type of naming is similar to how file paths or URLs are structured.

Characteristics of Structured Naming

  • Hierarchy: Names are organized in a tree-like structure, where each level represents a different category or domain.

Example of Structured Naming

In a URL like https://www.example.com/path/to/resource, https is the protocol, www.example.com is the domain name, and /path/to/resource represents the path to a specific resource within the domain. Domain Name System (DNS) also uses structured naming, where com is a top-level domain, example is a second-level domain, and www is a subdomain.

Advantages of Structured Naming

  • Scalability
  • Manageability
  • Uniqueness

Attribute-Based Naming

Attribute-based naming, also known as directory-based or content-based naming, uses a set of attributes to identify and locate resources, rather than relying on a structured hierarchy. Names are represented as a collection of attributes and values, which are used to search for resources in a distributed environment.

Characteristics of Attribute-Based Naming

  • Attribute Focused: Resources are identified by their attributes (e.g., “color=blue,” “size=large”) rather than a fixed name or path.

Example of Attribute-Based Naming

In a file system, instead of searching for a file by a specific path, you might search for all files created by a certain user or all documents with a specific keyword in their metadata. LDAP (Lightweight Directory Access Protocol) is an example of attribute-based naming, where resources are found based on attributes like cn=John Doe or ou=Engineering.

Advantages of Attribute-Based Naming

  • Flexibility
  • Dynamic

Physical and Logical Clocks

Physical clocks are hardware-based and measure real-world time, such as the system clocks on computers. These clocks are synchronized using protocols like NTP and are essential when the exact time of events is crucial, such as in timestamping transactions. However, synchronizing physical clocks across different machines is challenging due to issues like network delays and clock skew, which can lead to inconsistencies in time across the system.

Logical clocks are software-based mechanisms that focus on maintaining the order of events rather than tracking actual time. They are used to ensure consistency in event sequencing across distributed systems. Two common types are Lamport clocks, which determine the causal order of events, and vector clocks, which provide a more detailed view of event causality. Logical clocks excel at tracking the relative order of events, ensuring consistent operation in distributed systems, but they do not provide real-time information, and their implementation can be complex, especially in larger systems.

Coordinator in Distributed Systems

A coordinator manages specific tasks, ensures consistency, and handles decisions requiring central control, such as managing locks, coordinating updates, or resolving conflicts.

Approaches to Electing a Coordinator

  1. Bully Algorithm: Any process can start an election if the coordinator fails. The process with the highest ID becomes the coordinator after a series of challenges between processes.
  2. Ring Algorithm: Processes are arranged in a ring. When a coordinator fails, an election message circulates around the ring, and the process with the highest ID becomes the new coordinator.
  3. Randomized Algorithms: Processes randomly select themselves as coordinators, with conflicts resolved through additional random selection rounds.
  4. Consensus Protocols (e.g., Paxos, Raft): A leader (coordinator) is elected through a voting process as part of achieving distributed consensus, ensuring fault tolerance.

Clock Synchronization

Clock synchronization involves aligning the clocks of different nodes to ensure consistent and accurate timekeeping across the system.

Network Time Protocol (NTP) is a protocol that allows the synchronization of system clocks (from desktops to servers).

Berkeley’s Algorithm

Berkeley’s Algorithm is a clock synchronization technique used in distributed systems. The algorithm assumes that each machine node in the network either doesn’t have an accurate time source or doesn’t possess a UTC server.

Algorithm

  1. An individual node is chosen as the master node from a pool of nodes in the network. This node is the main node in the network which acts as a master, and the rest of the nodes act as slaves. The master node is chosen using an election process/leader election algorithm.
  2. The master node periodically pings slave nodes and fetches clock time at them using Cristian’s algorithm. The diagram below illustrates how the master sends requests to slave nodes.
  3. The master node calculates the average time difference between all the clock times received and the clock time given by the master’s system clock itself. This average time difference is added to the current time at the master’s system clock and broadcasted over the network.
  4. Failure of master => election of a new master

Cristian’s Algorithm

In Cristian’s Algorithm for clock synchronization, the client initiates the process by sending a request message to the server, including its local time, T1, when the request was sent. Upon receiving this request, the server records the current time, T2, and sends a response back to the client, which includes T2. When the client receives this response, it notes the time, T3, at which the response was received. With this information, the client calculates the round-trip delay and the clock offset using the following formulas:

  • Round-trip delay = (T3 – T1) – (T4 – T2)
  • Clock offset = ((T2 – T1) + (T3 – T4)) / 2

where T4 is the time when the server sent the response. Finally, the client adjusts its local clock based on the calculated offset to synchronize with the server’s clock.

Lamport’s Logical Clocks

Lamport’s Logical Clocks provide a mechanism to order events in a distributed system without relying on actual time. Each process in the system maintains a counter that increments with each event. When a process sends a message, it includes its counter value with the message. Upon receiving the message, the recipient process updates its own counter to be greater than its current value and the value from the received message. This method helps in establishing a partial ordering of events, ensuring that if one event causally precedes another, it will be reflected in their timestamps. Lamport’s clocks are useful for determining the causal order of events but do not provide information about the exact timing of events.

Vector Clocks

Vector Clocks extend Lamport’s concept by providing a more detailed view of event causality. Each process maintains a vector of counters, where each entry corresponds to a counter for every process in the system. When a process performs an event or sends a message, it updates its own counter and includes its vector clock in the message. The receiving process then updates its own vector clock based on the received vector, ensuring that all entries reflect the latest known information. This approach allows for a precise understanding of the causality and concurrency of events, as it can distinguish between events that are concurrent and those that are causally related.

Gossip-Based Coordination

Gossip-Based Coordination is a decentralized method used for managing and disseminating information across distributed systems by mimicking the spread of gossip in social networks. In this approach, each node periodically exchanges information with a few randomly selected peers, spreading updates or status changes throughout the system. Over time, this process ensures that information reaches all nodes, achieving eventual consistency rather than immediate synchronization. The gossip-based method is fault-tolerant, as the failure of some nodes does not impede the overall dissemination of information, and it is highly scalable, as each node only needs to communicate with a small, constant number of other nodes. This approach is used in various applications, including failure detection, data replication, and configuration management, making it suitable for large, dynamic distributed systems where traditional centralized coordination might be impractical.

The Bully Algorithm

The Bully Algorithm is a method for electing a coordinator in a distributed system. When a process detects that the current coordinator has failed, it initiates an election by sending a message to all processes with higher IDs. If a higher-ID process is active, it responds, taking over the election. The initiating process will defer to this higher-ID process. If no higher-ID processes respond, the initiating process is declared the new coordinator. This algorithm ensures that the process with the highest ID becomes the coordinator, but it can generate significant message traffic and might be inefficient in large systems with many processes.

Consistency Models

Consistency models in distributed systems refer to the guarantees provided by the system about the order in which operations appear to occur to clients.

Data-Centric Consistency Models

Data-centric consistency models define how data operations are seen by different nodes in a distributed system, focusing on maintaining consistency of data across multiple replicas.

Types of Data-Centric Consistency Models

  1. Strong Consistency: Here, all nodes in the system agree on the order in which operations occurred. Reads will always return the most recent version of the data, and writes will be visible to all nodes immediately after they occur. This model provides the highest level of consistency. There are some performance and availability issues.
  2. Pipelined Random Access Memory (PRAM) or FIFO: PRAM is one of the weak consistency models. Here, all processes view the operations of a single process in the same order that they were issued by that process, while operations issued by different processes can be viewed in a different order from different processes.
  3. Sequential Consistency Model: This model was proposed by Lamport. Here, the result of any execution is the same as if the read and write operations by all processes were executed in some sequential order, and the operations of each individual process appear in this sequence in the order specified by its program. The sequential consistency model is weak as compared to strict consistency.
  4. Causal Consistency Model: It makes sure that all processes see causally-related shared accesses in the same order. The causal consistency model is weaker as compared to strict or sequential consistency because the difference between causal consistency and sequential consistency is that causal consistency does not require a total order.

Client-Centric Consistency Models

Client-centric consistency models define how a data-store presents the data value to an individual client when the client process accesses the data value across different replicas.

Types of Client-Centric Consistency Models

  1. Monotonic Reads: Monotonic reads ensure that once a client reads a data item, it will never see an older version of that data item in subsequent reads. This model provides a guarantee that once a client reads a value, it will always see that value or a more recent one in future read operations.
  2. Monotonic Writes: Monotonic writes ensure that if a client writes a value to a data item, all subsequent writes by that client to the same data item will be applied in the order they were issued. This model guarantees that the client’s writes are seen in the same sequence as they were issued, maintaining a consistent order of updates.
  3. Read-Your-Writes: Read-your-writes consistency, also known as session consistency, guarantees that once a client performs a write operation, it will always see its own write in subsequent read operations. This model ensures that clients have a consistent view of their updates, which is critical for user-facing applications where individuals need to view the results of their actions immediately.
  4. Writes-Follow-Reads: Writes-follow-reads consistency ensures that if a client reads a data item and then writes a new value, the new value will be visible to all subsequent reads by the same client. This model guarantees that updates made after a read operation will be seen in future read operations by the same client.

Failure Models/Faults

html>