Distributed Systems Concepts: Code Migration, Middleware, and Naming
What is Code Migration? Explain the Migration Model
Code migration in distributed systems refers to the transfer of executable code from one machine (or process) to another to be executed remotely. It allows a system to move computations closer to the data or resources, improving performance, flexibility, and resource utilization. There are two main types of code migration:
- Strong Migration: Moves the entire process, including code, data, and execution state, so execution can resume at the exact point on the target machine.
- Weak Migration: Moves only the code and initial data, and execution starts from the beginning on the target machine.
Code Migration Models
- Client-Server Model
- Code moves from a client to a server or vice versa.
- Example: Remote procedure calls or web applications sending scripts to a server.
- Remote Evaluation Model
- A client sends code to a remote machine along with data.
- Useful when data is large and it’s efficient to move code to data rather than data to code.
- Code on Demand Model
- A client downloads code from a remote server to execute locally.
- Example: Java applets or JavaScript executed in a browser.
Explain Different Types of Middleware
Middleware facilitates communication and management between distributed applications. Key types include:
- Message-Oriented Middleware (MOM)
- Enables asynchronous communication between distributed applications.
- Example: RabbitMQ, IBM MQ.
- Remote Procedure Call (RPC) Middleware
- Allows a program to invoke procedures on a remote system as if they were local.
- Example: Java RMI, gRPC.
- Object Middleware
- Supports communication between distributed objects in object-oriented systems.
- Example: CORBA (Common Object Request Broker Architecture).
- Database Middleware
- Connects applications to distributed databases.
- Example: ODBC, JDBC.
- Application Server Middleware
- Provides a platform for running enterprise applications.
- Example: JBoss, WebLogic, Tomcat.
- Portal Middleware
- Provides unified access to multiple services and applications.
- Example: Enterprise web portals.
Explain the Concept of Multicast Communication
Multicast communication is a method in distributed systems where a message is sent from one sender to a specific group of receivers simultaneously, rather than sending it individually to each receiver (unicast) or to all nodes (broadcast). This helps reduce network traffic and improves efficiency when the same data needs to be delivered to multiple nodes.
Key Features:
- One-to-Many Communication: A single message reaches multiple subscribed recipients.
- Group Management: Receivers join or leave multicast groups dynamically.
- Efficiency: Reduces duplicate messages compared to sending multiple unicast messages.
- Reliability Options: Can be reliable (acknowledged) or unreliable (best-effort).
Explain Structured Naming Schema
A structured naming schema in distributed systems is a method of naming entities (like files, resources, or objects) using a hierarchical and organized format. It makes names easy to understand, locate, and manage.
Key Features:
- Hierarchy: Names are arranged in levels, often resembling a tree structure.
- Example:
department/employee/records/file1.txt
- Example:
- Uniqueness: Each entity has a unique path or fully qualified name within the hierarchy.
- Ease of Resolution: The system can easily locate an entity by traversing the hierarchy.
- Human-Readable: Unlike raw identifiers or addresses, structured names are meaningful to users.
Example:
- In a file system:
/home/user/documents/report.docx - In DNS: www.example.com
Explain Attribute-Based Naming with an Example
Attribute-based naming is a method in distributed systems where an entity is identified not by a hierarchical name or location, but by a set of attributes or properties that describe it. The system resolves the entity based on the values of these attributes, allowing more flexible and dynamic lookup.
Key Features:
- Entity Identification by Properties: Entities are located using descriptive attributes rather than a fixed name.
- Supports Queries: Users can search for entities by specifying attribute values.
- Flexibility: Useful in dynamic environments where entities may move or change frequently.
Example:
Suppose a distributed file system uses attribute-based naming with the following attributes:
Type: DocumentOwner: AliceDateCreated: 2026-03-10
A query like: Find file where Type = Document AND Owner = Alice
Define Name, Address, and Identifier
- Name
- A name is a human-readable label used to identify an entity in a distributed system.
- It is meaningful to users but may not indicate the entity’s location.
- Example: www.example.com
- Address
- An address specifies the location of an entity in the system.
- It tells the system where to find the entity but may not be meaningful to humans.
- Example:
192.168.1.10(IP address)
- Identifier
- An identifier uniquely identifies an entity regardless of its location or name.
- Used internally by the system to track and manage entities.
- Example: UUID
550e8400-e29b-41d4-a716-446655440000
What is Clock Synchronization? Why is it Needed?
Clock synchronization is the process of coordinating the clocks of different nodes (computers) in a distributed system so that they agree on a common notion of time. Since each machine has its own hardware clock, differences (drifts) may occur, which can cause inconsistencies in event ordering, scheduling, and coordination.
Why Clock Synchronization is Needed
- Event Ordering: To maintain the correct sequence of events across nodes.
- Coordination: For coordinating tasks, transactions, and distributed algorithms.
- Consistency: Ensures timestamps in logs, databases, or distributed files are accurate.
- Fault Detection & Recovery: Helps in detecting failures and coordinating recovery actions.
Explain Various Replica Management Strategies
In distributed systems, replica management refers to techniques used to manage multiple copies (replicas) of data across different nodes to improve availability, reliability, and performance. Different strategies are used to maintain consistency and coordination among replicas.
- Primary-Backup (Primary-Copy) Strategy
- One replica acts as the primary, and others act as backups.
- Simple but may create a bottleneck at the primary.
- Active Replication (State Machine Replication)
- All replicas process the same requests in the same order.
- Ensures high availability and fault tolerance.
- Passive Replication
- Backups remain idle until the primary fails.
- Easier to manage but slower failover.
- Quorum-Based Replication
- Uses voting mechanisms for read and write operations.
- Balances consistency and availability.
- Lazy (Asynchronous) Replication
- Updates are propagated to replicas after some delay.
- Common in large-scale systems (eventual consistency).
- Eager (Synchronous) Replication
- Updates are sent to all replicas before confirming the transaction.
- Ensures strong consistency but increases latency.
Discuss the Concept of Threads in Distributed Systems
A thread is the smallest unit of execution within a process. In distributed systems, threads allow multiple tasks to execute concurrently, improving system performance, responsiveness, and resource utilization.
Key Concepts:
- Concurrency
- Threads enable multiple operations (such as handling client requests) to run at the same time.
- A server can create a separate thread for each client request.
- Lightweight Process
- Threads share the same memory space of a process.
- They consume fewer resources than full processes.
- Multithreading in Distributed Systems
- Servers use multithreading to handle multiple remote requests simultaneously.
- Improves throughput and reduces response time.
- Types of Threads
- User-Level Threads: Managed by user libraries, faster but may block the entire process.
- Kernel-Level Threads: Managed by the operating system, more reliable but slightly slower.
- Benefits in Distributed Systems
- Faster communication handling
- Efficient resource sharing
- Challenges
- Deadlocks and race conditions
- Complexity in programming
Explain Structured Naming Schema of an Entity Briefly
A structured naming schema is a method of naming entities in a distributed system using a hierarchical and organized structure. The name is composed of multiple components separated into levels, forming a tree-like structure. Each level represents a specific part of the entity’s location or classification.
In this schema, an entity is identified by a path name, where each component helps in locating the resource step by step. For example, in a file system:/home/user/documents/file.txt
or in DNS:
www.example.com
Discuss Different Access Control Mechanisms
Access control mechanisms are methods used to restrict and manage access to system resources in distributed systems. They ensure that only authorized users or processes can access specific data or services.
- Discretionary Access Control (DAC)
- Access rights are controlled by the owner of the resource.
- Example: File permissions in operating systems.
- Mandatory Access Control (MAC)
- Access is controlled by a central authority based on security levels.
- Used in highly secure systems (e.g., military systems).
- Role-Based Access Control (RBAC)
- Access rights are assigned based on user roles.
- Example: Admin, Manager, Employee roles.
- Attribute-Based Access Control (ABAC)
- Access is granted based on attributes such as user identity, time, location, or device type.
- Provides fine-grained and flexible control.
- Access Control Matrix
- Represents permissions in a matrix form (subjects × objects).
- Can be implemented using Access Control Lists (ACLs) or Capabilities.
