Dynamo: A Highly Available Key-Value Store

Data Structure and Storage

Dynamo partitions data among nodes using consistent hashing, forming a circular space or ring. Each node has a position, and data items are stored in the closest node clockwise based on their key’s hash value. Replicas are stored in the next (r) nodes clockwise. Virtual nodes ensure uniform load distribution.

CAP Theorem and Dynamo

Dynamo prioritizes Availability and Partition tolerance, providing eventual consistency. Data replication ensures availability during failures, while consistency may require reconciliation later.

Lazy Replication Protocol

Pros:

  • Lower response time due to reduced coordination.
  • Less network traffic through batching updates.

Cons:

  • Potential inconsistencies with write-everywhere and reconciliation needs.
  • Outdated reads in backups with a primary approach.

Majority Quorum and Read Operations

A write quorum consists of nodes whose combined weight exceeds half the total system weight. Write operations update the data and timestamp in all quorum nodes. Read operations retrieve values from a read quorum, returning the value with the highest timestamp.

Quorum Systems in a 7-Node Cluster

A quorum requires more than half the total weight. With 7 nodes, various quorum configurations are possible, each with different fault tolerance levels.

Lazy Primary-Backup Replication

Only the primary replica processes write transactions, sending updates to backups after commit. This reduces latency but may lead to stale reads and inconsistencies if the primary fails.

Concurrency Control and Transactions

Concurrency control prevents issues like lost updates and inconsistent retrievals. Transactions provide ACID properties (Atomicity, Consistency, Isolation, Durability) through concurrency and recovery protocols.

Serializability and Snapshot Isolation

Serializability ensures equivalent results to serial execution. Snapshot isolation avoids anomalies by reading committed data at transaction start, allowing only write-write conflicts.

Isolation Anomalies

  • Dirty read: Reading uncommitted data that may be rolled back.
  • Non-repeatable read: Reading the same item twice with different values due to an intervening update.
  • Phantom: Retrieving a set of items twice with different results due to an intervening insert.

Scalability Techniques

  • Scale-up: Increasing resources of a single node (expensive, single point of failure).
  • Scale-out: Adding more nodes (cost-effective, fault-tolerant).
  • Partitioning: Splitting data horizontally (rows) or vertically (columns).
  • Replication: Storing data on multiple nodes for read scalability.

Consistency Models

  • Strong consistency: All nodes have the same data at any time.
  • Weak consistency: No guarantee of immediate data consistency, with eventual consistency as a common approach.

Replication Protocols

  • Eager (synchronous): Updates applied to all replicas within the transaction (high consistency, lower performance).
  • Lazy (asynchronous): Updates propagated asynchronously (lower latency, potential inconsistencies).
  • Update-everywhere: Updates executed at any replica.
  • Primary copy: Updates executed only at a designated replica.

Taxonomy of Replication Protocols

  • Primary copy / eager: Fault-tolerance with strong consistency.
  • Update anywhere / eager: Databases and NoSQL data stores with quorum-based consistency.
  • Primary copy / lazy: Fault-tolerance with weaker consistency guarantees.
  • Update anywhere / lazy: High availability, disconnected operations, and low conflict rates.

Relational Databases vs. NoSQL

Relational databases store structured data in tables with ACID guarantees, while NoSQL databases handle unstructured or semi-structured data with varying consistency models. Relational databases typically scale vertically, while NoSQL databases scale horizontally.

NoSQL Database Types

  • Graph databases: Optimized for graph data and traversals.
  • Document-oriented: Store and query JSON-like documents.
  • Key-value: Simple key-based access to values.
  • Text-oriented: Focus on text indexing and search.
  • In-memory: Keep all data in memory for low latency.

Cloud Computing and Service Models

Cloud computing provides on-demand access to computing resources over the internet. Service models include:

  • SaaS (Software as a Service): Applications delivered over the web.
  • PaaS (Platform as a Service): Platform for software development and deployment.
  • IaaS (Infrastructure as a Service): Virtualized computing resources (servers, storage, networking).

DynamoDB vs. HBase

DynamoDB offers scalability and performance with minimal maintenance, while HBase provides more flexibility in data storage and types. DynamoDB uses a key-value model, while HBase is column-oriented. Consider your use case and data model requirements when choosing between them.

Dynamo Design Principles

  • Data replication and eventual consistency.
  • Conflict resolution during read operations.
  • Direct routing of requests to appropriate nodes.

Data Partitioning and Replication

Dynamo uses consistent hashing with virtual nodes for load balancing. Each key is replicated at N successor nodes, with a preference list for fault tolerance. Versioning ensures eventual consistency.

Consistency and Interface

Dynamo provides get and put operations with quorum-like consistency (R+W > N). Get retrieves versions from N nodes and reconciles them, while put stores the object and context in a set of replicas.

Handling Failures

Hinted handoff mechanism handles temporary failures by storing data on alternate nodes until the original replica recovers. Data is replicated across data centers for disaster recovery. Merkle trees efficiently identify out-of-sync keys during recovery.

Membership and Failure Detection

Node addition/removal is handled manually to distinguish transient failures. Membership information and partitioning data are exchanged periodically. Permanent failures require explicit removal.