Distributed Systems: Concepts, Technologies, and Examples

Two-Phase Commit Problems (Locking the entire cluster if one node is down, Possible to implement timeouts, Possible to use Quorum, Quorum: in a distributed environment, if there is a partition, then the nodes vote to commit or rollback)

Vector Clocks (Used for conflict detection of data, Timestamp-based resolution of conflicts is not enough, generating a partial ordering of events in a distributed system and detecting causality violations, A vector clock of a system of N processes is an array/vector of N logical clocks, one clock per process; a local “smallest possible values” copy of the global clock array is kept in each process, with the following rules for clock updates: (Initially all clocks are zero, Each time a process experiences an internal event, it increments its own logical clock in the vector by one, Each time a process prepares to send a message, it sends its entire vector along with the message being sent, Each time a process receives a message, it increments its own logical clock in the vector by one and updates each element in its vector by taking the maximum of the value in its own vector clock and the value in the vector in the received message (for every element)).)

Gossip Protocol (All the nodes talk to each other peer-wise, There is no global state, No single point of coordinator, If one node goes down and there is a Quorum load for that node is shared among others, • Self-managing system, If a new node joins, load is also distributed)

Key-Value NoSQL (Focus on scaling to huge amounts of data, Designed to handle massive load, Data model: (global) collection of Key-value pairs, Dynamo ring partitioning and replication, Pros: very fast, very scalable, simple data model, eventual consistency, fault-tolerance, Cons: – Can’t model more complex data structures such as objects)

Document-Based (Can model more complex objects, Data model: collection of documents)

Column-Based (allow key-value pairs to be stored (and retrieved on key) in a massively parallel system, storing principle: big hashed distributed tables, properties: partitioning (horizontally and/or vertically), high availability etc. completely transparent to application, One column family can have variable numbers of columns, Cells within a column family are sorted “physically”)

RDBMS: must fetch data from several places on disk and glue together
• Column-based NOSQL: only fetch column families of those columns that are required by a query (all columns in a column family are stored together on the disk, so multiple rows can be retrieved in one read operation à data locality)

Graph Stores (Focus on modeling the structure of data (interconnectivity), Scale vertically, no clustering.)

NO SQL to use
Key / Value Based: quickly storing basic/processed information. extremely performant, efficient and usually easily scalable. Caching, Queueing, Keeping live information, Distributing information / tasks
Column Based: used when simple key / value pairs are not enough. Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns. Row-oriented organizations are more efficient when many columns of a single row are required at the same time. Keeping unstructured, non-volatile information, Scaling
Document Based: overcome the constraints of one or two levels of key / value nesting. any complex and arbitrary structure can form a document. need to retrieve entire document even for getting/updating a single value, hurting performance. Nested information, JavaScript friendly due to JSON format.
Graph Based: These databases are commonly used by applications whereby clear boundaries for connections are necessary to establish. They use tree-like structures (i.e. graphs) with nodes and edges connecting each other through relations. Handling complex relational information, Modelling and handling classifications

ACID (Strong consistency, Isolation, Focus on “commit”, Nested transactions, Availability, Conservative (pessimistic), Difficult evolution (e. g. schema))

BASE (Weak consistency – stale data OK, Availability first, Best effort, Approximate answers OK, Aggressive (optimistic), Simpler, Faster, Easier evolution)

Chubby System (A chubby cell consists of a small set of servers (replicas), A master is elected from the replicas via a consensus protocol, Client talks to the master via chubby library, Replicas maintain copies of a simple database, Clients send read/write requests only to the master, For a write: • The master propagates it to replicas via the consensus protocol • Replies after the write reaches a majority of replicas, For a read: • The master satisfies the read alone, If a replica fails and does not recover for a long time (a few hours) • A fresh machine is selected to be a new replica, replacing the failed one • It updates the DNS • Obtains a recent copy of the database • The current master polls DNS periodically to discover new replicas)

UNIX-like File System Interface (Chubby supports a strict tree of files and directories)

File Handles (Has check digits encoded in it; cannot be forged, Sequence number, Mode information at open time)

Locks and Sequences (Locks: advisory rather than mandatory)

Caching (Strict consistency: easy to understand, Lease based, master will invalidate cached copies upon a write request, Write-through caches)

Sessions (A client sends keep-alive requests to a master, A master responds by a keep-alive response, Immediately after getting the keep-alive response, the client sends another request for extension, The master will block keep-alives until close the expiration of a session, Extension is default to 12s)

Scaling (Proxies: • Handle KeepAlive and read requests, pass write requests to the master • Reduce traffic but reduce availability, Partitioning: partition name space • A master handles nodes with hash(name) mod N == id • Limited cross-partition messages)

Dynamo


rXF0az0x2hTrzr8vQYpNy3WTnWThCtxgLMe4dtH6


SSTable (block index (stored at the end of the SSTable) is used to locate blocks, lookup can be performed with a single disk seek, Optionally, an SSTable can be completely mapped into memory)