Database Systems: Concepts and Technologies

Database Transactions and Concurrency Control

Transactions

In a database, a transaction is a logical unit that is executed independently for data retrieval or updates. Relational databases require transactions to be atomic, consistent, isolated, and durable (ACID).

Concurrency Control

Without concurrency control, two problems can arise:

  • Lost update: When two transactions read the old value of a variable and use it to calculate a new value, leading to data loss.
  • Inconsistent retrievals: A retrieval transaction observes values that are involved in an ongoing update transaction, resulting in inconsistent data.

ACID Properties

  • Atomicity: All operations within a transaction are either committed or aborted, ensuring data integrity.
  • Consistency: Transactions maintain database consistency by adhering to predefined rules and constraints.
  • Isolation: Concurrent transactions are isolated from each other, as if they were executed serially.
  • Durability: Once a transaction commits, its effects are permanent and survive system failures.

Database Scalability and Replication

Approaches to Scalability

  • Scale-up: Increasing the resources of a single node, such as memory and CPU cores. This approach can be expensive and has a single point of failure.
  • Scale-out: Adding more nodes to the system, which is more cost-effective and allows for better fault tolerance through replication.

Techniques for Scaling

  • Partitioning: Splitting data into smaller fragments or partitions that can be handled by different nodes or cores.
  • Replication: Storing copies of data on multiple nodes to improve read scalability and fault tolerance.

Database Replication

Database replication aims to increase data availability and efficiency. It involves maintaining multiple copies of data across different nodes.

Consistency Models

  • Strong consistency: All nodes have the same data at any given time.
  • Weak consistency: There is no guarantee that all nodes have the same data at all times, but they eventually converge to a consistent state.

Replication Protocols

  • Eager replication: Updates are propagated to all replicas immediately.
  • Lazy replication: Updates are propagated asynchronously, after the transaction commits.

NoSQL Technologies

NoSQL databases offer alternatives to traditional relational databases, providing different data models and query languages.

Types of NoSQL Databases

  • Graph databases: Optimized for storing and querying graph-structured data.
  • Document-oriented databases: Store data in flexible, semi-structured documents, often in JSON format.
  • Key-value databases: Provide simple key-value access to data.
  • Text-oriented databases: Designed for text indexing and search.
  • In-memory databases: Store data in memory for fast access.

Data Streaming and CEP

Data streaming and complex event processing (CEP) involve processing data in real-time as it arrives, without storing it first.

Stateful Operators

  • Aggregate: Computes aggregate functions, such as sum or average.
  • Equijoin: Matches tuples from two streams based on an equality predicate.

Bigtable and HBase

Bigtable and HBase are distributed, scalable NoSQL databases designed for large datasets.

HBase Architecture

  • Regions: Units of data distribution that can be served by different region servers.
  • HBase Master: Manages cluster configuration and metadata.
  • Region Servers: Serve data to clients.

Dynamo

Dynamo is a highly available and scalable distributed data store developed by Amazon.

Dynamo Architecture

  • Data partitioning: Uses consistent hashing to distribute data across nodes.
  • Replication: Replicates data for fault tolerance.
  • Quorum systems: Ensures data consistency through quorum reads and writes.