Database Fundamentals: Transactions, Concurrency, and Indexing
Understanding Database Fundamentals
Key Database Concepts
- Redundancy in a Schema: Unnecessary duplication of data.
- Cascading Rollback: Occurs when the failure of one transaction causes others to roll back.
- External Storage in DBMS: Refers to storage devices like hard drives or SSDs used to store large data files.
- Heap File: An unordered collection of records stored in blocks.
- Schema Refinement: Removes anomalies and redundancy through normalization.
ACID Properties for Reliable Transactions
ACID stands for Atomicity, Consistency, Isolation, and Durability—core properties ensuring reliable database transactions.
- Atomicity: Ensures a transaction is all-or-nothing. Like a vending machine: you either get the snack or your money back.
- Consistency: Ensures the database moves from one valid state to another. Like following a recipe correctly resulting in a valid dish.
- Isolation: Ensures concurrent transactions don’t interfere. Similar to multiple customers at ATMs—each one’s transaction appears independent.
- Durability: Guarantees that once a transaction commits, its results persist despite crashes. Like saving a file permanently.
Together, ACID ensures database operations are predictable and robust. For instance, in a bank transfer, money deducted from one account and credited to another must happen together (atomicity), maintain correct balances (consistency), not interfere with other operations (isolation), and remain committed even after a power loss (durability). These properties are essential for applications requiring high integrity like banking, reservations, and inventory systems. Without ACID, transactions could leave databases in inconsistent states, risking data loss, duplication, or corruption during system failures or crashes.
Serializability and Concurrency Control
Serializability ensures concurrent transactions yield results equivalent to some serial order. A precedence graph tests this. Each node represents a transaction; a directed edge from Ti to Tj means Ti’s action (read/write) on a data item conflicts and precedes Tj’s. If the graph has no cycles, the schedule is conflict-serializable.
For example, if T1 writes A, and T2 reads A, this leads to an edge T1→T2. If T2 then writes A before T1 reads it, a T2→T1 edge forms, creating a cycle. This means the schedule isn’t serializable. If no such cycle exists, transactions can be reordered to match a serial schedule. This method is effective for detecting correctness of concurrent transaction execution in databases. It prevents anomalies like lost updates or inconsistent reads. Precedence graphs help DBMS ensure transactions maintain data integrity under concurrency by identifying potential conflicts and verifying safe execution orders. Precedence graphs are essential in concurrency control, helping detect cycles early to maintain correctness and avoid transaction conflicts or anomalies.
Database Indexing: Hash-based vs. Tree-based
Hash-based indexing uses a hash function to directly compute the location of a record, offering very fast exact-match lookups. It’s ideal for queries like “find customer with ID 123.” However, it’s inefficient for range queries (e.g., IDs between 100–200) since data isn’t stored in order. Collisions—multiple keys mapping to the same location—must be handled using methods like chaining or linear probing, which can degrade performance. Hash indexes are simpler and faster in limited use-cases but can become inefficient with skewed data or frequent updates.
In contrast, tree-based indexing (especially B+ Trees) maintains sorted data, supports exact and range queries, and handles dynamic inserts/deletes efficiently. A B+ tree node contains keys and pointers (internal) or data pointers (leaf). It allows in-order traversal, which is valuable for sorted outputs or range filters. Tree-based indexes have higher overhead but scale better with dynamic data. For general-purpose relational databases, tree-based indexing is preferred due to its versatility.
Regarding index types:
- Dense Index: Has an index entry for every record.
- Sparse Index: Has entries for some records only.
Tree-based indexes outperform hash indexes in dynamic and range-based queries, making them ideal for general-purpose, high-concurrency database environments.