Database Management Systems: Core Concepts & Techniques
Key Database Management System Concepts
- Transaction: A sequence of operations performed as a single logical unit of work, ensuring consistency, atomicity, isolation, and durability (ACID).
- Primary Index: Based on the primary key and is ordered.
- Secondary Index: Created on non-primary key attributes and may not be ordered.
- Functional Dependency: Attribute Y is functionally dependent on X if each value of X is associated with exactly one value of Y (X → Y).
- Concurrency Control: Ensures that database transactions are executed concurrently without violating data integrity.
- Hashing: A technique to convert a given key into another value using a hash function, e.g., hashing a student ID to locate a record.
- Locks: Needed to prevent data inconsistency due to concurrent access and to maintain isolation in transactions.
- Multilevel Indexing: Uses a hierarchy of indexes to reduce access time, especially in large databases.
- Deadlock in DBMS: A situation where two or more transactions wait indefinitely for resources held by each other.
- Timestamp Ordering: A concurrency control protocol that assigns timestamps to transactions to enforce serializability.
- Decomposition: The process of breaking a relation into two or more relations to eliminate anomalies and maintain normalization.
Concurrency Control with Locking
Concurrency control with the locking method ensures that multiple transactions can occur simultaneously without interfering with each other, thereby preserving data consistency. The Two-Phase Locking (2PL) protocol is commonly used, involving two distinct phases:
- Growing Phase: Locks are acquired, and no locks are released.
- Shrinking Phase: Locks are released, and no new locks are acquired.
There are two types of locks—shared (S) and exclusive (X). Shared locks allow multiple transactions to read a resource, while exclusive locks allow only one transaction to write to a resource. If two transactions try to access the same data item with conflicting operations, one must wait. This method avoids issues such as lost updates, temporary inconsistency, and uncommitted data.
However, 2PL can lead to deadlocks, where transactions wait indefinitely for each other’s resources. To avoid this, deadlock detection and prevention techniques are used. Lock granularity—such as page-level or record-level locking—also plays a role in performance. While finer granularity offers better concurrency, it increases locking overhead. The strict 2PL version ensures serializability and recoverability, making it suitable for most practical applications in database systems that require a strong consistency model.
Log-Based Recovery Techniques
Log-based recovery techniques ensure database reliability and integrity even in the event of system crashes. The primary concept is the use of a log file, where every transaction’s operations are recorded before actual execution—known as Write-Ahead Logging (WAL). This log contains details like transaction start, data item changes, and transaction commit or abort status.
There are two key operations in recovery: undo and redo. If a transaction fails before committing, all its changes are undone using the log (undo). If it had committed, changes are redone to ensure durability. The recovery process typically includes Analysis, Redo, and Undo phases. The Analysis phase identifies which transactions were active at the time of the crash. The Redo phase reapplies all operations of committed transactions, and the Undo phase rolls back uncommitted transactions.
Checkpoints are used to limit the amount of log to be processed during recovery, marking a consistent state of the database. Advanced recovery models like ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) enhance performance using fine-grained locking and partial rollbacks. Log-based recovery guarantees the ACID properties—especially atomicity and durability—making it vital for reliable DBMS operations in critical applications.
Indexed Sequential Access Method (ISAM)
The Indexed Sequential Access Method (ISAM) is a file organization technique used to support both sequential and direct access of records. In ISAM, records are stored in a sorted order on the disk, and an index is maintained for faster access. The index file contains entries pointing to data blocks that hold the actual records. Searching a record involves first locating the correct index entry and then reading the corresponding data block, significantly reducing search time compared to linear search.
ISAM supports a multi-level index structure, where the top-level index points to lower-level index blocks, enabling quick access to large datasets. To handle insertions and deletions, ISAM maintains overflow areas linked to primary data blocks. When a data block is full, new records are added to these overflow blocks. However, over time, frequent insertions can lead to excessive overflow blocks, reducing performance. To overcome this, periodic reorganization of the file is necessary. ISAM is particularly effective for applications where data retrieval is mostly read-only or changes infrequently, such as in report generation or historical data analysis. Although modern systems favor dynamic methods like B+ trees, ISAM remains foundational for understanding indexing and file organization in databases.
B+ Trees for Database Indexing
B+ Trees are dynamic, balanced tree structures widely used in databases for indexing. They support efficient insertion and deletion operations while maintaining sorted data for fast range and exact match queries. In B+ Trees, all data records are stored in leaf nodes, and internal nodes contain only keys used for traversal.
Insertion Operations in B+ Trees
Insertion in a B+ Tree involves:
- Finding the appropriate leaf node where the new key should go.
- If the node has space, the key is inserted in sorted order.
- If the node is full, it splits into two, and the middle key is promoted to the parent node.
- This process may recursively propagate up the tree, possibly increasing its height.
Deletion Operations in B+ Trees
Deletion requires:
- Locating the key in the leaf node and removing it.
- If the node still satisfies the minimum number of keys, no further action is needed.
- However, if it underflows, it may borrow a key from a sibling or merge with one, adjusting the parent node accordingly.
- Like insertion, this can propagate upwards if necessary.
These operations ensure the B+ Tree remains balanced, keeping search time logarithmic relative to the number of records. This makes B+ Trees suitable for high-performance indexing in databases.