A Comprehensive Guide to Database Management Systems (DBMS)
What is a Database Management System (DBMS)?
A DBMS is a software system designed to manage and organize data in a structured manner. It allows users to:
- Create, modify, and query a database
- Manage security and access controls
Features of DBMS:
- Data modeling
- Data storage and retrieval
- Concurrency control
- Data integrity and security
- Backup and recovery
Data Manipulation Language (DML):
Used to store, modify, retrieve, delete, and update data in a database. Common DML statements include:
SELECTINSERTUPDATEDELETECALL
Data Definition Language (DDL):
Deals with database schemas and descriptions of how data should reside in the database. Common DDL statements include:
CREATEALTERDROPTRUNCATECOMMENTRENAME
Data Control Language (DCL):
Acts as an access specifier to the database. Common DCL statements include:
GRANTREVOKE
Transactional Control Language (TCL):
Manages all types of transactional data and transactions. Common TCL statements include:
ROLLBACKCOMMITSAVEPOINT
Query by Example (QBE)
QBE is a query language used in relational databases that allows users to search for information by providing an example of the data they want to access through a simple user interface.
Storage Manager
A storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. It is responsible for interacting with the file manager.
Entity-Relationship (ER) Diagram Symbols
- Rectangle: Entities
- Ellipse: Attributes
- Diamond: Relationships among Entities
- Line: Attributes to Entities and Entity Sets with Other Relationship Types
- Double Ellipse: Multi-Valued Attributes
- Double Rectangle: Weak Entity
Generalization
Generalization is a bottom-up approach where two or more lower-level entities combine to form a higher-level entity if they share common attributes. This process can continue to create even higher-level entities.
Relational Databases and RDBMS
A relational database organizes data into tables, allowing for relationships between different pieces of data. An RDBMS is a program that manages relational databases. Most RDBMS use SQL to access the database. Popular RDBMS include:
- MySQL
- SQL Server
- MS Access
- Oracle
Database Schema
A database schema is the logical structure of the entire database. It defines data organization, relationships, and constraints.
Relational Algebra
Relational algebra is a procedural language with operators that can be performed on relations.
Types of Relational Algebra Operators:
- UNION (∪): Combines all tuples from two relations, removing duplicates. Syntax: A UNION B (or) A ∪ B
- INTERSECTION (∩): Returns common tuples from two relations. Syntax: A INTERSECT B (or) A ∩ B
- DIFFERENCE (−): Returns tuples present in the first relation but not the second. Syntax: A MINUS B (OR) A − B
- CARTESIAN PRODUCT (×): Combines each tuple of the first relation with every tuple of the second. Syntax: A TIMES B (OR) A × B
- SELECTION (σ): Selects tuples based on a condition. Syntax: σ condition (relation name)
- PROJECTION (π): Selects specified attributes from a relation. Syntax: π(col1,col2…) Relation Name
- JOIN: Combines two or more relations based on a join condition.
- DIVIDE (÷): Divides tuples from one relation by another. Syntax: A DIVIDE B (OR) A ÷ B
- RENAME (ρ): Renames a relation. Syntax: ρ(OLD RELATION, NEW RELATION)
Database Keys
- Candidate Key: Minimal set of attributes uniquely identifying a tuple. Example: STUD_NO for STUDENT relation.
- Primary Key: Unique, non-null key. Example: STUD_NO in STUDENT(STUD_NO, SNAME, ADDRESS, PHONE).
- Super Key: Set of attributes uniquely identifying a tuple. Example: STUD_NO+PHONE for STUDENT relation.
- Alternate Key: Candidate key other than the primary key. Example: PHONE in STUDENT relation (if both STUD_NO and PHONE are candidate keys).
- Foreign Key: Attribute taking values present in another attribute, referencing the latter.
- Composite Key: Combination of attributes acting as a primary key when no single primary key exists. Example: FULLNAME + DOB for accessing student details.
JOIN Operations in SQL
- INNER JOIN: Returns rows with matching values in both tables.
- LEFT JOIN: Returns all rows from the left table and matching rows from the right table.
- RIGHT JOIN: Returns all rows from the right table and matching rows from the left table.
- FULL JOIN: Returns all rows from both tables, regardless of matches.
Functional Dependency
Functional dependency in a relational database is a constraint between two sets of attributes, where the value of one set determines the value of the other.
Types of Functional Dependencies:
- Full Functional Dependency: No attribute can be removed from the determinant set without breaking the dependency.
- Partial Functional Dependency: The dependent attribute is determined by a subset of the determinant set.
- Transitive Functional Dependency: A non-key attribute is dependent on another non-key attribute, which is dependent on a key attribute.
- Multi-valued Dependency (MVD): Two independent multi-valued attributes are dependent on the same attribute or set of attributes.
Database Anomalies
Database anomalies are problems caused by redundancy in relations. Types of anomalies include:
- Insertion Anomalies
- Deletion Anomalies
- Modification Anomalies
Normalization
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.
Normalization Forms:
- First Normal Form (1NF): All attributes contain atomic values (single-valued).
- Second Normal Form (2NF): 1NF + every non-primary-key attribute is fully functionally dependent on the primary key.
- Third Normal Form (3NF): 2NF + no transitive dependency for non-prime attributes.
- Boyce-Codd Normal Form (BCNF): 3NF + every determinant is a candidate key.
- Fourth Normal Form (4NF): BCNF + no multi-valued dependencies.
- Fifth Normal Form (5NF): 4NF + cannot be further decomposed without loss of information.
Objectives of Normalization:
- Reduce data redundancy
- Minimize data anomalies
- Ensure data integrity
Disadvantages of Normalization:
- Increased complexity
- Performance overhead
- Data modification overhead
- Denormalization requirements
- Storage overhead
SQL Triggers
SQL triggers are stored procedures automatically executed in response to specific database events (e.g., INSERT, UPDATE, DELETE).
Types of Triggers:
- BEFORE Trigger: Executes before the event.
- AFTER Trigger: Executes after the event.
Example Trigger:
CREATE TRIGGER example_trigger
AFTER INSERT ON employees
FOR EACH ROW
BEGIN
INSERT INTO audit_log (action, timestamp)
VALUES ('Employee inserted', NOW());
END;
Query Optimization
Query optimization improves database query performance by selecting the most efficient execution plan. Techniques include:
- Using Indexes
- Avoiding
SELECT * - Using Joins Wisely
- Limiting the Result Set
- Avoiding Nested Queries
ACID Properties of Transactions
ACID properties ensure data integrity and reliability in database transactions.
- Atomicity: Transactions are treated as a single, indivisible unit of work.
- Consistency: Transactions maintain database consistency, transitioning between valid states.
- Isolation: Concurrent transactions do not interfere with each other.
- Durability: Committed transactions’ effects are permanent, even after system failures.
Concurrency Control Protocols
Concurrency control protocols manage concurrent transactions to ensure database consistency.
1. Lock-Based Concurrency Control:
- Shared Locks: Allow concurrent reads but no modifications.
- Exclusive Locks: Allow exclusive access for modification.
- Two-Phase Locking (2PL): Transactions acquire locks (growing phase) and release them after completion (shrinking phase).
- Strict Two-Phase Locking (Strict 2PL): Locks are held until transaction commit.
2. Timestamp-Based Concurrency Control:
- Transactions are assigned unique timestamps.
- Conflicts are resolved based on timestamps.
Serializability
Serializability ensures that concurrent transaction execution produces the same result as serial execution.
- Conflict Serializability: Ensures no conflicts leading to inconsistency.
- View Serializability: Considers read and write operations for stricter consistency.
Deadlock
Deadlock occurs when two or more transactions are blocked indefinitely, waiting for each other to release resources.
Deadlock Detection:
- Wait-for Graph: Detects cycles in transaction dependencies.
- Timeouts: Detect potential deadlocks based on excessive wait times.
- Periodic Checks: Regularly analyze transaction and resource states.
Deadlock Handling:
- Deadlock Detection and Recovery: Abort one or more deadlocked transactions.
- Transaction Rollback: Rollback deadlocked transactions to their initial state.
Deadlock Avoidance:
- Lock Ordering: Define a lock acquisition order to prevent circular dependencies.
- Resource Allocation Graph: Track resource allocation to avoid cycles.
Deadlock Prevention:
- Timeouts: Limit resource acquisition wait times.
- No Preemption: Prevent resource preemption from transactions.
States of Transactions
Transactions transition through various states:
- Active
- Partially Committed
- Failed
- Aborted
- Committed
