A Comprehensive Guide to Database Management Systems (DBMS)

What is a Database Management System (DBMS)?

A DBMS is a software system designed to manage and organize data in a structured manner. It allows users to:

  • Create, modify, and query a database
  • Manage security and access controls

Features of DBMS:

  • Data modeling
  • Data storage and retrieval
  • Concurrency control
  • Data integrity and security
  • Backup and recovery

Data Manipulation Language (DML):

Used to store, modify, retrieve, delete, and update data in a database. Common DML statements include:

  • SELECT
  • INSERT
  • UPDATE
  • DELETE
  • CALL

Data Definition Language (DDL):

Deals with database schemas and descriptions of how data should reside in the database. Common DDL statements include:

  • CREATE
  • ALTER
  • DROP
  • TRUNCATE
  • COMMENT
  • RENAME

Data Control Language (DCL):

Acts as an access specifier to the database. Common DCL statements include:

  • GRANT
  • REVOKE

Transactional Control Language (TCL):

Manages all types of transactional data and transactions. Common TCL statements include:

  • ROLLBACK
  • COMMIT
  • SAVEPOINT

Query by Example (QBE)

QBE is a query language used in relational databases that allows users to search for information by providing an example of the data they want to access through a simple user interface.

Storage Manager

A storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. It is responsible for interacting with the file manager.

Entity-Relationship (ER) Diagram Symbols

  • Rectangle: Entities
  • Ellipse: Attributes
  • Diamond: Relationships among Entities
  • Line: Attributes to Entities and Entity Sets with Other Relationship Types
  • Double Ellipse: Multi-Valued Attributes
  • Double Rectangle: Weak Entity

Generalization

Generalization is a bottom-up approach where two or more lower-level entities combine to form a higher-level entity if they share common attributes. This process can continue to create even higher-level entities.

Relational Databases and RDBMS

A relational database organizes data into tables, allowing for relationships between different pieces of data. An RDBMS is a program that manages relational databases. Most RDBMS use SQL to access the database. Popular RDBMS include:

  • MySQL
  • SQL Server
  • MS Access
  • Oracle

Database Schema

A database schema is the logical structure of the entire database. It defines data organization, relationships, and constraints.

Relational Algebra

Relational algebra is a procedural language with operators that can be performed on relations.

Types of Relational Algebra Operators:

  1. UNION (∪): Combines all tuples from two relations, removing duplicates. Syntax: A UNION B (or) A ∪ B
  2. INTERSECTION (∩): Returns common tuples from two relations. Syntax: A INTERSECT B (or) A ∩ B
  3. DIFFERENCE (−): Returns tuples present in the first relation but not the second. Syntax: A MINUS B (OR) A − B
  4. CARTESIAN PRODUCT (×): Combines each tuple of the first relation with every tuple of the second. Syntax: A TIMES B (OR) A × B
  5. SELECTION (σ): Selects tuples based on a condition. Syntax: σ condition (relation name)
  6. PROJECTION (π): Selects specified attributes from a relation. Syntax: π(col1,col2…) Relation Name
  7. JOIN: Combines two or more relations based on a join condition.
  8. DIVIDE (÷): Divides tuples from one relation by another. Syntax: A DIVIDE B (OR) A ÷ B
  9. RENAME (ρ): Renames a relation. Syntax: ρ(OLD RELATION, NEW RELATION)

Database Keys

  • Candidate Key: Minimal set of attributes uniquely identifying a tuple. Example: STUD_NO for STUDENT relation.
  • Primary Key: Unique, non-null key. Example: STUD_NO in STUDENT(STUD_NO, SNAME, ADDRESS, PHONE).
  • Super Key: Set of attributes uniquely identifying a tuple. Example: STUD_NO+PHONE for STUDENT relation.
  • Alternate Key: Candidate key other than the primary key. Example: PHONE in STUDENT relation (if both STUD_NO and PHONE are candidate keys).
  • Foreign Key: Attribute taking values present in another attribute, referencing the latter.
  • Composite Key: Combination of attributes acting as a primary key when no single primary key exists. Example: FULLNAME + DOB for accessing student details.

JOIN Operations in SQL

  • INNER JOIN: Returns rows with matching values in both tables.
  • LEFT JOIN: Returns all rows from the left table and matching rows from the right table.
  • RIGHT JOIN: Returns all rows from the right table and matching rows from the left table.
  • FULL JOIN: Returns all rows from both tables, regardless of matches.

Functional Dependency

Functional dependency in a relational database is a constraint between two sets of attributes, where the value of one set determines the value of the other.

Types of Functional Dependencies:

  • Full Functional Dependency: No attribute can be removed from the determinant set without breaking the dependency.
  • Partial Functional Dependency: The dependent attribute is determined by a subset of the determinant set.
  • Transitive Functional Dependency: A non-key attribute is dependent on another non-key attribute, which is dependent on a key attribute.
  • Multi-valued Dependency (MVD): Two independent multi-valued attributes are dependent on the same attribute or set of attributes.

Database Anomalies

Database anomalies are problems caused by redundancy in relations. Types of anomalies include:

  • Insertion Anomalies
  • Deletion Anomalies
  • Modification Anomalies

Normalization

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.

Normalization Forms:

  1. First Normal Form (1NF): All attributes contain atomic values (single-valued).
  2. Second Normal Form (2NF): 1NF + every non-primary-key attribute is fully functionally dependent on the primary key.
  3. Third Normal Form (3NF): 2NF + no transitive dependency for non-prime attributes.
  4. Boyce-Codd Normal Form (BCNF): 3NF + every determinant is a candidate key.
  5. Fourth Normal Form (4NF): BCNF + no multi-valued dependencies.
  6. Fifth Normal Form (5NF): 4NF + cannot be further decomposed without loss of information.

Objectives of Normalization:

  • Reduce data redundancy
  • Minimize data anomalies
  • Ensure data integrity

Disadvantages of Normalization:

  • Increased complexity
  • Performance overhead
  • Data modification overhead
  • Denormalization requirements
  • Storage overhead

SQL Triggers

SQL triggers are stored procedures automatically executed in response to specific database events (e.g., INSERT, UPDATE, DELETE).

Types of Triggers:

  • BEFORE Trigger: Executes before the event.
  • AFTER Trigger: Executes after the event.

Example Trigger:

CREATE TRIGGER example_trigger
AFTER INSERT ON employees
FOR EACH ROW
BEGIN
    INSERT INTO audit_log (action, timestamp)
    VALUES ('Employee inserted', NOW());
END;

Query Optimization

Query optimization improves database query performance by selecting the most efficient execution plan. Techniques include:

  • Using Indexes
  • Avoiding SELECT *
  • Using Joins Wisely
  • Limiting the Result Set
  • Avoiding Nested Queries

ACID Properties of Transactions

ACID properties ensure data integrity and reliability in database transactions.

  • Atomicity: Transactions are treated as a single, indivisible unit of work.
  • Consistency: Transactions maintain database consistency, transitioning between valid states.
  • Isolation: Concurrent transactions do not interfere with each other.
  • Durability: Committed transactions’ effects are permanent, even after system failures.

Concurrency Control Protocols

Concurrency control protocols manage concurrent transactions to ensure database consistency.

1. Lock-Based Concurrency Control:

  • Shared Locks: Allow concurrent reads but no modifications.
  • Exclusive Locks: Allow exclusive access for modification.
  • Two-Phase Locking (2PL): Transactions acquire locks (growing phase) and release them after completion (shrinking phase).
  • Strict Two-Phase Locking (Strict 2PL): Locks are held until transaction commit.

2. Timestamp-Based Concurrency Control:

  • Transactions are assigned unique timestamps.
  • Conflicts are resolved based on timestamps.

Serializability

Serializability ensures that concurrent transaction execution produces the same result as serial execution.

  • Conflict Serializability: Ensures no conflicts leading to inconsistency.
  • View Serializability: Considers read and write operations for stricter consistency.

Deadlock

Deadlock occurs when two or more transactions are blocked indefinitely, waiting for each other to release resources.

Deadlock Detection:

  • Wait-for Graph: Detects cycles in transaction dependencies.
  • Timeouts: Detect potential deadlocks based on excessive wait times.
  • Periodic Checks: Regularly analyze transaction and resource states.

Deadlock Handling:

  • Deadlock Detection and Recovery: Abort one or more deadlocked transactions.
  • Transaction Rollback: Rollback deadlocked transactions to their initial state.

Deadlock Avoidance:

  • Lock Ordering: Define a lock acquisition order to prevent circular dependencies.
  • Resource Allocation Graph: Track resource allocation to avoid cycles.

Deadlock Prevention:

  • Timeouts: Limit resource acquisition wait times.
  • No Preemption: Prevent resource preemption from transactions.

States of Transactions

Transactions transition through various states:

  • Active
  • Partially Committed
  • Failed
  • Aborted
  • Committed