A Comprehensive Guide to Database Management Systems (DBMS)

Posted on Jun 18, 2024 in Computers

What is a Database Management System (DBMS)?

A DBMS is a software system designed to manage and organize data in a structured manner. It allows users to:

Create, modify, and query a database
Manage security and access controls

Features of DBMS:

Data modeling
Data storage and retrieval
Concurrency control
Data integrity and security
Backup and recovery

Data Manipulation Language (DML):

Used to store, modify, retrieve, delete, and update data in a database. Common DML statements include:

SELECT
INSERT
UPDATE
DELETE
CALL

Data Definition Language (DDL):

Deals with database schemas and descriptions of how data should reside in the database. Common DDL statements include:

CREATE
ALTER
DROP
TRUNCATE
COMMENT
RENAME

Data Control Language (DCL):

Acts as an access specifier to the database. Common DCL statements include:

GRANT
REVOKE

Transactional Control Language (TCL):

Manages all types of transactional data and transactions. Common TCL statements include:

ROLLBACK
COMMIT
SAVEPOINT

Query by Example (QBE)

QBE is a query language used in relational databases that allows users to search for information by providing an example of the data they want to access through a simple user interface.

Storage Manager

A storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. It is responsible for interacting with the file manager.

Entity-Relationship (ER) Diagram Symbols

Rectangle: Entities
Ellipse: Attributes
Diamond: Relationships among Entities
Line: Attributes to Entities and Entity Sets with Other Relationship Types
Double Ellipse: Multi-Valued Attributes
Double Rectangle: Weak Entity

Generalization

Generalization is a bottom-up approach where two or more lower-level entities combine to form a higher-level entity if they share common attributes. This process can continue to create even higher-level entities.

Relational Databases and RDBMS

A relational database organizes data into tables, allowing for relationships between different pieces of data. An RDBMS is a program that manages relational databases. Most RDBMS use SQL to access the database. Popular RDBMS include:

MySQL
SQL Server
MS Access
Oracle

Database Schema

A database schema is the logical structure of the entire database. It defines data organization, relationships, and constraints.

Relational Algebra

Relational algebra is a procedural language with operators that can be performed on relations.

Types of Relational Algebra Operators:

UNION (∪): Combines all tuples from two relations, removing duplicates. Syntax: A UNION B (or) A ∪ B
INTERSECTION (∩): Returns common tuples from two relations. Syntax: A INTERSECT B (or) A ∩ B
DIFFERENCE (−): Returns tuples present in the first relation but not the second. Syntax: A MINUS B (OR) A − B
CARTESIAN PRODUCT (×): Combines each tuple of the first relation with every tuple of the second. Syntax: A TIMES B (OR) A × B
SELECTION (σ): Selects tuples based on a condition. Syntax: σ condition (relation name)
PROJECTION (π): Selects specified attributes from a relation. Syntax: π(col1,col2…) Relation Name
JOIN: Combines two or more relations based on a join condition.
DIVIDE (÷): Divides tuples from one relation by another. Syntax: A DIVIDE B (OR) A ÷ B
RENAME (ρ): Renames a relation. Syntax: ρ(OLD RELATION, NEW RELATION)

Database Keys

Candidate Key: Minimal set of attributes uniquely identifying a tuple. Example: STUD_NO for STUDENT relation.
Primary Key: Unique, non-null key. Example: STUD_NO in STUDENT(STUD_NO, SNAME, ADDRESS, PHONE).
Super Key: Set of attributes uniquely identifying a tuple. Example: STUD_NO+PHONE for STUDENT relation.
Alternate Key: Candidate key other than the primary key. Example: PHONE in STUDENT relation (if both STUD_NO and PHONE are candidate keys).
Foreign Key: Attribute taking values present in another attribute, referencing the latter.
Composite Key: Combination of attributes acting as a primary key when no single primary key exists. Example: FULLNAME + DOB for accessing student details.

JOIN Operations in SQL

INNER JOIN: Returns rows with matching values in both tables.
LEFT JOIN: Returns all rows from the left table and matching rows from the right table.
RIGHT JOIN: Returns all rows from the right table and matching rows from the left table.
FULL JOIN: Returns all rows from both tables, regardless of matches.

Functional Dependency

Functional dependency in a relational database is a constraint between two sets of attributes, where the value of one set determines the value of the other.

Types of Functional Dependencies:

Full Functional Dependency: No attribute can be removed from the determinant set without breaking the dependency.
Partial Functional Dependency: The dependent attribute is determined by a subset of the determinant set.
Transitive Functional Dependency: A non-key attribute is dependent on another non-key attribute, which is dependent on a key attribute.
Multi-valued Dependency (MVD): Two independent multi-valued attributes are dependent on the same attribute or set of attributes.

Database Anomalies

Database anomalies are problems caused by redundancy in relations. Types of anomalies include:

Insertion Anomalies
Deletion Anomalies
Modification Anomalies

Normalization

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.

Normalization Forms:

First Normal Form (1NF): All attributes contain atomic values (single-valued).
Second Normal Form (2NF): 1NF + every non-primary-key attribute is fully functionally dependent on the primary key.
Third Normal Form (3NF): 2NF + no transitive dependency for non-prime attributes.
Boyce-Codd Normal Form (BCNF): 3NF + every determinant is a candidate key.
Fourth Normal Form (4NF): BCNF + no multi-valued dependencies.
Fifth Normal Form (5NF): 4NF + cannot be further decomposed without loss of information.

Objectives of Normalization:

Reduce data redundancy
Minimize data anomalies
Ensure data integrity

Disadvantages of Normalization:

Increased complexity
Performance overhead
Data modification overhead
Denormalization requirements
Storage overhead

SQL Triggers

SQL triggers are stored procedures automatically executed in response to specific database events (e.g., INSERT, UPDATE, DELETE).

Types of Triggers:

BEFORE Trigger: Executes before the event.
AFTER Trigger: Executes after the event.

Example Trigger:

CREATE TRIGGER example_trigger
AFTER INSERT ON employees
FOR EACH ROW
BEGIN
    INSERT INTO audit_log (action, timestamp)
    VALUES ('Employee inserted', NOW());
END;

Query Optimization

Query optimization improves database query performance by selecting the most efficient execution plan. Techniques include:

Using Indexes
Avoiding SELECT *
Using Joins Wisely
Limiting the Result Set
Avoiding Nested Queries

ACID Properties of Transactions

ACID properties ensure data integrity and reliability in database transactions.

Atomicity: Transactions are treated as a single, indivisible unit of work.
Consistency: Transactions maintain database consistency, transitioning between valid states.
Isolation: Concurrent transactions do not interfere with each other.
Durability: Committed transactions’ effects are permanent, even after system failures.

Concurrency Control Protocols

Concurrency control protocols manage concurrent transactions to ensure database consistency.

1. Lock-Based Concurrency Control:

Shared Locks: Allow concurrent reads but no modifications.
Exclusive Locks: Allow exclusive access for modification.
Two-Phase Locking (2PL): Transactions acquire locks (growing phase) and release them after completion (shrinking phase).
Strict Two-Phase Locking (Strict 2PL): Locks are held until transaction commit.

2. Timestamp-Based Concurrency Control:

Transactions are assigned unique timestamps.
Conflicts are resolved based on timestamps.

Serializability

Serializability ensures that concurrent transaction execution produces the same result as serial execution.

Conflict Serializability: Ensures no conflicts leading to inconsistency.
View Serializability: Considers read and write operations for stricter consistency.

Deadlock

Deadlock occurs when two or more transactions are blocked indefinitely, waiting for each other to release resources.

Deadlock Detection:

Wait-for Graph: Detects cycles in transaction dependencies.
Timeouts: Detect potential deadlocks based on excessive wait times.
Periodic Checks: Regularly analyze transaction and resource states.

Deadlock Handling:

Deadlock Detection and Recovery: Abort one or more deadlocked transactions.
Transaction Rollback: Rollback deadlocked transactions to their initial state.

Deadlock Avoidance:

Lock Ordering: Define a lock acquisition order to prevent circular dependencies.
Resource Allocation Graph: Track resource allocation to avoid cycles.

Deadlock Prevention:

Timeouts: Limit resource acquisition wait times.
No Preemption: Prevent resource preemption from transactions.

States of Transactions

Transactions transition through various states:

Active
Partially Committed
Failed
Aborted
Committed