Essential Principles of Database Management Systems

Data Models

A data model is a conceptual representation of data structures required for a database system. It defines how data is stored, organized, and manipulated.

Types of Data Models

  • Hierarchical Model: Arranges records in a hierarchy like an organization chart. Each record is a node; the top-most is the root. A child node can have only one parent, preventing multiple parent relationships.
  • Network Model: Similar to the hierarchical model, but a child node can have multiple parents. It supports many-to-many relationships, offering greater flexibility.
  • Relational Model: The most common model used by systems like MySQL and Oracle. Data is stored in tables (relations) consisting of rows (tuples) and columns (attributes). It is highly efficient for database management.

Database Architecture

Database architecture defines the structure and design of a system, explaining how components are organized and how users interact with data. It is categorized into:

  • One-Tier: User, application, and database reside on the same machine. Suitable for local applications like MS Access.
  • Two-Tier: The client application communicates directly with the database server via APIs like ODBC or JDBC.
  • Three-Tier: Includes an application server between the client and database, improving security, scalability, and data integrity.

Data Independence

Data independence is the ability to modify a database schema at one level without affecting another. It improves flexibility and maintenance.

Types of Data Independence

  • Logical Data Independence: Modifying the logical schema without changing external views or application programs.
  • Physical Data Independence: Changing physical storage structures (indexing, file organization) without affecting the logical schema.

Relational Model and Algebra

The relational model, introduced by Edgar F. Codd in 1970, organizes data into tables. It uses primary keys for unique identification and foreign keys to establish relationships.

Relational Algebra Operations

  • Projection (π): Selects specific columns from a table.
  • Selection (σ): Retrieves specific rows based on a condition.
  • Cross Product (×): Combines every row of one table with another.
  • Union (∪): Combines tuples from two relations, removing duplicates.
  • Set Difference (−): Returns tuples existing in one relation but not another.

SQL (Structured Query Language)

SQL is the standard language for managing relational databases. Key command categories include:

  • DDL (Data Definition Language): CREATE, ALTER, DROP, TRUNCATE.
  • DML (Data Manipulation Language): INSERT, UPDATE, DELETE.
  • DQL (Data Query Language): SELECT.
  • DCL (Data Control Language): GRANT, REVOKE.
  • TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT.

Functional Dependency and Normalization

Functional dependency (X → Y) occurs when attribute X determines attribute Y. Normalization is the process of organizing data to reduce redundancy through normal forms (1NF, 2NF, 3NF, and BCNF).

Lossless Decomposition

Lossless decomposition divides a table into smaller parts such that no data is lost, ensuring the original table can be reconstructed via Natural Join.

ACID Properties

ACID ensures reliable transaction processing:

  • Atomicity: “All or nothing” execution.
  • Consistency: Database remains in a valid state.
  • Isolation: Concurrent transactions do not interfere with each other.
  • Durability: Committed data remains permanent after system failures.

Concurrency and Indexing

Concurrency allows multiple users to access the database simultaneously. Techniques like locking and serializability prevent data inconsistency.

Indexing Techniques

  • Primary Indexing: Created on the primary key.
  • Secondary Indexing: Created on non-primary key attributes.
  • Multilevel Indexing: Uses B-Trees and B+ Trees to reduce search time and improve disk I/O efficiency.