Essential Database Management Systems Concepts

1. Data, Databases, and DBMS

Data refers to raw facts and figures which by themselves have no meaning (e.g., numbers, names, marks, dates). When processed and organized, it becomes meaningful information.

A database is an organized collection of related data stored in a structured manner for easy access, management, and updates (e.g., a student database).

A Database Management System (DBMS) is software that allows users to create, store, retrieve, update, and manage databases efficiently (e.g., MySQL, Oracle, SQL Server, PostgreSQL).

Characteristics of the Database Approach

  • Self-describing nature: Stores both data and metadata.
  • Data independence: Changes in structure do not affect application programs.
  • Reduced data redundancy: Prevents duplicate storage.
  • Data sharing: Multiple users can access data simultaneously.
  • Data consistency: All users see the same updated data.
  • Centralized control: Managed centrally by the DBMS.
  • Security and integrity: Enforces access control and constraints.

Advantages of DBMS

  • Reduced redundancy and inconsistency.
  • Improved security via authorization.
  • Better integrity through constraints.
  • Efficient access using queries.
  • Concurrent multi-user access.
  • Backup and recovery facilities.
  • Improved decision-making.

2. Types of DBMS

DBMS can be classified based on data organization and structure:

  • Centralized DBMS: Data is stored at one central location. Easy to manage but a single point of failure.
  • Distributed DBMS: Data is stored at multiple locations but appears as a single database. Offers high availability.
  • Hierarchical DBMS: Organizes data in a tree-like structure (one-to-many).
  • Network DBMS: Uses a graph structure allowing many-to-many relationships.
  • Relational DBMS (RDBMS): Stores data in tables (relations) using rows and columns.
  • Object-Oriented DBMS (OODBMS): Stores data as objects, supporting classes and inheritance.

3. Data Independence and Architecture

Three-Level ANSI/SPARC Architecture

  1. External Level (View Level): Represents the user’s specific view of the database.
  2. Conceptual Level (Logical Level): Describes the overall logical structure (entities, attributes, constraints).
  3. Internal Level (Physical Level): Describes how data is stored in memory (indexing, file structures).

Data Independence

The ability to change the schema at one level without affecting the next higher level:

  • Physical Data Independence: Changes at the internal level do not affect the conceptual level.
  • Logical Data Independence: Changes at the conceptual level do not affect the external level.

4. Relational Data Model

The relational model represents data in tables (relations). Key terms include:

  • Relation: A table with rows and columns.
  • Attribute: A column representing a property.
  • Tuple: A row representing a single record.
  • Domain: The set of allowable values for an attribute.

5. Keys in Relational Databases

  • Super Key: A set of attributes that uniquely identifies a tuple.
  • Candidate Key: A minimal super key.
  • Primary Key: The chosen candidate key to uniquely identify tuples; cannot contain NULL values.
  • Alternate Key: A candidate key not chosen as the primary key.
  • Foreign Key: An attribute that refers to the primary key of another table to maintain referential integrity.

6. Cardinality and Modality

These define how entities participate in relationships:

  • Cardinality: Specifies the maximum number of entity instances (1:1, 1:M, M:N).
  • Modality: Specifies the minimum participation (mandatory or optional).

7. SQL Queries and Aggregate Functions

Simple Queries: Retrieve data from a single table using SELECT and WHERE clauses.

Nested Queries (Subqueries): A query written inside another query. The inner query executes first.

Aggregate Functions: Perform calculations on a group of rows:

  • COUNT(), SUM(), AVG(), MAX(), MIN().

8. Short Notes

  • Functional Dependency: Occurs when one attribute uniquely determines another (X → Y).
  • Concurrency Control: Manages simultaneous transactions to maintain consistency (prevents lost updates).
  • Indexing: A technique to speed up data retrieval by reducing disk access time.
  • Schemas and Instances: A schema is the static structure; an instance is the data at a specific time.