Database Systems: Architecture, Design, and Normalization

Three-Level Database Architecture

The three-level (three-schema) architecture separates how data is physically stored, logically structured, and viewed by different users. This is the ANSI/SPARC architecture, and it exists mainly to provide data abstraction and data independence.

External, Conceptual, and Internal Levels

  • External (View) level: This is the level closest to the users; it defines multiple user views (external schemas). Each view shows only part of the database relevant to that user or application, hiding the rest (e.g., a Student_View showing only RollNo, Name, and Course).
  • Conceptual (Logical) level: This level describes the entire logical structure of the database for the whole organization in a single conceptual schema. It defines entities, attributes, relationships, and integrity constraints, independent of physical storage (e.g., full STUDENT(rollno, name, course, address, phone), COURSE, and ENROLL tables with relationships).
  • Internal (Physical) level: This level describes how data is physically stored on disk: file organization, indexes, record formats, storage paths, etc. It includes details like STUDENT stored in heap file STUD.F, a clustered index on rollno, a secondary index on course, block sizes, and access methods.

Database Mapping and Data Independence

Mapping connects the three levels like bridges, ensuring that changes in one level do not break the others. It translates user requests from simple views down to actual disk storage. [tutorialspoint +1]

External-Conceptual Bridge

This bridge links a user’s personal view to the full logical database design.

  • Full logical table: STUDENT (id, name, class, marks, phone).
  • User’s view: MY_STUDENTS (id, name, class) – hides marks and phone.

When a user types: SHOW MY_STUDENTS WHERE class = 'Math', the bridge quietly changes it to: SHOW id, name, class FROM STUDENT WHERE class = 'Math'. If someone adds ’email’ to the full table later, your simple view and query stay exactly the same—just tweak the bridge. No app changes are needed. [beginnersbook +1]

Conceptual-Internal Bridge

This bridge links the logical table to real files on your hard drive.

  • Logical table: STUDENT (id, name, class, marks, phone).
  • Real storage: Data in a file called STUD_DATA, with a fast search tool (index) on ‘class’.

For SHOW name FROM STUDENT WHERE class = 'Math', the bridge picks the file and uses the index to grab only Math rows quickly, skipping slow full searches. If storage moves to a new file or a faster index, you only update the bridge; your logical table and user queries do not change.

ER Diagram Symbols and Database Design

ER diagrams use simple shapes to show database tables, their details, and connections—acting like a blueprint for your data.

Main Symbols Explained

  • Rectangle: Strong entity (main table, e.g., “Student”). A double rectangle represents a weak entity (depends on another).
  • Oval: Attribute (detail like “name”). An underline indicates a primary key (unique ID like “rollno”).
  • Diamond: Relationship (connection like “Enrolls”).
  • Line (solid): Normal link between entity and relationship. A dashed line represents optionality or inheritance.
  • Crow’s foot (fork ||>): The “many” side of a connection.
  • Straight line (—): The “one” side.
  • Circle (O—): “Zero or one” (optional).
  • Examples: 1—||> means one-to-many.

Example Structure:

  • [Student] rectangle: oval: rollno (underlined), oval: name
  • [Course] rectangle: oval: code (underlined), oval: title
  • Diamond: enrolls_in (between Student and Course)
  • Line: Student --1--||> enrolls_in --||>1-- Course

Relational Algebra: The Five Core Operators

A “complete set of operators” in relational algebra refers to the smallest group of basic tools that can perform every possible database query by combining them—like a basic toolkit that builds anything.

The 5 Core Operators

These are: Selection (σ), Projection (π), Union (∪), Set Difference (−), and Cartesian Product (×). [geeksforgeeks +1]

  • Selection (σ): Pick rows that match a condition (e.g., “age > 18”).
  • Projection (π): Pick only needed columns (e.g., “name, rollno”).
  • Union (∪): Combine two tables (e.g., all students from BCA ∪ BTech).
  • Set Difference (−): Rows in one table but not the other (e.g., BCA − BTech).
  • Cartesian Product (×): Every row of table1 paired with every row of table2.

Justification for Completeness

Advanced operators like Join, Intersection, and Division can be built from these five:

  • Join: σ (R × S) (filtering after combining).
  • Intersection: R − ( (R − S) ∪ (S − R) ).
  • Division: Using difference and projection tricks.

Proof: Any SQL query can be rewritten using just these five steps. These alone express all relational power, similar to how addition and multiplication build all mathematics from basics.

Relational Calculus: TRC and DRC Explained

Relational Calculus is like writing a shopping list for your database: you describe exactly what data you want with conditions, without telling the system step-by-step how to fetch it. It is non-procedural (“what,” not “how”), unlike algebra’s commands.

Tuple Relational Calculus (TRC)

TRC focuses on complete rows (tuples). You declare a tuple variable that scans tables and picks matching ones.

  • Simple form: { t | conditions on t } – where t means “any row”.
  • Example: Students with age > 20 from the STUDENT table: { t | t ∈ STUDENT ∧ t.age > 20 }.
  • t checks every STUDENT row; if age > 20, it grabs the whole row t.
  • This is equivalent to: SELECT * FROM STUDENT WHERE age > 20.
  • Uses ∃ (some row exists) or ∀ (every row satisfies), e.g., { t | ∃ c ∈ COURSE (t.id = c.sid) } for students taking courses. [scaler +1]

Domain Relational Calculus (DRC)

DRC focuses on individual column values (domains). You specify exact columns to return.

  • Simple form: { <x, y> | conditions on x, y } – where x and y are column values.
  • Example: Names and IDs of students > 20: { <name, id> | <id, name, age> ∈ STUDENT ∧ age > 20 }.
  • This pulls just the name and id values where the condition is true.
  • This is equivalent to: SELECT name, id FROM STUDENT WHERE age > 20.
  • Also uses ∃/∀ for joins or checks. [scaler +1]

While TRC returns full rows, DRC picks specific columns. Both match the power of relational algebra and inspire SQL queries.

Functional Dependency and Armstrong’s Axioms

Functional Dependency (FD) is a rule stating: “If you know this value, you automatically know that value too.” For example, if you know rollno, you know the student’s name (rollno → name). In normalization, FDs act as a guide to split messy tables into clean ones, removing redundancy and errors.

Role in Normalization

Normalization uses FDs to fix problems like duplicate data or update anomalies. Example: A poorly designed table with rollno, name, and dept_name repeated for many students can be split using the FDs rollno → name and dept → dept_name. This avoids copying “CSE” 100 times, saving space and preventing mistakes (e.g., updating one “CSE” to “IT” while others remain unchanged). [geeksforgeeks +1]

Armstrong’s Axioms (Inference Rules)

These three simple rules allow you to derive “hidden” FDs from given ones, acting as mathematical proofs for databases.

  • Rule 1: Reflexivity – If B is a subset of A, then A → B. Example: If rollno → name + age, then rollno → name. This is trivial and always true. [geeksforgeeks]
  • Rule 2: Augmentation – If A → B, adding the same C to both sides results in AC → BC. Example: If rollno → name, then rollno + dept → name + dept. This expands FDs safely. [geeksforgeeks]
  • Rule 3: Transitivity – If A → B and B → C, then A → C. Example: If rollno → dept_id and dept_id → dept_name, then rollno → dept_name (a chain reaction). [geeksforgeeks]

Extra derived rules include Union (combine), Decomposition (split), and Pseudotransitivity. These three basics prove them all, providing a complete toolkit for FD math in normalization.

Database Normalization: 1NF to BCNF

Normal forms are steps to clean up database tables by removing repeats and errors, similar to organizing a messy room step by step.

1NF (First Normal Form)

Every cell must hold one single value—no lists or repeats within a cell. Rows must be unique with a key.

  • Bad example: A student table with “hobbies” listed as “cricket, football” in one cell.
  • Fixed 1NF: Split into two rows—one for cricket and one for football.
  • Result: 101 | Ravi | cricket and 101 | Ravi | football.

2NF (Second Normal Form)

The table must be in 1NF and have no partial dependencies (non-key columns must not depend on only part of a composite key).

  • Bad example: An order table with a composite key (order_id + item_id), where item_name depends only on item_id.
  • Fixed 2NF: Split into ORDERS(order_id, date) and ITEMS(item_id, name). This removes repeats of item_name.

3NF (Third Normal Form)

The table must be in 2NF and have no transitive dependencies (a non-key column should not depend on another non-key column).

  • Bad example: STUDENT(rollno, name, dept, dept_head)—where dept_head depends on dept, not rollno.
  • Fixed 3NF: Split into STUDENT(rollno, name, dept) and DEPT(dept, dept_head). You only need to update the head once.

BCNF (Boyce-Codd Normal Form)

This is a stricter version of 3NF. A table must be in 3NF, and every determinant (the left side of an FD) must be a candidate key.

  • Bad example: TEACHES(teacher, subject, student)—where teacher → subject, but teacher alone is not a full candidate key.
  • Fixed BCNF: Split into TEACHER_SUBJECT(teacher, subject) and TEACHER_STUDENT(teacher, student).

Limitations of Hierarchical and Network Models

Demerits of the Hierarchical Model

  • Rigid tree structure: Data must fit a strict parent-child relationship (one parent, many children). It cannot easily show many-to-many links, such as one student in many classes, which forces data repetition.
  • Hard to change: Adding or moving data requires redesigning the whole tree, which can break the system. [databasetown]
  • Data redundancy: The same information is copied across different branches, wasting space and causing update errors.
  • Slow searches: You must climb the tree top-down; there are no quick jumps.

Demerits of the Network Model

  • Complex web: While it allows many parents per child (using CODASYL pointers), designing and coding these links is a nightmare—pointers break easily.

Common Problems in Legacy Models

  • Problem 1: Copy-Paste Data: If Ravi studies Math and Science, his details (name, roll: 101) are copied in both places. Changing his name requires updating multiple records, or errors occur.
  • Problem 2: Deletion Loss: If you delete the Math course, Ravi’s data under Math vanishes—even if he is still enrolled in Science. You lose data accidentally.
  • Problem 3: Inability to Add New Links: Adding a “Project” team involving students and teachers is difficult because the tree cannot connect both ways easily without rebuilding the structure.
  • Problem 4: Slow Retrieval: Finding Ravi’s courses requires starting at the top (College → Class → Ravi → Courses). This takes much longer than a simple relational search.