Relational Database Theory: Algebra, Calculus, and FDs

SQL Constraints and Implementation

Constraints are rules applied to columns to ensure Data Integrity. They are usually defined during the CREATE TABLE process.

ConstraintSQL Implementation ExamplePurpose
NOT NULLname VARCHAR(50) NOT NULLEnsures a column cannot have an empty value.
UNIQUEemail VARCHAR(50) UNIQUEEnsures all values in a column are different.
PRIMARY KEYid INT PRIMARY KEYUniquely identifies each row (implies NOT NULL + UNIQUE).
FOREIGN KEYdept_id INT REFERENCES Dept(id)Ensures the value exists in another table.
CHECKage INT CHECK (age >= 18)Ensures the value satisfies a specific condition.
DEFAULTstatus VARCHAR(10) DEFAULT 'Active'Provides a value if none is specified.

Practical Implementation Example

CREATE TABLE Students (
    StudentID INT PRIMARY KEY,           -- Entity Integrity
    Email VARCHAR(100) UNIQUE,           -- Unique Constraint
    Age INT CHECK (Age >= 17),           -- Domain Constraint
    MajorID INT,
    FOREIGN KEY (MajorID) REFERENCES Majors(ID) -- Referential Integrity
);

Relational Algebra Operations

Relational Algebra is a procedural query language that provides the theoretical foundation for how data is manipulated in a relational database. It uses a set of operations that take one or two relations (tables) as input and produce a new relation as output.

1. Unary Operations

These operations work on a single table at a time.

Select (σ)

The Select operation filters rows (tuples) that satisfy a specific condition. It is equivalent to the WHERE clause in SQL.

  • Symbol: σp(R) where p is the predicate (logic) and R is the relation.
  • Example: σsalary > 50000(Employee) retrieves all employees earning more than 50,000.

Project (Π)

The Project operation filters columns (attributes). It discards all columns not listed and removes duplicate rows from the result.

  • Symbol: ΠA1, A2(R)
  • Example: Πname, city(Student) shows only the names and cities of students.

2. Set Theory Operations

For these operations (except Cartesian Product), the two tables must be Union Compatible (meaning they have the same number of columns with matching data types).

  • Union (∪): Combines all rows from two tables, removing duplicates. Example: Staff ∪ Managers results in a list of everyone in both categories.
  • Intersection (∩): Returns only the rows that appear in both tables. Example: Engineering Students ∩ Scholarship Holders finds students who are in both groups.
  • Set Difference (-): Returns rows that are in the first table but not in the second. Example: All Customers – Inactive Customers gives you only the active customers.
  • Cartesian Product (×): Combines every row of the first table with every row of the second. If Table A has n rows and Table B has m rows, the result has n × m rows. Symbol: R × S

3. The Join Operation (⋈)

The Join is a “binary” operation that combines related tuples from two relations into a single tuple. It is essentially a Cartesian Product followed by a Selection based on a common attribute.

  • Theta Join (⋈θ): Joins tables based on a general condition (like >, <, or =).
  • Equi-Join: A special case of Theta Join where only the “equals” (=) operator is used.
  • Natural Join (⋈): The most common join. It automatically joins tables based on columns with the same name and same data type, then removes the redundant duplicate column.

Summary Table

OperationNotationSQL EquivalentResult Focus
SelectσWHEREFiltering Rows
ProjectΠSELECT columnsFiltering Columns
UnionUNIONCombining Records
DifferenceEXCEPT / MINUSExcluding Records
Cartesian×CROSS JOINAll Combinations
JoinINNER JOINLinking related data

Relational Calculus Concepts

Relational Calculus is a non-procedural or declarative query language. Unlike Relational Algebra, which tells the computer how to get the data, Relational Calculus simply describes what data is needed using mathematical logic.

1. Tuple Relational Calculus (TRC)

TRC focuses on selecting entire tuples (rows) from a relation based on a condition. It uses a tuple variable t that ranges over a relation.

  • Syntax: {t | P(t)}
  • t: The resulting tuple variable.
  • P(t): The predicate (logical condition) that must be true for t to be included.
  • Logical Operators: Uses ∧ (AND), ∨ (OR), ¬ (NOT).
  • Quantifiers:
    • Existential (∃): “There exists” at least one tuple.
    • Universal (∀): “For all” tuples in a relation.

Example: To find the names of all employees who work in Department 5:

2. Domain Relational Calculus (DRC)

DRC is similar to TRC but uses domain variables (variables representing individual column values) rather than entire rows.

  • Syntax: {〈x1, x2, &dots;, xn〉 | P(x1, x2, &dots;, xn)}
  • 〈x1, x2, &dots;, xn〉: The set of attributes to be returned.
  • P(x1, &dots;): The condition the domain variables must satisfy.

Example: To find the names and ages of students older than 20 from a Student(Name, Age, Grade) table:

Relational Algebra vs. Relational Calculus

The main difference lies in the approach to data retrieval.

FeatureRelational AlgebraRelational Calculus
TypeProcedural (How to get it)Declarative (What to get)
PhilosophyPrescriptive: Defines the sequence of steps.Descriptive: Defines the end result.
Logic BasisSet Theory and Operations.First-Order Predicate Logic.
Language BaseBasis for implementation plans.Basis for the SQL language.
OperationsUses σ, Π, ∪, ⋈, etc.Uses t, ∃, ∀, ∧, ∨, etc.
OrderOrder of operations matters.No order is specified.

Summary of Basic Operations (Relational Algebra)

  • Select (σ): Extracts rows based on a condition.
  • Project (Π): Extracts specific columns.
  • Union (∪): Combines rows from two tables (removing duplicates).
  • Intersection (∩): Keeps only rows found in both tables.
  • Difference (-): Rows in the first table but not the second.
  • Cartesian Product (×): Combines every row of table A with every row of table B.
  • Join (⋈): Combines related rows based on a common attribute.

Functional Dependencies and Normalization

Functional dependencies (FDs) are the rules that govern the relationships between data columns in a table. They are the primary tool used by database designers to eliminate redundancy and prevent data errors through a process called Normalization.

1. What is Functional Dependency?

A Functional Dependency occurs when the value of one attribute (or set of attributes) uniquely determines the value of another attribute.

  • Notation: X → Y
  • Meaning: “X functionally determines Y.” If two rows have the same value for X, they must have the same value for Y.
  • Example: In a university database, Student_ID → Student_Name. If you know the ID is 101, the name will always be John Doe.

2. Characteristics of Functional Dependency

  • Determinant & Dependent: In X → Y, X is called the Determinant and Y is the Dependent.
  • Tuple Constraint: It is a constraint on all possible rows (tuples) in a relation. If t1[X] = t2[X], then t1[Y] must equal t2[Y].
  • Directional: It is a one-way relationship. While Student_ID determines Student_Name, the reverse (Student_Name → Student_ID) might not be true, as two students can have the same name.
  • Trivial vs. Non-Trivial:
    • Trivial: Y is a subset of X (e.g., {ID, Name} → {Name}). It is always true by default.
    • Non-Trivial: Y is not a subset of X (e.g., ID → Name). These are the “useful” dependencies.

3. Inference Rules (Armstrong’s Axioms)

Inference rules allow you to discover “hidden” dependencies that aren’t explicitly stated but are logically true. The three primary rules are known as Armstrong’s Axioms.

The Primary Axioms

  • Reflexivity Rule: If Y is a subset of X, then X → Y. Example: {Name, Age} → {Name}.
  • Augmentation Rule: If X → Y, then for any Z, XZ → YZ. Example: If ID → Name, then {ID, Phone} → {Name, Phone}.
  • Transitivity Rule: If X → Y and Y → Z, then X → Z. Example: If ID → Dept and Dept → Building, then ID → Building.

Secondary (Derived) Rules

These are shortcuts derived from the three primary axioms above:

  • Union: If X → Y and X → Z, then X → YZ.
  • Decomposition: If X → YZ, then X → Y and X → Z.
  • Pseudo-Transitivity: If X → Y and WY → Z, then WX → Z.

4. Why Do We Need Them?

Functional dependencies are the “tests” we use to see if a database is well-designed:

  • Identify Keys: They help us find Candidate Keys (the set of attributes that determine all other attributes in the table).
  • Normalize Tables: They tell us when to split a table. For example, if we find a Partial Dependency (where an attribute depends on only part of a primary key), we know the table is not in 2nd Normal Form (2NF).