Relational Database Theory: Algebra, Calculus, and FDs
SQL Constraints and Implementation
Constraints are rules applied to columns to ensure Data Integrity. They are usually defined during the CREATE TABLE process.
| Constraint | SQL Implementation Example | Purpose |
|---|---|---|
| NOT NULL | name VARCHAR(50) NOT NULL | Ensures a column cannot have an empty value. |
| UNIQUE | email VARCHAR(50) UNIQUE | Ensures all values in a column are different. |
| PRIMARY KEY | id INT PRIMARY KEY | Uniquely identifies each row (implies NOT NULL + UNIQUE). |
| FOREIGN KEY | dept_id INT REFERENCES Dept(id) | Ensures the value exists in another table. |
| CHECK | age INT CHECK (age >= 18) | Ensures the value satisfies a specific condition. |
| DEFAULT | status VARCHAR(10) DEFAULT 'Active' | Provides a value if none is specified. |
Practical Implementation Example
CREATE TABLE Students (
StudentID INT PRIMARY KEY, -- Entity Integrity
Email VARCHAR(100) UNIQUE, -- Unique Constraint
Age INT CHECK (Age >= 17), -- Domain Constraint
MajorID INT,
FOREIGN KEY (MajorID) REFERENCES Majors(ID) -- Referential Integrity
);Relational Algebra Operations
Relational Algebra is a procedural query language that provides the theoretical foundation for how data is manipulated in a relational database. It uses a set of operations that take one or two relations (tables) as input and produce a new relation as output.
1. Unary Operations
These operations work on a single table at a time.
Select (σ)
The Select operation filters rows (tuples) that satisfy a specific condition. It is equivalent to the WHERE clause in SQL.
- Symbol: σp(R) where p is the predicate (logic) and R is the relation.
- Example: σsalary > 50000(Employee) retrieves all employees earning more than 50,000.
Project (Π)
The Project operation filters columns (attributes). It discards all columns not listed and removes duplicate rows from the result.
- Symbol: ΠA1, A2(R)
- Example: Πname, city(Student) shows only the names and cities of students.
2. Set Theory Operations
For these operations (except Cartesian Product), the two tables must be Union Compatible (meaning they have the same number of columns with matching data types).
- Union (∪): Combines all rows from two tables, removing duplicates. Example: Staff ∪ Managers results in a list of everyone in both categories.
- Intersection (∩): Returns only the rows that appear in both tables. Example: Engineering Students ∩ Scholarship Holders finds students who are in both groups.
- Set Difference (-): Returns rows that are in the first table but not in the second. Example: All Customers – Inactive Customers gives you only the active customers.
- Cartesian Product (×): Combines every row of the first table with every row of the second. If Table A has n rows and Table B has m rows, the result has n × m rows. Symbol: R × S
3. The Join Operation (⋈)
The Join is a “binary” operation that combines related tuples from two relations into a single tuple. It is essentially a Cartesian Product followed by a Selection based on a common attribute.
- Theta Join (⋈θ): Joins tables based on a general condition (like >, <, or =).
- Equi-Join: A special case of Theta Join where only the “equals” (=) operator is used.
- Natural Join (⋈): The most common join. It automatically joins tables based on columns with the same name and same data type, then removes the redundant duplicate column.
Summary Table
| Operation | Notation | SQL Equivalent | Result Focus |
|---|---|---|---|
| Select | σ | WHERE | Filtering Rows |
| Project | Π | SELECT columns | Filtering Columns |
| Union | ∪ | UNION | Combining Records |
| Difference | – | EXCEPT / MINUS | Excluding Records |
| Cartesian | × | CROSS JOIN | All Combinations |
| Join | ⋈ | INNER JOIN | Linking related data |
Relational Calculus Concepts
Relational Calculus is a non-procedural or declarative query language. Unlike Relational Algebra, which tells the computer how to get the data, Relational Calculus simply describes what data is needed using mathematical logic.
1. Tuple Relational Calculus (TRC)
TRC focuses on selecting entire tuples (rows) from a relation based on a condition. It uses a tuple variable t that ranges over a relation.
- Syntax: {t | P(t)}
- t: The resulting tuple variable.
- P(t): The predicate (logical condition) that must be true for t to be included.
- Logical Operators: Uses ∧ (AND), ∨ (OR), ¬ (NOT).
- Quantifiers:
- Existential (∃): “There exists” at least one tuple.
- Universal (∀): “For all” tuples in a relation.
Example: To find the names of all employees who work in Department 5:
2. Domain Relational Calculus (DRC)
DRC is similar to TRC but uses domain variables (variables representing individual column values) rather than entire rows.
- Syntax: {〈x1, x2, &dots;, xn〉 | P(x1, x2, &dots;, xn)}
- 〈x1, x2, &dots;, xn〉: The set of attributes to be returned.
- P(x1, &dots;): The condition the domain variables must satisfy.
Example: To find the names and ages of students older than 20 from a Student(Name, Age, Grade) table:
Relational Algebra vs. Relational Calculus
The main difference lies in the approach to data retrieval.
| Feature | Relational Algebra | Relational Calculus |
|---|---|---|
| Type | Procedural (How to get it) | Declarative (What to get) |
| Philosophy | Prescriptive: Defines the sequence of steps. | Descriptive: Defines the end result. |
| Logic Basis | Set Theory and Operations. | First-Order Predicate Logic. |
| Language Base | Basis for implementation plans. | Basis for the SQL language. |
| Operations | Uses σ, Π, ∪, ⋈, etc. | Uses t, ∃, ∀, ∧, ∨, etc. |
| Order | Order of operations matters. | No order is specified. |
Summary of Basic Operations (Relational Algebra)
- Select (σ): Extracts rows based on a condition.
- Project (Π): Extracts specific columns.
- Union (∪): Combines rows from two tables (removing duplicates).
- Intersection (∩): Keeps only rows found in both tables.
- Difference (-): Rows in the first table but not the second.
- Cartesian Product (×): Combines every row of table A with every row of table B.
- Join (⋈): Combines related rows based on a common attribute.
Functional Dependencies and Normalization
Functional dependencies (FDs) are the rules that govern the relationships between data columns in a table. They are the primary tool used by database designers to eliminate redundancy and prevent data errors through a process called Normalization.
1. What is Functional Dependency?
A Functional Dependency occurs when the value of one attribute (or set of attributes) uniquely determines the value of another attribute.
- Notation: X → Y
- Meaning: “X functionally determines Y.” If two rows have the same value for X, they must have the same value for Y.
- Example: In a university database, Student_ID → Student_Name. If you know the ID is 101, the name will always be John Doe.
2. Characteristics of Functional Dependency
- Determinant & Dependent: In X → Y, X is called the Determinant and Y is the Dependent.
- Tuple Constraint: It is a constraint on all possible rows (tuples) in a relation. If t1[X] = t2[X], then t1[Y] must equal t2[Y].
- Directional: It is a one-way relationship. While Student_ID determines Student_Name, the reverse (Student_Name → Student_ID) might not be true, as two students can have the same name.
- Trivial vs. Non-Trivial:
- Trivial: Y is a subset of X (e.g., {ID, Name} → {Name}). It is always true by default.
- Non-Trivial: Y is not a subset of X (e.g., ID → Name). These are the “useful” dependencies.
3. Inference Rules (Armstrong’s Axioms)
Inference rules allow you to discover “hidden” dependencies that aren’t explicitly stated but are logically true. The three primary rules are known as Armstrong’s Axioms.
The Primary Axioms
- Reflexivity Rule: If Y is a subset of X, then X → Y. Example: {Name, Age} → {Name}.
- Augmentation Rule: If X → Y, then for any Z, XZ → YZ. Example: If ID → Name, then {ID, Phone} → {Name, Phone}.
- Transitivity Rule: If X → Y and Y → Z, then X → Z. Example: If ID → Dept and Dept → Building, then ID → Building.
Secondary (Derived) Rules
These are shortcuts derived from the three primary axioms above:
- Union: If X → Y and X → Z, then X → YZ.
- Decomposition: If X → YZ, then X → Y and X → Z.
- Pseudo-Transitivity: If X → Y and WY → Z, then WX → Z.
4. Why Do We Need Them?
Functional dependencies are the “tests” we use to see if a database is well-designed:
- Identify Keys: They help us find Candidate Keys (the set of attributes that determine all other attributes in the table).
- Normalize Tables: They tell us when to split a table. For example, if we find a Partial Dependency (where an attribute depends on only part of a primary key), we know the table is not in 2nd Normal Form (2NF).
