Database Normalization: Functional Dependencies and Normal Forms
1. Trivial vs Non-Trivial Dependency
These types describe the basic mathematical relationship between the sets of attributes.
- Trivial Functional Dependency: Occurs when the dependent (RHS) is a subset of the determinant (LHS). It is “trivial” because it doesn’t provide new information.
- Logic: X → Y is trivial if Y ⊆ X.
- Example: {Student_ID, Name} → {Name}.
- Non-Trivial Functional Dependency: Occurs when the dependent is not a subset of the determinant.
- Logic: X → Y is non-trivial if Y ⊄ X.
- Example: Student_ID → Name.
- Completely Non-Trivial: A subset of non-trivial where there is no overlap at all (X ∩ Y = ∅).
- Example: Student_ID → Age.
2. Full vs Partial Dependency
These are critical for Second Normal Form (2NF), which aims to eliminate partial dependencies.
- Full Functional Dependency: An attribute is fully dependent if it depends on the entire composite primary key, not just a part of it.
- Example: {Student_ID, Project_ID} → Grade. You need both pieces of info to know the grade.
- Partial Functional Dependency: Occurs when an attribute depends on only part of a composite primary key.
- Example: {Student_ID, Project_ID} → Student_Name. Here, Student_Name only needs Student_ID; it doesn’t care about the Project_ID.
3. Transitive Dependency
This is the target of Third Normal Form (3NF). It represents an indirect relationship between attributes.
- Logic: If A → B and B → C, then A → C is a transitive dependency.
- Example: Employee_ID → Department_ID and Department_ID → Dept_Location.
- The relationship Employee_ID → Dept_Location is transitive because it happens through the “middle-man” (Department_ID).
4. Multivalued Dependency (MVD)
Multivalued dependencies are handled in Fourth Normal Form (4NF). They occur when one attribute determines a set of values for another, rather than just a single value.
- Notation: X ↠ Y
- Condition: It requires at least three attributes (X, Y, Z) where for one value of X, there are multiple independent values of Y and Z.
- Example: A Teacher might teach multiple Subjects and have multiple Hobbies. The subjects they teach have nothing to do with their hobbies, but both are linked to the teacher. This creates redundancy.
Comparison of Normal Forms and Dependencies
| Normal Form | Dependency it Eliminates | Key Focus |
|---|---|---|
| 1NF | Atomic values / Multi-valued attributes | Flat tables |
| 2NF | Partial Dependency | Entire Primary Key |
| 3NF | Transitive Dependency | Non-key to non-key links |
| BCNF | Anomalies in overlapping keys | Every determinant is a Super Key |
| 4NF | Multivalued Dependency | Independent multi-valued facts |
Normalization: Purpose and Benefits
Normalization is the systematic process of organizing data in a database to reduce redundancy and eliminate undesirable characteristics like insertion, update, and deletion anomalies. It involves decomposing large, complex tables into smaller, well-structured ones.
1. Benefits and Need of Normalization
Without normalization, databases often suffer from “anomalies” that make data management difficult and error-prone.
- Eliminates Redundancy: Prevents the same piece of data from being stored in multiple places, saving storage space.
- Data Integrity: Ensures that when data changes, it only needs to be updated in one place, preventing inconsistencies.
- Faster Queries: Smaller tables often mean faster sorting and indexing, though very high normalization can sometimes require more joins.
- Prevents Anomalies:
- Insertion Anomaly: You can’t add data because some other required data is missing.
- Update Anomaly: Updating a record in one place but missing it in another due to duplicates.
- Deletion Anomaly: Deleting a record accidentally removes other unrelated, important information.
2. Normal Forms Based on Primary Keys
Each normal form (NF) is a set of rules. To reach a higher level, a table must first satisfy all the rules of the previous levels.
| Normal Form | Rule / Requirement | Problem Solved |
|---|---|---|
| 1NF | Data must be atomic (no sets or arrays in a cell). Each row must be unique. | Repeating groups and non-atomic data. |
| 2NF | Must be in 1NF + No Partial Dependency (non-key attributes must depend on the entire primary key). | Partial functional dependencies. |
| 3NF | Must be in 2NF + No Transitive Dependency (non-key attributes shouldn’t depend on other non-key attributes). | Transitive functional dependencies. |
| BCNF | A stricter version of 3NF. For every X → Y, X must be a Super Key. | Anomalies in tables with multiple overlapping candidate keys. |
3. Advanced Normal Forms
Fourth Normal Form (4NF)
4NF deals with Multi-valued Dependencies (MVD).
- MVD (X ↠ Y): Occurs when one attribute (X) determines a set of values for another attribute (Y), and this set is independent of other attributes in the table.
- Rule: A table is in 4NF if it is in BCNF and contains no non-trivial multi-valued dependencies.
- Solution: Split the independent multi-valued facts into separate tables.
Fifth Normal Form (5NF)
Also known as Project-Join Normal Form (PJ/NF), it deals with Join Dependencies.
- Join Dependency: Occurs when a table can be decomposed into multiple tables, but it cannot be recreated by joining only two of them; you need all three or more to get the original data back without “spurious” (fake) rows.
- Rule: A table is in 5NF if it cannot be decomposed into smaller tables without losing data or creating fake records.
Domain-Key Normal Form (DKNF)
DKNF is the “ultimate” normal form. It is more of a theoretical goal than a practical step.
- Definition: A table is in DKNF if every constraint on the table is a logical consequence of its Domain Constraints (valid values) and Key Constraints (unique identifiers).
- Goal: A database in DKNF would be free of all possible non-temporal anomalies.
Would you like me to walk through a real-world example showing how to transform a “raw” unnormalized table all the way up to 3NF or BCNF?
