Functional Dependency & Normalization in DBMS — Keys, MVD, 4NF, 5NF
Functional Dependency in DBMS
Functional Dependency: In a Database Management System (DBMS), a functional dependency is a relationship between two sets of attributes in a relation. It shows how one attribute or a set of attributes uniquely determines another attribute. It is written as X → Y, which means attribute X determines attribute Y. Functional dependency is a core concept used in normalization to reduce redundancy and maintain data consistency.
Types of Functional Dependency
Trivial Functional Dependency
A functional dependency is called trivial if the dependent attribute is a subset of the determinant. Example: if A → A or (A, B) → A, then it is a trivial dependency. These dependencies always hold true and do not provide any new information; they are mainly used for theoretical purposes.
Non-Trivial Functional Dependency
A functional dependency is non-trivial if the dependent attribute is not a subset of the determinant. Example: A → B, where B is not part of A. Non-trivial dependencies are important because they help in identifying keys and performing normalization.
Fully Functional Dependency
A functional dependency X → Y is said to be fully functional if Y depends on the entire X and not on any proper subset of X. Example: if (StudentID, CourseID) → Grade, and Grade does not depend on StudentID or CourseID alone, then it is a fully functional dependency. This type is important for Second Normal Form (2NF).
Partial Functional Dependency
A dependency is partial when a non-prime attribute depends on only a part of a composite primary key. Example: if (StudentID, CourseID) → StudentName, and StudentName depends only on StudentID, then it is a partial dependency. Partial dependency causes data redundancy and is removed in 2NF.
Transitive Functional Dependency
A dependency is transitive if an attribute depends on another attribute through a third attribute. Example: if A → B and B → C, then A → C is a transitive dependency. Transitive dependency leads to update anomalies and is removed in Third Normal Form (3NF).
Armstrong’s Axioms (Inference Rules)
Inference rules are logical rules used to derive new functional dependencies from the given ones. These rules are called Armstrong’s Axioms and are sound and complete.
Reflexivity Rule
If Y ⊆ X, then X → Y. This means an attribute set always determines its own subsets.
Example: If X = {A, B, C}, then ABC → A.
Augmentation Rule
If X → Y, then XZ → YZ for any attribute Z. Adding the same attribute to both sides keeps the dependency valid.
Example: If A → B, then AC → BC.
Transitivity Rule
If X → Y and Y → Z, then X → Z. This rule helps in finding indirect dependencies.
Example: If Roll No → Course and Course → Dept, then Roll No → Dept.
Additional Derived Rules
- Union Rule: If X → Y and X → Z, then X → YZ.
Example: If A → B and A → C, then A → BC. - Decomposition Rule: If X → YZ, then X → Y and X → Z.
Example: If A → BC, then A → B and A → C. - Pseudotransitivity Rule: If X → Y and WY → Z, then WX → Z.
Example: If A → B and CB → D, then AC → D.
Use of Inference Rules: Inference rules are used to find the closure of functional dependencies, check whether a dependency is logically implied, find candidate keys, and compute a minimal cover. They help in testing whether a relation satisfies a specific normal form.
Functional Dependency: Detailed Explanation and Example
Functional Dependency: In a Database Management System (DBMS), a functional dependency is a very important concept that explains the relationship between attributes of a relation. It is used to decide how data is stored in tables and how attributes depend on each other. A functional dependency is written as X → Y, which means that attribute X functionally determines attribute Y. This implies that if two rows have the same value of X, then they must have the same value of Y. Functional dependency is mainly used in database normalization to remove redundancy, avoid update anomalies, and maintain data consistency. It helps in identifying candidate keys, primary keys, and in converting tables into higher normal forms like 2NF, 3NF, and BCNF.
Example of Functional Dependency
Consider a STUDENT table with columns:
Roll No Name Course Dept
Roll No → Name (each roll number has only one student name)
Roll No → Course Course → Dept
Here, Roll No determines Name and Course, so it is a functional dependency. Also, Course determines Department, which leads to a transitive dependency.
Importance of Functional Dependency
Functional dependency plays a key role in designing a good database. It helps to:
- Identify keys of a relation
- Reduce data redundancy
- Prevent update, insert, and delete anomalies
- Maintain data integrity and consistency
- Achieve proper normalization
Characteristics of Functional Dependency
- Functional dependency represents a constraint between attributes in a relation.
- It helps in identifying candidate keys and primary keys.
- FD is used as the foundation of normalization.
- It reduces data redundancy and inconsistency.
- Functional dependency ensures data integrity in the database.
- It applies only within a single relation.
- FDs are based on real-world rules of data.
Normalization
Normalization: Normalization is a systematic process of organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, well-structured tables and defining relationships between them using keys. Normalization is based on functional dependency and is applied using different normal forms such as 1NF, 2NF, 3NF, and BCNF. The main aim of normalization is to design an efficient and reliable database.
Need of Normalization
Normalization is needed to overcome the problems of an unnormalized database. Without normalization, a database may suffer from various anomalies and inconsistencies.
- To Reduce Data Redundancy: In unnormalized tables, the same data is stored repeatedly in different places. Normalization removes this duplication by storing data only once, which saves storage space.
- To Avoid Update Anomalies: If the same data appears in multiple rows, updating it in one place but not in others causes inconsistency. Normalization ensures that data is updated at a single location.
- To Avoid Insertion Anomalies: An insertion anomaly occurs when new data cannot be inserted without adding unnecessary data. Normalization removes this problem by separating data into proper tables.
- To Avoid Deletion Anomalies: A deletion anomaly happens when deleting one record unintentionally removes important data. Normalization prevents such data loss.
- To Improve Data Integrity: Normalization maintains correct and consistent data by properly defining relationships and constraints between tables.
Benefits of Normalization
- Eliminates data redundancy: Normalization minimizes duplication of data, making the database more compact and efficient.
- Ensures data consistency: Since data is stored in one place, changes made in one table are reflected everywhere, maintaining consistency.
- Reduces anomalies: Normalization removes insertion, deletion, and update anomalies, leading to a stable database.
- Improves database structure: It organizes data into logical tables, making the database easier to understand and maintain.
- Enhances data integrity: By using primary and foreign keys, normalization enforces referential integrity between tables.
- Easier maintenance and modification: Normalized databases are easier to modify and extend without affecting existing data or applications.
Multivalued Dependency (MVD)
Multivalued Dependency: In DBMS, a multivalued dependency occurs when one attribute of a relation determines multiple independent values of another attribute. It is a special type of dependency that is different from functional dependency. A multivalued dependency is written as X →→ Y, which means that for a given value of X, there can be multiple values of Y, and these values are independent of other attributes. Multivalued dependency usually appears when a table stores more than one multi-valued fact about an entity. MVD is mainly used to achieve Fourth Normal Form (4NF).
Example of Multivalued Dependency
Consider a STUDENT table with columns:
StudentID PhoneNo Hobby
A student can have multiple phone numbers and multiple hobbies.
So we can write:
StudentID →→ PhoneNo
StudentID →→ Hobby
Here, phone numbers and hobbies are independent of each other but both depend on StudentID. This creates a multivalued dependency.
Why Multivalued Dependency Occurs
- MVD occurs when one attribute determines more than one value of another attribute.
- These values are not dependent on each other.
- Data is stored in a single table instead of separate tables.
Problems Caused by Multivalued Dependency
- Data redundancy increases because combinations of values are repeated.
- Insertion anomaly occurs when adding a new phone number or hobby.
- Deletion anomaly may cause loss of important data.
- Update anomaly occurs when modifying multi-valued attributes.
Difference Between Functional Dependency and Multivalued Dependency
Functional Dependency Multivalued Dependency
- Written as X → Y Written as X →→ Y
- X determines a single value of Y X determines multiple values of Y
- Used in 2NF and 3NF Used in 4NF
- Y depends on X Y is independent of other attributes
Fourth Normal Form (4NF)
Fourth Normal Form (4NF): Fourth Normal Form is an advanced level of database normalization. It mainly deals with the problem of multivalued dependencies (MVD). A relation is said to be in Fourth Normal Form (4NF) if it satisfies the following conditions:
- The relation must be in Boyce–Codd Normal Form (BCNF).
- There should be no non-trivial multivalued dependency in the relation.
This means that for every multivalued dependency X →→ Y, X must be a superkey.
Example of 4NF
Consider a STUDENT table: StudentID PhoneNo Hobby
Here: StudentID →→ PhoneNo and StudentID →→ Hobby. Phone numbers and hobbies are independent of each other. This table is not in 4NF because of multivalued dependency.
Decomposition into 4NF
To convert it into 4NF, the table is decomposed into:
- STUDENT_PHONE (StudentID, PhoneNo)
- STUDENT_HOBBY (StudentID, Hobby)
Now, there is no multivalued dependency, and redundancy is removed.
Importance of 4NF
- Improves data integrity and database structure
- Removes redundancy caused by multivalued dependencies
- Prevents insertion, deletion, and update anomalies
Fifth Normal Form (5NF)
Fifth Normal Form (5NF): Fifth Normal Form is also known as Project-Join Normal Form (PJNF). It deals with join dependencies. A relation is in Fifth Normal Form (5NF) if it cannot be decomposed into smaller relations without losing data and without creating spurious tuples.
Conditions of 5NF
- The relation must be in Fourth Normal Form (4NF).
- Every join dependency in the relation must be implied by the candidate keys.
Example of 5NF
Consider a SUPPLY table with columns: Supplier Part Project
A supplier supplies a part to a project. If the relationship between these three attributes is complex, the table may need decomposition.
Decomposition into 5NF
The table can be decomposed into:
- SUPPLIER_PART (Supplier, Part)
- SUPPLIER_PROJECT (Supplier, Project)
- PART_PROJECT (Part, Project)
This decomposition ensures that data can be joined back without any loss and without redundancy.
Importance of 5NF
- Ensures lossless decomposition
- Removes redundancy caused by complex join dependencies
- Improves data consistency and integrity
- Useful in complex database design
