Database Normalization and Management Principles
Database Normalization and 3rd Normal Form
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It divides large tables into smaller related tables.
Objectives of Normalization
- Remove duplicate data.
- Eliminate insertion anomalies.
- Eliminate deletion anomalies.
- Eliminate update anomalies.
- Improve data consistency.
3rd Normal Form (3NF)
A table is in 3NF if:
- It is already in 2NF.
- No transitive dependency exists.
- Non-key attributes depend only on the primary key.
Example
Student(StudentID, StudentName, DeptID, DeptName)
Dependencies:
- StudentID → StudentName
- StudentID → DeptID
- DeptID → DeptName
Transitive Dependency: StudentID → DeptName (through DeptID)
Conversion into 3NF
Student Table
- StudentID
- StudentName
- DeptID
Department Table
- DeptID
- DeptName
Advantages of 3NF
- Reduces redundancy.
- Improves consistency.
- Easier maintenance.
- Better data integrity.
- Efficient updates.
Understanding Data Independence
Definition
Data independence is the ability to modify the database schema without affecting application programs.
Importance
- Easy database maintenance.
- Increased flexibility.
- Reduced development cost.
- Better performance.
Types of Data Independence
1. Physical Data Independence
The ability to change physical storage without affecting the logical structure. This includes:
- Changes in file organization.
- Changes in indexing methods.
- Changes in storage devices.
Advantages
- Better performance.
- Easier maintenance.
2. Logical Data Independence
The ability to modify the logical schema without affecting external views. This includes:
- Adding new columns.
- Creating new tables.
- Modifying relationships.
Advantages
- Database flexibility.
- Easy expansion.
Comparison
- Physical independence affects the storage level.
- Logical independence affects the logical level.
- Physical independence is easier to achieve.
- Logical independence is more difficult.
Database Triggers: Uses and Types
Definition
A trigger is a stored procedure that automatically executes when a specified event occurs in a database.
Features
- Executes automatically.
- Associated with a table or view.
- Activated by INSERT, UPDATE, or DELETE.
Uses of Triggers
- Data validation.
- Audit tracking.
- Automatic calculations.
- Security enforcement.
- Maintaining data integrity.
- Automatic notifications.
Types of Triggers
Based on Timing
- BEFORE Trigger: Executes before the event.
- AFTER Trigger: Executes after the event.
Based on Events
- INSERT Trigger: Executes when a record is inserted.
- UPDATE Trigger: Executes when data is updated.
- DELETE Trigger: Executes when data is deleted.
Advantages and Disadvantages
Advantages: Automation of tasks, improved security, better data consistency, and reduced manual work.
Disadvantages: Difficult debugging and performance overhead.
Conclusion
Triggers automate database operations and improve integrity.
Essential Database Constraints
Definition
Constraints are rules applied to database tables to maintain data accuracy and integrity.
Importance
- Ensures valid data.
- Prevents errors.
- Maintains consistency.
Types of Constraints
- NOT NULL: Does not allow NULL values.
- UNIQUE: Ensures all values are unique.
- PRIMARY KEY: Uniquely identifies each row; cannot contain NULL values.
- FOREIGN KEY: Creates a relationship between tables and maintains referential integrity.
- CHECK: Restricts values according to specific conditions.
- DEFAULT: Assigns a default value if none is provided.
Advantages of Constraints
- Data accuracy and consistency.
- Improved integrity.
- Reduced redundancy.
- Better security.
Defining 1NF, 2NF, and 3rd Normal Form
1st Normal Form (1NF)
Definition: A table is in 1NF if all attributes contain atomic values.
Conditions: No repeating groups, no multivalued attributes, and each row must be unique.
Advantages: Simple table structure and easy data retrieval.
2nd Normal Form (2NF)
Definition: A table is in 2NF if it is in 1NF and has no partial dependency.
Conditions: Must be in 1NF; non-key attributes must depend on the entire primary key.
Advantages: Removes partial dependency and reduces redundancy.
3rd Normal Form (3NF)
Definition: A table is in 3NF if it is in 2NF and has no transitive dependency.
Conditions: Must be in 2NF; non-key attributes must depend only on the primary key.
Advantages: Eliminates transitive dependency and improves consistency.
Types of Functional Dependencies
Definition
Functional dependency describes the relationship between attributes in a relation.
Importance
- Basis of normalization.
- Helps identify redundancy.
- Improves database structure.
Types of Functional Dependencies
- Full Functional Dependency: Attribute depends on the entire primary key. Example: (StudentID, CourseID) → Grade
- Partial Dependency: Attribute depends on part of a composite key. Example: StudentID → StudentName
- Transitive Dependency: One non-key attribute depends on another non-key attribute. Example: StudentID → DeptID, DeptID → DeptName
- Trivial Dependency: Attribute determines itself. Example: {A, B} → A
- Non-Trivial Dependency: Determinant and dependent are different. Example: A → B
Impact on Database Design
- Identifies redundancy and helps achieve normalization.
- Improves consistency and reduces anomalies.
- Enhances integrity and improves performance.
Conclusion
Functional dependencies are essential for efficient database design.
Primary Key Impact on Database Design
Definition
A primary key is an attribute or set of attributes that uniquely identifies each record in a table.
Characteristics
- Unique and cannot be NULL.
- Stable and minimal.
Importance
- Uniquely identifies records and prevents duplicate entries.
- Maintains entity integrity and establishes relationships.
Impact on Database Design
- Ensures Uniqueness: No duplicate records.
- Supports Relationships: Used as a reference in foreign keys.
- Improves Query Performance: Indexing speeds up searching.
- Helps Normalization: Supports proper table decomposition.
- Maintains Data Integrity: Prevents invalid records.
- Simplifies Data Management: Easy record identification.
Advantages and Disadvantages
Advantages: Faster data retrieval, better consistency, improved integrity, efficient joins, and reduced redundancy.
Disadvantages: Poor key selection may reduce performance, and changing primary keys is difficult.
