Database Normalization and Management Principles

Database Normalization and 3rd Normal Form

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It divides large tables into smaller related tables.

Objectives of Normalization

  • Remove duplicate data.
  • Eliminate insertion anomalies.
  • Eliminate deletion anomalies.
  • Eliminate update anomalies.
  • Improve data consistency.

3rd Normal Form (3NF)

A table is in 3NF if:

  • It is already in 2NF.
  • No transitive dependency exists.
  • Non-key attributes depend only on the primary key.

Example

Student(StudentID, StudentName, DeptID, DeptName)

Dependencies:

  • StudentID → StudentName
  • StudentID → DeptID
  • DeptID → DeptName

Transitive Dependency: StudentID → DeptName (through DeptID)

Conversion into 3NF

Student Table
  • StudentID
  • StudentName
  • DeptID
Department Table
  • DeptID
  • DeptName

Advantages of 3NF

  • Reduces redundancy.
  • Improves consistency.
  • Easier maintenance.
  • Better data integrity.
  • Efficient updates.

Understanding Data Independence

Definition

Data independence is the ability to modify the database schema without affecting application programs.

Importance

  • Easy database maintenance.
  • Increased flexibility.
  • Reduced development cost.
  • Better performance.

Types of Data Independence

1. Physical Data Independence

The ability to change physical storage without affecting the logical structure. This includes:

  • Changes in file organization.
  • Changes in indexing methods.
  • Changes in storage devices.
Advantages
  • Better performance.
  • Easier maintenance.

2. Logical Data Independence

The ability to modify the logical schema without affecting external views. This includes:

  • Adding new columns.
  • Creating new tables.
  • Modifying relationships.
Advantages
  • Database flexibility.
  • Easy expansion.

Comparison

  • Physical independence affects the storage level.
  • Logical independence affects the logical level.
  • Physical independence is easier to achieve.
  • Logical independence is more difficult.

Database Triggers: Uses and Types

Definition

A trigger is a stored procedure that automatically executes when a specified event occurs in a database.

Features

  • Executes automatically.
  • Associated with a table or view.
  • Activated by INSERT, UPDATE, or DELETE.

Uses of Triggers

  • Data validation.
  • Audit tracking.
  • Automatic calculations.
  • Security enforcement.
  • Maintaining data integrity.
  • Automatic notifications.

Types of Triggers

Based on Timing

  • BEFORE Trigger: Executes before the event.
  • AFTER Trigger: Executes after the event.

Based on Events

  • INSERT Trigger: Executes when a record is inserted.
  • UPDATE Trigger: Executes when data is updated.
  • DELETE Trigger: Executes when data is deleted.

Advantages and Disadvantages

Advantages: Automation of tasks, improved security, better data consistency, and reduced manual work.

Disadvantages: Difficult debugging and performance overhead.

Conclusion

Triggers automate database operations and improve integrity.

Essential Database Constraints

Definition

Constraints are rules applied to database tables to maintain data accuracy and integrity.

Importance

  • Ensures valid data.
  • Prevents errors.
  • Maintains consistency.

Types of Constraints

  1. NOT NULL: Does not allow NULL values.
  2. UNIQUE: Ensures all values are unique.
  3. PRIMARY KEY: Uniquely identifies each row; cannot contain NULL values.
  4. FOREIGN KEY: Creates a relationship between tables and maintains referential integrity.
  5. CHECK: Restricts values according to specific conditions.
  6. DEFAULT: Assigns a default value if none is provided.

Advantages of Constraints

  • Data accuracy and consistency.
  • Improved integrity.
  • Reduced redundancy.
  • Better security.

Defining 1NF, 2NF, and 3rd Normal Form

1st Normal Form (1NF)

Definition: A table is in 1NF if all attributes contain atomic values.

Conditions: No repeating groups, no multivalued attributes, and each row must be unique.

Advantages: Simple table structure and easy data retrieval.

2nd Normal Form (2NF)

Definition: A table is in 2NF if it is in 1NF and has no partial dependency.

Conditions: Must be in 1NF; non-key attributes must depend on the entire primary key.

Advantages: Removes partial dependency and reduces redundancy.

3rd Normal Form (3NF)

Definition: A table is in 3NF if it is in 2NF and has no transitive dependency.

Conditions: Must be in 2NF; non-key attributes must depend only on the primary key.

Advantages: Eliminates transitive dependency and improves consistency.

Types of Functional Dependencies

Definition

Functional dependency describes the relationship between attributes in a relation.

Importance

  • Basis of normalization.
  • Helps identify redundancy.
  • Improves database structure.

Types of Functional Dependencies

  1. Full Functional Dependency: Attribute depends on the entire primary key. Example: (StudentID, CourseID) → Grade
  2. Partial Dependency: Attribute depends on part of a composite key. Example: StudentID → StudentName
  3. Transitive Dependency: One non-key attribute depends on another non-key attribute. Example: StudentID → DeptID, DeptID → DeptName
  4. Trivial Dependency: Attribute determines itself. Example: {A, B} → A
  5. Non-Trivial Dependency: Determinant and dependent are different. Example: A → B

Impact on Database Design

  • Identifies redundancy and helps achieve normalization.
  • Improves consistency and reduces anomalies.
  • Enhances integrity and improves performance.

Conclusion

Functional dependencies are essential for efficient database design.

Primary Key Impact on Database Design

Definition

A primary key is an attribute or set of attributes that uniquely identifies each record in a table.

Characteristics

  • Unique and cannot be NULL.
  • Stable and minimal.

Importance

  • Uniquely identifies records and prevents duplicate entries.
  • Maintains entity integrity and establishes relationships.

Impact on Database Design

  • Ensures Uniqueness: No duplicate records.
  • Supports Relationships: Used as a reference in foreign keys.
  • Improves Query Performance: Indexing speeds up searching.
  • Helps Normalization: Supports proper table decomposition.
  • Maintains Data Integrity: Prevents invalid records.
  • Simplifies Data Management: Easy record identification.

Advantages and Disadvantages

Advantages: Faster data retrieval, better consistency, improved integrity, efficient joins, and reduced redundancy.

Disadvantages: Poor key selection may reduce performance, and changing primary keys is difficult.