Database Design Fundamentals: Constraints, Dependencies, ACID Properties, and More
1. Domain Constraints and Check Constraints
Domain Constraints
Domain constraints are rules that define the permissible values that can be stored in a column of a database table. These constraints ensure that the data entered into a database adheres to a specific type and format, helping to maintain data integrity.
Example:
Consider a table Employees with columns EmployeeID, Name, and Age. A domain constraint for the Age column might specify that the values must be integers between 18 and 65. This can be implemented as follows:
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
Age INT CHECK (Age >= 18 AND Age <= 65)
);
In this example, the domain constraint on the Age column ensures that only integer values between 18 and 65 are allowed.
Check Constraints
Check constraints are specific conditions that each row in a database table must satisfy. They are used to enforce business rules at the column or table level.
Example:
In the same Employees table, suppose there is a requirement that the Name column should not be null, and Age should not only be between 18 and 65 but also an even number. This can be implemented using check constraints:
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100) NOT NULL,
Age INT CHECK (Age >= 18 AND Age <= 65 AND Age % 2 = 0)
);
Here, the check constraint on Age ensures that values are not only within the specified range but also even, and the NOT NULL constraint on Name ensures that it cannot be empty.
2. Transitive Dependency
A transitive dependency in a database occurs when a non-prime attribute depends on another non-prime attribute indirectly through a chain of dependencies. This typically happens in a relation that is not in the third normal form (3NF).
Example:
Consider a table StudentCourses with the following columns: StudentID, CourseID, InstructorID, and InstructorName. The functional dependencies are:
StudentID, CourseID→InstructorIDInstructorID→InstructorName
Here, InstructorName is transitively dependent on the composite key (StudentID, CourseID) via InstructorID.
To eliminate the transitive dependency and bring the table to 3NF, we should decompose it into two tables:
CREATE TABLE StudentCourses (
StudentID INT,
CourseID INT,
InstructorID INT,
PRIMARY KEY (StudentID, CourseID)
);
CREATE TABLE Instructors (
InstructorID INT PRIMARY KEY,
InstructorName VARCHAR(100)
);
Now, InstructorName is directly dependent on InstructorID in the Instructors table, and InstructorID is directly dependent on the composite key in the StudentCourses table.
3. ACID Properties
ACID properties are a set of properties that guarantee reliable processing of database transactions. They ensure that transactions are processed reliably and help in maintaining the integrity of the database.
- Atomicity: Ensures that a transaction is treated as a single unit, which either completes entirely or not at all. If any part of the transaction fails, the entire transaction fails and the database state is left unchanged.
Example: Consider a banking system where you transfer $100 from Account A to Account B. Atomicity ensures that either both the debit from Account A and the credit to Account B happen, or neither happens. - Consistency: Ensures that a transaction brings the database from one valid state to another valid state, maintaining database invariants.
Example: In a database maintaining inventory, a constraint may specify that total inventory in the warehouse should not go negative. Consistency ensures that any transaction adheres to this rule. - Isolation: Ensures that transactions occur independently without interference. Changes made by a transaction are not visible to other transactions until the transaction is committed.
Example: If two transactions are running concurrently, one updating an inventory count and another querying it, isolation ensures that the query sees a consistent count before or after the update, but not during the update. - Durability: Ensures that once a transaction is committed, it will remain committed even in the case of a system failure. Changes made by the transaction are permanently recorded.
Example: After transferring $100 from Account A to Account B, if the transaction is committed, the new balances of both accounts should persist even if the system crashes immediately after the commit.
These ACID properties together ensure that database transactions are processed reliably, maintaining data integrity and consistency across multiple transactions and system failures.
1. Normalization
Normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. It involves dividing large tables into smaller, more manageable pieces and defining relationships between them to enhance data integrity and reduce data anomalies.
1st Normal Form (1NF)
A table is in 1NF if it only contains atomic (indivisible) values, and each entry in a column is of the same data type. This ensures that each column contains unique, scalar values and that each record is unique.
Example:
Consider the following table with repeating groups, which is not in 1NF:
| StudentID | Name | Courses |
|---|---|---|
| 1 | John Doe | Math, Science |
| 2 | Jane Smith | English, Art |
To convert this into 1NF, we separate the repeating groups into individual rows:
| StudentID | Name | Course |
|---|---|---|
| 1 | John Doe | Math |
| 1 | John Doe | Science |
| 2 | Jane Smith | English |
| 2 | Jane Smith | Art |
2nd Normal Form (2NF)
A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key. This means that non-key attributes should depend on the entire primary key, not just part of it.
Example:
Consider a table that is in 1NF but not in 2NF:
| StudentID | Course | Instructor |
|---|---|---|
| 1 | Math | Dr. Smith |
| 1 | Science | Dr. Jones |
| 2 | English | Dr. Brown |
| 2 | Art | Dr. White |
Here, Instructor depends only on Course, not on the full primary key (StudentID, Course). To bring it into 2NF, we create a separate table for Courses:
Students Table:
| StudentID | Course |
|---|---|
| 1 | Math |
| 1 | Science |
| 2 | English |
| 2 | Art |
Courses Table:
| Course | Instructor |
|---|---|
| Math | Dr. Smith |
| Science | Dr. Jones |
| English | Dr. Brown |
| Art | Dr. White |
2. Data Abstraction
Data abstraction is the process of hiding the complexities of a database system from users and exposing only essential features. It simplifies user interaction with the system and enhances security by limiting access to specific data.
Levels of Data Abstraction:
- Physical Level: This is the lowest level of abstraction which describes how data is actually stored. It deals with complex low-level data structures in detail.
- Logical Level: This middle level of abstraction describes what data is stored in the database and the relationships among those data. Users and developers often interact with this level.
- View Level: This is the highest level of abstraction which describes only part of the entire database. It allows different users to view the data they need while hiding the rest.
Example:
In a banking system, a clerk might see a customer’s basic details (view level), a database administrator might see the schema design (logical level), and the physical storage details (physical level) are hidden from both.
3. E-R Diagram for Banking Management System
Symbols in E-R Diagram:
- Entity: Represented by a rectangle. An entity is a thing or object in the real world with an independent existence.
- Attribute: Represented by an oval. An attribute is a property of an entity.
- Relationship: Represented by a diamond. It shows the relationship between entities.
- Primary Key: Underlined attribute.
- Composite Attribute: Oval with other ovals connected to it.
- Multi-valued Attribute: Double oval.
- Derived Attribute: Dashed oval.
E-R Diagram for Banking Management System:
+------------------+
| Customer |
+------------------+
| CustomerID (PK) |
| Name |
| Address |
| Phone |
+------------------+
|
|
| +-------------+
+------>| Account |
| +-------------+
| | AccountNo (PK) |
| | Type |
| | Balance |
| +-------------+
|
+------------------+
| Transaction |
+------------------+
| TransactionID (PK) |
| Date |
| Amount |
| Type |
| AccountNo (FK) |
+------------------+
|
|
|
+------------------+
| Branch |
+------------------+
| BranchID (PK) |
| BranchName |
| Location |
+------------------+
|
|
|
+------------------+
| Employee |
+------------------+
| EmployeeID (PK) |
| Name |
| Position |
| BranchID (FK) |
+------------------+
In this E-R diagram:
– Customer and Account have a one-to-many relationship.
– Account and Transaction have a one-to-many relationship.
– Branch and Employee have a one-to-many relationship.
– Each Account is associated with one Branch.
– Each Transaction is linked to one Account.
1. Explain the Stored Procedure
A stored procedure is a precompiled set of one or more SQL statements that are stored on the database server and can be executed as a single unit. Stored procedures are used to encapsulate complex business logic, perform repetitive tasks, and ensure consistent data operations. They can accept input parameters, return output parameters, and provide a return status.
Example:
Here is a simple stored procedure to add a new employee to an Employees table:
CREATE PROCEDURE AddEmployee
@EmployeeID INT,
@Name VARCHAR(100),
@Position VARCHAR(100),
@Salary DECIMAL(10, 2)
AS
BEGIN
INSERT INTO Employees (EmployeeID, Name, Position, Salary)
VALUES (@EmployeeID, @Name, @Position, @Salary);
END;
To call this stored procedure, you would use:
EXEC AddEmployee 1, 'John Doe', 'Developer', 60000.00;
2. Disadvantages of File System
Disadvantages of File System:
- Data Redundancy and Inconsistency:
– In a file system, the same data might be duplicated in multiple files, leading to redundancy. Inconsistent updates across these files can result in data inconsistency. - Limited Security:
– File systems typically offer basic security mechanisms, lacking the granular access control and security features provided by modern database management systems (DBMS). This makes it difficult to protect sensitive data effectively.
These disadvantages highlight why modern systems often prefer using DBMS to manage and maintain data efficiently and securely.
Q-2 (B) Answer the following question (any 3)
1. Difference between E-R Model and Relational Model
The E-R (Entity-Relationship) Model and the Relational Model are two different approaches to designing and representing data structures in a database system.
E-R Model:
– Conceptual Representation: The E-R Model is used for conceptual design of the database. It provides a high-level view of data and the relationships among data.
– Components: The primary components are entities, attributes, and relationships. Entities represent real-world objects or concepts, attributes are properties of entities, and relationships depict how entities are related.
– Diagrammatic Representation: E-R diagrams use specific symbols (rectangles for entities, ovals for attributes, diamonds for relationships) to visually represent data.
– Use Case: It is mainly used in the initial stages of database design to model the data requirements of an organization or application.
Example of E-R Diagram:
+------------------+
| Customer |
+------------------+
| CustomerID (PK) |
| Name |
| Address |
+------------------+
|
|
|
+------------------+
| Account |
+------------------+
| AccountNo (PK) |
| Type |
| Balance |
+------------------+
Relational Model:
– Logical Representation: The Relational Model represents data logically using tables (also called relations). Each table consists of rows and columns.
– Components: The primary components are tables, rows, and columns. Tables represent entity sets, rows represent records, and columns represent attributes.
– Normalization: It emphasizes eliminating redundancy through normalization, ensuring the database structure is efficient and free from anomalies.
– Use Case: It is used for the actual implementation of the database. The design is converted into a set of tables that can be implemented using a relational database management system (RDBMS).
Example of Relational Model:
CREATE TABLE Customer (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
Address VARCHAR(255)
);
CREATE TABLE Account (
AccountNo INT PRIMARY KEY,
Type VARCHAR(50),
Balance DECIMAL(10, 2),
CustomerID INT,
FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
);
Key Differences:
– Purpose: E-R Model is for high-level conceptual design; Relational Model is for detailed logical implementation.
– Representation: E-R Model uses diagrams with entities, attributes, and relationships; Relational Model uses tables with rows and columns.
– Stage of Use: E-R Model is used in the early design phase; Relational Model is used in the implementation phase.
– Detail: E-R Model is less detailed and abstract; Relational Model is more detailed and concrete.
2. Explain Different Types of Security Levels
Database security involves multiple layers to protect data from unauthorized access and ensure data integrity and confidentiality. The main security levels include:
- Physical Security:
– Description: Protects the hardware and physical infrastructure of the database.
– Measures: Controlled access to server rooms, surveillance cameras, and secure hardware disposal. - Network Security:
– Description: Protects the data as it travels across networks.
– Measures: Firewalls, intrusion detection systems (IDS), virtual private networks (VPN), and secure communication protocols (e.g., SSL/TLS). - Access Control:
– Description: Manages who can access the database and what operations they can perform.
– Measures: Authentication (usernames, passwords, biometrics), authorization (roles, permissions), and auditing (tracking user actions). - Data Security:
– Description: Protects the data itself from unauthorized access and corruption.
– Measures: Encryption (both at rest and in transit), data masking, and backup and recovery strategies. - Application Security:
– Description: Ensures that applications accessing the database are secure.
– Measures: Secure coding practices, regular security testing (e.g., penetration testing, code reviews), and patch management. - Administrative Security:
– Description: Ensures that administrative processes and policies support security.
– Measures: Security policies, regular training for staff, incident response plans, and compliance with regulations (e.g., GDPR, HIPAA).
3. What are the Different Types of Keys in the Database?
Keys are crucial in a relational database for uniquely identifying records and establishing relationships between tables. Here are the main types of keys:
- Primary Key:
– Description: A unique identifier for each record in a table. It must contain unique values and cannot be null.
– Example:CustomerIDin aCustomerstable. - Foreign Key:
– Description: A field in one table that uniquely identifies a row of another table, creating a relationship between the two tables.
– Example:CustomerIDin anOrderstable referencingCustomerIDin theCustomerstable. - Candidate Key:
– Description: A field, or combination of fields, that can uniquely identify a record. A table can have multiple candidate keys, but only one primary key.
– Example:EmailandSSNcan be candidate keys in aUserstable. - Composite Key:
– Description: A primary key consisting of two or more fields to uniquely identify a record.
– Example:OrderIDandProductIDtogether form a composite key in anOrderDetailstable. - Alternate Key:
– Description: A candidate key that is not chosen as the primary key.
– Example: IfEmailis a candidate key butUserIDis the primary key, thenEmailis an alternate key. - Super Key:
– Description: A set of one or more columns (attributes) that can uniquely identify a row in a table. A super key includes the primary key and any additional attributes.
– Example:{CustomerID, Email}in aCustomerstable. - Unique Key:
– Description: Ensures all values in a column or a set of columns are unique. It can contain null values (though typically one null value).
– Example:Usernamein aUserstable, ensuring no two users have the same username.
4. Explain View, Index, and Database Trigger
View:
– Description: A virtual table based on the result-set of a SQL query. It does not store data physically but presents data from one or more tables.
– Usage: Simplifies complex queries, provides a layer of security by restricting access to specific rows/columns, and abstracts underlying table structures.
– Example:
CREATE VIEW ActiveCustomers AS
SELECT CustomerID, Name, Email
FROM Customers
WHERE IsActive = 1;
Index:
– Description: A database object that improves the speed of data retrieval operations on a table at the cost of additional space and slower write operations.
– Types: Primary (unique), secondary (non-unique), clustered (data stored in the index order), and non-clustered (separate from the data storage).
– Usage: Enhances performance for search queries, sorting, and join operations.
– Example:
CREATE INDEX idx_customer_email ON Customers (Email);
Database Trigger:
– Description: A set of SQL statements that automatically execute in response to certain events on a particular table or view, such as INSERT, UPDATE, or DELETE.
– Usage: Enforces business rules, maintains audit trails, synchronizes tables, and enforces referential integrity.
– Example:
CREATE TRIGGER trg_after_insert
ON Orders
AFTER INSERT
AS
BEGIN
INSERT INTO OrderAudit (OrderID, OrderDate)
SELECT OrderID, OrderDate
FROM inserted;
END;
These concepts and components play crucial roles in designing, optimizing, and securing a database system, ensuring both performance and reliability.
1. Explain Normalization in Detail
Normalization is a process in database design to organize the columns (attributes) and tables (relations) of a database to minimize redundancy and dependency. The main objective is to divide larger tables into smaller, related tables and link them using relationships. This process helps in improving the data integrity and reducing data anomalies.
Normalization typically involves several normal forms (NF), each with specific rules and objectives. Here’s a detailed explanation of the first three normal forms (1NF, 2NF, and 3NF), which are the most commonly used:
First Normal Form (1NF)
A table is in 1NF if:
- All the values in the table are atomic (indivisible).
- Each column contains only one value for each row (no repeating groups or arrays).
Example:
Consider a table with customer orders that is not in 1NF:
OrderID | CustomerName | ProductIDs
------- | ------------- | ----------
1 | John Doe | 101, 102
2 | Jane Smith | 103, 104, 105
To convert it to 1NF, each product should have its own row:
OrderID | CustomerName | ProductID
------- | ------------- | ----------
1 | John Doe | 101
1 | John Doe | 102
2 | Jane Smith | 103
2 | Jane Smith | 104
2 | Jane Smith | 105
Second Normal Form (2NF)
A table is in 2NF if:
- It is in 1NF.
- All non-key attributes are fully functional dependent on the primary key.
This means removing partial dependencies, where an attribute is dependent on only part of a composite primary key.
Example:
Consider a table with the following structure that is in 1NF but not in 2NF:
OrderID | ProductID | CustomerName | OrderDate
------- | ---------- | ------------- | ----------
1 | 101 | John Doe | 2023-01-01
1 | 102 | John Doe | 2023-01-01
2 | 103 | Jane Smith | 2023-01-02
Here, CustomerName depends only on OrderID and not on the combination of OrderID and ProductID. To convert it to 2NF, create separate tables:
Orders Table:
OrderID | CustomerName | OrderDate
------- | ------------- | ----------
1 | John Doe | 2023-01-01
2 | Jane Smith | 2023-01-02
OrderDetails Table:
OrderID | ProductID
------- | ----------
1 | 101
1 | 102
2 | 103
Third Normal Form (3NF)
A table is in 3NF if:
- It is in 2NF.
- All non-key attributes are non-transitively dependent on the primary key.
This means removing transitive dependencies, where a non-key attribute depends on another non-key attribute.
Example:
Consider a table that is in 2NF but not in 3NF:
OrderID | CustomerID | CustomerName | OrderDate
------- | ---------- | ------------- | ----------
1 | 1001 | John Doe | 2023-01-01
2 | 1002 | Jane Smith | 2023-01-02
Here, CustomerName is dependent on CustomerID, which is not a primary key. To convert it to 3NF, create separate tables:
Orders Table:
OrderID | CustomerID | OrderDate
------- | ---------- | ----------
1 | 1001 | 2023-01-01
2 | 1002 | 2023-01-02
Customers Table:
CustomerID | CustomerName
---------- | -------------
1001 | John Doe
1002 | Jane Smith
Advantages of Normalization:
- Eliminates Data Redundancy: Reduces the duplication of data.
- Improves Data Integrity: Ensures that data is consistent and accurate.
- Enhances Data Organization: Makes it easier to organize data logically.
- Simplifies Maintenance: Facilitates easier data updates and maintenance.
Disadvantages of Normalization:
- Performance Overhead: Can introduce performance overhead due to the increased number of tables and joins.
- Complexity: Can make the database schema more complex and harder to understand for users.
2. Explain the Terms Entity, Entity Type, Entity Attributes, and Entity Set in DBMS
Entity:
An entity is an object or thing in the real world that is distinguishable from other objects. Entities can be anything, such as a person, place, event, or object that has a distinct existence in the context of a database.
Example:
- A student in a university database.
- A product in an inventory system.
Entity Type:
An entity type defines a collection of entities that have the same attributes. It is essentially a template or blueprint for creating entities and is represented by a table in a relational database.
Example:
- The
Studententity type could include attributes such asStudentID,Name,DateOfBirth, andEmail. - The
Productentity type might includeProductID,ProductName,Price, andCategory.
Entity Attributes:
Attributes are the properties or characteristics of an entity. Each attribute represents a specific piece of information about the entity.
Example:
For the Student entity type:
StudentID(unique identifier)NameDateOfBirthEmail
For the Product entity type:
ProductID(unique identifier)ProductNamePriceCategory
Entity Set:
An entity set is a collection of entities of the same type. It represents the actual instances or records stored in the database table for a particular entity type.
Example:
- All students in a university represent an entity set for the
Studententity type. - All products in an inventory represent an entity set for the
Productentity type.
Visual Representation:
Entity-Relationship Diagram (ERD):
+------------------+ +------------------+
| Student | | Product |
+------------------+ +------------------+
| StudentID (PK) | | ProductID (PK) |
| Name | | ProductName |
| DateOfBirth | | Price |
| Email | | Category |
+------------------+ +------------------+
- Entity: An instance of
StudentorProduct. - Entity Type: The structure/template of
StudentorProduct. - Entity Attributes:
StudentID,Name,DateOfBirth,EmailforStudent;ProductID,ProductName,Price,CategoryforProduct. - Entity Set: All records of
StudentorProductstored in the respective tables.
These concepts are fundamental to understanding database design and structure, enabling efficient data organization and retrieval.
