Database Design and SQL Concepts

Database Concepts

Relation: Table (e.g., Pets table)

Attribute: Column (e.g., Name)

Domain: Set of possible values (e.g., Age: 0-20)

Tuple: Row (e.g., (1, 'Buddy', 'Dog', 3))

Degree: Number of attributes (e.g., 4 in Pets)

Cardinality: Number of tuples (e.g., 10 rows in Pets)

Candidate Key: Unique identifier (e.g., CourseID)

Primary Key: Selected unique identifier (e.g., CourseID as primary key)

Foreign Key: Reference to primary key in another table (e.g., OwnerID in Pets)

Domain Constraint: Valid value range (e.g., Age >= 0)

Entity Integrity: Primary key cannot be NULL

Referential Integrity: Foreign key must match primary key or be NULL

NULL: Missing or unknown data

Views: Virtual tables with custom views for security, customization, and simplification

Key Types

Primary Key: Unique + not null.

Foreign Key: Links tables, enforces referential integrity, allows duplicates, links to a primary key in a different table.

Composite Key: Multiple columns combined for uniqueness.

Unique Key: Ensures uniqueness, can be null.

Candidate Key: Potential primary keys, unique identifiers.

Alternate Key: Candidate keys that are not the primary key.

Surrogate Key: Auto-generated/incrementing, no inherent meaning.

Superkey: Set of all columns that can uniquely identify a record.

Integrity Constraints

Null: Represents an unknown or not applicable value. Different from zero or blank spaces.

Entity Integrity: Primary key attributes cannot be Null.

Referential Integrity: Foreign key must match a candidate key in its home relation or be wholly Null.

General Constraints: Additional rules set by users/admins to define or constrain database aspects.

Views

Base Relation: Named relation corresponding to an entity in the conceptual schema, physically stored in the database.

View: Dynamic result of operations on base relations. Produced on request, at query time. Defined by a query on one or more base relations.

Purpose of Views

Definition: Virtual tables created by queries on one or more base tables.

Purpose:

  • Security: Restrict access to specific data.
  • Customization: Present data differently for various users.
  • Simplicity: Simplify complex queries.

Examples

COMP 3611: Create a view to show ORDER with product prices:

CREATE VIEW OrderDetails AS
SELECT ORDER.ORD_ID, PRODUCT.PROD_NAME, PRODUCT.PRICE
FROM ORDER
JOIN ORDER_PRODUCT ON ORDER.ORD_ID = ORDER_PRODUCT.ORD_ID
JOIN PRODUCT ON ORDER_PRODUCT.PROD_ID = PRODUCT.PROD_ID;

Erehwon Clinic: Create a view to list all pets by client:

CREATE VIEW ClientPets AS
SELECT Client.Name, Pet.Name
FROM Client
JOIN Pet ON Client.ClientID = Pet.ClientID;

SQL

salary DECIMAL(7,2) CHECK (salary BETWEEN 6000.00 AND 40000.00)
CREATE TABLE Employees (
    ID INT PRIMARY KEY,
    Salary DECIMAL(10, 2) CHECK (Salary BETWEEN 30000 AND 200000)
);
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,  -- This creates a primary index
    Name VARCHAR(100),
    DepartmentID INT
); // Database index

Write a query to list all courses with titles and credits:

SELECT COURSE_CODE, COURSE_TITLE, CREDITS
FROM COURSE;

Total Credits for all courses:

SELECT SUM(CREDITS) AS TOTAL_CREDITS
FROM COURSE;

Write an SQL subquery to find the course with the highest number of credits:

SELECT COURSE_CODE, COURSE_TITLE, CREDITS
FROM COURSE
WHERE CREDITS = (SELECT MAX(CREDITS) FROM COURSE);

Insert a new course into the table:

INSERT INTO COURSE (COURSE_CODE, COURSE_TITLE, CREDITS)
VALUES ('NEW101', 'New Course Title', 3);

Update credits for a course:

UPDATE course
SET CREDITS = 4
WHERE COURSE_CODE = 'EXIST101';

Delete a course from the table:

DELETE FROM COURSE
WHERE COURSE_CODE = 'OLD101';

Inner Join CLIENT & PET tables to combine rows from two or more tables, return only rows that match in both tables (same ClientID in both), filter rows that don’t match:

SELECT c.ClientID, c.FirstName, C.LastName, p.PetNbr, p.PetName, P.PetType
FROM CLIENT c
INNER JOIN Pet p ON c.ClientID = p.ClientID;

Left Outer Join: Combine rows from two or more tables, return rows from the Client (left) table even if there is no match in the right table (Pet), puts NULL for no match:

SELECT C.ClientID, C.FirstName, C.LastName, P.PetNbr, P.PetName, P.PetType
FROM CLIENT C
LEFT OUTER JOIN PET P ON C.ClientID = P.ClientID;

Right Outer Join: Combine rows from two tables, return rows from the right (Pet) table even if there is no match in the left (Client) table:

SELECT C.ClientID, C.FirstName, C.LastName, P.PetNbr, P.PetName, P.PetType
FROM CLIENT C
RIGHT OUTER JOIN PET P ON C.ClientID = P.ClientID;

Full Outer Join: Combine rows from two or more tables, return all rows from both even if there is no match (all nulls):

SELECT C.ClientID, C.FirstName, C.LastName, P.PetNbr, P.PetName, P.PetType
FROM CLIENT C
FULL OUTER JOIN PET P ON C.ClientID = P.ClientID;

Subquery to find all clients who have a pet of type ‘Cat’:

SELECT C.ClientID, C.FirstName, C.LastName
FROM CLIENT C
WHERE C.ClientID IN (SELECT P.ClientID FROM PET P WHERE P.PetType = 'Cat');

Not In Subquery: Check if values do not exist (Clients who don’t have a dog):

SELECT C.ClientID, C.FirstName, C.LastName
FROM CLIENT C
WHERE C.ClientID NOT IN (SELECT P.ClientID FROM PET P WHERE P.PetType = 'Dog');

Correlated Subquery for aggregate operations and use the result in the outer query (find all clients who have more than 2 pets):

-- Example Placeholder Query -- 

Subquery with Join: Find all clients who have a pet with a higher pet number than the average pet number for their pet type:

SELECT C.ClientID, C.FirstName, C.LastName
FROM CLIENT C
WHERE C.ClientID IN (
    SELECT P.ClientID
    FROM PET P
    INNER JOIN (
        SELECT PetType, AVG(PetNbr) AS AvgPetNbr
        FROM PET
        GROUP BY PetType
    ) AS AvgPetNbr ON P.PetType = AvgPetNbr.PetType AND P.PetNbr > AvgPetNbr.AvgPetNbr
);

Subquery with Aggregate function and use the result in the outer query (find all clients who have a total pet number greater than the average total pet number for all clients):

SELECT C.ClientID, C.FirstName, C.LastName
FROM CLIENT C
WHERE C.ClientID IN (
    SELECT P.ClientID
    FROM PET P
    GROUP BY P.ClientID
    HAVING SUM(P.PetNbr) > (SELECT AVG(SUM(PetNbr)) FROM PET GROUP BY ClientID)
);

Triggers

TRIGGER_CLIENT_ID_SEQ: Generates a unique ClientID sequence number for each new client inserted.

CREATE TRIGGER TRIGGER_CLIENT_ID_SEQ
BEFORE INSERT ON CLIENT
FOR EACH ROW
BEGIN
    SELECT CLIENT_ID_SEQ.NEXTVAL INTO :NEW.ClientID FROM DUAL;
END;

TRIGGER_CLIENT_NAME_CHECK: Checks if the Name column is not null and has a minimum length of 2 characters.

CREATE TRIGGER TRIGGER_CLIENT_NAME_CHECK
BEFORE INSERT OR UPDATE ON CLIENT
FOR EACH ROW
BEGIN
    IF :NEW.Name IS NULL OR LENGTH(:NEW.Name) < 2 THEN
        -- Handle the error (e.g., raise an exception)
    END IF;
END;

TRIGGER_PET_ID_SEQ: Generates a unique PetNbr sequence number for each new pet inserted, starting from 1 for each client.

CREATE TRIGGER TRIGGER_PET_ID_SEQ
BEFORE INSERT ON PET
FOR EACH ROW
BEGIN
    SELECT PET_ID_SEQ.NEXTVAL INTO :NEW.PetNbr FROM DUAL;
    :NEW.PetNbr := :NEW.PetNbr + (SELECT MAX(PetNbr) FROM PET WHERE ClientID = :NEW.ClientID);
END;

CLIENT_ID_SEQ: Generates a unique sequence number for each new client inserted.

CREATE SEQUENCE CLIENT_ID_SEQ START WITH 1 INCREMENT BY 1 NOCACHE;

Database Analysis & Design

Software Crisis/Depression

High maintenance costs and failures include:

  1. Lack of requirement specifications
  2. Lack of development methods
  3. Poor decomposition of design

This gave way to a structured approach to development called the Information System Lifecycle (ISLC). The database is a component viewed from a wider perspective.

Database System Development Lifecycle (DSDLC)

  1. Database Planning
  2. System Definition
  3. Requirement Collection and Analysis
  4. Database Design
  5. DBMS Selection (optional)
  6. Application Design
  7. Prototyping (optional)
  8. Implementation
  9. Data Conversion and Loading
  10. Testing
  11. Operational Maintenance

1. Database Planning

  • Management activities affecting the SDLC
  • Integrated with the IS strategy/mission statement
  • Define aims/clear path/mission objectives per task
  • Standards for collecting data/formatting/documentation/design and implementation
  • Scope/Objectives/Requirements

2. System Definition

  • Scope and boundaries of the database system
  • User view defines the requirements of the database for management and supervision
  • One or more user views (don’t leave out users)
  • Breakdown requirements using views (e.g., Pet Manager System Environment/Boundaries)

3. Requirement Collection and Analysis

  • Information about parts of the organization that use the database
  • Use for requirements of the new system
  • Fact-finding: Gather information for views, descriptions of data use or generation, details of how data is used or generated, additional requirements for the new database. Information used to identify requirements and how to manage requirements for the database.
  • For databases with many views:
  1. Centralized approach
  2. View integration approach
  3. Combination of approaches

Centralized Approach: Requirements for user views are merged into a single set of requirements. The data model is created representing all user views during the database design stage.

View Integration: Requirements for each user are separate lists. Data models for each view are created and merged later in the database design. The data model for a single user view or a subset of all views is a local data model. Each model includes diagrams and documentation describing requirements, but not all user views. Local data models are merged later in the design to produce a global data model that incorporates all views.

4. Database Design

  • Create a design that supports the mission statement, objectives, and requirements (user and system requirements).
  • Approaches (4 types):
    1. Top-down
    2. Bottom-up
    3. Inside-out
    4. Mixed
  • Main purpose of data modeling: To understand the meaning/semantics of data, facilitate communication about requirements, and build a data model that answers questions about entities, relations, and attributes.
  • Data models are used to understand three things:
    1. User perspective of data
    2. Nature of data independent of physical representation
    3. Use of data across views

Criteria for an optimal data model: Structural validity, simplicity, expressibility, non-redundancy, shareability, extensibility, integrity, diagrammatic representation.

3 Database Design Phases:

  1. Conceptual
  2. Logical
  3. Physical

1. Conceptual Database Design: Construct a model of data independent of physical considerations. The data model is built using information in user requirement specifications and is the source of logical information.

2. Logical Database Design: Construct a model of the data model (e.g., relational) independent of the DBMS and physical considerations. The refined conceptual model is mapped onto the logical model.

3. Physical Database Design: Produce a description of the database implementation on secondary storage. This describes base relations, file organization, indexes for accessing data, integrity and security steps, and is specific to the DBMS.

Logical/Conceptual Database Design: External schema and conceptual schema.

Physical Database: Internal schema and physical storage.

DBMS Selection

Select a DBMS to support the system. This is done before logical design if there is sufficient information about system requirements.

Steps for DBMS Selection:

  1. Define the terms of reference for the study.
  2. Shortlist 2-3 products.
  3. Evaluate products.
  4. Recommend a selection and produce a report.

Database Design Types:

  1. Bottom-up: Fundamental attributes, entities, and relations are grouped (e.g., normalization). Suitable for simple databases with a small number of attributes, but not for complex systems.
  2. Top-down: Suitable for complex systems. Starts with data models with a few high-level entities and relations and uses a top-down approach to identify low-level entities. Uses an ER model.
  3. Inside-out: Similar to bottom-up, but identifies major entity sets and branches out.
  4. Mixed: Combines bottom-up and top-down strategies before combining.

DBMS Selection Factors: Scalability, compatibility, support, security, cost, open source vs. commercial (e.g., Oracle), functionality.

5. Application Design Phase

  • Design the user interface and application programs that use and process the database (e.g., user GUI).

6. Prototyping (Optional)

  • Building a working data model of the database system allows users and designers to visualize the final system.
  • Two types of prototypes:
    1. Requirements prototype (discarded)
    2. Evolutionary prototype (included in the system)

7. Implementation

  • Create physical database definitions and application programs using DDL.

8. Data Conversion and Loading

  • Load data from the old system and, when possible, convert old existing applications to run on the new database.

9. Testing

  • Test for faults and errors.
  • Validate against user requirements.
  • Evaluate learnability, robustness, performance, recoverability, and adaptability.

10. Operational Maintenance

  • The system is fully implemented.
  • Continuously monitor and maintain the system.
  • Incorporate new requirements through the SDLC.

Fact-Finding

Fact-finding involves examining documentation, conducting interviews, observing the enterprise in operation, conducting research, and using questionnaires.

Mission Statement for the Database

The purpose of the database is to support the veterinary clinic, maintain and manage data between [entity/department] and [entity/department], and facilitate communication between [stakeholders].

Mission Objectives:

  • To maintain data on [specific data elements].
  • To perform searches on [specific data elements].

User Requirements: Detailed information about the data being held and how it is used.

System Requirements: Specific features and functionalities required.

Data Models

ER Model

  • Entities: Client and Pet
  • Attributes:
    • Client: ClientID (PK), Name, Deductible
    • Pet: PetNbr (PK), PetName, PetType, ClientID (FK)
  • Relationships:
    • 1:M (Clients can have one to many pets)
    • M:1 (A pet belongs to one client – many to one)

Example: Customer Database

  • Entities: Customers, Products, Orders, Order_Product
  • Attributes:
    • Order_Product: Ord_ID (FK), Product_ID (FK)

EER Diagrams

Enhanced entity-relationship diagrams (EERDs) are used to overcome the limitations of ER diagrams. They introduce more semantic concepts for complex systems that need specialization/generalization/inheritance/aggregation and composition.

  • Superclass: An entity type that has one or more distinct subclasses (e.g., STAFF).
  • Subclass: Distinct subgroups of entity types that can be found in the data model (e.g., types of staff, which are a subclass of the superclass STAFF). Subclasses inherit all attributes of their superclass, avoiding describing similar concepts multiple times, saving time, and improving readability.
  • Type Hierarchy: The relationship between an entity and its subclass is a type hierarchy. Type hierarchies are also known as specialization hierarchies or generalization hierarchies.
  • Shared Subclass: A subclass that has more than one superclass, representing multiple inheritance.
  • Specialization: A process that maximizes the differences between members of an entity by identifying distinct features. It is a top-down approach to defining a set of superclasses and related subclasses.
  • Generalization: A process that minimizes differences in entities with common features. It is the reverse of specialization.

Types of Constraints for Specialization/Generalization:

  1. Participation Constraint: Specifies whether every member in the superclass must participate as a member of a subclass (mandatory or optional). For example, types of staff don’t have to have a job role. Optional participation is represented as {Optional, OR}.
  2. Disjoint Constraints: Specifies whether a superclass can have more than one subclass. Disjoint subclasses can use”O” next to the participation constraint. If a subclass generation is not disjoint, it is considered non-disjoint.

Aggregation and Composition

  • Aggregation: Represents a”HAS ” or”IS PART O” relationship between entities, where one entity is the whole and the other is the part.
  • Composition: A type of aggregation that represents strong ownership between the whole and the part.

Fact-Finding Techniques for the DSDLC

These techniques help gather the necessary information to design the database system.

  • Interviews: Interview the clinic’s staff to understand what information they need about clients and pets.
  • Questionnaires: Distribute questionnaires to all employees to gather input on what features they need in the new system.
  • Document Analysis: Review existing paper records of clients and pets to understand what data is currently being tracked.
  • Observation: Observe how the staff interacts with the current system or manual processes to identify inefficiencies.
  • Prototyping: Create a prototype of the database to get feedback from users on whether it meets their needs.

Method for deciding between a centralized or view integration approach (or a combination): Examine the overlap in data between user views.

Database Normalization

  • Identify required attributes: Determine the necessary data elements.
  • Aggregation into relations: Group attributes into normalized relations based on functional dependencies.
  • Identify keys: Determine candidate keys and primary keys for unique identification.
  • Primary Key: The selected unique identifier for a relation.
  • Alternate Keys: Candidate keys that are not chosen as the primary key.

Multivalued and Composite Attributes

  • Multivalued Attribute: An attribute that can have more than one value for a single entity instance (e.g., phone numbers).
  • Composite Attribute: An attribute composed of multiple sub-attributes (e.g., address: street number, street name, city, zip code).

Normalization Forms

  • 1NF (First Normal Form): Eliminate repeating groups and ensure atomic values.
  • 2NF (Second Normal Form): Remove partial dependencies (dependencies on part of a composite key).
  • 3NF (Third Normal Form): Remove transitive dependencies (non-key attributes dependent on other non-key attributes).
  • BCNF (Boyce-Codd Normal Form): Further refine 3NF to remove remaining anomalies.

Functional Dependencies

A relationship where the value of one attribute (the determinant) determines the value of another attribute (the dependent). For example, StudentID -> StudentName.

Entity-Relationship Model (ERM)

A graphical representation of entities, their attributes, and the relationships between them. ER models are typically represented as Entity-Relationship Diagrams (ERDs).

Enhanced Entity-Relationship Model (EERM)

is introduced as a way of representing additional semantic concepts. These include: specialization/generalization, aggregation, and composition.  

Specify and use fact-finding techniques for the DSDLC.Fact-Finding Techniques: These techniques help gather necessary information to design the database system.  Interviews: Interview the clinic’s staff to understand what information they need about clients and pets.Questionnaires:Distribute questionnaires to all employees to gather input on what features they need in the new system.Document Analysis:Review existing paper records of clients and pets to understand what data is currently being tracked.Observation:Observe how the staff interacts with the current system or manual processes to identify inefficiencies.Prototyping: Create a prototype of the database to get feedback from users on whether it meets their needs. Method for desciding central or view intrgrstion approach or combo = examine overlap in data between user view.

Missions Statement for DB: The purposeof the DB is to support the vert clinic and maintain and manage data between__ and the __ and facilitate communication between ____. Mission objectives: to main data on xyz, to perofrm searches on xyz. User requirements = detail opf data being held and how its used. System requirements:featues features.

DATA MODELS: ER model: Entities = Client and Pet. Attributes Client=ClientId, Name, Deductive. Pet: PetNbr,PetName,PetType,ClientID(FK). Relationships: 1:M Clients can have one to many pets. M:1 A pet belongs to one client many to one. CUSTOMER:  Customers, Products, Orders, Order_product. Order_Product ord_id (fk), and produc_id(fk)

EER DIAGRAMS: Enhanced entity relationship diagrams used to overcome limitations of ER diagrams. More semantic concepts for complex systems that need specialization/generalization/inheritance/aggregation and composition. Superclass: entity type of one or more distinct subclass eg STAFF. Subclass are distinct subgorups of entity types that can be found in data model. Eg types of staff which are a subclass of the superclass allows subclasses to posses all attributes of superclas avoid diescribing similar concepts more than once and save time and improve readability. Entity and sublclass = a type hierarchy type hierarchies are known by diff names like specialization hierarvchy or generalization hierarchy. A shared subclas has more than one superclass which is multiple inerhitance. Specialization process maximizes the differencebetween members of an enetity by diff features top down approcah to defining set of superclasses and related subclasses. Generalization process= minimize differences in entities with common features. reverse specialization. Types of constriants for Specialization/Generalization: 1. Participation constraint: whether every member in superclass must participate as a member of subclass (mandadatory or optional {Optional, OR}. eg typesof staff dont have to havea job role. Disjoiint constraints: whether superclass has more than one subclass, disjoint subclasses can use OR next to participation constrain. If a subclass generation not disjoint = non disjoint

Aggregation: represents a HAS A or IS PART OF relationship between entity ahnd type one is whole and other is part. Composition: a type of aggregation that is an association between entities and strong ownership between whole and part

LmscWAMIJuAAAAAElFTkSuQmCC