Understanding Database Concepts: A Comprehensive Guide
Understanding Database Concepts
Basic Terminology
Let’s break down some fundamental database terms:
- Relation: A table (e.g.,
Pets
table). - Attribute: A column within a table (e.g.,
Name
). - Domain: The set of possible values for an attribute (e.g., Age: 0-20).
- Tuple: A row in a table (e.g.,
(1, 'Buddy', 'Dog', 3)
). - Degree: The number of attributes (columns) in a table (e.g., 4 in
Pets
). - Cardinality: The number of tuples (rows) in a table (e.g., 10 rows in
Pets
). - Candidate Key: An attribute or set of attributes that uniquely identifies a tuple (e.g.,
CourseID
). - Primary Key: The selected candidate key used as the main unique identifier (e.g.,
CourseID
as the primary key). - Foreign Key: An attribute that references the primary key of another table, establishing a link between them (e.g.,
OwnerID
inPets
referencingClientID
inClients
). - Domain Constraint: Rules that define the valid range of values for an attribute (e.g.,
Age
>= 0). - Entity Integrity: Ensures that the primary key is unique and not NULL.
- Referential Integrity: Ensures that a foreign key value matches a candidate key value in another table or is NULL.
- NULL: Represents missing or unknown data. It’s important to note that NULL is different from zero or blank spaces.
- Views: Virtual tables based on queries, offering customized and secure data access.
Key Types Explained
- Primary Key: Uniquely identifies each record in a table and cannot be NULL.
- Foreign Key: Links tables by referencing a primary key in another table, enforcing referential integrity. It can contain duplicates (multiple rows with the same foreign key value) as it links to the primary key in a different table.
- Composite Key: Formed by combining multiple columns to ensure uniqueness when a single column cannot guarantee it.
- Unique Key: Ensures uniqueness within a column but allows for NULL values.
- Candidate Key: Potential primary keys, each capable of uniquely identifying a record.
- Alternate Key: Candidate keys that were not chosen as the primary key.
- Surrogate Key: An auto-generated, system-defined key with no inherent meaning, often used for simplicity.
- Superkey: Any set of columns that can uniquely identify a record in a table.
Integrity Constraints
Integrity constraints ensure data accuracy and consistency:
- Entity Integrity: Primary key attributes cannot be NULL.
- Referential Integrity: Foreign keys must match a candidate key in the referenced relation or be wholly NULL.
- General Constraints: User-defined rules to enforce specific data requirements.
Views: A Closer Look
Views are virtual tables derived from base tables, offering several benefits:
- Definition: Created by queries on one or more base tables.
- Security: Restrict data access to authorized users.
- Customization: Tailor data presentation for different user roles.
- Simplicity: Simplify complex queries by representing them as virtual tables.
Examples of Views
- COMP 3611: Create a view to show orders with product prices:
CREATE VIEW OrderDetails AS SELECT ORDER.ORD_ID, PRODUCT.PROD_NAME, PRODUCT.PRICE FROM ORDER JOIN ORDER_PRODUCT ON ORDER.ORD_ID = ORDER_PRODUCT.ORD_ID JOIN PRODUCT ON ORDER_PRODUCT.PROD_ID = PRODUCT.PROD_ID;
- Erehwon Clinic: Create a view to list all pets by client:
CREATE VIEW ClientPets AS SELECT Client.Name, Pet.Name FROM Client JOIN Pet ON Client.ClientID = Pet.ClientID;
SQL Essentials
SQL (Structured Query Language) is used for managing data in relational databases. Here are some common SQL statements:
Data Definition Language (DDL)
- Define a column with constraints:
salary DECIMAL(7,2) CHECK (salary BETWEEN 6000.00 AND 40000.00)
- Create a table:
CREATE TABLE Employees ( ID INT PRIMARY KEY, Salary DECIMAL(10, 2) CHECK (Salary BETWEEN 30000 AND 200000) );
- Create a table with comments:
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, -- This creates a primary index Name VARCHAR(100), DepartmentID INT ); -- Database index
Data Manipulation Language (DML)
- Select data:
SELECT COURSE_CODE, COURSE_TITLE, CREDITS FROM COURSE;
- Calculate aggregate values:
SELECT SUM(CREDITS) AS TOTAL_CREDITS FROM COURSE;
- Use subqueries:
SELECT COURSE_CODE, COURSE_TITLE, CREDITS FROM COURSE WHERE CREDITS = (SELECT MAX(CREDITS) FROM COURSE);
- Insert data:
INSERT INTO COURSE (COURSE_CODE, COURSE_TITLE, CREDITS) VALUES ('NEW101', 'New Course Title', 3);
- Update data:
UPDATE course SET CREDITS = 4 WHERE COURSE_CODE = 'EXIST101';
- Delete data:
DELETE FROM COURSE WHERE COURSE_CODE = 'OLD101';
Joins
- Inner Join: Retrieve matching rows from both tables:
SELECT c.ClientID, c.FirstName, c.LastName, p.PetNbr, p.PetName, p.PetType FROM CLIENT c INNER JOIN Pet p ON c.ClientID = p.ClientID;
- Left Outer Join: Retrieve all rows from the left table and matching rows from the right table:
SELECT c.ClientID, c.FirstName, c.LastName, p.PetNbr, p.PetName, p.PetType FROM CLIENT c LEFT OUTER JOIN PET p ON c.ClientID = p.ClientID;
- Right Outer Join: Retrieve all rows from the right table and matching rows from the left table:
SELECT c.ClientID, c.FirstName, c.LastName, p.PetNbr, p.PetName, p.PetType FROM CLIENT c RIGHT OUTER JOIN PET p ON c.ClientID = p.ClientID;
- Full Outer Join: Retrieve all rows from both tables:
SELECT c.ClientID, c.FirstName, c.LastName, p.PetNbr, p.PetName, p.PetType FROM CLIENT c FULL OUTER JOIN PET p ON c.ClientID = p.ClientID;
Subqueries
- IN Subquery:
SELECT C.ClientID, C.FirstName, C.LastName FROM CLIENT C WHERE C.ClientID IN (SELECT P.ClientID FROM PET P WHERE P.PetType = 'Cat');
- NOT IN Subquery:
SELECT C.ClientID, C.FirstName, C.LastName FROM CLIENT C WHERE C.ClientID NOT IN (SELECT P.ClientID FROM PET P WHERE P.PetType = 'Dog');
- Correlated Subquery:
SELECT C.ClientID, C.FirstName, C.LastName FROM CLIENT C WHERE C.ClientID IN (SELECT P.ClientID FROM PET P GROUP BY P.ClientID HAVING COUNT(*) > 2);
- Subquery with Join:
SELECT C.ClientID, C.FirstName, C.LastName FROM CLIENT C WHERE C.ClientID IN ( SELECT P.ClientID FROM PET P INNER JOIN ( SELECT PetType, AVG(PetNbr) AS AvgPetNbr FROM PET GROUP BY PetType ) AS AvgPetNbr ON P.PetType = AvgPetNbr.PetType AND P.PetNbr > AvgPetNbr.AvgPetNbr );
- Subquery with Aggregate Function:
SELECT C.ClientID, C.FirstName, C.LastName FROM CLIENT C WHERE C.ClientID IN ( SELECT P.ClientID FROM PET P GROUP BY P.ClientID HAVING SUM(P.PetNbr) > (SELECT AVG(SUM(PetNbr)) FROM PET GROUP BY ClientID) );
Triggers in SQL
Triggers are special stored procedures that automatically execute when specific events occur in the database.
Examples of Triggers
TRIGGER_CLIENT_ID_SEQ
: Generates a uniqueClientID
sequence number for each new client inserted:CREATE TRIGGER TRIGGER_CLIENT_ID_SEQ BEFORE INSERT ON CLIENT FOR EACH ROW BEGIN SELECT CLIENT_ID_SEQ.NEXTVAL INTO :NEW.ClientID FROM DUAL; END;
TRIGGER_CLIENT_NAME_CHECK
: Checks if theName
column is not null and has a minimum length of 2 characters:CREATE TRIGGER TRIGGER_CLIENT_NAME_CHECK BEFORE INSERT OR UPDATE ON CLIENT FOR EACH ROW BEGIN IF :NEW.Name IS NULL OR LENGTH(:NEW.Name) < 2 THEN -- Raise an error or handle the situation accordingly END IF; END;
TRIGGER_PET_ID_SEQ
: Generates a uniquePetNbr
sequence number for each new pet inserted, starting from 1 for each client:CREATE TRIGGER TRIGGER_PET_ID_SEQ BEFORE INSERT ON PET FOR EACH ROW BEGIN SELECT PET_ID_SEQ.NEXTVAL INTO :NEW.PetNbr FROM DUAL; :NEW.PetNbr := :NEW.PetNbr + (SELECT MAX(PetNbr) FROM PET WHERE ClientID = :NEW.ClientID); END;
Database Analysis & Design
The Software Crisis and the Rise of Structured Development
The”software crisi” emerged from the challenges of managing complex software projects, leading to high costs and frequent failures. Key contributing factors included:
- Lack of clear requirements specifications.
- Absence of standardized development methodologies.
- Poor decomposition of design into manageable modules.
These issues paved the way for structured development approaches, such as the Information System Lifecycle (ISLC), which emphasizes a systematic and phased approach to software development. Databases are integral components of information systems, and their design and implementation benefit from a structured lifecycle.
Database System Development Lifecycle (DSDLC)
The DSDLC outlines the stages involved in designing, developing, and maintaining a database system. The key phases include:
- Database Planning: Align database development with the organization’s overall IT strategy and mission statement. Define scope, objectives, and standards for data collection, formatting, and documentation.
- System Definition: Clearly define the scope and boundaries of the database system. Identify user views, which represent the specific needs and requirements of different user groups. For instance, a pet manager’s view might differ from a veterinarian’s view.
- Requirements Collection and Analysis: Gather detailed information about the data used and generated by the organization. This includes understanding how data is currently managed, identifying any data-related issues, and defining the requirements for the new database system. Fact-finding techniques, such as interviews, questionnaires, document analysis, and observation, are crucial in this phase.
- Database Design: Create a blueprint for the database that supports the mission statement, objectives, and user requirements. This involves choosing a suitable database model (e.g., relational, object-oriented) and designing the database schema, which defines the tables, attributes, relationships, and constraints. Database design can be approached using top-down, bottom-up, inside-out, or mixed strategies.
- DBMS Selection (Optional): Choose a Database Management System (DBMS) that meets the technical and functional requirements of the system. This step is typically performed before logical design if sufficient information about system requirements is available. Factors to consider include scalability, compatibility, vendor support, security features, and cost.
- Application Design: Design the user interface and application programs that will interact with the database. This includes developing forms, reports, and data entry screens.
- Prototyping (Optional): Build a working model of the database system to allow users and designers to visualize and provide feedback on the proposed system. Prototyping can be either requirements-driven (discarded after feedback) or evolutionary (integrated into the final system).
- Implementation: Create the physical database and load it with data. This involves using Data Definition Language (DDL) to create tables, indexes, and other database objects. Data conversion and migration from existing systems may also be required.
- Testing: Thoroughly test the database system to identify and correct errors. Testing should cover various aspects, including functionality, performance, security, and data integrity.
- Operational Maintenance: Once the system is deployed, it requires ongoing monitoring, maintenance, and support. This includes tasks such as performance tuning, backup and recovery, and implementing changes or enhancements based on user feedback or evolving business needs.
Fact-Finding Techniques
Effective fact-finding is essential for gathering accurate and comprehensive information during the requirements analysis phase. Common fact-finding techniques include:
- Interviews: Conduct structured or unstructured interviews with stakeholders to gather their perspectives and requirements.
- Questionnaires: Distribute questionnaires to a wider audience to collect data and opinions.
- Document Analysis: Review existing documentation, such as reports, forms, and data dictionaries, to understand current processes and data structures.
- Observation: Observe users interacting with the current system to identify pain points and areas for improvement.
- Prototyping: Develop prototypes to gather feedback on specific features or functionalities.
Data Modeling
Data modeling is a crucial aspect of database design, representing the structure and relationships of data. The Entity-Relationship Model (ERM) is a widely used conceptual data model, often depicted using Entity-Relationship Diagrams (ERDs).
- Entities: Represent real-world objects or concepts, such as clients, pets, or products.
- Attributes: Represent the properties or characteristics of entities, such as client name, pet type, or product price.
- Relationships: Represent associations between entities, such as a client owning a pet or an order containing products.
Enhanced Entity-Relationship Model (EERM)
is introduced as a way of representing additional semantic concepts. These include: specialization/generalization, aggregation, and composition.
Specify and use fact-finding techniques for the DSDLC.Fact-Finding Techniques: These techniques help gather necessary information to design the database system. Interviews: Interview the clinic’s staff to understand what information they need about clients and pets.Questionnaires:Distribute questionnaires to all employees to gather input on what features they need in the new system.Document Analysis:Review existing paper records of clients and pets to understand what data is currently being tracked.Observation:Observe how the staff interacts with the current system or manual processes to identify inefficiencies.Prototyping: Create a prototype of the database to get feedback from users on whether it meets their needs. Method for desciding central or view intrgrstion approach or combo = examine overlap in data between user view.
Missions Statement for DB: The purposeof the DB is to support the vert clinic and maintain and manage data between__ and the __ and facilitate communication between ____. Mission objectives: to main data on xyz, to perofrm searches on xyz. User requirements = detail opf data being held and how its used. System requirements:featues features.
DATA MODELS: ER model: Entities = Client and Pet. Attributes Client=ClientId, Name, Deductive. Pet: PetNbr,PetName,PetType,ClientID(FK). Relationships: 1:M Clients can have one to many pets. M:1 A pet belongs to one client many to one. CUSTOMER: Customers, Products, Orders, Order_product. Order_Product ord_id (fk), and produc_id(fk)
EER DIAGRAMS: Enhanced entity relationship diagrams used to overcome limitations of ER diagrams. More semantic concepts for complex systems that need specialization/generalization/inheritance/aggregation and composition. Superclass: entity type of one or more distinct subclass eg STAFF. Subclass are distinct subgorups of entity types that can be found in data model. Eg types of staff which are a subclass of the superclass allows subclasses to posses all attributes of superclas avoid diescribing similar concepts more than once and save time and improve readability. Entity and sublclass = a type hierarchy type hierarchies are known by diff names like specialization hierarvchy or generalization hierarchy. A shared subclas has more than one superclass which is multiple inerhitance. Specialization process maximizes the differencebetween members of an enetity by diff features top down approcah to defining set of superclasses and related subclasses. Generalization process= minimize differences in entities with common features. reverse specialization. Types of constriants for Specialization/Generalization: 1. Participation constraint: whether every member in superclass must participate as a member of subclass (mandadatory or optional {Optional, OR}. eg typesof staff dont have to havea job role. Disjoiint constraints: whether superclass has more than one subclass, disjoint subclasses can use OR next to participation constrain. If a subclass generation not disjoint = non disjoint
Aggregation: represents a HAS A or IS PART OF relationship between entity ahnd type one is whole and other is part. Composition: a type of aggregation that is an association between entities and strong ownership between whole and part