Introduction to Databases and SQL

Posted on Sep 11, 2024 in Computers

Data and Databases

Data

Data is facts and figures which when combined and interpreted can form information. Data is meaningless until it is interpreted and put into context.

Databases

Databases organize and store data. They provide ways to extract data and turn it into information. They are required by any small, medium, or large business.

Different Uses

OLTP – Online Transaction Processes
OLAP – Online Analytical Processes

Different Types

Different types of databases affect access and data modelling.

RDBMS – Relational Database Management System
ODBMS – Object Database Management System

Data is accessed using Structured Query Language (SQL).

Three Sub-sets of SQL

DCL – Data Control Language (User privileges)
DDL – Data Definition Language (Database and table structure)
DML – Data Manipulation Language (Accessing, inserting, modifying, and deleting data)

DBA Activity

Database Administrators are responsible for database security, including granting and revoking user access rights (privileges), redundancy/mirroring (disaster recovery), and ensuring sufficient disk space. They also monitor and improve database performance.

Database Vendors

Popular database vendors include Oracle, Microsoft SQL Server, MySQL (Open Source), Sybase, and PostgreSQL.

Relationships

Relationships are used to link tables together.

Primary Key Column

Uniquely identifies a row within a table
Cannot be null

Foreign Key Column

Links to another column in the same or another table
Usually refers to a Primary Key

Data Modelling

Data modelling is the process of defining and organizing data so that data becomes information.

Identify and define entities. Entities represent the “things” for which we are recording data. Each entity needs a clear definition that everyone understands.
Identify and define attributes. Attributes represent the individual pieces of data that we are recording about the entities. Each attribute needs a clear definition.
Identify and define relationships. The ERD documents the relationships between entities.

Normalization

Normalization can be applied as a process, leading from one normal form to the next. However, if you model your data, thinking about entities and the relationships between them, your design will likely be in 3rd Normal Form.

1NF

There are no duplicate rows.
Each column contains ONE value from the applicable domain; i.e., ATOMIC VALUES.
No repeated columns or groups of columns; i.e., no one-to-many relationships modelled within a single table.

2NF

It is in First Normal Form.
Every non-key attribute is functionally dependent on the entire primary key (and not just part of a composite primary key).

If the primary key consists of only one column, then the table automatically meets 2NF.

3NF

It is in Second Normal Form.
There are no transitive dependencies, i.e., every non-key attribute is functionally dependent on the primary key and not functionally dependent on any other column.

Joins

Joins are used to combine relational data across multiple tables. They allow you to get information from different tables in the same query based on an identified common column.

Inner Join

Gives rows only based on the common columns.

Outer Join

Obtains everything from one table and rows based on the matching columns from the second table.

Left Outer Join
Right Outer Join: Obtains everything from the second table mentioned.
Full Outer Join: Obtains everything from both tables and the corresponding links.

Aliases and ‘||’

For large and complex queries, retyping the table name each time can be bothersome.
Tables, column headings, and Result Sets can be aliased, i.e., a variable can be assigned that represents that table, column heading, or Result Set.
Column contents and strings can be concatenated together using ||.

Self Join

A table is joined to itself, e.g., an employee table containing a list of employees and their corresponding managers who are also employees.

Cartesian Join

Returns a Cartesian product of rows within the tables.
Combines each row in the first table with each row from the second table.
Useful for producing test data to achieve maximum combinations of different rows in several tables.

Views

A view is a stored query. Each time you SELECT from a view, the underlying query is re-run. A view can be queried as if it was a table:

It can be placed in the FROM clause.
It can be aliased and joined to tables.

Aggregate Functions

Aggregate functions act on sets of data.

COUNT – counts the number of entries in the set
SUM – adds all the values of the set
AVG – evaluates the average value of the set
MIN – chooses the minimum value in the set
MAX – chooses the maximum value in the set

Types of column aggregate functions can be used with:

COUNT can be used on any type of column.
SUM & AVG can only be used on columns with a NUMBER data type.
MIN & MAX can be used on NUMBER and DATE columns.

Group By

On its own, an aggregate function will run against a whole column.
GROUP BY allows you to run an aggregate function against subsets of data from the column.

Things to remember:

You must GROUP BY any non-aggregate columns in the SELECT.
You can GROUP BY columns not in the SELECT.
It’s pointless to GROUP BY a primary key column.

Having

Used to filter on aggregated columns where GROUP BY is being used.
Must be used instead of WHERE for these columns.
Cannot use column aliases in the HAVING clause.

Subqueries

Subqueries are queries that appear inside a query’s filter section. A query returns a Result Set which can, in turn, be used in the WHERE or HAVING clause of another query.

In-line Views

In-line views are queries appearing inside the FROM clause.

Create As Select

Pros

Inherits data types.
No need to pre-create a table.

Cons

Constraints not transferred.
No control over table structure.

Ideal for first backups, then INSERT from SELECT.

Truncate

Space is added to a table dynamically in batches.
Deleting records does not recover the space.
TRUNCATE empties the table and recovers the space.
Resets the high-water mark.

Explain Plan

Specific to Oracle SQL Developer, Explain Plan provides an analysis of how the query would be executed. It can evaluate the cost of the query based on processing, memory, read and/or write. It is not an absolute science but one of many indicators to help optimize queries. It makes more sense the more you use it.

Insert From Select

The previous query is not quite good enough. Why? The table may already have some data within it, resulting in either duplication of data within the same table or the query failing due to table constraints.

Sequences

Sequences help solve the problem of generating an Artificial Primary Key. They provide a numbering system synchronized between all transactions.

Correlation

There are 3 parts to any correlated subquery:

Display the trade_id with the (What we’re looking for)
maximum price_total (Aggregate column)
for each share_id (For each column(s)).

Things to notice:

Correlated subqueries always have the same basic structure.
Each of the queries uses a different alias of the same table.
The WHERE clause in the subquery is used to join the two queries together.

NVL

NVL is used to replace null with a fixed value. In the example below, nulls are replaced with zero for brokers who haven’t made a trade.

Commit and Rollback

All DML transactions are subject to commit and rollback.
No DML transaction is saved until one of three things occurs: you exit, the session expires, or you COMMIT.
This has the following effects: other users will not see your DML changes, and you can undo changes you have not committed using ROLLBACK to your last COMMIT.

Set Functions

Set functions create a result set from the results of two separate SELECT statements. The two SELECT statements must have an identical number of columns with the same data types.

Union

Adds two result sets together and returns the distinct values.

Union All

Same result as UNION, except every value is returned, not just distinct ones.

Intersect

Returns only the common rows from the two result sets.

Minus

Removes the rows of the 2nd result set from the 1st result set.