Introduction to Databases and SQL
Data and Databases
Data
Data is facts and figures which when combined and interpreted can form information. Data is meaningless until it is interpreted and put into context.
Databases
Databases organize and store data. They provide ways to extract data and turn it into information. They are required by any small, medium, or large business.
Different Uses
- OLTP – Online Transaction Processes
- OLAP – Online Analytical Processes
Different Types
Different types of databases affect access and data modelling.
- RDBMS – Relational Database Management System
- ODBMS – Object Database Management System
Data is accessed using Structured Query Language (SQL).
Three Sub-sets of SQL
- DCL – Data Control Language (User privileges)
- DDL – Data Definition Language (Database and table structure)
- DML – Data Manipulation Language (Accessing, inserting, modifying, and deleting data)
DBA Activity
Database Administrators are responsible for database security, including granting and revoking user access rights (privileges), redundancy/mirroring (disaster recovery), and ensuring sufficient disk space. They also monitor and improve database performance.
Database Vendors
Popular database vendors include Oracle, Microsoft SQL Server, MySQL (Open Source), Sybase, and PostgreSQL.
Relationships
Relationships are used to link tables together.
Primary Key Column
- Uniquely identifies a row within a table
- Cannot be null
Foreign Key Column
- Links to another column in the same or another table
- Usually refers to a Primary Key
Data Modelling
Data modelling is the process of defining and organizing data so that data becomes information.
- Identify and define entities. Entities represent the “things” for which we are recording data. Each entity needs a clear definition that everyone understands.
- Identify and define attributes. Attributes represent the individual pieces of data that we are recording about the entities. Each attribute needs a clear definition.
- Identify and define relationships. The ERD documents the relationships between entities.
Normalization
Normalization can be applied as a process, leading from one normal form to the next. However, if you model your data, thinking about entities and the relationships between them, your design will likely be in 3rd Normal Form.
1NF
- There are no duplicate rows.
- Each column contains ONE value from the applicable domain; i.e., ATOMIC VALUES.
- No repeated columns or groups of columns; i.e., no one-to-many relationships modelled within a single table.
2NF
- It is in First Normal Form.
- Every non-key attribute is functionally dependent on the entire primary key (and not just part of a composite primary key).
If the primary key consists of only one column, then the table automatically meets 2NF.
3NF
- It is in Second Normal Form.
- There are no transitive dependencies, i.e., every non-key attribute is functionally dependent on the primary key and not functionally dependent on any other column.
Joins
Joins are used to combine relational data across multiple tables. They allow you to get information from different tables in the same query based on an identified common column.
Inner Join
Gives rows only based on the common columns.
Outer Join
Obtains everything from one table and rows based on the matching columns from the second table.
- Left Outer Join
- Right Outer Join: Obtains everything from the second table mentioned.
- Full Outer Join: Obtains everything from both tables and the corresponding links.
Aliases and ‘||’
- For large and complex queries, retyping the table name each time can be bothersome.
- Tables, column headings, and Result Sets can be aliased, i.e., a variable can be assigned that represents that table, column heading, or Result Set.
- Column contents and strings can be concatenated together using ||.
Self Join
A table is joined to itself, e.g., an employee table containing a list of employees and their corresponding managers who are also employees.
Cartesian Join
- Returns a Cartesian product of rows within the tables.
- Combines each row in the first table with each row from the second table.
- Useful for producing test data to achieve maximum combinations of different rows in several tables.
Views
A view is a stored query. Each time you SELECT from a view, the underlying query is re-run. A view can be queried as if it was a table:
- It can be placed in the FROM clause.
- It can be aliased and joined to tables.
Aggregate Functions
Aggregate functions act on sets of data.
- COUNT – counts the number of entries in the set
- SUM – adds all the values of the set
- AVG – evaluates the average value of the set
- MIN – chooses the minimum value in the set
- MAX – chooses the maximum value in the set
Types of column aggregate functions can be used with:
- COUNT can be used on any type of column.
- SUM & AVG can only be used on columns with a NUMBER data type.
- MIN & MAX can be used on NUMBER and DATE columns.
Group By
- On its own, an aggregate function will run against a whole column.
- GROUP BY allows you to run an aggregate function against subsets of data from the column.
Things to remember:
- You must GROUP BY any non-aggregate columns in the SELECT.
- You can GROUP BY columns not in the SELECT.
- It’s pointless to GROUP BY a primary key column.
Having
- Used to filter on aggregated columns where GROUP BY is being used.
- Must be used instead of WHERE for these columns.
- Cannot use column aliases in the HAVING clause.
Subqueries
Subqueries are queries that appear inside a query’s filter section. A query returns a Result Set which can, in turn, be used in the WHERE or HAVING clause of another query.
In-line Views
In-line views are queries appearing inside the FROM clause.
Create As Select
Pros
- Inherits data types.
- No need to pre-create a table.
Cons
- Constraints not transferred.
- No control over table structure.
Ideal for first backups, then INSERT from SELECT.
Truncate
- Space is added to a table dynamically in batches.
- Deleting records does not recover the space.
- TRUNCATE empties the table and recovers the space.
- Resets the high-water mark.
Explain Plan
Specific to Oracle SQL Developer, Explain Plan provides an analysis of how the query would be executed. It can evaluate the cost of the query based on processing, memory, read and/or write. It is not an absolute science but one of many indicators to help optimize queries. It makes more sense the more you use it.
Insert From Select
The previous query is not quite good enough. Why? The table may already have some data within it, resulting in either duplication of data within the same table or the query failing due to table constraints.
Sequences
Sequences help solve the problem of generating an Artificial Primary Key. They provide a numbering system synchronized between all transactions.
Correlation
There are 3 parts to any correlated subquery:
- Display the trade_id with the (What we’re looking for)
- maximum price_total (Aggregate column)
- for each share_id (For each column(s)).
Things to notice:
- Correlated subqueries always have the same basic structure.
- Each of the queries uses a different alias of the same table.
- The WHERE clause in the subquery is used to join the two queries together.
NVL
NVL is used to replace null with a fixed value. In the example below, nulls are replaced with zero for brokers who haven’t made a trade.
Commit and Rollback
- All DML transactions are subject to commit and rollback.
- No DML transaction is saved until one of three things occurs: you exit, the session expires, or you COMMIT.
- This has the following effects: other users will not see your DML changes, and you can undo changes you have not committed using ROLLBACK to your last COMMIT.
Set Functions
Set functions create a result set from the results of two separate SELECT statements. The two SELECT statements must have an identical number of columns with the same data types.
Union
Adds two result sets together and returns the distinct values.
Union All
Same result as UNION, except every value is returned, not just distinct ones.
Intersect
Returns only the common rows from the two result sets.
Minus
Removes the rows of the 2nd result set from the 1st result set.