Understanding Data Units, Python, R, Excel, and SQL
Understanding Data Units
Common Data Units
Bit (b): The smallest unit of data storage, representing a binary digit (0 or 1).
Byte (B): A group of 8 bits, typically representing a single character.
Kilobyte (KB): 1,024 bytes.
Megabyte (MB): 1,024 KB.
Gigabyte (GB): 1,024 MB.
Terabyte (TB): 1,024 GB.
Petabyte (PB): 1,024 TB.
Exabyte (EB): 1,024 PB.
Zettabyte (ZB): 1,024 EB.
Yottabyte (YB): 1,024 ZB.
Python Basics
Key Concepts
- Python is case-sensitive.
- Indentation is crucial for code blocks.
- The
print()
function outputs to the standard output. - Variables have dynamic typing.
- Python 3 is not fully backward-compatible with Python 2.
R Programming
Data Analysis with R
- R can read CSV data using
read.table()
orread.csv()
. head()
displays the first 6 rows of data by default.- Access vector elements using square brackets (e.g.,
vector[1]
). - Matrices can only contain one data type.
- Lists can hold objects of different types.
R Syntax and Data Types
- R is case-sensitive.
- The boolean value for true is
TRUE
. - The AND operator is
&
. - Function names come before the function keyword (e.g.,
function_name <- function(...) {...}
). - The modulus operator is
%%
.
Data Visualization
- Bar charts are suitable for visualizing the frequency of qualitative data.
- Scatterplots are useful for visualizing quantitative data.
Excel Fundamentals
Formulas and Functions
- Median:
(Middle value 1 + middle value 2) / 2
- Maximum:
MAX(range)
Conditional Formatting
- Conditional formatting rules determine cell appearance based on specified criteria.
Sparklines
- Sparklines are small charts that fit within a cell.
Pivot Tables
- Pivot tables summarize and analyze data based on categories.
Analysis Tools
- Goal Seek helps find input values that produce a desired output.
SQL and Relational Databases
Database Structure
Consider a database with tables:
vendor(vendorid, vendorname, state, balance)
sales(storenum, upc, qty, price, salesdate, vendorid)
store(storenum, storename, opendate, salesestimate)
SQL Queries
Creating and Modifying Tables
- Create a table:
CREATE TABLE store (storenum int, storename varchar(50), opendate date, salesestimate decimal(10,2))
- Insert data:
INSERT INTO store VALUES (101, 'Bargain Shop', '2017-01-04', 500000)
- Update data:
UPDATE vendor SET balance = balance * 1.2 WHERE state = 'BC'
- Delete data:
DELETE FROM store WHERE salesestimate < 1000000
Retrieving Data
- Select specific data:
SELECT * FROM sales WHERE storenum = 50 AND qty < 50
- Aggregate data:
SELECT upc, SUM(qty * price) AS totalSales FROM sales WHERE storenum = 12 GROUP BY upc ORDER BY totalSales DESC LIMIT 2
- Join tables:
SELECT upc, sales.storenum, storename, AVG(qty) AS avgQty, COUNT(*) AS numSalesRecords FROM sales INNER JOIN store ON sales.storenum = store.storenum WHERE salesdate > '2017-07-01' GROUP BY upc, sales.storenum, storename ORDER BY upc ASC