Understanding Data Units, Python, R, Excel, and SQL

Understanding Data Units

Common Data Units

Bit (b): The smallest unit of data storage, representing a binary digit (0 or 1).

Byte (B): A group of 8 bits, typically representing a single character.

Kilobyte (KB): 1,024 bytes.

Megabyte (MB): 1,024 KB.

Gigabyte (GB): 1,024 MB.

Terabyte (TB): 1,024 GB.

Petabyte (PB): 1,024 TB.

Exabyte (EB): 1,024 PB.

Zettabyte (ZB): 1,024 EB.

Yottabyte (YB): 1,024 ZB.

Python Basics

Key Concepts

  • Python is case-sensitive.
  • Indentation is crucial for code blocks.
  • The print() function outputs to the standard output.
  • Variables have dynamic typing.
  • Python 3 is not fully backward-compatible with Python 2.

R Programming

Data Analysis with R

  • R can read CSV data using read.table() or read.csv().
  • head() displays the first 6 rows of data by default.
  • Access vector elements using square brackets (e.g., vector[1]).
  • Matrices can only contain one data type.
  • Lists can hold objects of different types.

R Syntax and Data Types

  • R is case-sensitive.
  • The boolean value for true is TRUE.
  • The AND operator is &.
  • Function names come before the function keyword (e.g., function_name <- function(...) {...}).
  • The modulus operator is %%.

Data Visualization

  • Bar charts are suitable for visualizing the frequency of qualitative data.
  • Scatterplots are useful for visualizing quantitative data.

Excel Fundamentals

Formulas and Functions

  • Median: (Middle value 1 + middle value 2) / 2
  • Maximum: MAX(range)

Conditional Formatting

  • Conditional formatting rules determine cell appearance based on specified criteria.

Sparklines

  • Sparklines are small charts that fit within a cell.

Pivot Tables

  • Pivot tables summarize and analyze data based on categories.

Analysis Tools

  • Goal Seek helps find input values that produce a desired output.

SQL and Relational Databases

Database Structure

Consider a database with tables:

  • vendor(vendorid, vendorname, state, balance)
  • sales(storenum, upc, qty, price, salesdate, vendorid)
  • store(storenum, storename, opendate, salesestimate)

SQL Queries

Creating and Modifying Tables

  • Create a table: CREATE TABLE store (storenum int, storename varchar(50), opendate date, salesestimate decimal(10,2))
  • Insert data: INSERT INTO store VALUES (101, 'Bargain Shop', '2017-01-04', 500000)
  • Update data: UPDATE vendor SET balance = balance * 1.2 WHERE state = 'BC'
  • Delete data: DELETE FROM store WHERE salesestimate < 1000000

Retrieving Data

  • Select specific data: SELECT * FROM sales WHERE storenum = 50 AND qty < 50
  • Aggregate data: SELECT upc, SUM(qty * price) AS totalSales FROM sales WHERE storenum = 12 GROUP BY upc ORDER BY totalSales DESC LIMIT 2
  • Join tables: SELECT upc, sales.storenum, storename, AVG(qty) AS avgQty, COUNT(*) AS numSalesRecords FROM sales INNER JOIN store ON sales.storenum = store.storenum WHERE salesdate > '2017-07-01' GROUP BY upc, sales.storenum, storename ORDER BY upc ASC