CIS 2500 Exam 1 Excel Cheat Sheet: Data and Statistics
CIS 2500 – Exam 1 Cheat Sheet (Excel Focused)
Chapter 1 – Data Basics
- Population: All items in the study.
- Sample: Subset of the population.
- Parameter: Numerical value describing a population.
- Statistic: Numerical value describing a sample.
- Cross-sectional: Many entities at one time.
- Time series: One entity across the same point in time.
- Nominal: Labels only (numeric or non-numeric).
- Ordinal: Ranked categories (numeric or non-numeric).
- Interval: Numeric, no true zero.
- Ratio: Numeric, true zero.
- Qualitative
Essential Programming and Data Science Q&A
Python Fundamentals
Q: What is the difference between if, elif, and else?
ifchecks an initial condition.elifchecks another condition if the previous one is false.elseruns if none of the preceding conditions are true.
Q: When do we use a for loop instead of while?
- We use
forwhen we know the number of iterations. whileis used when the condition controls the loop execution.
Q: What is an infinite loop?
A loop that never stops because its condition is always true.
Q: What is a function?
A reusable block
Read MoreStatistical Inference: Sampling and Confidence Intervals
Chapter 9: Sampling Distributions
Quantile-Quantile Plot (QQ-Plot)
Empirical Rule: This property states that approximately 68%, 95%, and 99.7% of data falls within 1, 2, and 3 standard deviations of the mean, respectively.
Standard Normal Distribution
The Standard Normal distribution has a mean of 0 and a standard deviation of 1.
Example: If you want to know the percentage of babies that weigh less than 95 ounces at birth, you must first convert the value 95 to a standardized score (STAT).
Based
Read MoreStatistical Analysis and Predictive Modeling in Excel
Descriptive Statistics and Central Tendency
Descriptive statistics are the numbers that summarize a dataset, giving you a quick “snapshot” of its typical values and how much they vary. These are divided into Measures of Central Tendency (the middle) and Measures of Dispersion (the spread).
1. Measures of Central Tendency
These identify the “center” of your data where most values congregate.
- Mean (Average): The sum of all values divided by the total count. It is the most common measure but is highly
Statistics Concepts: Variables, Distributions, and Inference
Lesson 1: Variables
- Explanatory Variable – aka Independent Variable; explains variations in the response variable (x-axis). This is the predictor.
- Example: “Can quiz scores be used to predict exam scores?” (Explanatory = Quiz scores)
- Response Variable – aka Dependent Variable; its value is predicted or its variation is explained by the explanatory variable (y-axis). This is the outcome.
Lesson 2: Variable Types and Data Visualization
- Categorical vs. Quantitative Variables
- Categorical Variables = names,
Essential Causal Inference and Econometrics Techniques
Randomized Experiments and Causal Inference
Why are randomized experiments so desirable?
Randomization breaks the link between treatment assignment and confounders, making treated and untreated groups exchangeable. This guarantees unbiased estimates of causal effects (on average) because any differences in outcomes can be attributed to the treatment rather than selection.
Why might we not be able to run a randomized experiment?
They may be unethical (e.g., denying beneficial treatments), infeasible
