Essential Statistical Concepts and Tests

Posted on Jul 13, 2025 in Statistics

Simple Linear Regression

Purpose: Predict a numerical outcome (dependent variable Y) from a numerical predictor (independent variable X).
Equation: Y = a + bX
- a (intercept): Predicted Y when X = 0
- b (slope): For each 1-unit increase in X, Y increases/decreases by b units.
Example: Income = 20000 + 3000 × YearsOfEducation → Each extra year of education predicts $3,000 more income.
R² (Coefficient of Determination): Tells us how much of the variation in Y is explained by X. Ranges from 0 to 1.
Interpretation: If R² = 0.64 → 64% of the variance in income is explained by years of education.
Important Concepts: Outliers, residuals, linear relationship assumption.

ANOVA (Analysis of Variance)

Purpose: Compare means across 3 or more groups to see if at least one group mean is different.
Key Idea: Looks at how much total variability (SST) is due to differences between groups (SSB) vs. within groups (SSW).
Formulas:
- SST = SSB + SSW
- MSB = SSB / dfB, MSW = SSW / dfW
- F = MSB / MSW
Interpretation: If F > critical value → Reject H₀ → At least one group mean is different.
Example: Comparing average calorie burn in 5 types of exercise.

Correlation

Purpose: Measure the strength and direction of a linear relationship between two numeric variables.
Pearson’s r: Ranges from -1 to 1
- r = 0 → No linear relationship
- r = +1 → Perfect positive relationship
- r = -1 → Perfect negative relationship
R² = r²: Interpreted as percent of variance in Y explained by X.
Fisher Z-transformation: Used to compare two different correlation coefficients.

Chi-Square Tests

Use when: Your variables are categorical (e.g., yes/no, categories, ranks).
Two Types:
- Test of Independence: Are two variables related (e.g., Netflix hours and fitness level)?
- Goodness of Fit: Do observed frequencies match a theoretical distribution?
Formula:
- O = Observed frequency
- E = Expected frequency (based on row/column totals)
Assumption: Expected counts should generally be ≥ 5 in each cell.
Interpretation: If chi-square statistic > critical value → Reject H₀.

Measures of Center and Spread

Measures of Central Tendency

Mean: Average
Median: Middle value when ordered
Mode: Most frequent value
Use median when outliers/skewed data are present.

Measures of Dispersion

Variance: Average squared deviation from mean → Shows data spread
Standard Deviation: √Variance → Easier to interpret (in same units as original data)

Interpretation: Larger SD = more spread = less consistent data.

Scatterplots and Frequency Tables

Scatterplots

X-axis: Independent variable
Y-axis: Dependent variable
Each point: One individual
Used to visualize correlation/regression patterns.
Look for: Linear trend, outliers, direction (positive/negative).

Frequency Tables

Shows count of occurrences per category (e.g., Netflix hours by fitness level).
Used for Chi-square tests (calculate expected values).

Measures of Effect Size

Phi: For 2×2 tables (calculate it!)
Gamma, Tau-b, Lambda, Cramer’s V: Know these are used for ordinal/nominal variables → you don’t calculate these.
They tell us strength of association, not causation.

Levels of Measurement

Level	Description	Example
Nominal	Names/labels (no order)	Gender, Race
Ordinal	Categories with logical order	Satisfaction rating
Interval	Numerical, equal spacing, no true zero	Temperature (°C)
Ratio	Interval + true zero	Age, Weight, Income

Choosing the Right Statistical Test

If You Want To…	Use This Test
Compare one mean to a known value	Z-test (if SD known)
Compare one mean to a value (SD unknown)	t-test
Compare two group means	Two-sample t-test
Compare same group before and after	Paired t-test
Compare 3+ means across groups	ANOVA
Test association between 2 categories	Chi-square Test of Independence
Compare frequencies with theoretical values	Chi-square Goodness of Fit
Predict numeric outcome from numeric variable	Simple Linear Regression
Predict binary outcome (e.g., yes/no)	Logistic Regression
Compare 2 correlations	Fisher Z transformation

Key Assumptions:

T-tests: Normal distribution, equal variances (unless Welch used)
ANOVA: Normality + equal variance
Chi-square: Expected count ≥ 5 per cell
Regression: Linearity, independence, normal residuals

Hypothesis Testing, Errors, and Confidence Intervals

H₀ (Null Hypothesis): No effect, no difference
H₁ (Alternative Hypothesis): There is a difference
Alpha (α): Probability of making a Type I Error (typically 0.05)

Errors:

Type I Error: Rejecting H₀ when it’s true (false positive)
Type II Error: Failing to reject H₀ when it’s false (false negative)

Confidence Intervals (CI):

Definition: A range of values believed to contain the population parameter
95% CI: If we repeated this study 100 times, ~95 CIs would contain the true value
Narrow CI = more precision

Essential Statistical Concepts and Tests

Simple Linear Regression

ANOVA (Analysis of Variance)

Correlation

Chi-Square Tests

Measures of Center and Spread

Measures of Central Tendency

Measures of Dispersion

Scatterplots and Frequency Tables

Scatterplots

Frequency Tables

Measures of Effect Size

Levels of Measurement

Choosing the Right Statistical Test

Key Assumptions:

Hypothesis Testing, Errors, and Confidence Intervals

Errors:

Confidence Intervals (CI):

Recent Notes

Subjects

Publicidad