Statistical Fundamentals and Key Concepts Reference

Posted on Nov 23, 2025 in Statistics

Hypothesis Testing and P-Values

P-Value Definition

The p-value is the probability of observing a statistic as extreme (or more extreme) as the sample statistic, assuming the null hypothesis (H₀) is true.

Interpretation

Large p-value: Evidence in favor of H₀ (Null Hypothesis).
Small p-value: Evidence in favor of Hₐ (Alternative Hypothesis).

Types of Errors

Type I Error (α): Rejecting H₀ when H₀ is true.
Type II Error (β): Failing to reject H₀ when H₀ is false.

Study Design Fundamentals

Sample: A subset of the population that is observed or measured.
Population: The entire group of interest.

Variables

Explanatory Variable (x): The factor believed to influence the outcome.
Response Variable (y): The outcome that is measured.

Study Types

Observational Study: Researchers observe subjects without assigning treatments.
Experimental Study: Treatments are assigned randomly to subjects to determine causality.

Causality: Causal relationships can only be implied in well-designed experiments.

Descriptive Statistics

Distribution Shape and Measures of Center

Symmetric: Mean ≈ Median
Right Skewed: Mean > Median
Left Skewed: Mean < Median

Robustness to Outliers

Mean: Sensitive to outliers.
Median: Robust to outliers.

Z-Score

A Z-score measures the number of standard deviations an observation is from the mean. It quantifies how extreme an observation is compared to the mean.

Bootstrap and Randomization Methods

Bootstrap

Resamples the original sample with replacement.
Used to estimate variability and construct Confidence Intervals (CI).
The resulting distribution is centered at the sample mean.

Randomization (Permutation Test)

Simulates the null distribution.
The resulting distribution is centered at the H₀ parameter (null value).

Note on Sample Size: Each bootstrap sample must have the same size as the original sample.

Key Probability Rules

Joint Probability: P(A ∩ B) = P(A) * P(B | A)
Total Probability: P(B) = P(A ∩ B) + P(Ā ∩ B)
Conditional Probability: P(A | B) = P(A ∩ B) / P(B)
Independence: If A and B are independent, P(A ∩ B) = P(A) * P(B)

Chi-Square Test

Use: Testing for association between two categorical variables.

Formulas

Expected Counts (E): E = (Row Total * Column Total) / n
Degrees of Freedom (df): df = (r - 1)(c - 1) (where r is rows, c is columns)

Hypotheses

Null Hypothesis (H₀): No association exists between the variables.
Alternative Hypothesis (Hₐ): An association exists between the variables.

Regression and Correlation Analysis

Simple Linear Regression Model

Equation: ŷ = b₀ + b₁x

Slope (b₁): The predicted change in the response variable (y) for every one-unit increase in the explanatory variable (x).
Intercept (b₀): The predicted value of y when x equals 0.

Correlation Coefficient (r)

Range: -1 ≤ r ≤ 1
Direction: Indicated by the sign (Positive or Negative).
Strength:
- |r| close to 1 indicates a strong linear relationship.
- |r| close to 0 indicates a weak linear relationship.

Prediction: To predict y, plug the value of x into the regression equation.

Discrete Random Variables

Valid Probability Function Conditions

The probability of any outcome must be between 0 and 1: 0 ≤ P(X=x) ≤ 1
The sum of all probabilities must equal 1: ∑P(X=x) = 1

Example (Mutually Exclusive Events): P(X=3 or 4) = P(X=3) + P(X=4)

Binomial Distribution Conditions (BINS)

Binary: Only two outcomes (success/failure).
Independence: Each trial is independent.
Number: Fixed number (n) of trials.
Success: Constant probability (p) of success per trial.

Confidence Intervals (CI)

Interpretation

“I am 95% confident that the true population parameter lies within this calculated interval.”

Formulas

CI for Proportion: p̂ ± z* * √[p̂(1 - p̂) / n]
CI for Mean (t-distribution): x̄ ± t* * (s / √n)

Essential Statistical Tips

Conditions to Check Before Running Tests

Proportion Tests: Check that np ≥ 10 and n(1-p) ≥ 10 (Success/Failure Condition).
T-Tests: Check for approximate normality of the sample distribution or large sample size.
Chi-Square Test: Ensure all expected counts are ≥ 5.

Hypothesis Test Decision Rule

If p-value ≤ α (Significance Level), Reject H₀.
If p-value > α, Fail to Reject H₀.

Applied Statistics Examples

Correlation Interpretation (r = 0.66)
Question: The correlation between the variables is approximately r = 0.66. Explain what this correlation tells us about the strength and direction of the association between the variables.
Answer: A correlation of r = 0.66 indicates that the relationship between social media engagement score and average onsite spend is moderately strong and positive.
Identifying Variables
Question: Which variable is the explanatory variable, and which is the response variable for this linear regression model?
- Explanatory Variable (x-axis): Social media engagement score.
- Response Variable (y-axis): Average onsite spend.
Y-Intercept Interpretation
Question: Interpret the y-intercept of the linear regression model in context.
Answer: When the social media engagement score is 0, the average onsite spend is predicted to be $33.40.
Using the Regression Equation for Prediction
Question: Show how you would use the regression equation to calculate the predicted onsite spend for a festival attendee with a social media engagement score of 50. (You do NOT need to work this out.)
Calculation: Predicted Average Onsite Spend = 33.40 + 1.67 * (50)
R-Squared Interpretation (R² = 0.548)
Question: The R² (R-squared) for this linear regression model is 0.548. Interpret this value in context.
Answer: 54.8% of the variability in the average onsite spend is explained by the social media engagement score (the model).

Statistical Fundamentals and Key Concepts Reference

Hypothesis Testing and P-Values

P-Value Definition

Interpretation

Types of Errors

Study Design Fundamentals

Variables

Study Types

Descriptive Statistics

Distribution Shape and Measures of Center

Robustness to Outliers

Z-Score

Bootstrap and Randomization Methods

Bootstrap

Randomization (Permutation Test)

Key Probability Rules

Chi-Square Test

Formulas

Hypotheses

Regression and Correlation Analysis

Simple Linear Regression Model

Correlation Coefficient (r)

Discrete Random Variables

Valid Probability Function Conditions

Binomial Distribution Conditions (BINS)

Confidence Intervals (CI)

Interpretation

Formulas

Essential Statistical Tips

Conditions to Check Before Running Tests

Hypothesis Test Decision Rule

Applied Statistics Examples

Correlation Interpretation (r = 0.66)

Identifying Variables

Y-Intercept Interpretation

Using the Regression Equation for Prediction

R-Squared Interpretation (R² = 0.548)

Recent Notes

Subjects

Publicidad