Essential Statistics & Probability Concepts: A Quick Reference

Core Statistical Concepts

  • Basic Statistical Concepts

    • Population: The entire group of interest.
    • Sample: A subset of the population.
    • Parameter: A numerical characteristic describing a population (e.g., μ for population mean).
    • Statistic: A numerical characteristic describing a sample (e.g., for sample mean).
    • Descriptive Statistics: Methods for organizing and summarizing data.
    • Inferential Statistics: Methods for drawing conclusions about a population from a sample.
  • Data Types and Measurement Levels

    • Nominal: Categories only.
      • Examples: Gender, colors.
    • Ordinal: Ordered categories.
      • Examples: Ratings (good/fair/poor).
    • Interval: Ordered, equal intervals, no true zero.
      • Examples: Temperature (°C).
    • Ratio: Ordered, equal intervals, true zero.
      • Examples: Height, Weight.
  • Measures of Central Tendency

    • Mean: The average (μ or ).
    • Median: The middle value (resistant to outliers).
    • Mode: The most frequent value.
    • Skewness:
      • Right-skewed: Mean > Median.
      • Left-skewed: Mean < Median.
      • Symmetric: Mean = Median.
  • Measures of Variation

    • Range: Maximum value – Minimum value.
    • Variance: σ² (population), (sample).
    • Standard Deviation: σ or s.
    • Empirical Rule (Normal Distribution):
      • 68% of data within μ ± σ.
      • 95% of data within μ ± 2σ.
      • 99.7% of data within μ ± 3σ.
    • Chebyshev’s Theorem: At least (1 – 1/) of data within μ ± kσ for any k > 1.
  • Fundamental Probability Rules

    • Range: 0 ≤ P(A) ≤ 1.
    • Complement: P(A’) = 1 – P(A).
    • Addition:
      • General: P(A ∪ B) = P(A) + P(B) – P(A ∩ B).
      • Mutually Exclusive: P(A ∪ B) = P(A) + P(B).
    • Multiplication:
      • General: P(A ∩ B) = P(A) × P(B | A).
      • Independent Events: P(A ∩ B) = P(A) × P(B).
    • Conditional Probability: P(B | A) = P(A ∩ B) / P(A).
  • Counting Methods

    • Fundamental Counting Principle: n₁ × n₂ × … × nₖ.
    • Permutation (Order Matters):
      • No Repetition: n! / (n-r)!.
      • With Identical Items: n! / (n₁! n₂! … nₖ!).
    • Combination (Order Doesn’t Matter): n! / [(n-r)! r!].
  • Introduction to Probability Distributions

    • Discrete: Countable outcomes.
    • Continuous: Measurements (uncountable outcomes).
    • Binomial: n trials, p success probability.
    • Normal: Bell curve, characterized by μ and σ.
  • Essential Statistical Formulas

    • Z-score: z = (x – μ) / σ.
    • Coefficient of Variation (CV): CV = (σ / μ) × 100%.
    • Interquartile Range (IQR): Q₃ – Q₁.
    • Outliers: Values < Q₁ – 1.5 × IQR or > Q₃ + 1.5 × IQR.
  • Data Visualization Techniques

    • Quantitative Data:
      • Histogram, Boxplot, Scatterplot.
    • Qualitative Data:
      • Bar Chart, Pie Chart.
    • Time Series Data: Line Chart.
    • Relationships Between Variables: Scatterplot.
  • Experimental Design and Sampling

    • Sampling Methods:
      • Simple Random Sampling.
      • Stratified Sampling.
      • Cluster Sampling.
      • Systematic Sampling.
    • Study Types:
      • Observational Study.
      • Experimental Study.
      • Simulation.
      • Survey.

Discrete Probability Distributions

  • Bernoulli Distribution

    • Description: A single trial with two outcomes (success/failure).
    • Probability Mass Function (PMF): P(X=1) = p, P(X=0) = 1-p.
    • Mean: μ = p.
    • Variance: σ² = p(1-p).
    • Example: A single coin flip (p=0.5).
  • Binomial Distribution

    • Description: Counts the number of successes in n independent Bernoulli trials.
    • PMF: P(X=k) = C(n, k) pᵏ (1-p)ⁿ⁻ᵏ.
    • Mean: μ = np.
    • Variance: σ² = np(1-p).
    • Conditions:
      • Fixed number of trials (n).
      • Independent trials.
      • Constant probability of success (p).
      • Two outcomes per trial (success/failure).
    • Example: Number of heads in 10 coin flips.
  • Geometric Distribution

    • Description: Number of trials until the first success.
    • PMF: P(X=k) = (1-p)ᵏ⁻¹ p.
    • Mean: μ = 1/p.
    • Variance: σ² = (1-p)/p².
    • Example: Number of coin flips until the first head.
  • Negative Binomial Distribution

    • Description: Number of trials until the r-th success.
    • PMF: P(X=n) = C(n-1, r-1) pʳ (1-p)ⁿ⁻ʳ.
    • Mean: μ = r/p.
    • Variance: σ² = r(1-p)/p².
    • Example: Number of coin flips until the 3rd head.
  • Poisson Distribution

    • Description: Counts events occurring in a fixed interval of time or space.
    • PMF: P(X=k) = (e⁻λ λᵏ) / k!.
    • Mean: μ = λ.
    • Variance: σ² = λ.
    • Conditions:
      • Events are independent.
      • Constant average rate (λ) over the interval.
      • Events do not occur simultaneously.
    • Example: Number of customers arriving per hour.

Continuous Probability Distributions

  • Uniform Distribution

    • Description: Equal probability over a given interval [a, b].
    • Probability Density Function (PDF): f(x) = 1 / (b-a) for a ≤ x ≤ b.
    • Mean: μ = (a+b) / 2.
    • Variance: σ² = (b-a)² / 12.
    • Example: Waiting time between 0-5 minutes.
  • Normal Distribution

    • Description: A bell-shaped, symmetric curve.
    • PDF: f(x) = (1 / √(2πσ²)) e⁻⁽ˣ⁻ᵠ⁾²ᐟ⁽²ᵠ²⁾.
    • Mean: μ.
    • Variance: σ².
    • Properties:
      • Symmetric about the mean.
      • Follows the 68-95-99.7 rule.
      • Inflection points at μ ± σ.
      • Standard Normal Distribution: μ=0, σ=1.

Sampling and the Central Limit Theorem

  • Sampling Distributions

    • Sample Mean (): μₓ̄ = μ, σₓ̄ = σ / √n.
    • Sample Proportion (): μₚ̂ = p, σₚ̂ = √(p(1-p) / n).
    • Unbiased Estimators: (for mean), (for proportion), (for variance).
  • Central Limit Theorem (CLT)

    • For any population with mean μ and variance σ²:
      • The sampling distribution of approximates a Normal distribution when n ≥ 30.
      • The mean of the sampling distribution of is μₓ̄ = μ.
      • The standard deviation of the sampling distribution of is σₓ̄ = σ / √n.
    • For normal populations, the sampling distribution of is normal for any sample size n.

Advanced Probability Concepts

  • Random Variables

    • Discrete: Countable outcomes (e.g., dice rolls).
    • Continuous: Uncountable outcomes (e.g., height).
  • Expectation and Variance

    • Expected Value E(X): μ = ΣxP(x) (discrete) or ∫xf(x)dx (continuous).
    • Variance Var(X): σ² = E(X²) – [E(X)]².
    • Properties:
      • E(aX+b) = aE(X) + b.
      • Var(aX+b) = Var(X).
      • For independent X, Y: Var(X ± Y) = Var(X) + Var(Y).
  • Cumulative Distribution Function (CDF)

    • Definition: F(x) = P(X ≤ x).
    • For Continuous Variables: F(x) = ∫₋∞ˣ f(t) dt.

Key Statistical Formulas Summary

  • Important Formulas for Statistics

    • Z-score: z = (x – μ) / σ.
    • Binomial to Poisson Approximation: When n ≥ 100 and np ≤ 10, use λ = np.
    • Transforming Normal to Standard Normal: z = (x – μ) / σ.