Essential Statistics & Probability Concepts: A Quick Reference
Posted on Sep 23, 2025 in Computational Mathematics
Core Statistical Concepts
Basic Statistical Concepts
- Population: The entire group of interest.
- Sample: A subset of the population.
- Parameter: A numerical characteristic describing a population (e.g., μ for population mean).
- Statistic: A numerical characteristic describing a sample (e.g., x̄ for sample mean).
- Descriptive Statistics: Methods for organizing and summarizing data.
- Inferential Statistics: Methods for drawing conclusions about a population from a sample.
Data Types and Measurement Levels
- Nominal: Categories only.
- Examples: Gender, colors.
- Ordinal: Ordered categories.
- Examples: Ratings (good/fair/poor).
- Interval: Ordered, equal intervals, no true zero.
- Examples: Temperature (°C).
- Ratio: Ordered, equal intervals, true zero.
- Examples: Height, Weight.
Measures of Central Tendency
- Mean: The average (μ or x̄).
- Median: The middle value (resistant to outliers).
- Mode: The most frequent value.
- Skewness:
- Right-skewed: Mean > Median.
- Left-skewed: Mean < Median.
- Symmetric: Mean = Median.
Measures of Variation
- Range: Maximum value – Minimum value.
- Variance: σ² (population), s² (sample).
- Standard Deviation: σ or s.
- Empirical Rule (Normal Distribution):
- 68% of data within μ ± σ.
- 95% of data within μ ± 2σ.
- 99.7% of data within μ ± 3σ.
- Chebyshev’s Theorem: At least (1 – 1/k²) of data within μ ± kσ for any k > 1.
Fundamental Probability Rules
- Range: 0 ≤ P(A) ≤ 1.
- Complement: P(A’) = 1 – P(A).
- Addition:
- General: P(A ∪ B) = P(A) + P(B) – P(A ∩ B).
- Mutually Exclusive: P(A ∪ B) = P(A) + P(B).
- Multiplication:
- General: P(A ∩ B) = P(A) × P(B | A).
- Independent Events: P(A ∩ B) = P(A) × P(B).
- Conditional Probability: P(B | A) = P(A ∩ B) / P(A).
Counting Methods
- Fundamental Counting Principle: n₁ × n₂ × … × nₖ.
- Permutation (Order Matters):
- No Repetition: n! / (n-r)!.
- With Identical Items: n! / (n₁! n₂! … nₖ!).
- Combination (Order Doesn’t Matter): n! / [(n-r)! r!].
Introduction to Probability Distributions
- Discrete: Countable outcomes.
- Continuous: Measurements (uncountable outcomes).
- Binomial: n trials, p success probability.
- Normal: Bell curve, characterized by μ and σ.
Essential Statistical Formulas
- Z-score: z = (x – μ) / σ.
- Coefficient of Variation (CV): CV = (σ / μ) × 100%.
- Interquartile Range (IQR): Q₃ – Q₁.
- Outliers: Values < Q₁ – 1.5 × IQR or > Q₃ + 1.5 × IQR.
Data Visualization Techniques
- Quantitative Data:
- Histogram, Boxplot, Scatterplot.
- Qualitative Data:
- Time Series Data: Line Chart.
- Relationships Between Variables: Scatterplot.
Experimental Design and Sampling
- Sampling Methods:
- Simple Random Sampling.
- Stratified Sampling.
- Cluster Sampling.
- Systematic Sampling.
- Study Types:
- Observational Study.
- Experimental Study.
- Simulation.
- Survey.
Discrete Probability Distributions
Bernoulli Distribution
- Description: A single trial with two outcomes (success/failure).
- Probability Mass Function (PMF): P(X=1) = p, P(X=0) = 1-p.
- Mean: μ = p.
- Variance: σ² = p(1-p).
- Example: A single coin flip (p=0.5).
Binomial Distribution
- Description: Counts the number of successes in n independent Bernoulli trials.
- PMF: P(X=k) = C(n, k) pᵏ (1-p)ⁿ⁻ᵏ.
- Mean: μ = np.
- Variance: σ² = np(1-p).
- Conditions:
- Fixed number of trials (n).
- Independent trials.
- Constant probability of success (p).
- Two outcomes per trial (success/failure).
- Example: Number of heads in 10 coin flips.
Geometric Distribution
- Description: Number of trials until the first success.
- PMF: P(X=k) = (1-p)ᵏ⁻¹ p.
- Mean: μ = 1/p.
- Variance: σ² = (1-p)/p².
- Example: Number of coin flips until the first head.
Negative Binomial Distribution
- Description: Number of trials until the r-th success.
- PMF: P(X=n) = C(n-1, r-1) pʳ (1-p)ⁿ⁻ʳ.
- Mean: μ = r/p.
- Variance: σ² = r(1-p)/p².
- Example: Number of coin flips until the 3rd head.
Poisson Distribution
- Description: Counts events occurring in a fixed interval of time or space.
- PMF: P(X=k) = (e⁻λ λᵏ) / k!.
- Mean: μ = λ.
- Variance: σ² = λ.
- Conditions:
- Events are independent.
- Constant average rate (λ) over the interval.
- Events do not occur simultaneously.
- Example: Number of customers arriving per hour.
Continuous Probability Distributions
Uniform Distribution
- Description: Equal probability over a given interval [a, b].
- Probability Density Function (PDF): f(x) = 1 / (b-a) for a ≤ x ≤ b.
- Mean: μ = (a+b) / 2.
- Variance: σ² = (b-a)² / 12.
- Example: Waiting time between 0-5 minutes.
Normal Distribution
- Description: A bell-shaped, symmetric curve.
- PDF: f(x) = (1 / √(2πσ²)) e⁻⁽ˣ⁻ᵠ⁾²ᐟ⁽²ᵠ²⁾.
- Mean: μ.
- Variance: σ².
- Properties:
- Symmetric about the mean.
- Follows the 68-95-99.7 rule.
- Inflection points at μ ± σ.
- Standard Normal Distribution: μ=0, σ=1.
Sampling and the Central Limit Theorem
Sampling Distributions
- Sample Mean (x̄): μₓ̄ = μ, σₓ̄ = σ / √n.
- Sample Proportion (p̂): μₚ̂ = p, σₚ̂ = √(p(1-p) / n).
- Unbiased Estimators: x̄ (for mean), p̂ (for proportion), s² (for variance).
Central Limit Theorem (CLT)
- For any population with mean μ and variance σ²:
- The sampling distribution of x̄ approximates a Normal distribution when n ≥ 30.
- The mean of the sampling distribution of x̄ is μₓ̄ = μ.
- The standard deviation of the sampling distribution of x̄ is σₓ̄ = σ / √n.
- For normal populations, the sampling distribution of x̄ is normal for any sample size n.
Advanced Probability Concepts
Random Variables
- Discrete: Countable outcomes (e.g., dice rolls).
- Continuous: Uncountable outcomes (e.g., height).
Expectation and Variance
- Expected Value E(X): μ = ΣxP(x) (discrete) or ∫xf(x)dx (continuous).
- Variance Var(X): σ² = E(X²) – [E(X)]².
- Properties:
- E(aX+b) = aE(X) + b.
- Var(aX+b) = a²Var(X).
- For independent X, Y: Var(X ± Y) = Var(X) + Var(Y).
Cumulative Distribution Function (CDF)
- Definition: F(x) = P(X ≤ x).
- For Continuous Variables: F(x) = ∫₋∞ˣ f(t) dt.
Key Statistical Formulas Summary
Important Formulas for Statistics
- Z-score: z = (x – μ) / σ.
- Binomial to Poisson Approximation: When n ≥ 100 and np ≤ 10, use λ = np.
- Transforming Normal to Standard Normal: z = (x – μ) / σ.