Mastering Statistical Methods and Data Analysis

Sampling Methods

  • Simple Random Sampling: Every subject has an equal probability of being selected. This provides a good representation but may be subject to non-response bias.
  • Systematic Sampling: This involves applying a selection interval k from a random starting point. While every subject has an equal probability of being selected, it is simple but may not provide a good representation if there is a pattern in the way subjects are lined up.
  • Stratified Sampling: The sampling frame is divided into strata. The size of each stratum does not need to be the same. Simple random sampling is applied to each stratum, ensuring an equal probability of being selected. This offers good representation but requires detailed information about the sampling frame and strata.
  • Cluster Sampling: The sampling frame is divided into clusters, and a fixed number of clusters are selected using simple random sampling. While every subject has an equal probability of being selected and it is less tedious, it may not provide a good representation if the clusters are dissimilar.
  • Convenience Sampling: Subjects are chosen from those who are easily available to participate. This leads to selection bias and non-response bias, failing to provide a good representation.
  • Volunteer Sampling: Subjects volunteer themselves into a sample. These individuals might have strong opinions on the research questions, leading to selection bias, non-response bias, and poor representation.

Variables and Descriptive Statistics

  • Independent Variable (IV): The variable subjected to manipulation in the study.
  • Dependent Variable (DV): The variable hypothesized to change depending on how the IV is manipulated.
  • Categorical Variables: Qualitative data categories.
  • Numerical Variables: Quantitative data values.
  • Mean: The average value of a data set.
  • Variance: A measure of how far a set of numbers is spread out from their average value.
  • Standard Deviation: The square root of the variance, representing the dispersion of a dataset relative to its mean.

aKAAAABklEQVQDAKXYKlBLTXsNAAAAAElFTkSuQmCC

Measures of Central Tendency and Spread

  • Median: The middle value of the variables after arranging them in ascending order. If there are two middle values, the average of the two is taken.
  • Quartiles and IQR:
    • Q1: The 25th percentile, representing the middle of the lower half.
    • Q3: The 75th percentile, representing the middle of the upper half.
    • IQR: The Interquartile Range (Q3 – Q1), which is always non-negative.
  • Mode: The value that appears most often, interpreted as the peak of the distribution.

Experimental vs Observational Studies

  • Experimental/Controlled Study: Intentionally applies a “treatment” of interest to manipulate the IV to observe if it affects the DV. This can provide evidence for a cause-and-effect relationship.
    • The treatment group is exposed to the treatment or IV being tested.
    • The control group does not receive the treatment or receives a placebo so they do not know which group they are in (single blinding).
    • Double blinding occurs when the assessors do not know whether they are assessing the treatment or control group.
  • Observational Study: Observes individuals and measures variables of interest without any manipulation of the IV. This can only provide evidence of association but not cause-and-effect relationships, as there could be confounders.

Association and Confounding Factors

  • Topic 2.1: Association
  • Topic 2.2: Rules on rates
  • Topic 2.3: Simpson’s Paradox
  • Topic 2.4: Confounders

Data Visualization and Regression

  • Topic 3.1: Histogram (univariate)
  • Topic 3.2: Boxplot (univariate)
  • Topic 3.3: Scatter plot (bivariate)
  • Topic 3.4: Correlation Coefficient
  • Topic 3.5: Linear Regression

Probability and Statistical Inference

  • Topic 4.1: Probability
  • Topic 4.2: Conjunction Fallacy, Base Rate Fallacy, and Random Variables
  • Topic 4.3: Statistical Inference and Confidence Intervals
  • Topic 4.4: Hypothesis Testing