Sampling, Correlation, and Multivariate Methods for Research

Sampling: Population, Sample, Census

Population, sample, census: The population of interest is the entire group researchers want to generalize to. A sample is the smaller group that is actually observed or measured. A census collects data from every single member of the population. Population = who you care about. Sample = who you study. Census = everyone in the population.

Representative vs. Biased Samples

Representative vs. biased samples: A representative sample (unbiased) gives every member of the population an equal chance of being selected. A biased sample (unrepresentative) gives some people a higher chance of being selected than others.

When a Sample Is Biased

A sample is biased when the selection process favors some individuals. Example: choosing chips from the bottom of a bag gives a biased sample because broken chips are overrepresented.

Biased vs. Representative Techniques

Biased techniques include convenience sampling and self-selection (people who are easiest to reach or who volunteer themselves). Representative techniques involve probability sampling (random selection where everyone has an equal chance). Biased = not generalizable. Representative = generalizable.

Random Sampling and External Validity

Random sampling increases external validity because it allows results to generalize from the sample to the population.

Different meanings of “random”: Random sampling is choosing people by chance from the population (used for generalization). Random assignment is placing people into conditions by chance (used for cause and effect). Random in everyday language just means unexpected or odd, not scientifically random.

Key Sampling Terms

  • Convenience sampling: selecting people who are easiest to access.
  • Biased sampling: the method overrepresents some individuals and limits external validity.
  • Representative (unbiased) sampling: the sample accurately reflects the population and supports generalization.

Why a Larger Sample Is Not Always Better

A bigger sample cannot fix a biased sampling method. If the sample is biased, a large sample still leads to wrong conclusions. Representativeness is more important than size. Larger samples also increase cost and effort, and after a point provide very small improvements.

Describing Correlations: Direction, Strength, Significance

When explaining a correlation or association, you should describe three things: the direction of the relationship (positive, negative, or zero), the strength of the effect (how close the dots are to forming a line, or the absolute value of r), and the statistical significance (whether the result is unlikely to be due to chance, usually determined by p < .05 or whether the confidence interval includes zero).

Using Scatterplots to Estimate Associations

To estimate results from a correlational study with two quantitative variables using a scatterplot, look for:

  • Direction (upward slope = positive, downward slope = negative),
  • Strength (dots close together = strong, dots spread out = weak), and
  • whether the pattern looks clear enough that it might be statistically significant.

Correlation Coefficient r

The correlation coefficient r is a number that describes the direction and strength of a linear relationship between two quantitative variables. A positive r means a positive relationship, a negative r means a negative relationship, and an r near zero means no linear relationship. Strength is based on the size of r: about .10 is small, .20 is moderate, .30 is fairly strong, and .40 or higher is considered strong in psychology.

Analyzing Categorical and Continuous Variables

If one variable is categorical and one variable is continuous, you use a mean-differences test, usually a t-test. You would also use a bar graph to compare the means of the groups. To analyze a correlational study with at least one categorical variable using a bar graph, look at the means of each group, compare how far apart the bars are, and compute the difference between the two means. A larger difference suggests a stronger association.

Interrogating the Validity of Association Claims

To interrogate the construct validity of an association claim, ask whether each variable was measured well. This includes asking if the measure was reliable (test-retest reliability, internal reliability, or interrater reliability) and valid (face validity, content validity, convergent validity, discriminant validity, or criterion validity).

Statistical Validity

To interrogate the statistical validity of an association claim, consider elements that might distort the correlation. These include effect size (how strong r is), statistical significance (whether the result is likely due to chance), outliers (extreme scores that can inflate or deflate r), restriction of range (when one variable does not show its full range, which can make r appear smaller), and whether the relationship might be curvilinear (a curved pattern that r cannot detect).

External Validity

To interrogate the external validity of an association claim, ask to whom the results can generalize. Consider who was sampled, how they were sampled, and whether the sampling method allows generalization to the population of interest. Random sampling increases external validity.

Association vs. Causation

An association claim only requires covariance, meaning the two variables are related. A causal claim requires all three rules for causation: covariance, temporal precedence (the cause must come before the effect), and internal validity (no third variable can explain the relationship). Because correlational studies measure variables at the same time and do not control for third variables, they can support association claims but not causal claims.


Multivariate Designs: Longitudinal, Regression, Mediation

A multivariate design measures more than two variables at the same time and includes longitudinal designs, multiple regression, and mediation models. These designs are important because simple correlations cannot establish causation; multivariate approaches help address internal validity and move researchers closer to causal reasoning by examining temporal precedence, ruling out third variables, and testing mechanisms.

Three Main Multivariate Tools

  • Longitudinal designs: show temporal precedence.
  • Multiple regression: statistically controls for third variables.
  • Mediation: identifies possible mechanisms linking variables.

Longitudinal Designs and Cross-Lag Correlations

A longitudinal design measures the same variables in the same people over multiple time points and provides cross-sectional correlations, autocorrelations, and cross-lag correlations. Longitudinal studies help determine which variable comes first, reveal how variables change over time, and allow stronger causal inferences without manipulating variables.

Interpreting Cross-Lag Patterns

A cross-lag correlation tests whether an earlier measure of one variable predicts a later measure of another. Researchers compare A at Time 1 predicting B at Time 2 versus B at Time 1 predicting A at Time 2. Interpretation patterns include: A predicts B only (A likely comes first), B predicts A only, both directions predict each other (reciprocal), or neither predicts the other (no temporal precedence).

Multiple Regression and Beta (β) Values

Popular press phrases such as controlled for, after adjusting for, or even after accounting for indicate multiple regression. Multiple regression tests whether the relationship between two variables still exists after controlling for additional predictors, allowing researchers to rule out some third variables and identify the unique contribution of each predictor. Although regression strengthens internal validity, it cannot prove causation because only measured variables can be controlled.

A beta value (β) shows the direction and strength of a predictor’s relationship with an outcome after holding other predictors constant. Positive β means higher predictor values relate to higher outcome values; negative β means the opposite; betas are significant when their confidence interval does not include zero. Beta differs from correlation (r) because r represents a simple bivariate relationship, while β shows the unique effect of a predictor in the context of other variables.

Mediation and the Pattern-and-Parsimony Approach

The pattern-and-parsimony approach uses many correlational studies, ideally with different methods, that all point to the same causal conclusion; when multiple results converge, the simplest explanation is likely correct. This approach is strongest when alternative explanations become unlikely across studies, such as decades of evidence linking smoking and lung cancer.

How Multivariate Designs Relate to Internal Validity

Multivariate designs relate to internal validity by addressing two of the three causal criteria: covariance (all multivariate studies establish this), temporal precedence (supported by longitudinal cross-lag findings), and internal validity (addressed by regression’s ability to hold third variables constant). Longitudinal designs handle the “which came first” problem, and regression filters out confounds, creating stronger—though still not experimental—evidence for causation.