Causal Inference Methods: RCTs vs. Quasi-Experimental Designs
RCTs and the Foundations of Causal Inference
A Randomized Controlled Trial (RCT) earns its privileged status because the researcher dictates treatment assignment through explicit randomization, thereby severing any systematic link between the treatment and unobserved confounders. That design choice ensures “internal validity at list price”: the difference in average outcomes across treatment arms is, by construction, an unbiased estimate of the causal effect, conditional only on compliance and sample integrity.
The quasi-experimental alternatives—Standard Regression (OLS), Differences-in-Differences (DiD), Instrumental Variables (IV), and Regression-Discontinuity (RD)—can replicate that causal clarity only by leaning on identifying assumptions that are credible in some environments and fragile in others. Each non-RCT method carries its own Achilles’ heel:
- Ordinary Least Squares (OLS): Presumes that every omitted determinant of the outcome is either observable or orthogonal to the regressor of interest. Weakness: Omitted-variable bias.
- Differences-in-Differences (DiD): Requires that untreated (“control”) units trace out the counterfactual time path that treated units would have followed absent the intervention—the parallel-trends assumption. Weakness: Divergent pre-trends.
- Instrumental Variables (IV): Trades on an exclusion restriction: the instrument must affect the outcome exclusively through its impact on the endogenous regressor, not through any other channel. Weakness: Violation of the exclusion restriction.
- Regression-Discontinuity (RD): Hinges on the notion that units just above and just below a predetermined cutoff are, in expectation, identical save for treatment status. Weakness: Manipulation or sorting around the cutoff.
In short, whereas an RCT embeds the identification strategy in the data-generating process itself, the other four designs import identification through assumptions that must be articulated, defended, and empirically interrogated.
Differences-in-Differences (DiD) Methodology
Differences-in-Differences (DiD) tackles causal inference by viewing time as an experiment’s second dimension. It compares the change in outcomes for a treated group before and after an intervention to the contemporaneous change for a comparison group left untouched by the intervention. If the two groups were on parallel trajectories prior to treatment, any post-intervention divergence can be interpreted as the treatment effect.
The logic is dramatized by the example of the 6th and 8th Federal Reserve Districts during the early 1930s banking crisis (as featured in Professor Angrist’s lecture). Both districts displayed nearly identical slopes in the number of operating banks during the tranquil, pre-collapse years of 1928–1929, bolstering the parallel-trends claim.
In late 1930, the Caldwell & Company debacle hit the 6th District, triggering a steep, immediate decline in its bank count, while the 8th District experienced only a modest, gradual contraction. By anchoring the two series at their pre-crisis gap and tracing their subsequent evolution, the graph illustrates the vertical “treatment-effect” bracket: the 6th District’s post-collapse shortfall relative to its own pre-trend, net of whatever erosion the 8th District also endured. That vertical distance—the change-in-changes—is the DiD estimator.
The entire inference rests on a disciplined thought experiment: had the collapse never occurred, the treated group’s outcome path would have shadowed the control group’s path. Any skepticism should therefore be aimed squarely at the plausibility of that counterfactual alignment.
Instrumental Variables (IV) Estimation
Instrumental Variables (IV) estimation becomes indispensable when the regressor of interest is endogenous—meaning it is correlated with unobserved determinants of the outcome—so that an Ordinary Least Squares (OLS) coefficient would entangle causal and spurious associations.
The IV remedy unfolds in two regressions:
- First Stage: The suspect regressor (for example, an indicator for having at least three children) is projected on an external instrument (such as a twin birth or a same-sex pair among the first two children) plus any covariates. The fitted values represent the component of treatment status purged of endogenous noise.
- Second Stage: The outcome (e.g., maternal labor supply) is regressed on those fitted values, attributing any remaining variation strictly to the instrument-induced exogenous shock.
Angrist and Evans operationalized this machinery in their 1998 American Economic Review study. The first-stage coefficients starkly contrast the two instruments’ power: a multiple second birth elevates the probability of a third child by roughly sixty percentage points, whereas a same-sex sibship nudges it upward by only six. Yet when the reduced-form impacts on employment, hours, weeks, and earnings are divided by these respective first-stage effects—the Wald ratios—the resulting causal effects per extra child are strikingly similar across instruments.
That concordance signals that the Local Average Treatment Effect (LATE) is robust to whether compliance is driven by chance twinning or by parents’ desire for gender variety. It also reassures us that the exclusion restriction is not obviously violated: both shocks appear to depress maternal labor supply solely through the expansion of family size, not via independent channels of stress, health, or income. The exercise therefore showcases IV’s capacity to rescue causality in observational settings marred by self-selection, provided we can muster an instrument with bite and a defensible exclusion story.
OLS and the Challenge of Omitted-Variable Bias (OVB)
A final table, drawn from Angrist’s NLSY walkthrough, tracks how the estimated wage payoff to schooling contracts as progressively richer control variables enter the regression, illustrating the danger of Omitted-Variable Bias (OVB).
The progression of the estimated return to education (per additional year) is as follows:
- Bare-bones specification: Each additional year of education is associated with a hefty 13.2-percent rise in hourly wages.
- Introducing age dummies: Scarcely dents that figure, hinting that cohort composition is not the culprit.
- Augmenting with demographic covariates (parental education, region): Shaves the coefficient modestly to 11.4%, suggesting some upward bias from correlated background factors.
- Adding AFQT score (proxy for cognitive ability): The return tumbles to 8.7%, exposing the extent to which innate ability had been masquerading as schooling’s productivity boost.
- Adding occupation fixed effects: Drags the estimate down to 6.6%, indicating that part of the schooling premium merely reflects access to higher-paying occupations rather than higher productivity within a given occupation.
Collectively, these results narrate a cautionary tale: the less we control for talent and occupational sorting, the more we overstate the causal return to education. They also motivate the pivot to quasi-experimental estimators (IV or RD) that aim to purge precisely those embedded selection forces.