Essential Causal Inference and Econometrics Techniques

Randomized Experiments and Causal Inference

Why are randomized experiments so desirable?
Randomization breaks the link between treatment assignment and confounders, making treated and untreated groups exchangeable. This guarantees unbiased estimates of causal effects (on average) because any differences in outcomes can be attributed to the treatment rather than selection.

Why might we not be able to run a randomized experiment?
They may be unethical (e.g., denying beneficial treatments), infeasible (due to cost or logistics), illegal, or lack external validity.

What is observational data? How does it differ from experimental data?
Observational data arise when the researcher does not control treatment assignment. Unlike experiments, treatment is correlated with confounders, reintroducing bias.

Why can’t we estimate treatment effects from observational data the same way as experiments?
Because treatment is not randomly assigned, naïve comparisons mix causal effects with confounding bias.


Regression Analysis and Coefficient Interpretation

What is regression and how does it work?
Regression models the relationship between an outcome and one or more predictors by estimating coefficients that minimize prediction error (usually the sum of squared residuals).

How do we interpret coefficients in multiple regression?
Each coefficient represents the average change in the outcome associated with a one-unit change in that predictor, holding all other variables constant.

What does linear regression mean?
It is linear in parameters, not necessarily linear in variables. We can include nonlinear transformations (e.g., squared terms) and still run linear regression.


Methods for Controlling Confounding Bias

Approaches to control for confounding directly:

  • Restriction (conditioning on confounders)
  • Regression adjustment
  • Matching and propensity score methods

When does controlling change the treatment coefficient?
When the added variable is correlated with both the treatment and the outcome (i.e., it is a confounder).

Why might these approaches still be biased?
They only adjust for observed confounders. Unmeasured confounding remains.

Big weakness:
We don’t know what we don’t know—hidden confounders cannot be fixed with regression.

Adjusted regression vs. naïve ATE:
Adjusted regression estimates a conditional causal effect (holding covariates constant), not a raw difference in means.

What does matching seek to do?
It constructs treated and untreated groups that look similar on observed covariates to approximate randomization.

Strengths and Weaknesses:

  • Flexible and widely applicable.
  • Relies on strong assumptions and remains vulnerable to unobserved confounding.

Difference-in-Differences (DiD) Methodology

Why is before–after for one unit biased?
Time trends and other shocks may affect outcomes even without treatment.

Required data:
Panel or repeated cross-sectional data with treated and untreated groups over time.

DiD estimator:
(D_after − D_before)_treated − (D_after − D_before)_control

Key assumption:
Parallel trends: In the absence of treatment, treated and control groups would have followed the same trend.

Strengths and Weaknesses:

  • Controls for time-invariant unobservables.
  • Fails if the parallel trends assumption is violated.

Regression Discontinuity Design (RDD) Principles

Required data and treatment allocation:
A continuous running variable with treatment assigned based on a specific cutoff.

Can the running variable correlate with the outcome?
Yes, provided it does so smoothly. Only discontinuities at the cutoff matter.

Continuity assumption:
Potential outcomes are continuous at the cutoff in the absence of treatment.

As-if randomization:
Units near the cutoff are comparable, approximating random assignment.

How is the effect estimated?
By comparing the limits of the outcome just above and below the cutoff.

What does “L” stand for?
Local — the estimate applies only to units near the cutoff.

Big caveat:
Limited external validity; the effect is not global.

Strengths and Weaknesses:

  • Highly credible causal inference near the cutoff.
  • Narrow applicability and sensitive to manipulation.

Instrumental Variables (IV) Framework

How does IV mimic randomization?
It isolates variation in treatment driven by an external, random-like instrument.

Admissibility conditions:

  1. Relevance
  2. Exclusion restriction
  3. Exchangeability
  4. Monotonicity

Exclusion restriction:
The instrument affects the outcome only through the treatment. Violations cause bias.

Strengths and Weaknesses:

  • Handles unobserved confounding.
  • Instruments are hard to find; estimates are local.

IV for Noncompliance in Experiments

Why noncompliance is a problem:
Actual treatment differs from assignment, breaking randomization.

Why ITT is a valid instrument:
Intent-to-Treat (ITT) is randomly assigned, correlated with treatment, and cannot directly affect outcomes except through treatment.

Types of subjects:

  • Never-takers
  • Always-takers
  • Compliers
  • Defiers

Why assume no defiers?
This assumption is needed to identify the complier group.

Proportion of compliers:
The difference in treatment rates between Z=1 and Z=0.

CATE (Conditional Average Treatment Effect):
ITT effect ÷ proportion of compliers.

Why CATE is local:
It applies only to compliers, not to the entire population.


Bayes’ Theorem and Conditional Probability

Conditional probability:
The probability of an event occurring given that another event has already occurred.

Bayes’ Theorem:
P(A|B) = [P(B|A) · P(A)] / P(B)

Importance of base rates:
Ignoring priors leads to the base rate fallacy—overestimating the probability of rare events.

p-values vs. truth:
p-values represent P(data | null), not P(null | data). They do not tell us the probability that a hypothesis is true.


Causal Inference Summary Cheat Sheet

  • Randomization: Leads to exchangeability and unbiased causal effects.
  • Observational data: Leads to confounding.
  • Regression: Controls for observed confounders only.
  • DiD: Difference of differences (requires parallel trends).
  • RDD: Discontinuity at the cutoff (provides LATE).
  • IV: Requires relevance, exclusion, exchangeability, and monotonicity.
  • CATE: Calculated as ITT divided by the compliance rate.
  • Bayes: Always consider priors and base rates.