The modern subject is the subject of the sciences.

Posted on Jan 11, 2024 in Psychology and Sociology

Pre-Lecture: Casual Interference

Notation
- A: Exposure
- Y: Outcome
- L: Measured known covariates
  - Can condition
- U: Unmeasured covariates
  - Can never condition
Counterfactual
- What would have happened to the exposed had they not been exposed?
- What would have happened to the person who got treatment A, had they got treatment B?
The Causal Effect for an Individual
- Potential outcomes or counterfactual outcomes:
- Ya=1
  - Y under treatment a=1
  - Outcomes variable that would have been observed under treatment a=1
- Ya=0
  - Outcome variable that would have been observed with no treatment
- The treatment A has a causal effect on an individual’s outcome
  Y if:
  - Ya=1≠ Ya=0
- Causation
  - DIFFERENT RISK IN THE ENTIRE POPULATION UNDER TWO EXPOSURE VALUES
  - Pr[Ya=1] : risk in all subjects of the population had they received the counterfactual exposure level a
    - If everybody had been exposed, what’s the probability of the outcome?
  - Causal Risk Ratio: Pr[Ya=1] / Pr[Ya=0]
    - Probability of the outcome had everyone been exposed / Probability of the outcome had everyone not been exposed
    - But we don’t have this!
    - = 1 if no association
- Association
  - DIFFERENT RISK IN TWO DISJOINT SUBSETS OF THE POPULATION DETERMINED BY THE SUBJECTS’ ACTUAL EXPOSURE VALUE
  - Pr[Y=1|A=1] is the risk in subjects of the population that meet the condition “having actually received exposure level a”
    - Population that you have outcome Y given that you received treatment A
  - Associational Risk Ratio: Pr[Y=1|A=1] / Pr[Y=1|A=0] = 1
    - Probability of outcome for those who were treated / Probability of outcome for those who weren’t treated
- Causal: need to know what happened if whole diamond got treatment and whole diamond didn’t get treatment
- Association: each half of diamond gets either treatment or no treatment and are compared to each other

What we want to be true
- Pr[Ya=1] = Pr[Y=1|A=1]
- Pr[Ya=0] = Pr[Y=1|A=0]
- We want the probability of the outcome for people who were treated to be the same as the probability of the outcome had everyone been treated
- We want the probability of the outcome for people who were untreated to be the same as the probability of the outcome had everyone been untreated

Confounding via the counterfactual
- Confounding arises when the outcome in the truly non-exposed differs from what would have occurred in the exposed group in the absence of exposure
  - Pr[Ya=1] ≠ Pr[Y=1|A=1]
Conditional exchangeability
- Critical criterion for causal inference
- Exchangeability says within levels of L (measured covariates):
  - Exposed subjects would have had the same risk as unexposed subjects had they been unexposed
    - Exposed are
  - Unexposed subjects would have had the same risk as exposed subjects had they been unexposed
  - In other words, a group of people exposed are a fine approximation for counterfactual of people who didn’t get exposure
- Conditional exchangeability says
  - There’s something about these groups that are different (usually confounders)
  - Something different about those who are exposed vs not exposed
    - Ex. Those who choose/are able to take treatment and those who don’t
  - When we look within levels of the confounder (ex. Gender) we should have exchangeability
    - Ex. When we look at just women do we have exchangeability? When we look at men do we have exchangeability?
  - Goal when trying to control confounding is to achieve the greatest degree of conditional exchangeability as possible

Directed Acyclic Graphs (DAG)

Aka Casual diagrams
Counterfactual models underlie DAGS

DAG basics

Time as an invisible X-axis (Helpful when in longitudinal structures)
Directed – edges/arrows imply direction
Acyclic – no cycles, variable cannot cause itself
Graph

Simple picture that:
1. Encodes subject-matter knowledge
2. Our assumptions

Under the Null

DXn0MCoIQOo6tIOB0yug_JSnUvuX2NjgNy4F95nr

A is not associated with Y
- They are independent of one another (no arrow between them)
L (confounder) is associated with A and Y
- Precede exposure and outcome in time
- Are associated with exposure and outcome
- Not an intermediate in the pathway between A and Y
Even though A and Y are not associated with each other, there’s a backdoor path between them

Structural / causal properties of confounders:
- precede both the exposure and outcome in time
- associated with exposure and the outcome
- not an intermediate in the causal pathway between A and Y
We say we’ve conditioned on L to block it (accounting for it in some way)
- When conditioned, the backdoor path of association between A and Y is blocked
- Ex. Of conditioning is resitriction

Ft9iEO1QV25pNF5ONu3KGkMi0z4bk3FQ8RtRDxXZ

D-separation

Graphical rules to assess whether 2 variables are independent (vs. D-connected, which implies they are not independent)
Can we get these through a backdoor path?
Path: arrow-based route between the 2 variables in the graph
Rule 1:
- If there are no variables being conditioned on, a path is blocked if and only if two arrowheads on the path collide at some variable on the path

AMpjPNSefJJQbEaNleRFvZXmZ9qUtdBdhU9d_Yws

D should be Y
2 arrow heads come together on D, so D is called a collider

Rule 2:
- Any path that contains a noncollider that has been conditioned on is blocked
- Conditioning on is depicted by a box
  - Think of it as a door is closed
- When you condition on a noncollider, it blocks that path (WANT OPEN, NOT BLOCKED!!
  )
- So we want noncolliders that are conditioned on

g-9yJ93KpE1s5xvsnH9EQ4OezM3H4g7HKc1hBd0H

Rule 3:
- A collider that has been conditioned on does not block a path

Q5GRupJ-RvworPI7S9IF3PpxXNExPCvtbe6yKgvc

Conditioning on D opens the path between L and A
By conditioning on a common outcome, knowing D and having information on either A or B, gives me information on the other
Arrows pointing toward it remains
So we want colliders that aren’t conditioned on (conditioning on leads to selection bias
)

Vhtb9pZ_02a88ouD5Xv9fUwvJuddDcOkniEbg9JN

zcn5keUsAwr4gLnvx0H8YCXrgzi9pjRf733PyQut

Rule 4
- A collider that has a descendant (something that comes after) that has been conditioned on does not block a path (opens the path)
- Conditioning on D opens the path between L and A

qcX2vjDZSRfS3ILE2rcfdnJKdMxvdYrWRzvZhTTe

If exposure precedes disease (A→Y) (as we want it to) then the overall association has 2 components
- 1. Spurious association due to the sharing of a common cause- CONFOUNDING
- 2. Causal effect of A on Y- THE GOAL

Example 1: Interested in A→D – is there confounding?

First, assume that null is true

2Tk-uxl4wMhqI_MvnTNKeGoYqXhhjKz9lcNwCxSy

Can you still get from outcome to exposure? (make sure you don’t include arrow from exposure to outcome when considering collider)

The look for d-separation to determine if A ┴ D
┴ is independent
In this case, L is a collider and it isn’t conditioned so path is closed and an arrow remains between A and D (so no backdoor path): NOT CONFOUNDER
If you condition on a collider it opens up the path between A and D (no arrow, is backdoor path)

Backdoor path = confounding

Example 2: Interested in A→D – is there confounding?

-eJuVngc79kJpAmQhvq4X7IS4i755tDYqFqlKuYY

1. Suppose the null is true (remove arrows from A)

2. Is there a back door path? (YES)

Arrow direction doesn’t matter UNLESS there are colliders

3. Can we block it?

YES by conditioning for L

XorbWQWNGuZdBwFb4_GGwJMzUmdk5k8qERV7nHWg

U is confounder
Should we adjust for L
- No backdoor path that L is involved in
- But U and L are correlated
  - How much adjusting for L helps is dictated by the correlation between U and L

How do people look for confounding?
- Change in estimate
- Comparing crude and adjusted (without cutpoint)
- A priori knowledge and DAG
- Autopilot (what others have done, period)
- Automated selection methods
- Satisfying the definition of a confounder

What if you don’t know which DAG is correct?

Sensitivity Analyses

ha6cHZIIiasaAW53-y5HRlq7gWu7uwskiGZPAa8C

Must distinguish between incidence and prevalence
How do you define an individual?

Steps to get diagnosis

Take a while to go to doctors
Doesn’t get better
Primary care physician runs tests (symptomatic but not yet diagnosed)
Referred to specialist
WHAT HAPPENS WHEN PEOPLE AREN’T IN YOUR SYSTEM

Methodologic Challenges – Incidence vs Prevalence

Is the first time that a patient is seen in a dataset with a specific diagnosis when they are diagnosed?
It depends
- We need to understand where these data come from
If exposure comes a lot earlier than outcome may be issue

Defining outcome

Timing of outcomes with complex multistage diagnoses complex multistage diagnoses

Multiple sclerosis (>1 episode) (have to have outcome a few times before getting diagnosed?)
Systemic lupus erythematosus (avg 2-4 yrs onset to dx)
Myocardial infarction
Prostate cancer

What is the time that outcome occurs
- 1st suspected MS episode?
- Symptom onset?
- Admission to the emergency department?
- Elevated PSA?

If exposure can change over time (ex. Someone is on a medication, then off the medication), how does the outcome get assigned?
Do you define onset as when first diagnosed?

Why people like using kaiser (closed system)
- Primary care, pharmacy, specialists all there

Case reports and case series
- Important in evidence-based practice
- Often first line of evidence, hypothesis generating
- Not stand-alone nor definitive selection bias
- No comparator group
- Describe rare clinical events or unusual manifestations
- Describes series of cases

Ecologic study
- Ex.

90Mo5TDHq1EIhMLe6pBYwa4ISkNfagdoA45KxyOX

- Don’t know if people who had high fat intake had breast cancer, don’t know at individual level
Ecological fallacy: the associations observed at the population or group level may not hold up when looking at the same association among individuals within the group
- Use of aggregate data to draw individual level inferences
Why conduct ecological studies?
- Individual level-study is not possible
- Measurement impossible
- Design not possible, including unethical
- Relatively new hypothesis
- Time or money is limited but data are easily available
- Interested at the ecological level

Ecologic as a level
- Can still be interested in biologic hypothesis/mechanism but have ecologic-level exposure data (e.G. Environmental measurements)
- Can have ecologic exposure and outcome measures
- Might be interested in group and individual level effects

Cross-sectional studies

Snapshot of a community or group
Exposure and outcome are measured at the same time
Usually captures prevalent outcomes/disease
Can be descriptive and/or analytic

Relationship Between Prevalence, Incidence Density/Rate and Duration of Disease:

P= I X D

where, I = incidence rate, P = prevalence, D = duration

Prevalent outcome interpretation

With prevalence, you don’t know if exposure causes incident disease, survival with disease, or exposure
If we see an association between a prevalent outcome and exposure it may be that:

1. Exposure → incident disease

2. Exposure → survival with disease

3. Disease → exposure

Cross sectional studies use Point Prevalence Rate Ratio (PPRR) to estimate the relative risk
- 2 types of potential bias will differentiate these 2 measures
  - The ratio of the disease durations
  - The ratio of the complements of the point prevalence estimates in the exposed and unexposed groups.

Cohort Studies

Extension of RTC into observational study realm
If this question could be answered by a randomized experiment, what would that experiment look like?
Why is this appealing?
How does randomization change a DAG?
“Causal inference from observational data then revolves around the hope that the observational study can be viewed as a conditionally randomized experiment.”

How does the DAG change with randomization?

PC8y45nim61pfWjsCmj2BEfwxNmOEKGj13sCJLx_

Arrow from C to D stays
BUT arrow from C to E disappears, getting rid of backdoor path

Simplest transition from experimental studies
- Except that subjects “choose” exposure rather than it being assigned by investigator

What is a cohort?

A group of individuals (potential subjects)
- defined on the basis of the presence or absence of exposure to a suspected risk factor for a disease
- whose disease or mortality is measured over time

We want to know the effect of an exposure on the occurrence of a particular outcome during some observation period.
We determine what the outcome is in the exposed group.
Do the same for unexposed group

kSC6ZOOmgZt7UTGSqwONF87KKad5butBbnyOfmkq

Q7TUMc-kf25HDNhczOv_E2WNNAx_Nz6KlSSyi5Xt

For external comparison group, use source population (represents what happens to population in absence of exposure)

LIapp-j0AyeZSBPvIy8usj_KisDY0tsZyefQyZq-

Old Test Question:

What is the randomized experiment that we would like to conduct (but cannot)?
How does the observational study emulate that randomized experiment?
(Note: Target trials and emulating a trial to come later this term)

BwO009RQ62E6MGnRGjQgiVHG1_gHRcl6maYp0dne

When use the OR as an estimate of the RR
- There is a “built in” bias, which is away from the null hypothesis
- OR will always be farther from the RR than the OR
- Closes from rare disease
- The OR is always further away from 1.0 then the RR

OR vs. RR: Advantages
- OR can be estimated from a case-control study.
- OR can be estimated from logistic regression.
- OR of an “event” is the reciprocal of the OR of a “non-event.”

Closed Cohort

Once a member, always a member
Fixed population
Membership-defining event
The number of people in the cohort can be counted and is fixed at baseline (T0)

Open cohort: people come in and out

Because people come in and out of the study population, they contribute person-time for the time that they are observed
Person-time accrues from a changing population of individuals
Can account for differing length of follow-up as well as loss to follow-up
Participants can contribute time to multiple exposure categories § Person-time may not be intuitive

Usszg1HoXr0j6lUX4R9uMbJMoQrkeqHGi9ql1cT_

Follow-up

Participants are followed for the outcome of interest, and therefore must be at risk for the outcome.

In practice we can only follow-up until:

Outcome
Death
Emigration
Drop out
Lost for unknown reasons

Prospective Cohort Study: Exposures are measured by the investigator before the outcomes have occurred

Retrospective Cohort Study: Exposures are measured by the investigator after the outcomes have occurred

Cohort is not the same as prospective (can be either prospective or retrospective)

Relative risk is not the same as risk ratio
- Relative risk is generic term for any ratio based measure of association (odds ratio, rate ratio, hazard ratio, risk ratio)
- Risk ratio = risk of outcome in exposed/risk of outcome in unexposed

Induction vs Latent Period
- Induction: how long it takes exposure to induce disease
- Latent Period: from time when event happens and until it’s on our radar
- Induction time is an important part of the study hypothesis

1ZyMs-bz5nTFEfW57ax1fic8amYSF1NVn8A4bNOA

In studies of chronic exposures, it is easy to confuse the time during which exposure occurs with the time at risk of exposure effects

Ex. Atomic bomb has very long risk period due to exposure
WHAT ABOUT WHEN EXPOSURE CHANGES OVER TIME?
- Start treatment A, switch to treatment B.
- If outcome occurs, when is it attributed to A and when to B?

Changing exposure
- There are many exposures where patients vary between exposed and unexposed, or vary across different exposures.
- How do you handle events that occur during transition periods or shortly after treatment switch?
  - Assumptions about washout, induction, and the underlying biology are all important in making these decisions.

Immortal time bias

Time under observation or during follow up during which the outcome could not have occurred (ex. Heart transplant patient is counted as heart transplant patient before actually received transplant, should technically be counted as unexposed time, they had to go through that time to get exposure, essentially immortal during that time)
Also has been referred to as survivor treatment selection bias in some studies

AvVQdLG5RqfH9oEMxg5DTIuyEYGeG-9ei_dxHclT

Case-Control Study

How do we get controls?

We generally understand where cases come from…
The biggest concern in case-control studies is control selection
Imagine underlying hypothetical cohort
Cases are all cases that occurred in the hypothetical cohort during the study
Controls are selected from among those without the disease of interest (non-cases)
Nested case control study: have cohort and do case control study within it (you know your entire cohort, but aren’t doing the full cohort study)
- ex. Nurse’s study, do experiments on subset

2 main ways for sampling for controls

Cumulative incidence sampling

Wait until the end of follow-up (assuming an underlying closed cohort) and sample all cases regardless of when they occurred.
The controls are those that did not get the outcome
People who survive all that time may be different!!!! (healthier, very adherent to certain lifestyle)

JYXWErnxAGDivZZTa_oZTnj6mHc0Uf-JsIfMuzkt

Risk-set sampling (Incidence density sampling)

Choosing control from cohort when a case becomes a case (?)
- Every time a case becomes a case, you pull a control (can be matched or unmatched) from other members of the cohort that don’t yet have the disease
Advantage: samples on-person time
- Matching cases and controls to be eligible up until the same amount of time
Sampling must be independent of exposure
Controls are matched to cases on time at risk (same amount of follow-up time)
Because controls are matched on time, the probability of being selected is proportional to an individual’s person-time in the study base
Someone who is a control at one time can later be a control again as well as a case
BEST WAY (LESS BIAS)

xwQWUkRxSmJZkQOlIjIKDScoA0LVLCww2iuBgzeJ

Are there differences in the type of controls we get from these two different sampling mechanisms?
- YES

Rare disease assumption & sampling

Cumulative incidence sampling requires that the outcome is rare for the OR to approximate the risk ratio
The rare disease assumption is not needed with risk-set (density) sampling of controls for the OR to approximate the rate ratio

Nested vs. Non-nested case-control studies

Nested:hypothetical cohort is real…
Non-nested: think about hypothetical cohort
Goal of controls is to sample the study base to get an unbiased estimate of the exposure distribution in population that gave rise to the cases
Incidence density sampling preferable

eaYx-u5AxLzsh_vNbBnCHGX8dB70AsFF7DHWSVkM

Where does the information on controls come from?
- Source population: the larger population that the cohort was derived from
  - How does it compare to the study population?
    - These are the individuals in my cohort, sampled from the source population
  - And what is this study base that I keep hearing about?
    - The person-time from which the cases arrive out of the cohort

Primary Study Base (Population Controls)

Base population identified first
Cases identified from this population (or person-time experience)
- Nested case-control
- Roster
Enumerated (everyone in source population is enumerated)

Secondary Study Base

Source of cases identified first
Thinking backwards
Investigator determines where the cases came from
- Cases from a hospital with no pre-defined base population
- Disease clusters
More error prone

Where can I find controls?

Population register
Neighborhood
Friends/family
Hospitals
Dead case = dead control
Random digit dialing

Population controls (primary base)

Often relies on a roster or register (same study base)
In the absence of a roster or register, it is possible that not every person eligible has the same chance of being selected (possible selection bias)
- Random digit dialing
  - May cause problems because some people have multiple phone numbers (no longer proportional)
- Neighborhood controls (residences instead of phone numbers)
  - May cause problems due to environmental exposures,

Hospital or disease registry controls

Not all cases within a hospital are the same
- Ex. Multiple sclerosis patients referred to an academic center and those who live nearby
  - Some disease groups may travel from really far because specific/complicated conditions
  - Other disease groups may be from near by
- Therefore, how does one control group reflect the same differences in referral patterns within that one academic center? (some people come from far for complicated stuff, some people come from close)
Secondary study base may not be identifiable
Berkson’s Bias
- If exposure is related to risk of being hospitalized with the ”control disease”, then the distribution of exposure in our control group will be different from the distribution in the study base.??
- (control is more likely to be exposed)

Friends and family as controls

Relying on the case to identify controls (behaviors are more similar in family and friend groups)

Dead controls

Records or family interviews for exposure information
No longer at risk of the outcome
What if case is dead?
What if exposure is related to mortality?

Questions:

LlcjQnUEmG3oq20qYqqdRcI12zo0zVk5hiJo-vtB

False, you can leave them in. They still get an outcome and they are unexposed (there are lots of reasons people can be exposed or unexposed)

For controls, population must be risk for outcome, NOT exposure

What about women who had a hysterectomy before menopause? Should they be excluded?

No, Not at risk for the outcome

Matched case-control studies

Cases are matched to controls with respect to important confounders
Case:Control ratio can vary from 1:1 to 1:x (x>1)
Remember that matching makes cases and controls more similar to each other than they would be with random sampling

Applications of Stratified Analysis Methods

Analysis of matched data involves the same statistical methods as used for unmatched data. Even though many textbooks present special ’matched-data’ techniques, these…Are just special cases of general stratified methods for sparse data!
Stratified Analysis
- Stratify data on the confounding variables to form strata (perhaps these are your matching factors)
- Test of no association uses the Mantel-Haenszel chi-sq test statistic
- Can do 1-1 matching
- Mcnemar’s test: test discordance and concordance (get info just from discordance, for matched data)

What could we do differently?
- Can we learn something by only looking at cases?
- Can we design a study similar to a case-control design but have multiple outcomes/cases defined?
- Can a case-control study look at multiple exposures for the same outcome?
- Any other thoughts?

Could we do this?

HFwm8fhEDx0vxWKHKCDHrIw6z6UUM3IW4M1dWGMv

Yes, with a subcohort

Case-cohort

Pick a subcohort of the full cohort
- Randomly sampled subcohort from the entire cohort/base
- PLUS all cases (who may or may not be in the subcohort)
This is your ”control” population
+ collect information on cases that might occur outside of subcohort within larger cohort
Pros of this
- Efficity: Testing and detailed data not required for the entire cohort Flexible: can test multiple hypotheses and multiple outcomes
- Cases and subcohort arise from the same base (reduce selection bias)
- Collect exposure information independent of outcome (reduce information/recall bias)
- Subcohort can calculate person-time
- Sampling of subcohort in proximity to study start

Case-only designs

Gene-Environment Interaction Studies
- Stratify by genes and environment to see if the environment is an effect modifier/there is interaction ONLY IN CASES
  - All this question can answer is the gene-environment association
  - Effect modifier: 3rd variable that is neither the exposure nor the outcome, by which when we look at different levels of that variable, the exposure-outcome relationship changes
- This design is advantageous because:
  - Estimate the association between exposure and genotype among cases
  - Do not have to worry about control selection and the corresponding biases
  - Cannot assess main effects
  - Assumption that E and G are independent in the underlying source population

Cross-over trials
- Each individual is exposed to both treatments
- But still considered randomization!!!
  - Because order matters

dwYN51BKRLI7dv0DeQvOerLVEF37EqIRRPl2e_RW

A washout period is defined as the time between treatment periods. Instead of immediately stopping and then starting the new treatment, there will be a period of time where the treatment from the first period where the drug is washed out of the patient’s system

What are the strengths of cross-over designs?
- Don’t have to identify a comparator/control population
- Closer to the counterfactual, but without the time mchine
Limitations
- Assumptions about induction and washout period

Case-crossover
- Observational study
- Case serves as their own control
- For each case, earlier time periods are selected as control periods
- Tend to see it for acute events

Case-crossover assumptions
- Triggers
  - Risk factors for outcomes in close temporal proximity
- Acute events
- No confounding by time-invariant factors

EniHf6Su4pNYLY19R0OjQzgx3j30aiHubDSK3OMz

Challenges
- Index and reference interval/times not always obvious
- What about exposures that increase over time?
- For example ambient air pollution
- Solution: bidirectional/ ambispective design(take day before and day after)
- Assumption: case events will not influence subsequent exposure (aka if someone has heart attack, will not exercise the next day)

Homework 1 Answer: Was supposed to be case-control study but was cohort

IF effect modifier would look at in CASES (exposed) AND CONTROLS (unexposed)

Selection Bias

Statistical definition of bias: Bias occurs when the average value of the association measure obtained from an infinite number of studies is not the true value
Epidemiological Definition of Bias: deviation of results or inferences from the truth, or processes leading to such deviation. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth

yDxnSMnR5Ft3Mp9pcjfNzkGi0_OQBEpSoAayoIpq

What is selection bias?
- The study population is not representative of the population one intended to analyze
- Present when individuals have different probabilities of being included in the study sample according to relevant study characteristics
- ISSUE OF INTERNAL VALIDITY
  - Generalizability: can apply to population? (not a bias ias)
  - Difference: Internal validity is saying can i trust these results? External validity is asking to whom do these results apply
- Most biases we have in epi can be reduced to missing data

Is it true that you can’t have selection bias in a prospective study? NOT TRUE
- 1. Differential loss to follow-up
- 2. Volunteer/self-selection
- More in-depth discussion in “structural approach to selection bias”

Sampling fractions/selection probabilities

_jMYUr78afFdFo_YvfmqszsbnZn_Epm13EtldWWw

No selection bias is present if the cross product of the sampling fractions is 1 (i.E. No association between exposure and disease)
The cross product is called the “selection bias factor” or the “selection odds ratio” = αδ/βγ

eSYsWRxlQQTvqRxaKs0ai864H7ttcBkAehss51_l

Selection bias can occur in many designs

Cross-sectional
- Prevalence/incidence bias – survival, long duration
- Non participation
Cohort
- Non-response (complete data)
- Unrepresentative group (E- for example)
- Attrition
Case-control

Case selection

Control selection

Self-selection bias: Study can only occur among those who volunteer

PLQ4fCbitzAcm8yuXn3de9GfLGCsrtgwOU1j3GaX

Because volunteering is conditioned on, put a box around it (on DAGS, put box around when conditioning around some factor (restricting,stratifying, adjusting for, matching on)
U’s are unmeasured covariates; missing data, could be confounders but not included
What you’re doing when you’re looking for a backdoor path, is you’re saying “if I remove the arrow that connects my exposure to my outcome can I start at the outcome and end up getting back to the exposure? (backward arrows)
Volunteer is a collider: 2 arrows going into it
- If you don’t anything to a collider, the path is naturally blocked
- But if you condition on a collider, you open the path
Good that it is open, allows path to continue (induced backdoor path)

Selection bias is not only about how people are selected, but also how they are retained
- This is called: Differential Loss to Follow-up Remember:

The study population is not representative of the population one intended to analyze

Censor = included or not included

w516HVEr-IaPauB-WTNCSA84y4-OtC-BtiCUCvQO

YVnbxUW3MUQdq8yqb5h73bz46437glnY3EK4vYm9

Depletion of Susceptibles: Studying people who are gone (ex. Infections, people are susceptible and get disease and then become immune so population of those susceptible is depleted

Example of depletion of susceptibles
- Those who were susceptible to getting disease, their risk was highest in the first year and then starts to drop off because they have been depleted from the population (if they were gonna have an effect, it would’ve happened shortly after)
- Those who remain were the superwoman who were basically immune to that effect (superwomen effect)

Homework 2 Answers: incidence vs prevalence, survivorship, depletion of susceptibles

Do the types of selection bias that we just discussed also have the potential to impact randomized trials?

Yes!
At least two different ways that the benefit of randomization may be gone:
- 1. Post-randomization run-in phase
- 2. Differential retention by treatment arm
- Run in Phase: Give people treatment, just to see if they can tolerate it
- People who end up not successfully completing run-in phase may have an adverse reaction to treatment and can’t complete study
  - Could be problem if differential
- To fix that: randomize after but make sure have proper wash-out

Practicalities
- Cost (time & money)
  - Rare outcomes require large populations
  - Long induction time for event requires long follow-up
    - Loss to follow-up may lead to bias
- Exposure changing over time
  - How to attribute outcome to exposure
  - How to define exposure
  - But, can there be potential biases with time-varying exposures?

Collider-Stratification Bias (DAGS) is the structural form of selection bias

Measurement Error and Misclassification

Confounding isn’t the only important error/bias to look out for
We should always be concerned for measurement error
Measurement error can occur at any phase of a study
- Instrument design
- Errors in protocol RE: instrument
- Improper execution of protocol during data collection
- Individual subject limitations
  - Memory
  - Day-to-day variability in biologic characteristics
  - Social acceptance
- Errors during data entry and analysis
Measurement Error and Misclassification are same type of bias, misclassification just occurs for categorical variables and measurement error occurs for continuous variables

Exposure
- Risk factor under investigation
- Can be ascertained a number of ways, depending upon the study
  - Questionnaire
  - Register/record data
  - Direct measurement
- Because we are interested in the association between exposure and outcome, we need to compare outcomes among those exposed and those unexposed.

Defining exposure is a key issue
- Putting people into exposed and unexposed groups and seeing who gets the outcome and who doesn’t isn’t enough
- How is exposure classified?
  - Yes/no; continuous dosage; high/low/none
  - Timing of exposure
  - Etiologically relevant time period
- Misclassification vs Model misspecification?
  - What distinguishes is missing data
  - Misclassification: don’t have data but categorized it because doesn’t belong in a category
  - Model misspecification: have data but chose not to use it (have data, just didn’t model it in the right way)

Chronic Exposures
- Persist over time
- Accumulation of exposure is a function of intensity and time
- Options include (but not limited to):
  - Maximum intensity
  - Average intensity over some time
  - Cumulative amount
- Example: Pack-years of cigarette smoking is a composite of duration and intensity. Often analyses include exposure reclassified as duration of smoking or packs per day.
- The choice of exposure metric makes implicit assumptions.
- What are the implicit assumptions in the above example?

Where can we have misclassification
- Exposure
- Outcome
- Confounders

Misclassification of exposure

Exposure misclassification related to measurement error in onset time of an acute CV event
- Misclassification of exposure DUE TO measurement error

c2EN9S4lVFgAQ1DmaZc0s-De2Wl5qJMdk67pxRFY

Classification of outcome

What is the trajectory?
- Pathogenesis?
- Subclinical and clinical manifestations?
- Can we get insights from understanding the disease or outcome that will help us better understand the degree and form of misclassification?
When is someone defined as having the outcome?
How can heterogeneity of disease impact our studies?
- Ex. MS there are 3 types
- Lumping vs splitting:do we create our own misclassification by categorizing or dichotomizing some things?

How can we prevent outcome misclassification?

Consider homogeneous subgroups or phenotypes
Think about the causal pathway are there precursors to consider?
- Precursors instead of events!!
- But are the precursors the only thing contributing to the event?

What about if we are talking about measurement error of a confounder?

Persist over time
Accumulation of exposure is a function of intensity and time
Options include (but not limited to):
- Maximum intensity
- Average intensity over some time
- Cumulative amount
Example: Pack-years of cigarette smoking is a composite of duration and intensity. Often analyses include exposure reclassified as duration of smoking or packs per day.
- Because lumping together doesn’t account for dose-response effect and results in residual confounding
  - Is this residual confounding known or unknown?

Sources of data to reduce misclassification

Want something objective and close to the gold standard, if possible
imaging tests
pathology
databases
environmental measures
direct observation

Terminology:

Agreement: how close two measurement made on the same subject are, and is measured on the same scale as the measurements themselves
- Kappa
Reliability: relates the magnitude of the measurement error in observed measurements to the inherent variability in the ‘error-free’, ‘true’, or underlying level of the quantity
- ICC
Repeatability: variation in repeat measurements made on the same subject under identical conditions
Reproducibility: “variation in the measurements made on a subject under changing conditions”
- Reproducibility can influence both the validity and statistical precision of your studies.

How reliable is our measurement?

Is the result reproducible?
Test-Retest (same instrument, 2 people in time)
- % agreement
- Cohen’s Kappa
- Weighted Kappa
- Pearson correlation coefficient
- Intraclass correlation coefficient (icc)
Inter-method reliability (between tests)
- CC
- Sensitivity and specificity
- Misclassification matrix

XQmrSdeoNAFneQFsabrb4OE8TYfoKFTJQWp221o_

What’s the difference between Correlation and Kappa?

Correlation: can be correlated but doesn’t necessarily agree
- Can you predict info of one person from the other
- Ex. Could be negative correlation
- When one thing changes the other does
- Shows association but not same value
Kappa: how often do 2 people looking at the same thing agree, accounting for chance?

Validity

Do we measure what is intended to be measured?
Criterion validity: how well does the measure compare with a direct measure of the truth?
Content validity: Does the instrument capture all facets of a construct?
Construct validity: Instrument measures what it claims to be measuring

What is measurement error/misclassification?

Can be broken down into two components:
Systematic error (differential) – threatens validity
Random error (nondifferential) – threatens precision

yB9KtRh119FstK5XWwBO-GLlJpo7WYPSXWHLWVuP

Iu2Wnd-s4mijnnRzv1IfHBmR9hWoSD3zWZmq7oRB

Effects of measurement error

Measurement error leads to bias called: Misclassification bias or Information Bias
Differential misclassification
- Detection of emphysema in smokers vs nonsmokers
- Recall of pregnancy exposures by mothers with and without mothers who gave birth to healthy vs baby with malformation
- Can bias in either direction
- Time lapse between exposure and recall is an important marker/indicator of recall accuracy – therefore if have different duration of time for the exposed and the unexposed then could also have diff misclassification

Non-Differential misclassification
- Does not depend on the status of a subject with respect to other variables
- More likely to bias towards the null, but not always
- (Think of this as nondiff or diff misclassication of X with respect to Y)
- With nondiff misclassification, the sensitivity and specificity of the measurement method is the same by groups
- So if we are talking about the exposure, the sensitivity and specificity of the exposure measurement is the same for cases and control.

hQ9kBZ0bwbNCMxB0RaMMk_Z_pHeyOMvZ-P-Ad9lk

Type 1 error: False Positive
Type 2 error: False Negative

Ex. Qj94YSRSI3bSyGcQkEeLhaKae7hq1aH0tT95l7Cq

Remember that we have non-differential misclassification here because we are saying that it doesn’t differ by whether the individuals are cases or controls.
Now also remember that sensitivity represents the probability that individuals who have the disease are captured as having the disease – the truly positive. And we have a table here of the truth! We know that 60 cases and 200 controls are truly positive!
Specificity is the probability that you are correctly classified as unexposed given that you truly are unexposed.
Then because of the law of total probability we know that 1-sensitivity gives us false negatives and 1-specificity gives us false positives.
To figure out how many individuals are misclassified as exposed, we need to figure out how many of the cases will be true positives and how many will be false positives given the numbers that we have of true exposure distribution in this table.

Nondifferential because sensitivity and specificity of true and misclassified are different
If differential, the validity (sensitivity and specificity) would be different

Misclassified A = TPs + FPs

Sensitivity: A/A+C (true positive rate)

tNdzYI38ebF5vtvo1UL5DgzTdYZZkoGeqWsfne1f

Sensitivity = Θ
Specificity = ????

2wms9HPxTH2RRnahhEhX2Iq-Tdj_PNi9AvVb_CB-

Nondifferential misclassification occurs when neither sensitivity nor specificity for disease classification varies by exposure category. By contrast, differential misclassification occurs when misclassification of disease status varies by exposure category.
INCLUDE HOMEWORK 3 AND UNDERSTAND IT
- Include this in homework 3

cVayE05QGMEIz-GJN-8Q8shkDCM-9DAQ79wo8LVa

ADD HETEROGENEITY AND FLOWCHART

Confounding

Is this a causal relationship or might it be due to some sort of bias?
Traditional definition of confounders:
- They are associated with A in the base population
- They are independently associated with Y in the unexposed (independent of exposure)
- They are not intermediate variables (not on the causal pathway)
- They must precede Y, and can precede or be at least be at the same time as A
- “common cause of both A and Y”
Properties of a confounder
- A confounding factor must be an extraneous risk factor for disease
- A confounding factor must be associated with the exposure under study in the source population (the population at risk from which the cases are derived)
- A confounding factor must not be affected by the exposure or the disease
Want adjusted to be different from crude
- But stratified should be similar to one other
Could we have predicted the direction of potential bias?

Z9z5bXBxrzjIzaMalNz81wqBtzgEPKNPEPkrmah7

Overestimates
- Individuals who have malaria are more likely to work outside
- Individuals to work outside are more likely to be male
“The bias will result in a crude OR that is in absolute magnitude too big”

Assuming the null is true (no association between smoking and MI) what would we expect to see in an unadjusted OR?”

2ffKqxjvw56FvmfqrRtLbgpCeAtA8DjBGvyYnDKD

Underestimates: toward the null
- People who have MI are less likely to be moderate alcohol consumers
- If they are less likely to be alcohol consumers, they are also less likely to smoke
“Bias is absolute downward, therefore observed crude OR <1”

What confounding isn’t
- Effect modification
- Outcome heterogeneity
- Exposure heterogeneity
- Mediation

Confounding vs Effect Modification

Confounding
- Measure is Distorted
- Source of bias
- Crude vs adjusted
Effect Modification
- Measure varies by modifier
- “it depends”
- Across strata
- Not source of bias

kih1opql8DY74FIbTFXq5nGaN23wPNOGu5ExVq_y

Mediator: part of causal pathway, our exposure has an effect on disease that goes throw 2 pathways: one that goes through m and one that doesn’t go through m
- If we adjust for m, it blocks effect of exposure and outcome (not understanding the full effect of exposure on outcome)

WITH EFFECT MODIFIERS: HAVE TO BE ABLE TO STRATIFY UNEXPOSED BY SAME CATEGORIES

Ex. If exposure is coffee drinking, Decaf vs. Regular can not be effect modifier

DAGS

L → A assumes a direct causal effect (that is not mediated by other variables) for at least one individual
The lack of an arrow is also important!
Arrows don’t encode effect size or direction
Interaction not encoded (i.E. We don’t know how A and L, both causes of Y, might interact
Causal DAG must include all common causes of any pair of variables in the graph whether U or C or L

Causation

Different risk in the entire population under two exposure values
Pr[Ya=1=1] : risk in all subjects of the population had they received the counterfactual exposure level a
Causal Risk Ratio: Pr[Ya=1=1] / Pr[Ya=0=1] = 1
Can also define/assess in terms of odds ratio or difference.
What is the null value of the causal risk difference?
What is the null value of the causal odds ratio?
Impossible to find causation though!!!
What we observe
- Pr[Y=1|A=a] : risk of outcome Y in subjects of the population that meet the condition “having actually received exposure level a”
- Associational Risk Ratio:
  - Pr[Y=1|A=1] / Pr[Y=1|A=0] = 1
- This also implies that A and Y are independent
  - A ╨ Y
Counterfactual: what would happen if exposed were unexposed?

Identifiability Conditions: IN ORDER TO ANALYZE A CAUSAL EFFECT THESE MUST BE TRUE

- 1. The values of treatment under comparison correspond to well-defined interventions that, in turn, correspond to the versions of treatment in the data → consistency
  - Observed outcome for every treated individual equals their outcome if they had received treatment, and that the observed outcome for every untreated individual equals their outcome if they had remained untreated
  - Requires sufficiently well-defined treatments and treatment-variation irrelevance
  - Observed outcome and potential outcome (theoretical world) are consistent
- 2. The conditional probability of receiving every value of treatment, though not decided by the investigators, depends only on the measured covariates → exchangeability
  - Ya ╨ A for all a
  - Independence between the counterfactual outcome and the observed treatment
  - Potential outcomes are not conditional on what you actual treatment is in the world
  - Treatment and untreated experience same risk of outcome if received same level of treatment
  - Confounding leads to lack of exchangeability
  - Conditional exchangeability
    - Critical criterion for causal inference
    - Weaker than marginal exchangeability
    - Within levels of L:
      - Exposed subjects would have had the same risk as unexposed subjects had they been unexposed
      - Unexposed subjects would have had the same risk as exposed subjects had they been unexposed
    - Goal for confounding: achieve greatest degree of exchangeability as possible
- 3. The conditional probability of receiving every value of treatment is greater than zero, i.E., positive → positivity
  - Probability of treatment
  - Everybody has to have a nonzero probability of each treatment
  - “P” [“A=a” │”L=l” ]”>0 for all values l with P” [“L=l” ]”≠0 in the population of interest”
There are 2 major benefits to randomization in RCT
- Confounding
- Positivity
Identification of confounding – lots of different approaches

oyO-HD2yKNbZa1sH6KoROrr-ZRTk0Xy2dN5rFGLE

- But evaluation of confounding is different in Case-control and cohort studies because of how the populations are sampled and what they represent
What can we do about confounding?
- Design
  - Randomization
  - Restriction
  - Matching
- Analysis
  - Stratification
  - Multivariable adjustment
  - Propensity scores
  - Instrumental variables

The modern subject is the subject of the sciences.

1. Encodes subject-matter knowledge

2. Our assumptions

D-separation

1. Spurious association due to the sharing of a common cause- CONFOUNDING

2. Causal effect of A on Y- THE GOAL

Backdoor path = confounding

1. Suppose the null is true (remove arrows from A)

2. Is there a back door path? (YES)

3. Can we block it?

1. Exposure → incident disease

2. Exposure → survival with disease

3. Disease → exposure

Sampling must be independent of exposure

The person-time from which the cases arrive out of the cohort

For controls, population must be risk for outcome, NOT exposure

IF effect modifier would look at in CASES (exposed) AND CONTROLS (unexposed)

1. Differential loss to follow-up

2. Volunteer/self-selection

1. Post-randomization run-in phase

2. Differential retention by treatment arm

Recent Notes

Subjects

Publicidad