Week 3: Causes, Confounds, and Colliders

Elemental confounds

Workspace setup:

library(here)
library(tidyverse)
library(cowplot)
library(brms)
library(tidybayes)
library(patchwork)

Follow up from last week

  • Exponential distributions
  • b

Exponentials

We started by using uniform distributions for our priors on sigma (standard deviation). They do two nice things for us: (1) they’re bounded, so we can limit our prior to a positive value, and (2) they’re intuitive. However, they’re extremely inefficient. There are a couple other distributions that are better suited to priors for variances/standard deviations. We’re going to focus on one: the exponential.

The exponential distribution models rates (time between discrete events), but time is also a distance, which is what standard deviations are. The parameter for this distribution (\(\lambda\)) is the inverse of the rate. In other words, if we think the typical deviation is 2, we should use a rate of 1/2 (or .5). This is a very intuitive way of thinking about deviations, so we’ll use this function

Code
# Create a function to generate exponential distribution data
generate_exponential_data <- function(rate, n = 1000) {
  data.frame(
    x = seq(0, 6, length.out = n),
    density = dexp(seq(0, 6, length.out = n), rate = rate),
    distribution = paste0("rate = ", rate)
  )
}

# Generate data for three different exponential distributions
dist1 <- generate_exponential_data(rate = 0.5)
dist2 <- generate_exponential_data(rate = 1.0)
dist3 <- generate_exponential_data(rate = 2.0)

# Combine the data
all_dists <- bind_rows(dist1, dist2, dist3)

# Create the plot
ggplot(all_dists, aes(x = x, y = density, color = distribution)) +
  geom_line(size = 1.2) +
  scale_color_manual(values = c("#1c5253", "#e07a5f", "#81b29a")) +
  labs(
    title = "Three Different Exponential Distributions",
    x = "x",
    y = "Density",
    color = "Parameters"
  ) +
  theme(legend.position = "top")

b for beta

Let me try to clarify the different classes in brms. This package uses a set vocabulary of terms to refer to the different parameters in a model. Right now, we’re only working with three of these. Here’s some example code and the different classes involved:

m <- brm(
  data = d,
  family = gaussian,
  y ~ 1 + x, 
  prior = c( prior(normal(0, 1),   class = "Intercept"),
             prior(normal(0,.5),   class = "b"),
             prior(exponential(1), class="sigma")),
  < other stuff >
)

Connecting these priors to the mathematical model:

\[\begin{align*} y &\sim \text{Normal}(\mu, \sigma) \\ \mu &= a + (b\times x) \\ a &\sim \text{Normal}(0, 1) \\ b &\sim \text{Normal}(0, .5) \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]

Last week, we started to build out multiple regression models, those that include both categorical and continuous variables as “main effects” or predictors in a model. These are simple multiple regression models, and they can be extremely useful for things like revealing spurious correlations – zero-order correlations that suggest association even when the two variables are not causally related – and important correlations that are masked by other variables.

However, you cannot interpret the coefficients in any multiple regression model without identifying the underlying causal model. This week, we’ll use Directed Acyclic Graphs (DAGs) to develop and visualize our causal models. These DAGs will then help us determine which variables, if any, to control for when trying to estimate causal pathways. Along the way, we’ll discuss some common mistakes when it comes to controls and their disasterious consequences.

This will also be a good opportunity to practice the mathematical models and code we’ve discussed before, but there will be little new code this week.

Forks

library(dagitty)
dag3.1 <- dagitty( "dag{ Z -> X; Z -> Y }" )
coordinates(dag3.1) <- list( x=c(X=-1,Z=0,Y=1) , y=c(X=0,Z=-1,Y=0) )
rethinking::drawdag( dag3.1, cex = 3, lwd = 3 )

True confounds.

Not stratifying on (controlling for) Z will yield a spurious relationship between X and Y. That is, a correlation of X and Y (or regression) will be non-zero, even though there is no causal relationship from one to another.

Marriage example

data(WaffleDivorce, package = "rethinking")
d <- WaffleDivorce

exercise

Create two plots, one showing the relationship between marriage rate and divorce rate and another showing the relationship between median age at marriage and divorce rate.

solution

Code
p1 <- d %>% ggplot(aes(x = Marriage, y = Divorce)) +
  geom_point(color = "#1c5253") +
  geom_smooth(method = "lm", color = "black") +
  labs(x = "marriage rate", y = "divorce rate")

p2 <- d %>% ggplot(aes(x = MedianAgeMarriage, y = Divorce)) +
  geom_point(color = "#1c5253") +
  geom_smooth(method = "lm", color = "black") +
  labs(x = "median age at marriage", y = "divorce rate")

(p1 | p2)

exercise

Model the relationship between Divorce Rate (D) and Marriage Rate (M). (rethinking::standardize both first.) Be sure to do the following:

  • Write a mathematical model expressing this relationship including priors.

  • Calculate your posterior predictions for the relationship between D and M.

solution

\[\begin{align*} D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_MM_i \\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_M &\sim \text{Normal}(0, 0.5) \\ \sigma &\sim \text{Exponential}(1) \\ \end{align*}\]

d$D <- rethinking::standardize(d$Divorce)
d$M <- rethinking::standardize(d$Marriage)

solution

m1 <- brm(
  data = d,
  family = gaussian,
  D ~ 1 + M,
  prior = c( prior(normal(0, .2),  class=Intercept),
             prior(normal(0, .5),  class=b),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, seed=2, chains=1,
  file=here("files/models/31.1")
)

solution

posterior_summary(m1)
                 Estimate  Est.Error        Q2.5       Q97.5
b_Intercept   0.005581893 0.11208915  -0.2067139   0.2223152
b_M           0.355030263 0.12872363   0.0932213   0.6253811
sigma         0.947145157 0.09797891   0.7715463   1.1837365
Intercept     0.005581893 0.11208915  -0.2067139   0.2223152
lprior       -0.924918655 0.29721444  -1.6697629  -0.5096142
lp__        -69.002548031 1.19902799 -72.1796354 -67.5830244

Bonus: a plot

Code
post = as_draws_df(m1)


# plot it all
ggplot(d, aes(x = M, y = D)) +
  geom_point() +
  geom_abline( aes(intercept = b_Intercept, slope = b_M), 
               data = post[1:20, ],
               col= "#1c5253", 
               alpha = .3) +
  labs( x="marriage rate",
        y="divorce rate")

Now we’re going to incorporate state age (median age at marriage) into our model. This is the DAG proposed by RM. What does this DAG represent?

dag3.2 <- dagitty( "dag{ A -> D; A -> M; M -> D }" )
coordinates(dag3.2) <- list( x=c(A=0,D=1,M=2) , y=c(A=0,D=1,M=0) )
rethinking::drawdag( dag3.2, cex = 3, lwd = 3 )

forks

DAG models help us to see conditional independencies.

  • statements of which variables should be associated with each other (or not) in the data.
  • statements of which variables become disassociated when we condition on some other set of variables.
impliedConditionalIndependencies( dag3.2 ) #none

How does this change with a new DAG?

dag3.3 <- dagitty( "dag{ A -> D; A -> M }" )
coordinates(dag3.3) <- list( x=c(A=0,D=1,M=2) , y=c(A=0,D=1,M=0) )
rethinking::drawdag( dag3.3, cex = 3, lwd = 3 )
impliedConditionalIndependencies( dag3.3 ) 
D _||_ M | A

\[\begin{align*} D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_AA_i + \beta_MM_i\\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_A &\sim \text{Normal}(0, 0.5) \\ \beta_M &\sim \text{Normal}(0, 0.5) \\ \sigma &\sim \text{Exponential}(1) \\ \end{align*}\]

d$A <- rethinking::standardize(d$MedianAgeMarriage)
m2 <- brm(
  data=d,
  family=gaussian,
  D ~ 1 + A + M,
  prior = c( prior(normal(0, .2),  class=Intercept),
             prior(normal(0, .5),  class=b, coef="A"), #optional
             prior(normal(0, .5),  class=b, coef="M"),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, seed=3, chains=1,
  file=here("files/models31.2")
)

Let’s compare the effect of marriage rate for these two models.

post1 = as_draws_df(m1) %>% mutate(model="total effect")
post2 = as_draws_df(m2) %>% mutate(model="direct effect")
full_join(post1, post2) %>% 
  group_by(model) %>% 
  mean_qi(b_M)
# A tibble: 2 × 7
  model             b_M  .lower .upper .width .point .interval
  <chr>           <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
1 direct effect -0.0598 -0.363   0.269   0.95 mean   qi       
2 total effect   0.355   0.0932  0.625   0.95 mean   qi       

Bonus: a plot

Code
full_join(post1, post2) %>% 
  ggplot( aes(y=model, x=b_M, fill = after_stat(x > 0) ) )  + 
  stat_halfeye() +
  geom_vline(xintercept=0, linetype="dashed") +
  scale_fill_manual(values=c("grey", "#1c5253")) +
  guides(fill="none") +
  labs(x="Marriage effect", y=NULL)

If we want to simulate the effect of manipulating marriage, we use “do calculus.” We do this by effectively “deleting” the arrows going into our manipulation variable (M).

set.seed(9)
As = sample(d$A, replace=T, size=1000)
nd0 = data.frame(A = As, M=0)
nd1 = data.frame(A = As, M=1)
ppd0 = predicted_draws(m2, newdata = nd0) 
ppd1 = predicted_draws(m2, newdata = nd1) 

ppd = full_join(ppd0, ppd1) %>% 
  pivot_wider(names_from = M, names_prefix = "M",
              values_from = .prediction) %>% 
  mutate(diff=M1-M0)
     
ppd  %>% 
  ggplot(aes(x=diff)) +
  geom_density() +
  labs(title="Implied causal effect of marriage rates on divorce")

fitting the direct and total at once

d_model <- bf(D ~ 1 + A + M)
m_model <- bf(M ~ 1 + A)
m2c <-
  brm(data = d, 
      family = gaussian,
      d_model + m_model + set_rescor(FALSE),
      prior = c(prior(normal(0, 0.2), class = Intercept, resp = D),
                prior(normal(0, 0.5), class = b,         resp = D),
                prior(exponential(1), class = sigma,     resp = D),
                
                prior(normal(0, 0.2), class = Intercept, resp = M),
                prior(normal(0, 0.5), class = b,         resp = M),
                prior(exponential(1), class = sigma,     resp = M)),
      iter = 2000, warmup = 1000, chains = 1, seed = 5,
      file = here("files/models/m31.2c"))
posterior_summary(m2c) %>% round(2)
              Estimate Est.Error    Q2.5   Q97.5
b_D_Intercept     0.00      0.10   -0.19    0.19
b_M_Intercept     0.00      0.09   -0.17    0.16
b_D_A            -0.60      0.15   -0.89   -0.30
b_D_M            -0.05      0.15   -0.35    0.25
b_M_A            -0.69      0.10   -0.89   -0.49
sigma_D           0.83      0.09    0.68    1.01
sigma_M           0.71      0.07    0.59    0.86
Intercept_D       0.00      0.10   -0.19    0.19
Intercept_M       0.00      0.09   -0.17    0.16
lprior           -2.84      0.55   -4.14   -1.94
lp__           -117.95      1.92 -122.67 -115.33

Pipes

A pipe in a DAG model represents a situation where a variable acts as a mediator between two other variables. In this context, the effect of one variable on another is transmitted through the mediator.

dag_pipe <- dagitty("dag{ X -> M -> Y }")
coordinates(dag_pipe) <- list(x=c(X=0, M=1, Y=2), y=c(X=0, M=1, Y=0))
rethinking::drawdag(dag_pipe, cex = 3, lwd = 3)

One place that pipes show up is in post-treatment bias.

Suppose you are studying plants in a greenhouse, and you want to know how effective a particular fungal treatment is. Fungus on plants tends to reduce their growth. You plant a bunch of plants, measure them, then apply one of two different treatments. After some time, you measure the plants again, and you measure the amount of fungus on the plants.

Simulate some fake plant data.

set.seed(71)
# number of plants
N <- 100
# simulate initial heights
h0 <- rnorm(N,10,2)
# assign treatments and simulate fungus and growth
treatment <- rep( 0:1 , each=N/2 )
fungus <- rbinom( N , size=1 , prob=0.5 - treatment*0.4 )
h1 <- h0 + rnorm(N, 5 - 3*fungus)
# compose a clean data frame
d <- data.frame( h0=h0 , h1=h1 , treatment=treatment , fungus=fungus )
rethinking::precis(d)
              mean        sd      5.5%    94.5%    histogram
h0         9.95978 2.1011623  6.570328 13.07874 ▁▂▂▂▇▃▂▃▁▁▁▁
h1        14.39920 2.6880870 10.618002 17.93369     ▁▁▃▇▇▇▁▁
treatment  0.50000 0.5025189  0.000000  1.00000   ▇▁▁▁▁▁▁▁▁▇
fungus     0.23000 0.4229526  0.000000  1.00000   ▇▁▁▁▁▁▁▁▁▂

exercise

Draw the dag that describes the relationships between these 4 variables.

What are the implied conditional independences?

solution

plant_dag <- dagitty( "dag {
  H_0 -> H_1
  F -> H_1
  T -> F}" )

coordinates( plant_dag ) <- list( x=c(H_0=1.0,T=0,F=0.5,H_1=1),
                                  y=c(H_0=-.5,T=0,F=0.5,H_1=0) )

rethinking::drawdag( plant_dag, cex = 3, lwd = 3 )
impliedConditionalIndependencies(plant_dag)
F _||_ H_0
H_0 _||_ T
H_1 _||_ T | F

Let’s start by just modeling growth using our two height variables.

\[\begin{align*} h_{1,i} &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= h_{0,i} \times p \\ p &\sim \text{Log Normal}(0, .25) \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]

m3 <- brm(
  data=d,
  family=gaussian,
  h1 ~ 0 + h0,
  prior = c( prior(lognormal(0, .25), class=b, lb=0),
             prior(exponential(1),    class=sigma)),
  iter=2000, warmup=1000, seed=3, chains=1,
  file=here("files/models/31.3")
)

posterior_summary(m3)
          Estimate  Est.Error        Q2.5       Q97.5
b_h0      1.427201 0.01809789    1.392588    1.464538
sigma     1.826721 0.13345199    1.597196    2.107640
lprior   -2.728095 0.15714669   -3.070622   -2.454636
lp__   -204.320089 0.99218185 -206.809566 -203.312433

Now add treatment to this model.

\[\begin{align*} h_{1,i} &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= h_{0,i} \times p \\ p &= \alpha + \beta \text{treatment}_i \\ \alpha &\sim \text{Log Normal}(0, .25) \\ \beta_T &\sim \text{Normal}(0, .5) \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]

The trick to this is combining the two deterministic formulas:

\[ \mu_i = h_{0i} \times (\alpha + \beta\text{treatment}_i) \]

m4 <- brm(
  data = d,
  family = gaussian,
  bf(
    h1 ~ h0 * (a + t * treatment),
    a + t ~ 1,
    nl = TRUE),
  prior = c( prior(normal(0, .25), nlpar=a),
             prior(normal(0, .50), nlpar=t),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, chains=1, seed=9,
  file = here("files/models/31.4")
)
posterior_summary(m4)
                   Estimate  Est.Error          Q2.5        Q97.5
b_a_Intercept    1.36739390 0.02464814    1.32018702    1.4151191
b_t_Intercept    0.09828429 0.03351381    0.03079803    0.1655035
sigma            1.79127343 0.12669988    1.58008919    2.0669569
lprior         -16.53425658 0.53339055  -17.57744797  -15.5344238
lp__          -216.09858340 1.20703182 -219.36473634 -214.7443121

exercise

Now add in both treatment and fungus to this model.

solution

\[\begin{align*} h_{1,i} &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= h_{0,i} \times p \\ p &= \alpha + \beta_TT_i + \beta_FF_i \\ \alpha &\sim \text{Log Normal}(0, .25) \\ \beta_T &\sim \text{Normal}(0, .5) \\ \beta_F &\sim \text{Normal}(0, .5) \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]

m5 <- brm(
  data = d,
  family = gaussian,
  bf(
    h1 ~ h0 * (a + t * treatment + f * fungus),
    a + t + f ~ 1,
    nl = TRUE),
  prior = c( prior(normal(0, .25), nlpar=a),
             prior(normal(0, .50), nlpar=t),
             prior(normal(0, .50), nlpar=f),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, chains=1, seed=9,
  file = here("files/models/31.5")
)
Code
post4 = as_draws_df(m4) %>% mutate(model="without fungus")
post5 = as_draws_df(m5) %>% mutate(model="with fungus")
full_join(post4, post5) %>% 
  ggplot( aes(y=model, x=b_t_Intercept, fill = after_stat(x > 0)) ) +
  stat_halfeye() +
  scale_fill_manual(values=c("grey", "#1c5253")) +
  geom_vline(aes(xintercept=0), linetype = "dashed") +
  guides(fill="none") +
  labs(x = "Estimated effect of treatment",
       y=NULL)

colliders

dag3.4 <- dagitty( "dag{ X -> Z; Y -> Z }" )
coordinates(dag3.4) <- list( x=c(X=-1,Z=0,Y=1) , y=c(X=0,Z=1,Y=0) )
rethinking::drawdag( dag3.4, cex = 3, lwd = 3 )

Stratifying on Z opens up the association between X and Y. We do not want to stratify on Z.

collider of false sorrow

We’ll use the rethinking package to simulate data:

  • Each year, 20 people are born with uniformly distributed happiness values.
  • Each year, each person ages one year. Happiness does not change.
  • At age 18, individuals can become married. The odds of marriage each year are proportional to an individual’s happiness.
  • Once married, an individual remains married.
  • After age 65, individuals leave the sample. (They move to Spain.)
d <- rethinking::sim_happiness(seed = 1990, N_years = 1000)
rethinking::precis(d)
                  mean         sd      5.5%     94.5%     histogram
age       3.300000e+01 18.7688832  4.000000 62.000000 ▇▇▇▇▇▇▇▇▇▇▇▇▇
married   2.946154e-01  0.4560451  0.000000  1.000000    ▇▁▁▁▁▁▁▁▁▃
happiness 6.832142e-19  1.2144211 -1.789474  1.789474      ▇▅▇▅▅▇▅▇
Code
d %>% 
  mutate(married = factor(married, labels = c("unmarried", "married"))) %>% 
  ggplot(aes( x = age, y = happiness)) +
  geom_point( aes( color = married), size = 3) +
  scale_color_manual("",
                       values = c("unmarried" = "lightgrey",
                                  "married" = "#1c5253")) +
  theme(legend.position = "top")

exercise

Filter out people who are younger than 18. Then fit two models:

  1. a model in which happiness is influenced by both marriage and age.
  2. a model in which happiness is influenced only by age.

(You may want to center or rethinking::standardize age in some way.)

solution

d2 <- d[d$age >= 18, ]
d2$A <- rethinking::standardize(d2$age)
d2$mid <- as.factor(d2$married + 1)

m6a <- brm(
  data=d2, 
  family=gaussian,
  bf( happiness ~ 0 + a + b*A, 
      a ~ 0 + mid,
      b ~ 0 + mid,
      nl = TRUE),
  prior = c( prior(normal(0, .50), nlpar=a),
             prior(normal(0, .25), nlpar=b),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, seed=9, chains=1,
  file = here("files/models/31.6a")
)

solution

d2 <- d[d$age >= 18, ]
d2$A <- rethinking::standardize(d2$age)
d2$mid <- as.factor(d2$married + 1)

m6b <- brm(
  data=d2, 
  family=gaussian,
  happiness ~ 1 + A + mid,
  prior = c( prior(normal(0, .50), class=Intercept),
             prior(normal(0, .25), class=b),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, seed=9, chains=1,
  file = here("files/models/31.6b")
)

solution

m7 <- brm(
  data=d2, 
  family=gaussian,
  happiness ~ A,
  prior = c( prior(normal(0, .50), class=Intercept),
             prior(normal(0, .25), class=b),
             prior(exponential(1), class=sigma)),
  iter=2000, warmup=1000, seed=9, chains=1,
  file = here("files/models/31.7")
)

solution

Code
post6a = as_draws_df(m6a) 
post6b = as_draws_df(m6b)
post7  = as_draws_df(m7)

data.frame(
      b_mid1   = post6a$b_b_mid1, 
      b_mid2   = post6a$b_b_mid2, 
      b_A_avgb = post6b$b_A,
      b_A      = post7$b_A) %>% 
  mutate(b_A_avga = (b_mid1 + b_mid2)/2) %>% 
  select(b_A_avga, b_A_avgb, b_A) %>% 
  pivot_longer(everything()) %>% 
  mutate(name = case_when(
    name == "b_A_avga" ~ "Moderated by marriage",
    name == "b_A_avgb" ~ "Controlling for marriage",
    TRUE ~ "Unstratified"
  )) %>% 
  ggplot(aes( y=name, x=value, fill = after_stat(x < 0) ) )  + 
  stat_halfeye() +
  geom_vline(xintercept=0, linetype="dashed") +
  scale_fill_manual(values=c("grey", "#1c5253")) +
  guides(fill="none") +
  labs(x="Age effect", y=NULL)

testing DAG assumptions

  • Causal DAGs make strong assumptions about unobserved confounders, but analyzing them can still provide valuable insights about where our models might be wrong through testing implied relationships.

  • Conditional independencies are key testable implications of a DAG, representing pairs of variables that should show no association after controlling for specific sets of other variables.

  • We can identify conditional independencies using the same path analysis techniques used for finding backdoor paths - examining all paths between two variables and determining if there exists a conditioning set that blocks all paths.

  • While manually deriving conditional independencies in large graphs is complex due to the many variable pairs and paths to consider, computational tools can efficiently perform these calculations.