adventures in covariance
Let’s start by simulating cafe data.
# ---- set population-level parameters -----
a <- 3.5 # average morning wait time
b <- (-1) # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- (-0.7) #correlation between intercepts and slopes
# ---- create vector of means ----
Mu <- c(a, b)
# ---- create matrix of variances and covariances ----
sigmas <- c(sigma_a,sigma_b) # standard deviations
Rho <- matrix( c(1,rho,rho,1) , nrow=2 ) # correlation matrix
# now matrix multiply to get covariance matrix
Sigma <- diag(sigmas) %*% Rho %*% diag(sigmas)
# ---- simulate intercepts and slopes -----
N_cafes = 20
library(MASS)
set.seed(5)
vary_effects <- mvrnorm( n=N_cafes, mu = Mu, Sigma=Sigma)
a_cafe <- vary_effects[, 1]
b_cafe <- vary_effects[, 2]
# ---- simulate observations -----
set.seed(22)
N_visits <- 10
afternoon <- rep(0:1,N_visits*N_cafes/2)
cafe_id <- rep( 1:N_cafes , each=N_visits )
mu <- a_cafe[cafe_id] + b_cafe[cafe_id]*afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm( N_visits*N_cafes , mu , sigma )
d3 <- data.frame( cafe=cafe_id , afternoon=afternoon , wait=wait )
mean sd 5.5% 94.5% histogram
cafe 10.500000 5.7807513 2.000000 19.000000 ▇▇▇▇▇▇▇▇▇▇
afternoon 0.500000 0.5012547 0.000000 1.000000 ▇▁▁▁▁▁▁▁▁▇
wait 3.073405 1.1082252 1.477719 4.869957 ▁▁▂▅▇▇▅▇▃▃▁▁▁▁
cafe afternoon wait
1 1 0 3.967893
2 1 1 3.857198
3 1 0 4.727875
4 1 1 2.761013
5 1 0 4.119483
6 1 1 3.543652
7 1 0 4.190949
8 1 1 2.533223
9 1 0 4.124032
10 1 1 2.764887
In this exercise, we are simulating data from a generative process and then analyzing that data with a model that reflects exactly the correct structure of that process. But in the real world, we’re never so lucky. Instead we are always forced to analyze data with a model that is MISSPECIFIED: The true data-generating process is different than the model. Simulation can be used however to explore misspecification. Just simulate data from a process and then see how a number of models, none of which match exactly the data-generating process, perform. And always remember that Bayesian inference does not depend upon data-generating assumptions, such as the likelihood, being true. Non-Bayesian approaches may depend upon sampling distributions for their inferences, but this is not the case for a Bayesian model. In a Bayesian model, a likelihood is a prior for the data, and inference about parameters can be surprisingly insensitive to its details.
Mathematical model:
likelihood function and linear model
Wi∼Normal(μi,σ)μi=αCAFE[i]+βCAFE[i](Afternooni)
varying intercepts and slopes
[αCAFE[i]βCAFE[i]]∼MVNormal([αβ],S)S∼(σα,00,σβ)R(σα,00,σβ)
priors
α∼Normal(5,2)β∼Normal(−1,0.5)σ∼Exponential(1)σα∼Exponential(1)σβ∼Exponential(1)R∼LKJcorr(2)
# examples
rlkj_1 = rethinking::rlkjcorr(1e4, K=2, eta=1)
rlkj_2 = rethinking::rlkjcorr(1e4, K=2, eta=2)
rlkj_4 = rethinking::rlkjcorr(1e4, K=2, eta=4)
rlkj_6 = rethinking::rlkjcorr(1e4, K=2, eta=6)
data.frame(rlkj_1= rlkj_1[,1,2],
rlkj_2= rlkj_2[,1,2],
rlkj_4= rlkj_4[,1,2],
rlkj_6= rlkj_6[,1,2]) %>%
ggplot() +
geom_density(aes(x=rlkj_1, color = "1"), alpha=.3) +
geom_density(aes(x=rlkj_2, color = "2"), alpha=.3) +
geom_density(aes(x=rlkj_4, color = "4"), alpha=.3) +
geom_density(aes(x=rlkj_6, color = "6"), alpha=.3) +
labs(x="R", color="eta") +
theme(legend.position = "top")
m5 <- brm(
data = d3,
family = gaussian,
wait ~ 1 + afternoon + (1 + afternoon | cafe),
prior = c(
prior( normal(5,2), class=Intercept ),
prior( normal(-1, .5), class=b),
prior( exponential(1), class=sd),
prior( exponential(1), class=sigma),
prior( lkj(2), class=cor)
),
iter=2000, warmup=1000, chains=4, cores=4, seed=9,
file=here("files/models/m72.5")
)
Estimate Est.Error Q2.5 Q97.5
b_Intercept 3.66 0.23 3.21 4.11
b_afternoon -1.13 0.14 -1.41 -0.85
sd_cafe__Intercept 0.97 0.17 0.71 1.37
sd_cafe__afternoon 0.59 0.12 0.39 0.87
cor_cafe__Intercept__afternoon -0.51 0.18 -0.80 -0.10
sigma 0.47 0.03 0.42 0.53
Intercept 3.10 0.20 2.71 3.50
r_cafe[1,Intercept] 0.55 0.29 -0.01 1.14
r_cafe[2,Intercept] -1.50 0.30 -2.10 -0.90
r_cafe[3,Intercept] 0.70 0.30 0.12 1.30
r_cafe[4,Intercept] -0.42 0.29 -0.98 0.15
r_cafe[5,Intercept] -1.79 0.30 -2.36 -1.20
r_cafe[6,Intercept] 0.60 0.29 0.02 1.18
r_cafe[7,Intercept] -0.05 0.29 -0.64 0.55
r_cafe[8,Intercept] 0.28 0.30 -0.31 0.86
r_cafe[9,Intercept] 0.32 0.29 -0.28 0.90
r_cafe[10,Intercept] -0.10 0.29 -0.68 0.49
r_cafe[11,Intercept] -1.74 0.29 -2.31 -1.17
r_cafe[12,Intercept] 0.18 0.29 -0.40 0.76
r_cafe[13,Intercept] 0.22 0.30 -0.37 0.82
r_cafe[14,Intercept] -0.49 0.30 -1.08 0.09
r_cafe[15,Intercept] 0.79 0.30 0.22 1.39
r_cafe[16,Intercept] -0.27 0.29 -0.86 0.32
r_cafe[17,Intercept] 0.55 0.30 -0.03 1.15
r_cafe[18,Intercept] 2.08 0.30 1.52 2.68
r_cafe[19,Intercept] -0.42 0.30 -1.00 0.16
r_cafe[20,Intercept] 0.07 0.30 -0.53 0.65
r_cafe[1,afternoon] -0.03 0.29 -0.60 0.53
r_cafe[2,afternoon] 0.23 0.30 -0.36 0.79
r_cafe[3,afternoon] -0.80 0.30 -1.39 -0.24
r_cafe[4,afternoon] -0.10 0.28 -0.65 0.43
r_cafe[5,afternoon] 1.00 0.30 0.42 1.57
r_cafe[6,afternoon] -0.16 0.28 -0.72 0.38
r_cafe[7,afternoon] 0.10 0.28 -0.46 0.64
r_cafe[8,afternoon] -0.50 0.29 -1.09 0.07
r_cafe[9,afternoon] -0.17 0.28 -0.73 0.38
r_cafe[10,afternoon] 0.18 0.29 -0.39 0.75
r_cafe[11,afternoon] 0.70 0.29 0.15 1.27
r_cafe[12,afternoon] -0.06 0.29 -0.63 0.50
r_cafe[13,afternoon] -0.68 0.29 -1.25 -0.12
r_cafe[14,afternoon] 0.19 0.29 -0.36 0.78
r_cafe[15,afternoon] -1.06 0.31 -1.65 -0.44
r_cafe[16,afternoon] 0.09 0.28 -0.47 0.64
r_cafe[17,afternoon] -0.09 0.28 -0.64 0.47
r_cafe[18,afternoon] 0.10 0.30 -0.49 0.70
r_cafe[19,afternoon] 0.87 0.30 0.29 1.47
r_cafe[20,afternoon] 0.07 0.29 -0.49 0.66
lprior -5.06 0.44 -6.07 -4.36
lp__ -197.20 7.16 -212.07 -184.27
Let’s get the slopes and intercepts for each cafe.
cafe_params = m5 %>% spread_draws(b_Intercept, b_afternoon, r_cafe[id, term]) %>%
pivot_wider(names_from = term, values_from = r_cafe) %>%
mutate( intercepts =b_Intercept + Intercept,
slopes = b_afternoon + afternoon) %>%
mean_qi(intercepts, slopes)
cafe_params %>%
dplyr::select(cafe=id, intercepts, slopes) %>%
round(2)
cafe intercepts slopes
1 1 4.21 -1.16
2 2 2.16 -0.90
3 3 4.37 -1.93
4 4 3.24 -1.23
5 5 1.88 -0.14
6 6 4.26 -1.29
7 7 3.62 -1.03
8 8 3.94 -1.63
9 9 3.98 -1.31
10 10 3.56 -0.95
11 11 1.93 -0.43
12 12 3.84 -1.19
13 13 3.88 -1.81
14 14 3.18 -0.94
15 15 4.46 -2.19
16 16 3.39 -1.04
17 17 4.22 -1.22
18 18 5.75 -1.03
19 19 3.25 -0.26
20 20 3.73 -1.06
More about stat_ellipse
here.
We can use the slopes and intercepts to calculate the expected morning and afternoon wait times for each cafe.
cafe_params %>%
mutate(
morning = intercepts,
afternoon = intercepts + slopes
) %>%
ggplot( aes(x=morning, y=afternoon) ) +
mapply(function(level) {
stat_ellipse(geom = "polygon", type = "norm",
linewidth = 0, alpha = .1, fill = "#1c5253",
level = level)
},
# enter the levels here
level = c(1:9 / 10, .99)) +
geom_point(size = 2)+
labs(x="morning wait time",
y="afternoon wait time")
Or, we can use epred_draws
.
d3 %>%
distinct(cafe, afternoon) %>%
add_epred_draws(m5) %>%
group_by(cafe, afternoon) %>%
summarise(wait = mean(.epred), .groups = "drop") %>%
mutate(afternoon =ifelse(afternoon==1, "afternoon", "morning")) %>%
pivot_wider(names_from = afternoon, values_from=wait) %>%
ggplot( aes(x=morning, y=afternoon) ) +
mapply(function(level) {
stat_ellipse(geom = "polygon", type = "norm",
linewidth = 0, alpha = .1, fill = "#1c5253",
level = level)
},
# enter the levels here
level = c(1:9 / 10, .99)) +
geom_point(size = 2)+
labs(x="morning wait time",
y="afternoon wait time")
What is the correlation of our intercepts and slopes?
# A tibble: 1 × 6
cor_cafe__Intercept__afternoon .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 -0.506 -0.799 -0.0956 0.95 mean qi
post = as_draws_df(m5)
rlkj_2 = rethinking::rlkjcorr(nrow(post), K=2, eta=2)
data.frame(prior= rlkj_2[,1,2],
posterior = post$cor_cafe__Intercept__afternoon) %>%
ggplot() +
geom_density(aes(x=prior, color = "prior"), alpha=.3) +
geom_density(aes(x=posterior, color = "posterior"), alpha=.3) +
scale_color_manual(values = c("#1c5253" , "#e07a5f")) +
labs(x="R") +
theme(legend.position = "top")
Let’s apply what we’ve learned to our affect data:
data_path = "https://raw.githubusercontent.com/sjweston/uobayes/refs/heads/main/files/data/external_data/williams.csv"
d <- read.csv(data_path)
rethinking::precis(d)
mean sd 5.5% 94.5% histogram
id 98.61029694 63.7493291 10.0000000 207.000000 ▇▇▇▃▅▅▅▃▃▃▂▂
female 0.57016803 0.4950710 0.0000000 1.000000 ▅▁▁▁▁▁▁▁▁▇
PA.std 0.01438236 1.0241384 -1.6812971 1.751466 ▁▁▃▇▇▃▁▁
day 44.17962096 27.6985612 6.0000000 92.000000 ▇▇▇▇▅▅▅▃▃▃
PA_lag 0.01992587 1.0183833 -1.7204351 1.717036 ▁▂▃▅▇▇▅▃▂▁
NA_lag -0.04575229 0.9833161 -0.8750718 1.990468 ▇▃▂▁▁▁▁▁▁▁▁▁▁▁
steps.pm 0.05424387 0.6298941 -1.0258068 1.011356 ▁▂▇▇▃▁▁▁
steps.pmd 0.02839160 0.7575395 -1.1235951 1.229974 ▁▁▇▇▁▁▁▁▁▁▁▁▁▁
NA.std -0.07545093 0.9495660 -1.0484293 1.811061 ▁▁▁▇▂▁▁▁▁▁
Here’s our original intercept-only model (for comparison).
PA.stdi∼Normal(μi,σ)μi=αid[i]αj∼Normal(ˉα,σα) for j in 1...239ˉα∼Normal(0,1.5)σα∼Exponential(1)σ∼Exponential(1)
m1 <- brm(
data=d,
family=gaussian,
PA.std ~ 1 + (1 | id),
prior = c( prior(normal(0, 1.5), class=Intercept),
prior(exponential(1), class=sd),
prior(exponential(1), class=sigma)),
iter=4000, warmup=1000, chains=4, cores=4, seed=9, #running this longer so it mixes
control = list(adapt_delta =.99),
file=here("files/models/m71.2")
)
seed=4
set.seed(seed)
p1 = d %>%
nest(data = -id) %>%
sample_n(size = 3) %>%
unnest() %>%
ggplot(aes(x=day, y=PA.std, group=id)) +
geom_line(alpha=.3) +
facet_wrap(~id, nrow=1) +
labs(x="day", y="PA")
set.seed(seed)
p2 = d %>%
nest(data = -id) %>%
sample_n(size = 3) %>%
unnest() %>%
ggplot(aes(x=PA_lag, y=PA.std, group=id)) +
geom_line(alpha=.3) +
facet_wrap(~id, nrow=1) +
labs(x="PA yesterday", y="PA today")
p1/p2
Mathematical model with varying slopes
Likelihood function and linear model
PAij∼Normal(μi,σ)μi=αid[i]+βid[i]PAi,t−1
Varying intercepts and slopes:
αid[i]=γα+Uα,id[i]βid[i]=γβ+Uβ,id[i]
Random effects:
[Uα,id[i]Uβ,id[i]]∼MVNormal([00],S)S=(σα00σβ)R(σα00σβ)
Priors: γα∼Normal(0,1)γβ∼Normal(0,1)σ∼Exponential(1)σα∼Exponential(1)σβ∼Exponential(1)R∼LKJcorr(2)
m6 <- brm(
data=d,
family=gaussian,
PA.std ~ 1 + PA_lag + (1 + PA_lag | id),
prior = c( prior( normal(0,1), class=Intercept ),
prior( normal(0,1), class=b ),
prior( exponential(1), class=sigma ),
prior( exponential(1), class=sd ),
prior( lkj(2), class=cor)),
iter=4000, warmup=1000, seed=9, cores=4, chains=4,
file=here("files/models/m72.6")
)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: PA.std ~ 1 + PA_lag + (1 + PA_lag | id)
Data: d (Number of observations: 13033)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Multilevel Hyperparameters:
~id (Number of levels: 193)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.43 0.03 0.39 0.49 1.00 1514
sd(PA_lag) 0.11 0.01 0.09 0.14 1.00 3422
cor(Intercept,PA_lag) 0.08 0.11 -0.13 0.28 1.00 5402
Tail_ESS
sd(Intercept) 2817
sd(PA_lag) 6761
cor(Intercept,PA_lag) 7765
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.01 0.03 -0.08 0.05 1.00 605 893
PA_lag 0.44 0.01 0.42 0.47 1.00 5254 7769
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.54 0.00 0.54 0.55 1.00 17859 8363
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
What can we learn from this model?
# plot posterior density
p1 = post %>%
ggplot(aes(x = b_PA_lag)) +
stat_halfeye(fill="#1c5253") +
geom_vline( aes(xintercept=0), linetype = "dashed" ) +
labs( title="posterior density",
x="slope",
y=NULL) +
scale_y_continuous(breaks=NULL)
# posterior lines
p2 = d %>%
sample_n(2000) %>%
ggplot(aes( x=PA_lag, y=PA.std )) +
geom_point(alpha=.2, shape = 20) +
geom_abline( aes(intercept=b_Intercept, slope=b_PA_lag),
data=post[1:50, ],
alpha=.3,
color="#1c5253") +
labs( x="yesterday",
y="today",
title="posterior lines")
p1 + p2
How much within-person variability is accounted for by variability in negative affect? Let’s compare estimates of the residual variability of these models.
# model with no predictors
m1_sigma = spread_draws(m1, sigma) %>% mutate(model = "intercept only")
m6_sigma = spread_draws(m6, sigma) %>% mutate(model = "with lag")
m1_sigma %>%
full_join(m6_sigma) %>%
group_by(model) %>%
mean_qi(sigma)
# A tibble: 2 × 7
model sigma .lower .upper .width .point .interval
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 intercept only 0.601 0.593 0.608 0.95 mean qi
2 with lag 0.543 0.536 0.550 0.95 mean qi
To what extent do people differ in the association of holdover PA?
To what extent do people differ in the association of holdover PA?
To what extent do people differ in the association of holdover PA?
m6 %>% spread_draws(b_Intercept, b_PA_lag, r_id[id, term]) %>%
pivot_wider(names_from = term, values_from = r_id) %>%
filter(id %in% sample_id)
# A tibble: 72,000 × 8
# Groups: id [6]
.chain .iteration .draw b_Intercept b_PA_lag id Intercept PA_lag
<int> <int> <int> <dbl> <dbl> <int> <dbl> <dbl>
1 1 1 1 -0.0308 0.431 6 -0.254 0.00547
2 1 1 1 -0.0308 0.431 17 -0.0328 -0.357
3 1 1 1 -0.0308 0.431 107 0.304 0.126
4 1 1 1 -0.0308 0.431 116 -0.0160 -0.156
5 1 1 1 -0.0308 0.431 170 -0.593 0.0832
6 1 1 1 -0.0308 0.431 202 -0.190 -0.117
7 1 2 2 -0.0321 0.432 6 -0.220 0.0652
8 1 2 2 -0.0321 0.432 17 0.00782 -0.110
9 1 2 2 -0.0321 0.432 107 0.280 0.0646
10 1 2 2 -0.0321 0.432 116 -0.0333 0.00658
# ℹ 71,990 more rows
post2 = m6 %>% spread_draws(b_Intercept, b_PA_lag, r_id[id, term]) %>%
pivot_wider(names_from = term, values_from = r_id) %>%
mutate(intercept = b_Intercept + Intercept,
slope = b_PA_lag + PA_lag) %>%
filter(id %in% sample_id) %>%
with_groups(id, sample_n, 50)
d %>%
filter(id %in% sample_id) %>%
ggplot( aes( x=PA_lag, y=PA.std ) ) +
geom_point(alpha=.5, shape=20) +
geom_abline( aes(intercept=intercept, slope=slope, color=as.factor(id)),
data = post2,
alpha=.4) +
facet_wrap(~id) +
guides(color="none") +
labs(y="today",
x="yesterday")
post2 = m6 %>% spread_draws(b_Intercept, b_PA_lag, r_id[id, term]) %>%
pivot_wider(names_from = term, values_from = r_id) %>%
mutate(intercept = b_Intercept + Intercept,
slope = b_PA_lag + PA_lag) %>%
filter(id %in% sample_id) %>%
with_groups(id, sample_n, 50)
d %>%
filter(id %in% sample_id) %>%
ggplot( aes( x=PA_lag, y=PA.std ) ) +
geom_point(alpha=.5, shape=20) +
geom_abline( aes(intercept=intercept, slope=slope, color=as.factor(id)),
data = post2,
alpha=.4) +
geom_abline( aes(intercept=b_Intercept, slope=b_PA_lag),
data = post[1:50, ],
alpha=.4) +
facet_wrap(~id) +
guides(color="none") +
labs(y="today",
x="yesterday")
Back to the question: how much do these slopes differ?
Is someone’s overall positive affect related to their holdover effect?
# A tibble: 2,316,000 × 6
# Groups: id [193]
id .chain .iteration .draw Intercept PA_lag
<int> <int> <int> <int> <dbl> <dbl>
1 1 1 1 1 0.0902 0.00725
2 1 1 2 2 0.0776 0.0551
3 1 1 3 3 0.122 -0.0274
4 1 1 4 4 0.0212 -0.00127
5 1 1 5 5 0.150 0.0538
6 1 1 6 6 -0.0100 -0.0403
7 1 1 7 7 0.0670 0.0744
8 1 1 8 8 0.0968 0.0747
9 1 1 9 9 0.0320 -0.0629
10 1 1 10 10 0.00455 0.132
# ℹ 2,315,990 more rows
m6 %>% spread_draws(r_id[id, term]) %>%
pivot_wider(names_from = term, values_from = r_id) %>%
ungroup() %>%
dplyr::select(.draw, Intercept, PA_lag) %>%
nest(data= -.draw) %>%
mutate(r = map_dbl(data, ~cor(.$Intercept, .$PA_lag)))
# A tibble: 12,000 × 3
.draw data r
<int> <list> <dbl>
1 1 <tibble [193 × 2]> 0.162
2 2 <tibble [193 × 2]> 0.134
3 3 <tibble [193 × 2]> 0.000967
4 4 <tibble [193 × 2]> -0.0256
5 5 <tibble [193 × 2]> -0.00712
6 6 <tibble [193 × 2]> 0.0129
7 7 <tibble [193 × 2]> 0.0624
8 8 <tibble [193 × 2]> 0.00134
9 9 <tibble [193 × 2]> 0.0653
10 10 <tibble [193 × 2]> -0.0965
# ℹ 11,990 more rows
m6 %>% spread_draws(r_id[id, term]) %>%
pivot_wider(names_from = term, values_from = r_id) %>%
ungroup() %>%
dplyr::select(.draw, Intercept, PA_lag) %>%
nest(data= -.draw) %>%
mutate(r = map_dbl(data, ~cor(.$Intercept, .$PA_lag))) %>%
mean_qi(r)
# A tibble: 1 × 6
r .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 0.0799 -0.0787 0.236 0.95 mean qi