week 8: multilevel models

multivariate and MELSM

library(tidyverse)
library(psych)
library(cowplot)
library(patchwork)
library(here)
library(brms) 
library(tidybayes)

Models 1, 2, and 4 are linked throughout this lecture.

Models 3 and 5 are too big to share via GitHub. If you’ve registered for the class, you can find those files on our Canvas site.

double-advanced potions multilevel models

Multivariate MLMs (M-MLM)
- Fit models with 2+ (don’t get crazy) outcomes.
- The advantage is estimating the correlations between the outcomes, as well as the Level 2 coefficients.
Mixed Effects Location Scale Models (MELSM)
- Estimate both the mean (location) and variance (scale) of the repeated measures for each of your clusters.

We’ll use data provided by Williams and colleagues (2021.

d <- read_csv("https://raw.githubusercontent.com/sjweston/uobayes/refs/heads/main/files/data/external_data/williams.csv")
# scaled time variable
d <- d %>% mutate(day01 = (day - 2) / max((day - 2)))
distinct(d, id) %>% count()

# A tibble: 1 × 1
      n
  <int>
1   193

d %>% 
  count(id) %>% 
  summarise(median = median(n),
            min = min(n),
            max = max(n))

# A tibble: 1 × 3
  median   min   max
   <int> <int> <int>
1     74     8    99

Code

d %>% 
  count(id) %>% 
  ggplot(aes(x = n)) +
  geom_histogram(fill = "#1c5253", color = "white") +
  scale_x_continuous("number of days", limits = c(0, NA))

rethinking::precis(d)

                 mean         sd        5.5%       94.5%      histogram
id        98.61029694 63.7493291 10.00000000 207.0000000   ▇▇▇▃▅▅▅▃▃▃▂▂
female     0.57016803  0.4950710  0.00000000   1.0000000     ▅▁▁▁▁▁▁▁▁▇
PA.std     0.01438236  1.0241384 -1.68129708   1.7514660       ▁▁▃▇▇▃▁▁
day       44.17962096 27.6985612  6.00000000  92.0000000     ▇▇▇▇▅▅▅▃▃▃
PA_lag     0.01992587  1.0183833 -1.72043510   1.7170361     ▁▂▃▅▇▇▅▃▂▁
NA_lag    -0.04575229  0.9833161 -0.87507181   1.9904675 ▇▃▂▁▁▁▁▁▁▁▁▁▁▁
steps.pm   0.05424387  0.6298941 -1.02580678   1.0113559       ▁▂▇▇▃▁▁▁
steps.pmd  0.02839160  0.7575395 -1.12359512   1.2299739 ▁▁▇▇▁▁▁▁▁▁▁▁▁▁
NA.std    -0.07545093  0.9495660 -1.04842926   1.8110607     ▁▁▁▇▂▁▁▁▁▁
day01      0.43040430  0.2826384  0.04081633   0.9183673     ▇▇▇▇▅▅▅▃▃▃

Code

set.seed(14)

d %>% 
  nest(data=-id) %>% 
  slice_sample(n = 6) %>% 
  unnest(data) %>% 
  ggplot(aes(x = day, y = NA_lag)) +
  geom_line(color = "grey") +
  geom_point(color = "#1c5253", size = 1/2) +
  geom_line(aes(y = PA_lag), color = "darkgrey") +
  geom_point(aes(y = PA_lag), color = "#e07a5f", size = 1/2) +
  ylab("affect (standardized)") +
  facet_wrap(~ id)

Let’s fit an MLM with varying intercepts and slopes.

Likelihood function and linear model

\[\begin{align*} \text{NA}_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha_{\text{id}[i]} + \beta_{1\text{id}[i]}\text{NA_lag}_i + \beta_{2\text{id}[i]}\text{PA_lag}_i \end{align*}\]

Varying intercepts and slopes:

\[\begin{align*} \alpha_{\text{id}[i]} &= \alpha + u_{\alpha,\text{id}[i]} \\ \beta_{1\text{id}[i]} &= \beta + u_{\beta,\text{id}[i]} \\ \beta_{2\text{id}[i]} &= \beta + u_{\beta,\text{id}[i]} \end{align*}\]

Random effects:

\[\begin{align*} \begin{bmatrix} u_{\alpha,\text{id}[i]} \\ u_{\beta,1\text{id}[i]} \\ u_{\beta,2\text{id}[i]} \end{bmatrix} &\sim \text{MVNormal}\left(\begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \mathbf{S}\right) \\ \mathbf{S} &= \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 \\ 0 & \sigma_{\beta}\end{pmatrix}\mathbf{R}\begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 \\ 0 & \sigma_{\beta}\end{pmatrix} \end{align*}\]

Priors: \[\begin{align*} \alpha &\sim \text{Normal}(0,0.2) \\ \beta &\sim \text{Normal}(0,0.5) \\ \sigma &\sim \text{Exponential}(1) \\ \sigma_{\alpha} &\sim \text{Exponential}(1) \\ \sigma_{\beta} &\sim \text{Exponential}(1) \\ \mathbf{R} &\sim \text{LKJcorr}(2) \end{align*}\]

Download model file here.

m1 <-
  brm(data = d,
      family = gaussian,
      NA.std ~ 1 + NA_lag + PA_lag + (1 + NA_lag + PA_lag | id),
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 1), class = b),
                prior(exponential(1), class = sd),
                prior(exponential(1), class = sigma),
                prior(lkj(2), class = cor)),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      seed = 14,
      file = here("files/models/m82.1"))

What if I wanted to model positive affect? Usually, I’d have to fit a second model.

Multivariate MLMs

But what if I told you I could estimate both positive and negative affect simultaneously? Let’s start with an extremely basic, intercept-only model.

Likelihood function

\[\begin{align*} \text{NA}_i &\sim \text{Normal}(\mu_{\text{NA},i}, \sigma_{\text{NA}}) \\ \mu_{\text{NA},i} &= \alpha_{\text{NA},\text{id}[i]} \\ \text{PA}_i &\sim \text{Normal}(\mu_{\text{PA},i}, \sigma_{\text{PA}}) \\ \mu_{\text{PA},i} &= \alpha_{\text{PA},\text{id}[i]} \end{align*}\]

Varying intercepts:

\[\begin{align*} \alpha_{\text{NA},\text{id}[i]} &= \alpha_{\text{NA}} + u_{\alpha,1\text{NA},\text{id}[i]} \\ \alpha_{\text{PA},\text{id}[i]} &= \alpha_{\text{PA}} + u_{\alpha,\text{PA},\text{id}[i]} \\ \end{align*}\]

Random Effects: Multivariate Distribution

\[\begin{align*} \mathbf{u}_{\text{id}[i]} = \begin{bmatrix} u_{\alpha,\text{NA},\text{id}[i]} \\ u_{\alpha,\text{PA},\text{id}[i]} \end{bmatrix} &\sim \text{MVNormal}(\mathbf{0}, \boldsymbol{\Sigma}) \\ \boldsymbol{\Sigma} &= \mathbf{L} \mathbf{R} \mathbf{L} \quad \\ \text{with} \quad \mathbf{L} &= \text{diag}(\sigma_{\alpha,\text{NA}}, \sigma_{\alpha,\text{PA}}) \end{align*}\]

Priors

\[\begin{align*} \alpha_{\text{NA}}, \alpha_{\text{PA}} &\sim \text{Normal}(0, 0.2) \\ \sigma_{\text{NA}}, \sigma_{\text{PA}} &\sim \text{Exponential}(1) \\ \sigma_{\alpha,\text{NA}}, \sigma_{\beta,1\text{NA}}, \sigma_{\beta,2\text{NA}}, \mathbf{R} &\sim \text{LKJcorr}(2) \end{align*}\]

Download model file here.

bf_na = bf(NA.std ~ 1 + (1  | c | id))
bf_pa = bf(PA.std ~ 1 + (1  | c | id))
m2 <-
  brm(data = d,
      family = gaussian,
      bf_na + bf_pa + set_rescor(TRUE),
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(lkj(2), class = cor)),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      seed = 14,
      file = here("files/models/m82.2"))

This model has partitioned our data into several key pieces. First, we can get the between-person variance and between-person correlations.

m2

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: NA.std ~ 1 + (1 | c | id) 
         PA.std ~ 1 + (1 | c | id) 
   Data: d (Number of observations: 13033) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~id (Number of levels: 193) 
                                     Estimate Est.Error l-95% CI u-95% CI Rhat
sd(NAstd_Intercept)                      0.73      0.04     0.65     0.81 1.00
sd(PAstd_Intercept)                      0.81      0.04     0.73     0.90 1.00
cor(NAstd_Intercept,PAstd_Intercept)    -0.05      0.07    -0.20     0.09 1.02
                                     Bulk_ESS Tail_ESS
sd(NAstd_Intercept)                       434     1010
sd(PAstd_Intercept)                       509      866
cor(NAstd_Intercept,PAstd_Intercept)      299      572

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
NAstd_Intercept    -0.03      0.05    -0.14     0.07 1.01      231      486
PAstd_Intercept     0.00      0.06    -0.11     0.11 1.02      251      566

Further Distributional Parameters:
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_NAstd     0.63      0.00     0.63     0.64 1.00    20087    11406
sigma_PAstd     0.60      0.00     0.59     0.61 1.00    17390    10555

Residual Correlations: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(NAstd,PAstd)    -0.05      0.01    -0.06    -0.03 1.00    17687    11771

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

list of terms

between-person

sd(NAstd_Intercept) : variability in average negative affect
sd(PAstd_Intercept) : variability in average positive affect
cor(NAstd_Intercept,PAstd_Intercept): correlation between average NA and average PA

within-person

sigma_NAstd: variability within participants in NA
sigma_PAstd: variability within participants in PA
rescor(NAstd,PAstd): within-person correlation of PA and NA

Now we’ll add in some predictors. Let’s predict affect from lagged affect.

Likelihood function

\[\begin{align*} \text{NA}_i &\sim \text{Normal}(\mu_{\text{NA},i}, \sigma_{\text{NA}}) \\ \mu_{\text{NA},i} &= \alpha_{\text{NA},\text{id}[i]} + \beta_{1\text{NA},\text{id}[i]}\,\text{PA_lag}_i + \beta_{2\text{NA},\text{id}[i]}\,\text{NA_lag}_i \\ \\ \text{PA}_i &\sim \text{Normal}(\mu_{\text{PA},i}, \sigma_{\text{PA}}) \\ \mu_{\text{PA},i} &= \alpha_{\text{PA},\text{id}[i]} + \beta_{1\text{PA},\text{id}[i]}\,\text{PA_lag}_i + \beta_{2\text{PA},\text{id}[i]}\,\text{NA_lag}_i \end{align*}\]

Varying intercepts and slopes:

\[\begin{align*} \alpha_{\text{NA},\text{id}[i]} &= \alpha_{\text{NA}} + u_{\alpha,1\text{NA},\text{id}[i]} \\ \beta_{1\text{NA},\text{id}[i]} &= \beta_{1\text{NA}} + u_{\beta,1\text{NA},\text{id}[i]} \\ \beta_{2\text{NA},\text{id}[i]} &= \beta_{2\text{NA}} + u_{\beta,2\text{NA},\text{id}[i]} \\ \alpha_{\text{PA},\text{id}[i]} &= \alpha_{\text{PA}} + u_{\alpha,\text{PA},\text{id}[i]} \\ \beta_{1\text{PA},\text{id}[i]} &= \beta_{1\text{PA}} + u_{1\beta,\text{PA},\text{id}[i]} \\ \beta_{2\text{PA},\text{id}[i]} &= \beta_{2\text{PA}} + u_{2\beta,\text{PA},\text{id}[i]} \end{align*}\]

Random Effects: Multivariate Distribution

\[\begin{align*} \mathbf{u}_{\text{id}[i]} = \begin{bmatrix} u_{\alpha,\text{NA},\text{id}[i]} \\ u_{\beta,1\text{NA},\text{id}[i]} \\ u_{\beta,2\text{NA},\text{id}[i]} \\ u_{\alpha,\text{PA},\text{id}[i]} \\ u_{\beta,1\text{PA},\text{id}[i]} \\ u_{\beta,2\text{PA},\text{id}[i]} \end{bmatrix} &\sim \text{MVNormal}(\mathbf{0}, \boldsymbol{\Sigma}) \\ \boldsymbol{\Sigma} &= \mathbf{L} \mathbf{R} \mathbf{L} \quad \\ \text{with} \quad \mathbf{L} &= \text{diag}(\sigma_{\alpha,\text{NA}}, \sigma_{\beta,1\text{NA}}, \sigma_{\beta,2\text{NA}}, \sigma_{\alpha,\text{PA}}, \sigma_{\beta,1\text{PA}}, \sigma_{\beta,2\text{PA}}) \end{align*}\]

Priors

\[\begin{align*} \alpha_{\text{NA}}, \alpha_{\text{PA}} &\sim \text{Normal}(0, 0.2) \\ \beta_{1\text{NA}}, \beta_{2\text{NA}}, \beta_{1\text{PA}}, \beta_{2\text{PA}} &\sim \text{Normal}(0, 0.5) \\ \sigma_{\text{NA}}, \sigma_{\text{PA}} &\sim \text{Exponential}(1) \\ \sigma_{\alpha,\text{NA}}, \sigma_{\beta,1\text{NA}}, \sigma_{\beta,2\text{NA}}, \sigma_{\alpha,\text{PA}}, \sigma_{\beta,1\text{PA}}, \sigma_{\beta,2\text{PA}} &\sim \text{Exponential}(1) \\ \mathbf{R} &\sim \text{LKJcorr}(2) \end{align*}\]

bf_na = bf(NA.std ~ 1 + NA_lag + PA_lag + (1 + NA_lag + PA_lag  | c | id))
bf_pa = bf(PA.std ~ 1 + NA_lag + PA_lag + (1 + NA_lag + PA_lag  | c | id))
m3 <-
  brm(data = d,
      family = gaussian,
      bf_na + bf_pa + set_rescor(TRUE),
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 1), class = b),
                prior(lkj(2), class = cor)),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      seed = 14,
      file = here("files/models/m82.3"))

m3

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: NA.std ~ 1 + NA_lag + PA_lag + (1 + NA_lag + PA_lag | c | id) 
         PA.std ~ 1 + NA_lag + PA_lag + (1 + NA_lag + PA_lag | c | id) 
   Data: d (Number of observations: 13033) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~id (Number of levels: 193) 
                                     Estimate Est.Error l-95% CI u-95% CI Rhat
sd(NAstd_Intercept)                      0.44      0.03     0.39     0.50 1.00
sd(NAstd_NA_lag)                         0.15      0.01     0.12     0.18 1.00
sd(NAstd_PA_lag)                         0.07      0.02     0.03     0.10 1.00
sd(PAstd_Intercept)                      0.43      0.03     0.38     0.48 1.00
sd(PAstd_NA_lag)                         0.05      0.02     0.01     0.08 1.00
sd(PAstd_PA_lag)                         0.11      0.01     0.09     0.14 1.00
cor(NAstd_Intercept,NAstd_NA_lag)        0.09      0.10    -0.10     0.29 1.00
cor(NAstd_Intercept,NAstd_PA_lag)        0.30      0.15    -0.00     0.60 1.00
cor(NAstd_NA_lag,NAstd_PA_lag)           0.07      0.18    -0.30     0.40 1.00
cor(NAstd_Intercept,PAstd_Intercept)    -0.13      0.08    -0.28     0.02 1.00
cor(NAstd_NA_lag,PAstd_Intercept)       -0.00      0.11    -0.22     0.22 1.00
cor(NAstd_PA_lag,PAstd_Intercept)       -0.12      0.16    -0.42     0.18 1.01
cor(NAstd_Intercept,PAstd_NA_lag)       -0.06      0.17    -0.39     0.28 1.00
cor(NAstd_NA_lag,PAstd_NA_lag)          -0.03      0.21    -0.43     0.38 1.00
cor(NAstd_PA_lag,PAstd_NA_lag)           0.26      0.24    -0.24     0.69 1.00
cor(PAstd_Intercept,PAstd_NA_lag)       -0.24      0.18    -0.59     0.12 1.00
cor(NAstd_Intercept,PAstd_PA_lag)       -0.18      0.11    -0.39     0.04 1.00
cor(NAstd_NA_lag,PAstd_PA_lag)           0.10      0.15    -0.19     0.39 1.00
cor(NAstd_PA_lag,PAstd_PA_lag)          -0.12      0.18    -0.48     0.24 1.00
cor(PAstd_Intercept,PAstd_PA_lag)        0.09      0.11    -0.11     0.30 1.00
cor(PAstd_NA_lag,PAstd_PA_lag)           0.32      0.21    -0.13     0.68 1.00
                                     Bulk_ESS Tail_ESS
sd(NAstd_Intercept)                      3584     6292
sd(NAstd_NA_lag)                         6665    10373
sd(NAstd_PA_lag)                         1557     1372
sd(PAstd_Intercept)                      4387     7836
sd(PAstd_NA_lag)                         2112     2250
sd(PAstd_PA_lag)                         7127    10096
cor(NAstd_Intercept,NAstd_NA_lag)        6685     9729
cor(NAstd_Intercept,NAstd_PA_lag)        7006     7448
cor(NAstd_NA_lag,NAstd_PA_lag)           7576     7022
cor(NAstd_Intercept,PAstd_Intercept)     2349     4989
cor(NAstd_NA_lag,PAstd_Intercept)         897     1718
cor(NAstd_PA_lag,PAstd_Intercept)         280      469
cor(NAstd_Intercept,PAstd_NA_lag)       11866    10018
cor(NAstd_NA_lag,PAstd_NA_lag)           8743     9256
cor(NAstd_PA_lag,PAstd_NA_lag)           2757     5270
cor(PAstd_Intercept,PAstd_NA_lag)       14554    10682
cor(NAstd_Intercept,PAstd_PA_lag)       12167    12720
cor(NAstd_NA_lag,PAstd_PA_lag)           4607     8122
cor(NAstd_PA_lag,PAstd_PA_lag)           2163     3286
cor(PAstd_Intercept,PAstd_PA_lag)       14121    12777
cor(PAstd_NA_lag,PAstd_PA_lag)           1588     2031

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
NAstd_Intercept    -0.03      0.03    -0.10     0.03 1.00     1348     2892
PAstd_Intercept    -0.01      0.03    -0.07     0.05 1.00     2691     5026
NAstd_NA_lag        0.35      0.02     0.32     0.38 1.00     7233    10519
NAstd_PA_lag        0.01      0.01    -0.01     0.03 1.00     7186    11061
PAstd_NA_lag        0.04      0.01     0.02     0.06 1.00    16764    13101
PAstd_PA_lag        0.45      0.01     0.43     0.48 1.00    12268    12423

Further Distributional Parameters:
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_NAstd     0.59      0.00     0.59     0.60 1.00    27696    10725
sigma_PAstd     0.54      0.00     0.54     0.55 1.00    31518    11100

Residual Correlations: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
rescor(NAstd,PAstd)    -0.02      0.01    -0.04    -0.01 1.00    32445    12034

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

These models can help us address many questions. First, let’s examine the between- and within-person variance of these outcomes.

Code

post = as_draws_df(m3)

post %>% 
  select(between = cor_id__NAstd_Intercept__PAstd_Intercept, 
         within = rescor__NAstd__PAstd) %>% 
  pivot_longer(everything()) %>% 
  group_by(name) %>% 
  mean_qi()

# A tibble: 2 × 7
  name      value  .lower   .upper .width .point .interval
  <chr>     <dbl>   <dbl>    <dbl>  <dbl> <chr>  <chr>    
1 between -0.133  -0.284   0.0217    0.95 mean   qi       
2 within  -0.0249 -0.0423 -0.00733   0.95 mean   qi

To clarify: this is the within-person correlation after accounting for the predictors in the model. This can help us detect any additional patterns in the data.

Code

m3 %>% 
  gather_draws(r_id__NAstd[id, term], r_id__PAstd[id, term]) %>% 
  mutate(.variable=str_remove(.variable, "r_id__")) %>% 
  pivot_wider(names_from = .variable, values_from = .value) %>% 
  # just for presenting to class 
  arrange(.draw, id, term) %>% select(-.chain, -.iteration) %>% 
  mutate(across(c(NAstd, PAstd), \(x) round(x, 2)))

# A tibble: 9,264,000 × 5
# Groups:   id, term [579]
      id term      .draw NAstd PAstd
   <int> <chr>     <int> <dbl> <dbl>
 1     1 Intercept     1  0.14  0.05
 2     1 NA_lag        1  0.11  0.08
 3     1 PA_lag        1  0.07  0.13
 4     2 Intercept     1 -0.4   0.88
 5     2 NA_lag        1 -0.03  0   
 6     2 PA_lag        1 -0.04  0.16
 7     3 Intercept     1  0.82  0   
 8     3 NA_lag        1  0.33 -0.09
 9     3 PA_lag        1  0.1  -0.2 
10     4 Intercept     1  0.2  -0.08
# ℹ 9,263,990 more rows

spaghetti plots

Code

nd = data.frame(
  PA_lag = seq(from = min(d$PA_lag), to=max(d$PA_lag), length.out=50),
  NA_lag = 0,
  id = max(d$id) + 1
)

nd %>% add_epred_draws(m3, allow_new_levels=T) %>% 
  filter(.draw <= 50) %>% 
  ggplot(aes(x = PA_lag, y = .epred)) +
  geom_line(aes(group = .draw, color = .category), alpha=.2) +
  scale_color_manual(values = c("#1c5253" , "#e07a5f")) +
  facet_wrap(~.category) +
  guides(color="none") +
  labs(x = "Positive Affect (lagged)", y = "Expected score")

Code

nd = data.frame(
  NA_lag = seq(from = min(d$PA_lag), to=max(d$PA_lag), length.out=50),
  PA_lag = 0,
  id = max(d$id) + 1
)

nd %>% add_epred_draws(m3, allow_new_levels=T) %>% 
  filter(.draw <= 50) %>% 
  ggplot(aes(x = NA_lag, y = .epred)) +
  geom_line(aes(group = .draw, color = .category), alpha=.2) +
  scale_color_manual(values = c("#1c5253" , "#e07a5f")) +
  facet_wrap(~.category) +
  guides(color="none") +
  labs(x = "Negative Affect (lagged)", y = "Expected score")

But of course, we’ve allowed these effects to vary for each individual.

Code

nd = expand.grid(
  PA_lag = seq(from = min(d$PA_lag), to=max(d$PA_lag), length.out=50),
  NA_lag = 0,
  id = sample(d$id, size = 4, replace=F)
)

nd %>% add_epred_draws(m3) %>% 
  filter(.draw <= 20) %>% 
  ggplot(aes(x = PA_lag, y = .epred)) +
  geom_line(aes(group = .draw, color = .category), alpha=.2) +
  scale_color_manual(values = c("#1c5253" , "#e07a5f")) +
  facet_grid(.category~id) +
  guides(color="none") +
  labs(x = "Positive Affect (lagged)", y = "Expected score")

We may even want to see whether there’s a correlation between the effect of positive affect on PA and on NA.

Code

postm2 = m3 %>% spread_draws(r_id__PAstd[id, term],
                    r_id__NAstd[id, term]) 

postm2 %>% 
  filter(term == "PA_lag") %>% 
  mean_qi %>% 
  ggplot(aes(x = r_id__PAstd, y = r_id__NAstd)) + 
  geom_point() +
  labs(x = "Effect of PA on PA", y="Effect of PA on NA")

But of course, each of these points is a summary of a distribution! The real benefit of modeling these outcomes jointly is that we can also get a distribution of the correlations between these effects.

m3 %>% spread_draws(`cor.*`, regex = T) %>% 
  pivot_longer(starts_with("cor"),
               names_prefix = "cor_id__") %>% 
  group_by(name) %>% 
  mean_qi(value) %>% 
  arrange(desc(abs(value)))

# A tibble: 15 × 7
   name                            value   .lower .upper .width .point .interval
   <chr>                           <dbl>    <dbl>  <dbl>  <dbl> <chr>  <chr>    
 1 PAstd_NA_lag__PAstd_PA_lag    0.318   -0.132   0.681    0.95 mean   qi       
 2 NAstd_Intercept__NAstd_PA_l…  0.303   -0.00369 0.601    0.95 mean   qi       
 3 NAstd_PA_lag__PAstd_NA_lag    0.261   -0.240   0.692    0.95 mean   qi       
 4 PAstd_Intercept__PAstd_NA_l… -0.241   -0.588   0.123    0.95 mean   qi       
 5 NAstd_Intercept__PAstd_PA_l… -0.183   -0.391   0.0356   0.95 mean   qi       
 6 NAstd_Intercept__PAstd_Inte… -0.133   -0.284   0.0217   0.95 mean   qi       
 7 NAstd_PA_lag__PAstd_PA_lag   -0.123   -0.479   0.242    0.95 mean   qi       
 8 NAstd_PA_lag__PAstd_Interce… -0.121   -0.419   0.180    0.95 mean   qi       
 9 NAstd_NA_lag__PAstd_PA_lag    0.102   -0.190   0.393    0.95 mean   qi       
10 PAstd_Intercept__PAstd_PA_l…  0.0950  -0.113   0.296    0.95 mean   qi       
11 NAstd_Intercept__NAstd_NA_l…  0.0946  -0.103   0.291    0.95 mean   qi       
12 NAstd_NA_lag__NAstd_PA_lag    0.0685  -0.300   0.405    0.95 mean   qi       
13 NAstd_Intercept__PAstd_NA_l… -0.0645  -0.393   0.283    0.95 mean   qi       
14 NAstd_NA_lag__PAstd_NA_lag   -0.0334  -0.433   0.383    0.95 mean   qi       
15 NAstd_NA_lag__PAstd_Interce… -0.00442 -0.223   0.220    0.95 mean   qi

modeling variability

Traditionally, psycho-social research has focused on identifying trait-like behaviors - for example, how negative a person is on average. But more recent work has broadened to include states - that is, whether and how much individuals fluctuate in their thoughts, feelings, and behaviors over time.

On the macro time scale, we typically assume stable traits, but on the micro time scale (day to day), we often observe substantial variability in these same traits. This presents a methodological challenge: how do we simultaneously model both the stable personality traits (the “location”) and the fluctuations (the “scale”)? The standard approach has been a two-stage process:

compute individual means (iMs) in a mixed effects model
obtain individual standard deviations (iSDs) from the residuals

But this approach has several drawbacks:

It can result in unreliable estimates
It’s particularly sensitive to the number of measurement occasions
It assumes independence of means and variances, which seems unlikely
It results in biased variance estimates

mixed-effects location scale models

In a standard linear mixed effects model with repeated measurements, we have:

\[ y_i = X_i\beta + Z_ib_i + \epsilon_i \] Where:

\(y_i\) is the observations for the ith person
\(X_i\beta\) represents the fixed effects
\(Z_ib_i\) represents the random effects (person-specific deviations)
\(\epsilon_i\) represents the error term (or states)

The key limitation of this standard model is that it assumes the same error variance \((\sigma^2)\) for everyone. The MELSM, instead, allows \(\sigma^2\) to differ at the individual level and even at different time points. The scale component is modeled as:

\[ \varphi_i = \text{exp}(W_i\eta + V_it_i) \] Where:

\(\varphi_i\) contains the error variances
\(\eta\) represents fixed effects for the scale
\(t_i\) represents random effects for the scale
The exponential function ensures positive variance estimates

Download model file here.

m4 <-
  brm(data = d,
      family = gaussian,
    bf(
    # location
    PA.std ~ 1 + (1 | c | id), 
    # scale (the c  in |c| allows for correlation between location and scale)
    sigma ~ 1 + (1 | c | id)
    ), 
    prior = c(prior(normal(0, 0.2), class = Intercept),
              prior(exponential(1), class = sd),
              prior(normal(0,1), class = Intercept, dpar=sigma),
              prior(exponential(1), class = sd, dpar=sigma),
              prior(lkj(2), class = cor)),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      seed = 14,
      file = here("files/models/m82.4"))

m4

 Family: gaussian 
  Links: mu = identity; sigma = log 
Formula: PA.std ~ 1 + (1 | c | id) 
         sigma ~ 1 + (1 | c | id)
   Data: d (Number of observations: 13033) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~id (Number of levels: 193) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                      0.81      0.04     0.73     0.90 1.00
sd(sigma_Intercept)                0.43      0.02     0.39     0.48 1.00
cor(Intercept,sigma_Intercept)    -0.34      0.07    -0.47    -0.21 1.00
                               Bulk_ESS Tail_ESS
sd(Intercept)                       400      789
sd(sigma_Intercept)                1517     3427
cor(Intercept,sigma_Intercept)      962     1969

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           0.01      0.06    -0.10     0.12 1.03      215      408
sigma_Intercept    -0.64      0.03    -0.70    -0.57 1.00      666     1753

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

We’ve seen the parameters for the location section before: these are the samed “fixed” parameters we’ve been covering in MLMs the past 4 lectures. Let’s talk about what the sigma part means.

posterior_summary(m4)["b_sigma_Intercept", ] |> round(2)

 Estimate Est.Error      Q2.5     Q97.5 
    -0.64      0.03     -0.70     -0.57

As a reminder, this is the intercept of the log of sigma (\(\text{log}(\sigma)\)). So we need to exponentiate before we can interpret it as a standard deviation.

posterior_summary(m4)["b_sigma_Intercept", ] |> exp() |> round(2)

 Estimate Est.Error      Q2.5     Q97.5 
     0.53      1.03      0.50      0.56

This is just the “average” variability of participants. Some of them vary a lot, and some vary a little.

Code

m4 |> spread_draws(b_sigma_Intercept, r_id__sigma[id, term] ) |> 
  filter(term == "Intercept") |> 
  mutate(sigma=b_sigma_Intercept+r_id__sigma) |> 
  mean_qi(sigma) |> 
  mutate(across(c(sigma, .lower, .upper), exp)) |> 
  arrange(sigma) |> 
  mutate(rank=row_number()) |> 
  ggplot( aes(x=rank, y = sigma, ymin=.lower, ymax=.upper) ) +
  geom_pointrange(linewidth = 0.4, fatten = 0.3, color = "#5e8485") +
  scale_x_continuous("participants ranked by posterior mean sigma", breaks = NULL) +
  ylab(expression(exp(eta[0]+italic(u)[2][italic(i)])))

Our posterior distribution contains both locations (means) and scales \(\text{log}(\sigma)\) for each of our clusters (participants). We can plot these to see whether they’re related.

Code

m4 |> 
  spread_draws( r_id[id, term], r_id__sigma[id, term] ) |> 
  summarise(  location=mean(r_id), 
              scale=mean(exp(r_id__sigma))) |> 
  ggplot( aes(x=location, y=scale) ) +
  geom_point() +
  geom_smooth(se=F, method="lm")

Of course, the correlation between these values can also be found in our model output.

posterior_summary(m4)["cor_id__Intercept__sigma_Intercept", ] |> round(2)

 Estimate Est.Error      Q2.5     Q97.5 
    -0.34      0.07     -0.47     -0.21

adding predictors

The real power of these models comes when we incorporate predictors for both location and scale.

m5 <-brm(
  data = d,
  family = gaussian,
  bf(
    # location
    PA.std ~ 1 + day + PA_lag + NA_lag + steps.pm*steps.pmd + 
      (1 + day + steps.pmd | c | id), 
    sigma ~ 1 + PA_lag + NA_lag + steps.pm*steps.pmd + (1 + steps.pmd | c | id)
    ), 
    prior = c(prior(normal(0, 0.2), class = Intercept),
              prior(normal(0, 1), class = b),
              prior(exponential(1), class = sd),
              prior(normal(0,1), class = Intercept, dpar=sigma),
              prior(exponential(1), class = sd, dpar=sigma),
              prior(lkj(2), class = cor)),
    iter = 5000, warmup = 1000, chains = 4, cores = 4,
    seed = 14,
    file = here("files/models/m82.5"))

Our model clearly has not mixed – be sure to run this model for many iterations if you actually want to use it. As it was, this model took 80 minutes to run on my computer, so I didn’t have time to go back and run for longer.

m5

 Family: gaussian 
  Links: mu = identity; sigma = log 
Formula: PA.std ~ 1 + day + PA_lag + NA_lag + steps.pm * steps.pmd + (1 + day + steps.pmd | c | id) 
         sigma ~ 1 + PA_lag + NA_lag + steps.pm * steps.pmd + (1 + steps.pmd | c | id)
   Data: d (Number of observations: 13033) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~id (Number of levels: 193) 
                                     Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                            0.61      0.13     0.49     0.90 1.60
sd(day)                                  0.03      0.11     0.00     0.23 1.60
sd(steps.pmd)                            0.25      0.18     0.12     0.57 1.56
sd(sigma_Intercept)                      0.38      0.11     0.13     0.49 1.60
sd(sigma_steps.pmd)                      0.13      0.04     0.03     0.18 1.46
cor(Intercept,day)                      -0.08      0.38    -0.43     0.67 1.60
cor(Intercept,steps.pmd)                -0.29      0.37    -0.91     0.08 1.60
cor(day,steps.pmd)                      -0.14      0.11    -0.35     0.08 1.19
cor(Intercept,sigma_Intercept)          -0.47      0.28    -0.94    -0.18 1.60
cor(day,sigma_Intercept)                -0.14      0.11    -0.40     0.07 1.37
cor(steps.pmd,sigma_Intercept)           0.41      0.34     0.07     0.98 1.57
cor(Intercept,sigma_steps.pmd)          -0.09      0.46    -0.52     0.70 1.59
cor(day,sigma_steps.pmd)                -0.08      0.12    -0.28     0.15 1.10
cor(steps.pmd,sigma_steps.pmd)          -0.27      0.32    -0.79     0.12 1.58
cor(sigma_Intercept,sigma_steps.pmd)    -0.10      0.44    -0.85     0.33 1.58
                                     Bulk_ESS Tail_ESS
sd(Intercept)                               7       11
sd(day)                                     7       11
sd(steps.pmd)                               7       18
sd(sigma_Intercept)                         7       11
sd(sigma_steps.pmd)                         9       11
cor(Intercept,day)                          7       11
cor(Intercept,steps.pmd)                    7       11
cor(day,steps.pmd)                         14       22
cor(Intercept,sigma_Intercept)              7       11
cor(day,sigma_Intercept)                    9       12
cor(steps.pmd,sigma_Intercept)              7       11
cor(Intercept,sigma_steps.pmd)              7       11
cor(day,sigma_steps.pmd)                   24     1751
cor(steps.pmd,sigma_steps.pmd)              7       11
cor(sigma_Intercept,sigma_steps.pmd)        7       11

Regression Coefficients:
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                    0.91      2.28    -0.04     6.52 1.59        7
sigma_Intercept             -0.19      1.09    -0.85     2.76 1.60        7
day                         -0.01      0.04    -0.10    -0.00 1.58        7
PA_lag                       0.11      0.44    -0.76     0.38 1.59        7
NA_lag                       0.28      0.43     0.01     1.06 1.59        7
steps.pm                     0.05      0.19    -0.27     0.26 1.56        7
steps.pmd                   -0.39      0.87    -1.97     0.13 1.59        7
steps.pm:steps.pmd           0.25      0.49    -0.07     1.13 1.59        7
sigma_PA_lag                 0.02      0.09    -0.12     0.35 1.60        7
sigma_NA_lag                 0.04      0.04     0.00     0.14 1.51        7
sigma_steps.pm              -0.12      0.15    -0.52     0.05 1.59        7
sigma_steps.pmd             -0.06      0.06    -0.24    -0.00 1.59        7
sigma_steps.pm:steps.pmd     0.01      0.08    -0.07     0.21 1.59        7
                         Tail_ESS
Intercept                      11
sigma_Intercept                12
day                            11
PA_lag                         11
NA_lag                         12
steps.pm                       20
steps.pmd                      11
steps.pm:steps.pmd             11
sigma_PA_lag                   11
sigma_NA_lag                   11
sigma_steps.pm                 11
sigma_steps.pmd                11
sigma_steps.pm:steps.pmd       11

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

We can visualize the fixed effects:

Code

m5 |> 
  gather_draws(`^b_.*`, regex=T) |> 
  mutate(
    out = ifelse( str_detect(.variable, "sigma"), "var", "location"),
    .variable = str_remove(.variable, "^b_sigma_"),
    .variable = str_remove(.variable, "^b_")
  ) |> 
    filter(.variable != "Intercept") |> 
    ggplot( aes(x=.value, y=.variable) ) +
    stat_halfeye() +
    facet_wrap(~out, nrow=2, scales="free") +
    labs(x="coef estimate", y=NULL, title="posterior")

People who walk more have higher PA

posterior_summary(m5)["b_steps.pm", ] |> round(2)

 Estimate Est.Error      Q2.5     Q97.5 
     0.05      0.19     -0.27      0.26

and are less variable:

posterior_summary(m5)["b_sigma_steps.pm", ] |> round(2)

 Estimate Est.Error      Q2.5     Q97.5 
    -0.12      0.15     -0.52      0.05

Code

means = d |> 
  select(PA_lag, day, NA_lag, steps.pmd) |> 
  summarise(across(everything(),mean))

tibble(steps.pm = seq( from=min(d$steps.pm), to=max(d$steps.pm), length=100)) |> 
  cross_join(means) |> 
  add_predicted_draws(m5, re_formula = NA) |> 
  ggplot( aes(x=steps.pm, y=.prediction) ) +
  geom_point(alpha=.5) +
  labs(x="Average steps", y="PA.std")

summary

Now that you know the basic MLM structure, you can expand these models in complex ways:

estimate multiple outcomes simultaneously
allow within-cluster variability to vary and predict it

These models demand a lot more attention, forethought, and horsepower. But they can be worth it. Model-implied estimates appropriately account for uncertainty, whereas calculed estimates do not.