library(tidyverse)
library(psych)
library(cowplot)
library(patchwork)
library(here)
library(brms) 
library(tidybayes) 

Ordered categorical outcomes

Ordered categories are variables that are discrete, like a count, except that the values merely indicate different ordered levels.1 Most Likert scale questions are ordered categories, even though we pretend they’re continuous (see polychoric correlations).

If we want to model an outcome as an ordered categorical variable, what we’re really asking is how does moving along an associated predictor variable move predictions in the outcome progressively through the categories in sequence. We must therefore model our outcomes in the right order.

To do so, we’ll use the principles of the generalized linear model discussed last week. That is, we use a link function – the CUMULATIVE LINK FUNCTION – to model the probability of a value that is that value or smaller.


example: the trolley problem

data(Trolley, package = "rethinking")
trolley <- Trolley
head(trolley)
##    case response order     id age male           edu action intention contact
## 1 cfaqu        4     2 96;434  14    0 Middle School      0         0       1
## 2 cfbur        3    31 96;434  14    0 Middle School      0         0       1
## 3 cfrub        4    16 96;434  14    0 Middle School      0         0       1
## 4 cibox        3    32 96;434  14    0 Middle School      0         1       1
## 5 cibur        3     4 96;434  14    0 Middle School      0         1       1
## 6 cispe        3     9 96;434  14    0 Middle School      0         1       1
##   story action2
## 1   aqu       1
## 2   bur       1
## 3   rub       1
## 4   box       1
## 5   bur       1
## 6   spe       1

trolley %>% 
  ggplot(aes( x=response )) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


trolley %>%
  count(response) %>% 
  mutate(pr_k     = n/nrow(trolley),
         cum_pr_k = cumsum(pr_k)) %>% 
  ggplot(aes(x = response, y = cum_pr_k)) +
  geom_point(size = 3, color = "#1c5253") +
  geom_line(color = "#1c5253", alpha = .3) + 
  labs(x = "response",
       y = "cumulative probability")


# primary data
trolley_plot <-
  trolley %>%
  count(response) %>%
  mutate(pr_k     = n / nrow(trolley),
         cum_pr_k = cumsum(n / nrow(trolley))) %>% 
  mutate(discrete_probability = ifelse(response == 1, cum_pr_k, cum_pr_k - pr_k))

# annotation
text <-
  tibble(text     = 1:7,
         response = seq(from = 1.25, to = 7.25, by = 1),
         cum_pr_k = trolley_plot$cum_pr_k - .065)

trolley_plot %>% 
  ggplot(aes(x = response, y = cum_pr_k,
             color = cum_pr_k, fill = cum_pr_k)) +
  geom_line() +
  geom_point(shape = 21, colour = "grey92", 
             size = 2.5, stroke = 1) +
  geom_linerange(aes(ymin = 0, ymax = cum_pr_k),
                 alpha = 1/2) +
  geom_linerange(aes(x = response + .025,
                     ymin = ifelse(response == 1, 0, discrete_probability), 
                     ymax = cum_pr_k),
                 color = "black") +
  # number annotation
  geom_text(data = text, 
            aes(label = text),
            size = 4) +
  scale_x_continuous(breaks = 1:7) +
  scale_y_continuous("cumulative proportion", breaks = c(0, .5, 1), 
                     limits = c(0, 1.1)) +
  theme(axis.ticks = element_blank(),
        axis.title.y = element_text(angle = 90),
        legend.position = "none")


On to the modeling. We’ll start with an intercept-only model.

RiCategorical(p)logit(pk)=αkϕϕ=0αkNormal(0,1.5)

Remember, ϕ doesn’t have an intercept because there’s a α for each cut point.

m1 <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1, 
  prior = c( prior(normal(0, 1.5), class = Intercept) ),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/models/62.1")
)

αK denotes the K1 cut points or thresholds.

ϕ is a stand-in for a potential linear model.


summary(m1)
##  Family: cumulative 
##   Links: mu = logit; disc = identity 
## Formula: response ~ 1 
##    Data: trolley (Number of observations: 9930) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept[1]    -1.92      0.03    -1.98    -1.86 1.00     2725     3116
## Intercept[2]    -1.27      0.02    -1.32    -1.22 1.00     3960     3485
## Intercept[3]    -0.72      0.02    -0.76    -0.68 1.00     4323     3760
## Intercept[4]     0.25      0.02     0.21     0.29 1.00     4822     3486
## Intercept[5]     0.89      0.02     0.85     0.93 1.00     4364     3378
## Intercept[6]     1.77      0.03     1.71     1.83 1.00     4658     3543
## 
## Further Distributional Parameters:
##      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## disc     1.00      0.00     1.00     1.00   NA       NA       NA
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Implementation of the cumulative-logit model is

P(Y ≤ k) = F(disc * (thres[k] - mu))

Where:

  • thres[k] - These are the threshold parameters (often called “cutpoints” or α_k in your lecture notes). They represent the boundaries between adjacent categories on the latent continuous scale. In brms, these are the “Intercept[k]” parameters you see in the model output.

  • mu - This is the linear predictor, which can include fixed effects, random effects, etc. It represents the location of an observation on the latent scale.

  • disc - This is a discrimination parameter (sometimes called a scale parameter). It controls how quickly the probabilities change as you move along the latent scale. Higher values mean sharper distinctions between categories.

In lecture, we formulate cumulative logit as

logit(p_k) = α_k - φ_i

Where:

  • α_k corresponds to thres[k] in brms
  • φ_i corresponds to mu in brms
  • The discrimination parameter is implicitly set to 1

The negative sign in front of φ_i in your formulation corresponds to how brms parameterizes the model with (thres[k] - mu) rather than (mu - thres[k]).


The thresholds are in the logit-metric. Let’s convert these back to probabilities.

gather_draws(m1, `b_.*`, regex=T) %>% 
  mutate(.value = inv_logit_scaled(.value)) %>% 
  mean_qi
## # A tibble: 6 × 7
##   .variable      .value .lower .upper .width .point .interval
##   <chr>           <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 b_Intercept[1]  0.128  0.122  0.135   0.95 mean   qi       
## 2 b_Intercept[2]  0.220  0.212  0.228   0.95 mean   qi       
## 3 b_Intercept[3]  0.328  0.319  0.337   0.95 mean   qi       
## 4 b_Intercept[4]  0.562  0.552  0.571   0.95 mean   qi       
## 5 b_Intercept[5]  0.709  0.700  0.718   0.95 mean   qi       
## 6 b_Intercept[6]  0.854  0.847  0.861   0.95 mean   qi

exercise

Create a plot showing the posterior predictive distribution for the intercept-only model (m1). Can you visualize both the posterior and the observed data?


solution

predicted_draws(m1, newdata = data.frame(1)) %>% 
  count(.prediction) %>%
  mutate(pr_k = n / sum(n)) %>% 
  ggplot(aes(x = .prediction, y = pr_k)) +
  geom_col(fill = "#1c5253", alpha = 0.7) +
  geom_point(data = trolley_plot, aes(x = response), 
             color = "#3d405b", size = 2) +
  labs(x = "response",
       y = "probability",
       title = "Posterior Predictive Distribution") 


adding predictors

Now let’s add the features of the story – contact, action, and intention – as predictors in our model. Let’s expand our mathematical model. Here’s our original:

RiCategorical(p)logit(pk)=αkϕiϕi=0αkNormal(0,1.5)

McElreath proposes the following causal formula:

ϕi=β1actioni+β2contacti+β3intentioni


m2 <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1 + action + contact + intention, 
  prior = c( prior(normal(0, 1.5), class = Intercept),
             prior(normal(0, 0.5), class = b)),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/models/62.2")
)

m2
##  Family: cumulative 
##   Links: mu = logit; disc = identity 
## Formula: response ~ 1 + action + contact + intention 
##    Data: trolley (Number of observations: 9930) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept[1]    -2.83      0.05    -2.92    -2.74 1.00     2960     2654
## Intercept[2]    -2.15      0.04    -2.23    -2.06 1.00     3324     2745
## Intercept[3]    -1.56      0.04    -1.64    -1.49 1.00     2997     2965
## Intercept[4]    -0.54      0.04    -0.61    -0.47 1.00     3180     2764
## Intercept[5]     0.12      0.04     0.05     0.20 1.00     3373     3358
## Intercept[6]     1.03      0.04     0.95     1.11 1.00     3586     3183
## action          -0.70      0.04    -0.78    -0.62 1.00     3744     3404
## contact         -0.95      0.05    -1.05    -0.85 1.00     3038     2764
## intention       -0.72      0.04    -0.79    -0.64 1.00     4178     3090
## 
## Further Distributional Parameters:
##      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## disc     1.00      0.00     1.00     1.00   NA       NA       NA
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

gather_draws(m2, `b_.*`, regex=T) %>% 
  filter(!str_detect(.variable, "[0-9]")) %>% 
  mutate(.variable = str_remove(.variable, "b_")) %>% 
  ggplot( aes(x = .value, y = .variable) ) +
  geom_vline( aes(xintercept = 0), linetype = "dashed", alpha = .3 ) +
  stat_dist_halfeye(fill = "#5e8485") +
  scale_x_continuous("marginal posterior", breaks = -5:0 / 4) +
  scale_y_discrete(NULL) 


nd <- 
  trolley %>% 
  distinct(action, contact, intention) %>% 
  mutate(combination = str_c(action, contact, intention, sep = "_"))

f <- predicted_draws( m2 , newdata = nd )

f
## # A tibble: 24,000 × 9
## # Groups:   action, contact, intention, combination, .row [6]
##    action contact intention combination  .row .chain .iteration .draw
##     <int>   <int>     <int> <chr>       <int>  <int>      <int> <int>
##  1      0       1         0 0_1_0           1     NA         NA     1
##  2      0       1         0 0_1_0           1     NA         NA     2
##  3      0       1         0 0_1_0           1     NA         NA     3
##  4      0       1         0 0_1_0           1     NA         NA     4
##  5      0       1         0 0_1_0           1     NA         NA     5
##  6      0       1         0 0_1_0           1     NA         NA     6
##  7      0       1         0 0_1_0           1     NA         NA     7
##  8      0       1         0 0_1_0           1     NA         NA     8
##  9      0       1         0 0_1_0           1     NA         NA     9
## 10      0       1         0 0_1_0           1     NA         NA    10
## # ℹ 23,990 more rows
## # ℹ 1 more variable: .prediction <fct>

f %>% 
  as.data.frame() %>% 
  mutate(
    across( c(action, contact, intention) , 
            as.character),
    action  = str_c("action = ",  action),
    contact = str_c("contact = ", contact)) %>% 
  count(action, contact, intention, .prediction) %>% 
  ggplot( aes(x=.prediction, y=n, color=intention) )+
  geom_point() +
  geom_line(aes(group=intention)) +
  facet_grid(~action+contact) +
  labs( x="response", 
        y="count" )


exercise

Create a new model that models the interaction between intention and the other variables. Then:

  1. Compare the fits of these models using PSIS or WAIC.
  2. Plot the posterior predictive distribution. How does it differ from our previous model?

solution

m2_complex <- brm(
  data = trolley,
  family = cumulative, 
  response ~ 1 + action + contact + intention + 
    action:intention + contact:intention, 
  prior = c(prior(normal(0, 1.5), class = Intercept),
            prior(normal(0, 0.5), class = b)),
  iter = 2000, warmup = 1000, cores = 4, chains = 4,
  file=here("files/models/62.2com")
)

solution: compare

m2  <-       add_criterion(m2,        "loo")
m2_complex <- add_criterion(m2_complex, "loo")

loo_compare(m2, m2_complex, criterion = "loo") %>% 
  print(simplify = F)
##            elpd_diff se_diff  elpd_loo se_elpd_loo p_loo    se_p_loo looic   
## m2_complex      0.0       0.0 -18464.7     40.4        11.0      0.1  36929.3
## m2            -80.3      12.5 -18545.0     37.9         9.0      0.0  37090.0
##            se_looic
## m2_complex     80.8
## m2             75.8
m2  <-       add_criterion(m2,        "waic")
m2_complex <- add_criterion(m2_complex, "waic")

loo_compare(m2, m2_complex, criterion = "waic") %>% 
  print(simplify = F)
##            elpd_diff se_diff  elpd_waic se_elpd_waic p_waic   se_p_waic
## m2_complex      0.0       0.0 -18464.6      40.4         11.0      0.1 
## m2            -80.3      12.5 -18545.0      37.9          9.0      0.0 
##            waic     se_waic 
## m2_complex  36929.3     80.8
## m2          37090.0     75.8

solution: ppd

predicted_draws(object = m2_complex, newdata = nd) %>% 
  mutate(
    intention=as.character(intention),
    action  = str_c("action = ",  action),
    contact = str_c("contact = ", contact)) %>% 
  count(action, contact, intention, .prediction) %>% 
  ggplot( aes(x=.prediction, y=n, color=intention) )+
  geom_point() +
  geom_line(aes(group=intention)) +
  facet_grid(~action+contact) +
  labs( x="response", 
        y="count" )


solution: ppd

p1 = predicted_draws(object = m2, newdata = nd) %>% 
  mutate(model = "simple")
p2 = predicted_draws(object = m2_complex, newdata = nd) %>% 
  mutate(model = "complex")

p1 %>% full_join(p2) %>% 
  mutate(
    intention=as.character(intention),
    action  = str_c("action = ",  action),
    contact = str_c("contact = ", contact)) %>% 
  count(model, action, contact, intention, .prediction) %>% 
  ggplot( aes(x=.prediction, y=n, 
              color=interaction(intention,model,
                                sep="-",lex.order=TRUE)) )+
  geom_point() +
  geom_line(aes(group=interaction(intention,model,sep="-",lex.order=TRUE))) +
  facet_grid(~action+contact) +
  labs( x="response", 
        y="count",
        color = "intention+model")
## Joining with `by = join_by(action, contact, intention, combination, .row,
## .chain, .iteration, .draw, .prediction, model)`


Ordered categorical predictors

distinct(trolley, edu)
##                    edu
## 1        Middle School
## 2    Bachelor's Degree
## 3         Some College
## 4      Master's Degree
## 5 High School Graduate
## 6      Graduate Degree
## 7     Some High School
## 8    Elementary School
trolley <-
  trolley %>% 
  mutate(edu_new = 
           recode(edu,
                  "Elementary School" = 1,
                  "Middle School" = 2,
                  "Some High School" = 3,
                  "High School Graduate" = 4,
                  "Some College" = 5, 
                  "Bachelor's Degree" = 6,
                  "Master's Degree" = 7,
                  "Graduate Degree" = 8) %>% 
           as.integer())

To incorporate this variable as a MONOTONIC CATEOGORICAL PREDICTOR, we’ll set up a notation such that each step up in value comes with its own incremental or marginal effect on the outcome.

ϕi=7j=1δj

We only need 7 because the first category is absorbed into the intercept for ϕ. So our full formula for this parameter is: ϕi=β1actioni+β2contacti+β3intentioni+βEEi1j=0δj
where βE is the average effect


responseiCategorical(p)logit(pk)=αkϕiϕi=β1actioni+β2contacti+β3intentioni+β4 mo(edu_newi,δ)αkNormal(0,1.5)β1,,β3Normal(0,1)β4Normal(0,0.143)δDirichlet(2,2,2,2,2,2,2),


m3 <- 
  brm(data = trolley, 
      family = cumulative,
      response ~ 1 + action + contact + intention + mo(edu_new),  # note the `mo()` syntax
      prior = c(prior(normal(0, 1.5), class = Intercept),
                prior(normal(0, 1), class = b),
                # note the new kinds of prior statements
                prior(normal(0, 0.143), class = b, coef = moedu_new),
                prior(dirichlet(2, 2, 2, 2, 2, 2, 2), class = simo, coef = moedu_new1)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4, 
      seed = 12,
      file = here("files/models/62.3"))

Warning: this will probably take 20 minutes or so to run on your laptop. You can download my model file here.


m3
##  Family: cumulative 
##   Links: mu = logit; disc = identity 
## Formula: response ~ 1 + action + contact + intention + mo(edu_new) 
##    Data: trolley (Number of observations: 9930) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept[1]    -3.13      0.17    -3.53    -2.84 1.00     1707     2012
## Intercept[2]    -2.45      0.17    -2.83    -2.16 1.00     1702     1934
## Intercept[3]    -1.87      0.17    -2.25    -1.58 1.00     1708     1804
## Intercept[4]    -0.85      0.17    -1.23    -0.56 1.00     1731     1956
## Intercept[5]    -0.18      0.17    -0.56     0.11 1.00     1718     1925
## Intercept[6]     0.73      0.17     0.35     1.01 1.00     1737     1988
## action          -0.71      0.04    -0.78    -0.63 1.00     4604     3182
## contact         -0.96      0.05    -1.06    -0.86 1.00     4521     2761
## intention       -0.72      0.04    -0.79    -0.65 1.00     4514     2992
## moedu_new       -0.05      0.03    -0.11    -0.00 1.00     1705     1866
## 
## Monotonic Simplex Parameters:
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## moedu_new1[1]     0.26      0.15     0.04     0.59 1.00     2292     3104
## moedu_new1[2]     0.14      0.09     0.02     0.35 1.00     4405     2501
## moedu_new1[3]     0.19      0.11     0.03     0.44 1.00     4005     2613
## moedu_new1[4]     0.16      0.09     0.03     0.39 1.00     3636     2564
## moedu_new1[5]     0.04      0.05     0.00     0.15 1.00     2606     1894
## moedu_new1[6]     0.09      0.06     0.01     0.25 1.00     3747     2484
## moedu_new1[7]     0.12      0.07     0.02     0.29 1.00     3829     2596
## 
## Further Distributional Parameters:
##      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## disc     1.00      0.00     1.00     1.00   NA       NA       NA
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s see the effect of education after controlling for story parameters.

nd <- distinct(
  trolley, 
  action, contact, intention, edu_new
)
predicted_draws(m3, nd) %>% 
  ungroup() %>% 
  count(edu_new, .prediction) %>% 
  with_groups(edu_new, mutate, total=sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction),
         edu_new = as.factor(edu_new)) %>% 
  ggplot(aes(x = .prediction,  y = pk, color = edu_new)) +
  geom_point() +
  geom_line() +
  labs( x="response", 
        y="probability" ) +
  theme(legend.position = "top")


exercise

Create a new model that includes education as a regular categorical predictor (not monotonic) and compare it to m3. Your tasks:

  1. Fit the model using the original edu variable (not edu_new)
  2. Create a plot of the posterior predictive distribution from the new model.

OPTIONAL 3. Use create a side-by-side visual comparison of the education effects 4. Interpret whether the monotonic constraint appears to improve the model fit


solution: fit model

# Fit model with regular categorical education
m3_cat <- brm(
  data = trolley,
  family = cumulative,
  response ~ 1 + action + contact + intention + edu,
  prior = c(prior(normal(0, 1.5), class = Intercept),
            prior(normal(0, 1), class = b)),
  iter = 2000, warmup = 1000, cores = 4, chains = 4,
  file=here("files/models/62.3cat")
)

m3_cat
##  Family: cumulative 
##   Links: mu = logit; disc = identity 
## Formula: response ~ 1 + action + contact + intention + edu 
##    Data: trolley (Number of observations: 9930) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept[1]             -2.96      0.05    -3.06    -2.86 1.00     2945
## Intercept[2]             -2.28      0.05    -2.37    -2.18 1.00     3064
## Intercept[3]             -1.69      0.05    -1.78    -1.60 1.00     3071
## Intercept[4]             -0.66      0.04    -0.75    -0.58 1.00     3178
## Intercept[5]              0.01      0.04    -0.07     0.10 1.00     3191
## Intercept[6]              0.92      0.05     0.83     1.01 1.00     3731
## action                   -0.71      0.04    -0.79    -0.63 1.00     4290
## contact                  -0.96      0.05    -1.07    -0.87 1.00     3938
## intention                -0.72      0.04    -0.79    -0.65 1.00     4548
## eduElementarySchool       0.91      0.20     0.51     1.30 1.00     4417
## eduGraduateDegree        -0.17      0.06    -0.30    -0.06 1.00     3902
## eduHighSchoolGraduate    -0.06      0.07    -0.19     0.07 1.00     4060
## eduMastersDegree         -0.11      0.06    -0.22    -0.00 1.00     3674
## eduMiddleSchool          -0.25      0.15    -0.54     0.04 1.00     4061
## eduSomeCollege           -0.31      0.05    -0.40    -0.22 1.00     3476
## eduSomeHighSchool         0.13      0.09    -0.05     0.31 1.00     4017
##                       Tail_ESS
## Intercept[1]              3032
## Intercept[2]              3242
## Intercept[3]              3462
## Intercept[4]              3403
## Intercept[5]              3188
## Intercept[6]              3322
## action                    3188
## contact                   3308
## intention                 3197
## eduElementarySchool       2639
## eduGraduateDegree         2979
## eduHighSchoolGraduate     2920
## eduMastersDegree          3003
## eduMiddleSchool           2889
## eduSomeCollege            3245
## eduSomeHighSchool         3187
## 
## Further Distributional Parameters:
##      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## disc     1.00      0.00     1.00     1.00   NA       NA       NA
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

solution: ppd

nd <- distinct(
  trolley, 
  action, contact, intention, edu
)
predicted_draws(m3_cat, nd) %>% 
  ungroup() %>% 
  count(edu, .prediction) %>% 
  mutate(
    edu = factor(edu,
                 levels = c(
                  "Elementary School",
                  "Middle School",
                  "Some High School",
                  "High School Graduate",
                  "Some College" ,
                  "Bachelor's Degree",
                  "Master's Degree",
                  "Graduate Degree"))) %>% 
  with_groups(edu, mutate, total=sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction),
         edu_new = as.factor(edu)) %>% 
  ggplot(aes(x = .prediction,  y = pk, color = edu)) +
  geom_point() +
  geom_line() +
  labs( x="response", 
        y="probability" ) +
  theme(legend.position = "top")


solution: side-by-side

p1 <- predicted_draws(m3, newdata = distinct(trolley, action, contact, intention, edu_new)) %>%
  mutate(model = "ordered",
         edu = factor(edu_new, 
                      levels = c(1:8),
                      labels = c(
                  "Elementary School",
                  "Middle School",
                  "Some High School",
                  "High School Graduate",
                  "Some College" ,
                  "Bachelor's Degree",
                  "Master's Degree",
                  "Graduate Degree")))
p2 <- predicted_draws(m3_cat, newdata = distinct(trolley, action, contact, intention, edu)) %>%
  mutate(model = "categorical")

full_join(p1, p2) %>% 
  ungroup() %>% 
  count(model, edu, .prediction) %>% 
  with_groups(c(model,edu), mutate, total = sum(n)) %>% 
  mutate(pk = n/total,
         .prediction = as.numeric(.prediction)) %>% 
  ggplot(aes(x = .prediction, y = pk, color = edu)) +
  geom_point(aes(shape = model)) +
  geom_line(aes(linetype = model)) +
  labs(x = "response", y = "probability") +
  guides(color = "none") + 
  facet_wrap(~edu) + 
  theme(legend.position = "top")
## Joining with `by = join_by(action, contact, intention, .row, .chain,
## .iteration, .draw, .prediction, model, edu)`


solution: compare fit

m3  <- add_criterion(m3,        "loo")
m3_cat <- add_criterion(m3_cat, "loo")

loo_compare(m3, m3_cat, criterion = "loo") %>% 
  print(simplify = F)
##        elpd_diff se_diff  elpd_loo se_elpd_loo p_loo    se_p_loo looic   
## m3_cat      0.0       0.0 -18510.5     38.9        15.3      0.2  37021.0
## m3        -30.2       7.7 -18540.7     38.1        11.2      0.1  37081.4
##        se_looic
## m3_cat     77.7
## m3         76.2
m3  <- add_criterion(m3,        "waic")
m3_cat <- add_criterion(m3_cat, "waic")

loo_compare(m3, m3_cat, criterion = "waic") %>% 
  print(simplify = F)
##        elpd_diff se_diff  elpd_waic se_elpd_waic p_waic   se_p_waic waic    
## m3_cat      0.0       0.0 -18510.5      38.9         15.2      0.2   37021.0
## m3        -30.2       7.7 -18540.7      38.1         11.2      0.1   37081.4
##        se_waic 
## m3_cat     77.7
## m3         76.2

  1. Actual counts communicate meaningful and consistent distance between possible values.↩︎