library(tidyverse)
library(psych)
library(cowplot)
library(patchwork)
library(here)
library(brms) 
library(tidybayes) 
library(santoku) # for data manipulation

today


counts

It’s less common in Psychology to use count variables as outcomes, but they’re extremely useful. They have an interpretable metric, and they’re usually in units that are meaningful (and possibly important).

This family of distributions with maximum entropy that matches the expectations of a count variable – non-negative, integers, with no maximum (different from binomial) – is the poisson family. These distributions are defined by a single parameter \((\lambda)\).

As a reminder, the mean and the variance of the Poisson distribution is equal to \(\lambda\); use that knowledge to interpret your estimates and set your priors.

The conventional link function for the poisson is the log link, which ensures that \(\lambda\) is always positive.


visualizing priors

Our brms code will use a log-link function to estimate \(\lambda\) from a linear model:

log(lambda) = a + b*var

Your priors for a and b will likely follow a typical Gaussian distribution.

a ~ Normal(0,1)

But the transformation is an exponential one, meaning that the relationship between your linear coefficients and resulting \(\lambda\) are difficult to predict.


tibble(x       = c(3, 22),
       y       = c(0.055, 0.04),
       meanlog = c(0, 3),
       sdlog   = c(10, 0.5)) %>% 
  expand_grid(number = seq(from = 0, to = 100, length.out = 200)) %>% 
  mutate(density = dlnorm(number, meanlog, sdlog),
         group   = str_c("alpha%~%Normal(", meanlog, ", ", sdlog, ")")) %>% 
  
  ggplot(aes(fill = group, color = group)) +
  geom_area(aes(x = number, y = density),
            alpha = 3/4, linewidth = 0, position = "identity") +
  geom_text(data = . %>% group_by(group) %>% slice(1),
            aes(x = x, y = y, label = group),
            family = "Times", parse = T,  hjust = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("counts") +
  theme(legend.position = "none")


Here’s some code to play with to help you visualize the effects of your prior.

# ---- change these ----
mean_of_prior = 1
sd_of_prior   = .5

# --- leave this alone ---

  # plot
  data.frame(x =seq(from = 0, to = 100, length.out = 200)) %>% 
    mutate(dx = dlnorm(x, mean_of_prior, sd_of_prior)) %>% 
    ggplot(aes( x=x, y=dx)) +
    geom_area(alpha = 3/4, linewidth = 0, position = "identity")

  # expected value of lambda
  rlnorm(1e5, mean_of_prior, sd_of_prior) %>% mean
## [1] 3.079656

populations and tools

data(Kline, package = "rethinking")
Kline <- Kline
Kline
##       culture population contact total_tools mean_TU
## 1    Malekula       1100     low          13     3.2
## 2     Tikopia       1500     low          22     4.7
## 3  Santa Cruz       3600     low          24     4.0
## 4         Yap       4791    high          43     5.0
## 5    Lau Fiji       7400    high          33     5.0
## 6   Trobriand       8000    high          19     4.0
## 7       Chuuk       9200    high          40     3.8
## 8       Manus      13000     low          28     6.6
## 9       Tonga      17500    high          55     5.4
## 10     Hawaii     275000     low          71     6.6

We’ll model the idea that:

  1. The number of tools increases with the log population size. Why log? Because that’s what the theory says: that it is the order of magnitude of the population that matters, not the absolute size of it.

  2. The number of tools increases with the contact rate among islands. No nation is an island, even when it is an island. Islands that are better networked may acquire or sustain more tool types.

  3. The impact of population on tool counts is moderated by high contact. This is to say that the association between total_tools and log population depends upon contact.


Intercept only model

m1 <- brm(
  data = Kline, 
  family = poisson,
  total_tools ~ 1, 
  prior = c( prior( normal(3, 0.5), class = Intercept) ),
  iter = 2000, warmup = 1000, chains = 4, cores = 4, 
  seed = 11,
  file = here("files/models/61.1")
)

Interaction model

Kline = Kline %>% 
  mutate(
    P = log(population)
  )
m2 <- brm(
  data = Kline, 
  family = poisson,
  bf(total_tools ~ a + b*P, 
     a + b ~ 0 + contact,
     nl = TRUE), 
  prior = c( prior( normal(3, 0.5), nlpar = a),
             prior( normal(0, 0.2), nlpar = b)),
  iter = 2000, warmup = 1000, chains = 4, cores = 4, 
  seed = 11,
  file = here("files/models/61.2")
)

m1
##  Family: poisson 
##   Links: mu = log 
## Formula: total_tools ~ 1 
##    Data: Kline (Number of observations: 10) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     3.54      0.05     3.44     3.64 1.00     1716     1970
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

m2
##  Family: poisson 
##   Links: mu = log 
## Formula: total_tools ~ a + b * P 
##          a ~ 0 + contact
##          b ~ 0 + contact
##    Data: Kline (Number of observations: 10) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## a_contacthigh     2.84      0.47     1.92     3.74 1.00     1378     1405
## a_contactlow      1.69      0.29     1.13     2.25 1.00     1439     1323
## b_contacthigh     0.09      0.05    -0.01     0.19 1.00     1399     1489
## b_contactlow      0.19      0.03     0.14     0.25 1.00     1444     1295
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

What do these mean?

Once we’ve moved outside of the Gaussian distribution, your best bet is to push everything back through the posterior. Do NOT try and evaluate the model estimates.

nd <- data.frame(P = 1) # intercept only model

epred_draws(object = m1, newdata = nd) %>% 
  median_qi
## # A tibble: 1 × 8
##       P  .row .epred .lower .upper .width .point .interval
##   <dbl> <int>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1     1     1   34.6   31.1   38.2   0.95 median qi

nd <-
  distinct(Kline, contact) %>% 
  expand_grid(P = seq(from = min(Kline$P), 
                            to=max(Kline$P), 
                            length.out = 100))

f2 <- epred_draws(object = m2, newdata = nd) %>% 
  group_by(contact, P) %>% 
  median_qi

f2
## # A tibble: 200 × 8
##    contact     P .epred .lower .upper .width .point .interval
##    <fct>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
##  1 high     7.00   31.7   24.9   40.2   0.95 median qi       
##  2 high     7.06   31.9   25.1   40.3   0.95 median qi       
##  3 high     7.11   32.0   25.4   40.3   0.95 median qi       
##  4 high     7.17   32.2   25.6   40.3   0.95 median qi       
##  5 high     7.23   32.4   25.8   40.3   0.95 median qi       
##  6 high     7.28   32.5   26.1   40.4   0.95 median qi       
##  7 high     7.34   32.7   26.3   40.4   0.95 median qi       
##  8 high     7.39   32.8   26.6   40.4   0.95 median qi       
##  9 high     7.45   33.0   26.9   40.5   0.95 median qi       
## 10 high     7.51   33.2   27.1   40.5   0.95 median qi       
## # ℹ 190 more rows

f2 %>% 
  ggplot(aes(x = P)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper, fill = contact), 
              alpha = .3) +
  geom_smooth(aes(y = .epred, color = contact)) +
  geom_point(data = Kline, aes(y = total_tools, color = contact)) +
  labs(x = "log population", y = "number of tools") 
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'


compare

m1  <- add_criterion(m1, "loo")
m2 <- add_criterion(m2, "loo")

loo_compare(m1, m2, criterion = "loo") %>% print(simplify = F)
##    elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic
## m2   0.0       0.0   -45.7      7.2         8.2   3.4     91.3  14.3   
## m1 -24.9      14.0   -70.6     16.9         7.9   3.5    141.2  33.7
loo(m2) %>% loo::pareto_k_table()
## Pareto k diagnostic values:
##                          Count Pct.    Min. ESS
## (-Inf, 0.7]   (good)     9     90.0%   232     
##    (0.7, 1]   (bad)      0      0.0%   <NA>    
##    (1, Inf)   (very bad) 1     10.0%   <NA>

m2k = m2$criteria$loo$pointwise %>% 
  as.data.frame() %>% 
  mutate(culture = Kline$culture) %>% 
  arrange(desc(influence_pareto_k)) %>% 
  mutate_if(is.double, round, digits = 2)
m2k
##    elpd_loo mcse_elpd_loo p_loo looic influence_pareto_k    culture
## 1     -7.12          0.49  3.27 14.24               1.27     Hawaii
## 2     -6.44          0.06  1.60 12.87               0.56      Tonga
## 3     -9.23          0.05  1.82 18.45               0.43  Trobriand
## 4     -3.28          0.01  0.20  6.56               0.29      Manus
## 5     -4.51          0.04  0.77  9.02               0.29   Malekula
## 6     -3.71          0.02  0.30  7.42               0.26        Yap
## 7     -3.12          0.01  0.12  6.23               0.23   Lau Fiji
## 8     -2.74          0.01  0.06  5.47               0.17 Santa Cruz
## 9     -2.93          0.01  0.04  5.86               0.16      Chuuk
## 10    -2.59          0.00  0.03  5.19               0.13    Tikopia

Adding these to our plot:

m2k = m2k %>% full_join(Kline)
## Joining with `by = join_by(culture)`
library(ggrepel)
f2 %>% 
  ggplot(aes(x = P)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper, fill = contact), 
              alpha = .3) +
  geom_smooth(aes(y = .epred, color = contact)) +
  geom_point(data = m2k, 
             aes(y = total_tools, 
                 size = influence_pareto_k,
                 color = contact)) +
  geom_text_repel(data = m2k, 
             aes(y = total_tools,
                 label = culture)) +
  guides(size = "none") +
  labs(x = "log population", y = "number of tools") +
  theme(legend.position = "top")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'


Natural scale

m2k = m2k %>% full_join(Kline)
## Joining with `by = join_by(culture, population, contact, total_tools, mean_TU,
## P)`
f2 %>% 
  ggplot(aes(x = P)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper, fill = contact), 
              alpha = .3) +
  geom_smooth(aes(y = .epred, color = contact)) +
  geom_point(data = m2k, 
             aes(y = total_tools, 
                 size = influence_pareto_k,
                 color = contact)) +
  geom_text_repel(data = m2k, 
             aes(y = total_tools,
                 label = culture)) +
  scale_x_continuous(trans = "exp",
                     breaks = log(c(0, 50000, 150000, 250000)),
                     labels = c(0, 50000, 150000, 250000)) +
  guides(size = "none") +
  labs(x = "population", y = "number of tools") +
  theme(legend.position = "top")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'


multinomial models

The binomial distribution is appropriate when you only have two outcomes, but what if you have multiple unordered outcomes? In that case, you should use the multinomial distribution, of which the binomial is just a special case.If there are \(K\) types of events with probabilities \(p_1,...,p_K\), then the probability of observing \(y_1,...,y_K\) events of each type out of \(n\) total trials is:

\[ \text{Pr}(y_i,...,y_K| n, p_i,...,p_K) = \frac{n!}{\prod_i y_i!}\prod_{i=1}^Kp_i^{y_i} \]

The link function used with the multinomial is the multinomial logit, which is also called the softmax function:

\[ \text{Pr}(k|s_i,...,s_K) = \frac{\text{exp}(s_k)}{\sum_{i=1}^K\text{exp}(s_i)} \]

The biggest issue is what to do with the multiple linear models. In a binomial GLM, you can pick either of the two possible events and build a single linear model for its log odds. The other event is handled automatically. But in a multinomial GLM, you need \(K − 1\) linear models for K types of events. One of the outcome values is chosen as a “pivot” and the others are modeled relative to it. In each of the \(K − 1\) linear models, you can use any predictors and parameters you like—they don’t have to be the same, and there are often good reasons for them to be different. In the special case of two types of events, none of these choices arise, because there is only one linear model. And that’s why the binomial GLM is so much easier.


There are two basic cases:

  1. predictors have different values for different values of the outcome, and

  2. parameters are distinct for each value of the outcome.

The first case is useful when each type of event has its own quantitative traits, and you want to estimate the association between those traits and the probability each type of event appears in the data.

The second case is useful when you are interested instead in features of some entity that produces each event, whatever type it turns out to be.


example: predictors matched to outcomes

You are modeling career choice for young adults. One predictor of choice is income.

# simulate career choices among 500 individuals
N <- 500 # number of individuals
income <- c(1,2,5) # expected income of each career
score <- 0.5*income # scores for each career, based on income

# next line converts scores to probabilities
p <- rethinking:::softmax(score[1],score[2],score[3])

# now simulate choice
# outcome career holds event type values, not counts
career <- rep(NA,N) # empty vector of choices for each individual
# sample chosen career for each individual

set.seed(34302)
for ( i in 1:N ) career[i] <- sample( 1:3 , size=1 , prob=p )

Before moving on, it might be useful to examine what we just did. With the three lines below the “# simulate career choices among 500 individuals” comment, we defined the formulas for three scores. Those were

\[ s_1 = .5 \times \text{income}_1\\ s_3 = .5 \times \text{income}_2\\ s_1 = .5 \times \text{income}_3\\ \] where \(\text{income}_1 = 1\), \(\text{income}_2 = 2\), \(\text{income}_3 = 5\). What’s a little odd about this setup and conceptually important to get is that although \(\text{income}_i\) varies across the three levels of \(s\), the
\(\text{income}_i\) value is constant within each level of \(s\). E.g., \(\text{income}_1\) is not a variable within the context of \(s_1\). Therefore, we could also write the above as:

\[ s_1 = .5 \times 1 = 0.5\\ s_3 = .5 \times 2 = 1.0\\ s_1 = .5 \times 5 = 2.5 \]


# put them in a tibble
d <-
  tibble(career = career) %>% 
  mutate(career_income = ifelse(career == 3, 5, career))

# plot 
d %>%
  ggplot(aes(x = career)) +
  geom_bar(linewidth = 0)


We have to choose a reference category. For this example, we’ll choose career 3. Let’s fit an intercept-only model to start. The next question is: what are our priors?

get_prior(data = d, 
          family = categorical(link = logit, refcat = 3),
          career ~ 1)
##                 prior     class coef group resp dpar nlpar lb ub  source
##  student_t(3, 0, 2.5) Intercept                  mu1             default
##  student_t(3, 0, 2.5) Intercept                  mu2             default

m3 <-
  brm(data = d, 
      family = categorical(link = logit, refcat = 3),
      career ~ 1,
      prior = c(prior(normal(0, 1), class = Intercept, dpar = mu1),
                prior(normal(0, 1), class = Intercept, dpar = mu2)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 11,
      file = here("files/models/61.3"))

posterior_summary(m3) %>% round(2)
##                 Estimate Est.Error    Q2.5   Q97.5
## b_mu1_Intercept    -2.01      0.16   -2.33   -1.71
## b_mu2_Intercept    -1.53      0.12   -1.77   -1.29
## Intercept_mu1      -2.01      0.16   -2.33   -1.71
## Intercept_mu2      -1.53      0.12   -1.77   -1.29
## lprior             -5.04      0.39   -5.83   -4.33
## lp__             -373.67      1.04 -376.56 -372.65

Turns out, these are related to our original data generating values!

tibble(income = c(1, 2, 5)) %>% 
  mutate(score = 0.5 * income) %>% 
  mutate(rescaled_score = score - 2.5)
## # A tibble: 3 × 3
##   income score rescaled_score
##    <dbl> <dbl>          <dbl>
## 1      1   0.5           -2  
## 2      2   1             -1.5
## 3      5   2.5            0

data.frame(id = 1) %>% 
  add_epred_draws(m3) %>% 
  median_qi()
## # A tibble: 3 × 9
##      id  .row .category .epred .lower .upper .width .point .interval
##   <dbl> <int> <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1     1     1 1         0.0993 0.0744  0.129   0.95 median qi       
## 2     1     1 2         0.161  0.131   0.195   0.95 median qi       
## 3     1     1 3         0.739  0.700   0.775   0.95 median qi
data.frame(id = 1) %>% 
  add_predicted_draws(m3)  %>% 
  count(.prediction)
## # A tibble: 3 × 4
## # Groups:   id, .row [1]
##      id  .row .prediction     n
##   <dbl> <int> <fct>       <int>
## 1     1     1 1             376
## 2     1     1 2             662
## 3     1     1 3            2962

Before we move on, here’s the exact same model with non-linear syntax (this will make more sense later)

# nonlinear syntax
m3_nonlinear <-
  brm(data = d, 
      family = categorical(link = logit, refcat = 3),
      bf(career ~ 1,
         nlf(mu1 ~ a1),
         nlf(mu2 ~ a2),
         a1 + a2 ~ 1),
      prior = c(prior(normal(0, 1), class = b, nlpar = a1),
                prior(normal(0, 1), class = b, nlpar = a2)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 11,
      file = here("files/models/61.3nl"))

m3_nonlinear
##  Family: categorical 
##   Links: mu1 = logit; mu2 = logit 
## Formula: career ~ 1 
##          mu1 ~ a1
##          mu2 ~ a2
##          a1 ~ 1
##          a2 ~ 1
##    Data: d (Number of observations: 500) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## a1_Intercept    -2.01      0.16    -2.33    -1.71 1.00     3390     2673
## a2_Intercept    -1.53      0.12    -1.77    -1.29 1.00     2964     2810
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Now we can add income. Remember, in this model, the predictor is matched to the outcome (that is, the value is the same for all observations with that outcome). So rather than using a variable from our dataset, we’ll just input the value directly into our model.

m4 <-
  brm(data = d, 
      family = categorical(link = logit, refcat = 3),
      bf(career ~ 1,
         nlf(mu1 ~ a1 + b * 1),
         nlf(mu2 ~ a2 + b * 2),
         a1 + a2 + b ~ 1),
      prior = c(prior(normal(0, 1),   class = b, nlpar = a1),
                prior(normal(0, 1),   class = b, nlpar = a2),
                prior(normal(0, 0.5), class = b, nlpar = b,  lb = 0)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 11,
      control = list(adapt_delta = .99),
      file = here("files/models/61.4"))

m4
##  Family: categorical 
##   Links: mu1 = logit; mu2 = logit 
## Formula: career ~ 1 
##          mu1 ~ a1 + b * 1
##          mu2 ~ a2 + b * 2
##          a1 ~ 1
##          a2 ~ 1
##          b ~ 1
##    Data: d (Number of observations: 500) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## a1_Intercept    -2.15      0.19    -2.59    -1.81 1.00      777     1018
## a2_Intercept    -1.81      0.28    -2.52    -1.41 1.01      750      706
## b_Intercept      0.15      0.13     0.01     0.49 1.01      605      760
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

To interpret this coefficient, we can use counterfactual simulation. The question is, “How much more likely would someone be to choose Career 2 over Career 1 if the salary of Career 2 were twice what it is now?”

post = as_draws_df(m4) 
s1      = post$b_a1_Intercept + post$b_b_Intercept * 1
s2_orig = post$b_a2_Intercept + post$b_b_Intercept * 2
s2_new  = post$b_a2_Intercept + post$b_b_Intercept * 2 * 2

# use the softmax link function to create probs
p_orig = sapply(1:nrow(post), 
                function(i) rethinking:::softmax( s1[i], s2_orig[i], 0 ))
p_new = sapply(1:nrow(post), 
                function(i) rethinking:::softmax( s1[i], s2_new[i],  0 ))

p_diff = p_new[2, ] - p_orig[2, ]
rethinking::precis(p_diff)
##              mean         sd        5.5%     94.5% histogram
## p_diff 0.04624929 0.04570784 0.003072838 0.1335661   ▇▂▁▁▁▁▁

example: predictors matched to observations

Suppose you are still modeling career choice. But now you want to estimate the association between each person’s family income and which career they choose. So the predictor variable must have the same value in each linear model, for each row in the data. But now there is a unique parameter multiplying it in each linear model. This provides an estimate of the impact of family income on choice, for each type of career.

n <- 500
set.seed(11)

# simulate family incomes for each individual
family_income <- runif(n)

# assign a unique coefficient for each type of event
b      <- c(-2, 0, 2)
career <- rep(NA, n)  # empty vector of choices for each individual
for (i in 1:n) {
    score     <- 0.5 * (1:3) + b * family_income[i]
    p         <- rethinking:::softmax(score[1], score[2], score[3])
    career[i] <- sample(1:3, size = 1, prob = p)
}

d <-
  tibble(career = career) %>% 
  mutate(family_income = family_income)

\[\begin{align*} s_1 &= 0.5 + -2\times\text{family_income}_i \\ s_2 &= 1.0 + 0\times\text{family_income}_i \\ s_3 &= 1.5 + 2\times\text{family_income}_i \\ \end{align*}\]


p1 <-
  d %>% 
  mutate(career = as.factor(career)) %>% 
  
  ggplot(aes(x = family_income, fill = career)) +
  geom_density(linewidth = 0, alpha = 3/4) +
  theme(legend.position = "none")
  
p2 <-
  d %>% 
  mutate(career = as.factor(career)) %>%
  
  mutate(fi = santoku::chop_width(family_income, width = .1, start = 0, labels = 1:10)) %>% 
  count(fi, career) %>% 
  group_by(fi) %>% 
  mutate(proportion = n / sum(n)) %>% 
  mutate(f = as.double(fi)) %>% 
  
  ggplot(aes(x = (f - 1) / 9, y = proportion, fill = career)) +
  geom_area() +
  xlab("family_income, descritized")

p1 + p2


To model, we only need to adapt our code for model 4:

m5 <-
  brm(data = d, 
      family = categorical(link = logit, refcat = 3),
      bf(career ~ 1,
         nlf(mu1 ~ a1 + b1 * family_income),
         nlf(mu2 ~ a2 + b2 * family_income),
         a1 + a2 + b1 + b2 ~ 1),
      prior = c(prior(normal(0, 1),   class = b, nlpar = a1),
                prior(normal(0, 1),   class = b, nlpar = a2),
                prior(normal(0, 0.5), class = b, nlpar = b1,  lb = 0), 
                prior(normal(0, 0.5), class = b, nlpar = b2,  lb = 0)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 11,
      control = list(adapt_delta = .99),
      file = here("files/models/61.5"))

m5
##  Family: categorical 
##   Links: mu1 = logit; mu2 = logit 
## Formula: career ~ 1 
##          mu1 ~ a1 + b1 * family_income
##          mu2 ~ a2 + b2 * family_income
##          a1 ~ 1
##          a2 ~ 1
##          b1 ~ 1
##          b2 ~ 1
##    Data: d (Number of observations: 500) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## a1_Intercept    -2.33      0.18    -2.69    -1.99 1.00     2434     2318
## a2_Intercept    -1.61      0.13    -1.88    -1.36 1.00     1916     1852
## b1_Intercept     0.10      0.10     0.00     0.37 1.00     2019     1400
## b2_Intercept     0.11      0.10     0.00     0.38 1.00     1394     1177
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s get those posterior predictions.

nd <- tibble(family_income = seq(from = 0, to = 1, length.out = 60))

(pred = nd %>% add_epred_draws(m5))
## # A tibble: 720,000 × 7
## # Groups:   family_income, .row, .category [180]
##    family_income  .row .chain .iteration .draw .category .epred
##            <dbl> <int>  <int>      <int> <int> <fct>      <dbl>
##  1             0     1     NA         NA     1 1         0.0782
##  2             0     1     NA         NA     2 1         0.0658
##  3             0     1     NA         NA     3 1         0.0800
##  4             0     1     NA         NA     4 1         0.0725
##  5             0     1     NA         NA     5 1         0.0862
##  6             0     1     NA         NA     6 1         0.0843
##  7             0     1     NA         NA     7 1         0.0758
##  8             0     1     NA         NA     8 1         0.0980
##  9             0     1     NA         NA     9 1         0.0995
## 10             0     1     NA         NA    10 1         0.0564
## # ℹ 719,990 more rows

pred %>% 
  mean_qi() %>% 
  ggplot( aes( x=family_income, y=.epred ) ) + 
  geom_ribbon( aes(ymin = .lower, ymax=.upper, fill = .category),
               alpha = .3) +
  geom_line( aes(color = .category) ) +
  facet_wrap(~.category, nrow=1) +
  guides(color = "none", fill= "none") + 
  labs( x="family income", y="probability of career choice" )