measurement
Measurement is the principled assignment of numbers to qualitities. And no matter what you measure or how you measure it, you’ll have some error.
Some tools tend to produce very little error (e.g., the length of this table in in inches). Other tools tend to produce more error. In the social sciences, our job is made harder by the fact that we often measure qualities that do not have an objective physical reality, like happiness or well-being. Measurement error is exacerbated when there is little available data.
Unfortunately, our statistic models will assume your measure has no error… unless you can tell the model how much error there is.
In measurement theory, we may assume that
\[ X_i = T_i + \epsilon_i \]
Meaning that for any observation \(i\), the observed score \(X\) on some measure is the sum of the true score \(T\) and error \(\epsilon\). Classical test theory assumes that \(\epsilon_i\) is randomly distributed, but other theories (IRT) disagree. Regardless, we can move forward with the assumption that observed scores are caused by some true score and some error.
mean sd 5.5% 94.5%
Location NaN NA NA NA
Loc NaN NA NA NA
Population 6.119600e+00 6.876156e+00 0.65780 1.897690e+01
MedianAgeMarriage 2.605400e+01 1.243630e+00 24.26950 2.826100e+01
Marriage 2.011400e+01 3.797905e+00 15.20850 2.649150e+01
Marriage.SE 1.399400e+00 7.969749e-01 0.54950 2.902200e+00
Divorce 9.688000e+00 1.820814e+00 6.66950 1.273050e+01
Divorce.SE 9.618000e-01 5.253675e-01 0.34085 1.893050e+00
WaffleHouses 3.234000e+01 6.578959e+01 0.00000 1.357450e+02
South 2.800000e-01 4.535574e-01 0.00000 1.000000e+00
Slaves1860 7.937834e+04 1.497309e+05 0.00000 4.355531e+05
Population1860 6.287293e+05 7.813127e+05 0.00000 1.903357e+06
PropSlaves1860 9.405132e-02 1.744486e-01 0.00000 4.561000e-01
histogram
Location
Loc
Population ▇▃▁▁▁▁▁▁
MedianAgeMarriage ▁▁▂▂▃▇▅▃▁▁▂▁▁▁
Marriage ▁▃▇▇▇▅▂▁▁▁
Marriage.SE ▁▇▅▃▁▂▁▁
Divorce ▂▃▅▅▇▂▃▁
Divorce.SE ▂▇▇▃▃▂▁▂▂▁▁▁
WaffleHouses ▇▁▁▁▁▁▁▁
South ▇▁▁▁▁▁▁▁▁▂
Slaves1860 ▇▁▁▁▁▁▁▁▁▁
Population1860 ▇▃▂▁▁▁▁▁
PropSlaves1860 ▇▁▁▁▁▁▁▁▁▁▁▁
dag_coords <-
tibble(name = c("A", "M", "D", "Dobs", "eD"),
x = c(1, 2, 2, 3, 4),
y = c(2, 3, 1, 1, 1))
dagify(M ~ A,
D ~ A + M,
Dobs ~ D + eD,
coords = dag_coords) %>%
tidy_dagitty() %>%
mutate(color = ifelse(name %in% c("D", "eD"), "a", "b")) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(aes(color = color),
size = 10, show.legend = F) +
geom_dag_text(parse = T, label = c("A", "D", expression(D[obs]),
"M",expression(italic(e)[D]))) +
geom_dag_edges() +
theme_void()p1 <- d %>%
ggplot(aes(x = MedianAgeMarriage,
y = Divorce,
ymin = Divorce - Divorce.SE,
ymax = Divorce + Divorce.SE)) +
geom_pointrange(shape = 20, alpha = 2/3, color="#1c5253") +
labs(x = "Median age marriage" ,
y = "Divorce rate")
p2 <-
d %>%
ggplot(aes(x = log(Population),
y = Divorce,
ymin = Divorce - Divorce.SE,
ymax = Divorce + Divorce.SE)) +
geom_pointrange(shape = 20, alpha = 2/3, color="#e07a5f") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("log population")
p1 | p2How do you incorporate the standard error into the model? First, we’ll think through this logically, then we’ll use the code.
First, measurement error refers to the amount of variability we would expect to see in statistics across studies. In other words, error is the measure of the spread of a distribution. What’s unknown in this case is the mean of the distribution.
\[ D_{\text{obs},i} \sim \text{Normal}(D_i, D_{\text{SE},i}) \]
Well hey, isn’t that the kind of parameter that Bayesian analysis has been trying to estimate all along?
\[\begin{align} D_{\text{obs},i} &\sim \text{Normal}(D_i, D_{\text{SE},i}) \\ D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_AA_i + \beta_MM_i \\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_A &\sim \text{Normal}(0, 0.5) \\ \beta_M &\sim \text{Normal}(0, 0.5) \\ \sigma &\sim \text{Exponential}(1) \end{align}\]
For the sake of easier priors, we’ll standardize all our variables first. Note that standardizing standard errors means keeping them in the same unit as the varible they refer to.
m0 <-
brm(data = d,
family = gaussian,
D_obs ~ 1 + A + M,
prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, cores = 4, chains = 4,
seed = 15,
# note this line
save_pars = save_pars(latent = TRUE),
file = here("files/models/m92.0"))m1 <-
brm(data = d,
family = gaussian,
D_obs | mi(D_sd) ~ 1 + A + M,
prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, cores = 4, chains = 4,
seed = 15,
# note this line
save_pars = save_pars(latent = TRUE),
file = here("files/models/m92.1")) Family: gaussian
Links: mu = identity; sigma = identity
Formula: D_obs | mi(D_sd) ~ 1 + A + M
Data: d (Number of observations: 50)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.05 0.10 -0.24 0.14 1.00 4597 3090
A -0.62 0.16 -0.92 -0.30 1.00 3829 3423
M 0.05 0.17 -0.27 0.38 1.00 3671 3297
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.58 0.11 0.39 0.81 1.00 1682 1720
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Estimate Est.Error Q2.5 Q97.5
b_Intercept -0.05287903 0.09560518 -0.23886745 0.13726794
b_A -0.61700465 0.15906799 -0.92245919 -0.29941922
b_M 0.05250207 0.16596210 -0.27093761 0.38138566
sigma 0.58407876 0.10676964 0.38881281 0.81491478
Intercept -0.05287903 0.09560518 -0.23886745 0.13726794
Yl[1] 1.16380958 0.36399705 0.45678248 1.88019251
Yl[2] 0.68798528 0.55545971 -0.40532881 1.78866913
Yl[3] 0.42646352 0.33696468 -0.22714592 1.07800780
Yl[4] 1.41990481 0.45735398 0.51982522 2.33835163
Yl[5] -0.90062131 0.12857833 -1.15488620 -0.64688189
Yl[6] 0.65994433 0.39848512 -0.10032271 1.45295620
Yl[7] -1.36526810 0.35504558 -2.05486652 -0.66823571
Yl[8] -0.33840244 0.47247230 -1.25609775 0.58214114
Yl[9] -1.90150049 0.58817751 -3.05336435 -0.72470074
Yl[10] -0.62132190 0.17186028 -0.95587821 -0.27755085
Yl[11] 0.76341246 0.28108009 0.20629632 1.31262617
Yl[12] -0.55322097 0.48671358 -1.53032059 0.41006406
Yl[13] 0.18014145 0.50648969 -0.85327426 1.15270187
Yl[14] -0.87210178 0.22594215 -1.31849712 -0.42656000
Yl[15] 0.56087538 0.30848278 -0.03307509 1.16024922
Yl[16] 0.28374664 0.40123574 -0.53788724 1.07797302
Yl[17] 0.49757257 0.42622575 -0.34424766 1.32234855
Yl[18] 1.25343126 0.34850197 0.58866869 1.93645181
Yl[19] 0.43293128 0.38544033 -0.30272767 1.20778192
Yl[20] 0.40469307 0.54221317 -0.60265472 1.49375005
Yl[21] -0.55866095 0.31505583 -1.18076309 0.06387204
Yl[22] -1.10085094 0.26189265 -1.60983803 -0.58917295
Yl[23] -0.26811533 0.26452671 -0.78206170 0.24950601
Yl[24] -1.00154932 0.30528291 -1.62109423 -0.40641259
Yl[25] 0.43228569 0.39575018 -0.33219269 1.22640132
Yl[26] -0.03375897 0.31355089 -0.66626319 0.58620335
Yl[27] -0.01176553 0.50233364 -0.99239479 1.00288051
Yl[28] -0.14619945 0.38981782 -0.92315210 0.61324160
Yl[29] -0.26257719 0.50330444 -1.20212247 0.74789655
Yl[30] -1.80061034 0.23677345 -2.25647365 -1.33209152
Yl[31] 0.17265156 0.41087733 -0.63822136 0.99652013
Yl[32] -1.66073596 0.16260915 -1.98035225 -1.34235191
Yl[33] 0.11852901 0.24138612 -0.36122545 0.60586708
Yl[34] -0.05418447 0.51141807 -1.11458891 0.91539123
Yl[35] -0.12291325 0.22737764 -0.57408527 0.32393579
Yl[36] 1.28479965 0.40761437 0.48882609 2.11017466
Yl[37] 0.22368318 0.35026426 -0.46485333 0.91459730
Yl[38] -1.02508669 0.21833912 -1.44644004 -0.59395760
Yl[39] -0.92956225 0.54071523 -1.96249272 0.16346135
Yl[40] -0.67944819 0.32342300 -1.31483203 -0.05196159
Yl[41] 0.24407462 0.54826453 -0.81917624 1.32536575
Yl[42] 0.74595423 0.33110341 0.09928354 1.38997465
Yl[43] 0.19636347 0.18408346 -0.16880604 0.55807729
Yl[44] 0.80183926 0.42335058 -0.05159185 1.60462754
Yl[45] -0.40557248 0.52661056 -1.42459914 0.68427256
Yl[46] -0.39240888 0.25684943 -0.89652352 0.09167503
Yl[47] 0.13440823 0.30195290 -0.46337386 0.72571630
Yl[48] 0.55957284 0.44795065 -0.31515399 1.44366592
Yl[49] -0.63550849 0.27979049 -1.18884521 -0.08282839
Yl[50] 0.85003581 0.58270897 -0.33293414 1.97866945
lprior -1.36690837 0.44820845 -2.44940868 -0.71213312
lp__ -78.15515223 6.57052200 -91.46921227 -65.53798980
states <- c("AL", "AR", "ME", "NH", "RI", "DC", "VT", "AK", "SD", "UT", "ID", "ND", "WY")
d_est <-
posterior_summary(m1) %>%
data.frame() %>%
rownames_to_column("term") %>%
mutate(D_est = Estimate) %>%
select(term, D_est) %>%
filter(str_detect(term, "Yl")) %>%
bind_cols(d)
d_est %>%
ggplot(aes(x = D_sd, y = D_est - D_obs)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_point(alpha = 2/3) +
geom_text_repel(data = . %>% filter(Loc %in% states),
aes(label = Loc),
size = 3, seed = 15) states <- c("AR", "ME", "RI", "ID", "WY", "ND", "MN")
as_draws_df(m1) %>%
expand_grid(A = seq(from = -3.5, to = 3.5, length.out = 50)) %>%
mutate(fitted = b_Intercept + b_A * A) %>%
ggplot(aes(x = A)) +
stat_lineribbon(aes(y = fitted),
.width = .95, size = 1/3,
color = "grey20",
fill = "grey80") +
geom_segment(data = d_est,
aes(xend = A, y = D_obs, yend = D_est),
linewidth = 1/5) +
geom_point(data = d_est,
aes(y = D_obs)) +
geom_point(data = d_est,
aes(y = D_est),
shape = 1, stroke = 1/3) +
geom_text_repel(data = d %>% filter(Loc %in% states),
aes(y = D_obs, label = Loc),
size = 3, seed = 15) +
labs(x = "median age marriage (std)",
y = "divorce rate (std)") +
coord_cartesian(xlim = range(d$A),
ylim = range(d$D_obs))Of course, predictors can have error too. Luckily brms has a way to handle this. Here’s an updated mathematical model that incorporates measurement error in the marriage variable.
\[\begin{align} D_{\text{obs},i} &\sim \text{Normal}(D_i, D_{\text{SE},i}) \\ D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_AA_i + \beta_MM_{obs,i} \\ M_{obs,i} &\sim \text{Normal}(M_i, M_{\text{SE},i}) \\ M_i &\sim \text{Normal}(0, 1) \\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_A &\sim \text{Normal}(0, 0.5) \\ \beta_M &\sim \text{Normal}(0, 0.5) \\ \sigma &\sim \text{Exponential}(1) \end{align}\]
And here we incorporate the measurement error into the model code:
m2 <-
brm(data = d,
family = gaussian,
D_obs | mi(D_sd) ~ 1 + A + me(M, M_sd),
prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(normal(0, 1), class = meanme),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, cores = 4, chains = 4,
seed = 15,
# note this line
save_pars = save_pars(latent = TRUE),
file = here("files/models/m92.2")) Family: gaussian
Links: mu = identity; sigma = identity
Formula: D_obs | mi(D_sd) ~ 1 + A + me(M, M_sd)
Data: d (Number of observations: 50)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -0.03 0.10 -0.22 0.17 1.00 2692 2785
A -0.52 0.16 -0.84 -0.21 1.00 1800 2492
meMM_sd 0.27 0.23 -0.17 0.71 1.01 1311 2335
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.55 0.11 0.35 0.79 1.00 969 1345
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Here’s an example of when your causal model is worse at out-of-sample prediction.
m0 <- add_criterion(m0, "loo")
m1 <- add_criterion(m1, "loo")
m2 <- add_criterion(m2, "loo")
loo_compare(m0, m1, m2) elpd_diff se_diff
m0 0.0 0.0
m1 -89.6 46.7
m2 -140.7 65.6
But that’s ok! Remember, it’s not always in your best interest to prioritize prediction. We’re introducing more uncertainty in our model so that we’re not fooled into thinking we have precision. The noise is the point in this case.
Let’s compare our coefficient estimates.
# get posterior draws for coefficients starting with b_
post0 = as_draws_df(m0) |> select(starts_with("b")) |> mutate(model="0")
post1 = as_draws_df(m1) |> select(starts_with("b")) |> mutate(model="1")
post2 = as_draws_df(m2) |> select(starts_with("b")) |> mutate(model="2")
full_join(post0, post1) |>
full_join(post2) |>
pivot_longer(-model) |>
with_groups(c(model, name), mean_qi, value) |>
select(model, name, value, .lower, .upper) |>
mutate(estimate = sprintf("%.2f [%.2f, %.2f]", value, .lower, .upper)) |>
select(model, name, estimate) |>
pivot_wider(names_from = model, values_from = estimate) |>
rename("Model 0" = "0", "Model 1" = "1", "Model 2" = "2") |>
mutate(name = str_remove(name, "b_")) |>
knitr::kable(col.names = c("Parameter", "Model 0", "Model 1", "Model 2"),
caption = "Coefficient estimates [95% CI] for each model")| Parameter | Model 0 | Model 1 | Model 2 |
|---|---|---|---|
| A | -0.61 [-0.91, -0.29] | -0.62 [-0.92, -0.30] | -0.52 [-0.84, -0.21] |
| Intercept | 0.00 [-0.20, 0.19] | -0.05 [-0.24, 0.14] | -0.03 [-0.22, 0.17] |
| M | -0.06 [-0.37, 0.25] | 0.05 [-0.27, 0.38] | NA [NA, NA] |
| bsp_meMM_sd | NA [NA, NA] | NA [NA, NA] | 0.27 [-0.17, 0.71] |
Our coefficient estimate is stronger because we’ve regularized on both our outcome and our predictor.
color_y <- "#1c5253"
color_p <- "#e07a5f"
# wrangle
full_join(
# D
tibble(Loc = d %>% pull(Loc),
D_obs = d %>% pull(D_obs),
D_est = posterior_summary(m2) %>%
data.frame() %>%
rownames_to_column("term") %>%
filter(str_detect(term, "Yl")) %>%
pull(Estimate)) %>%
pivot_longer(-Loc, values_to = "d") %>%
mutate(name = if_else(name == "D_obs", "observed", "posterior")),
# M
tibble(Loc = d %>% pull(Loc),
M_obs = d %>% pull(M_obs),
M_est = posterior_summary(m2) %>%
data.frame() %>%
rownames_to_column("term") %>%
filter(str_detect(term, "Xme_")) %>%
pull(Estimate)) %>%
pivot_longer(-Loc, values_to = "m") %>%
mutate(name = if_else(name == "M_obs", "observed", "posterior")),
by = c("Loc", "name")
) %>%
# plot!
ggplot(aes(x = m, y = d)) +
geom_line(aes(group = Loc),
linewidth = 1/4) +
geom_point(aes(color = name)) +
scale_color_manual(values = c(color_p, color_y)) +
labs(subtitle = "Shrinkage of both divorce rate and marriage rate",
x = "Marriage rate (std)" ,
y = "Divorce rate (std)")- incorporating measurement error into your model results in regularization: observations associated with more error are less influential on the model, and the estimated true score of those points can be (but are not necessarily) quite different from the observation.
- this will result in a model that is more conservative but one you can have more faith in.
- of course, there's an obvious application of this kind of modeling...
The goal of a meta-analysis is estimating meaningful parameters. A set of studies that are broadly comparable are either…
1. identical replications of each other in that all studies are identical samples of the same population with the same outcome measures, etc, or
2. so different that the results of any one study provide no information about the results of any of the others, or
3. the studies as exchangeable but not necessarily either identical or completely unrelated; in other words we allow differences from study to study, but such that the differences are not expected a priori to have predictable effects favoring one study over another.
In other words, you can answer the question with complete pooling, no pooling, or partial pooling. And generally speaking, the third of these is probably the one you want.
Our data come from the second large-scale replication project by the Many Labs team (Klein et al., 2018). Of the 28 studies replicated in the study, we will focus on the replication of the trolley experiment from Hauser et al. (2007).
From Klein and colleagues:
According to the principle of double effect, an act that harms other people is more morally permissible if the act is a foreseen side effect rather than the means to the greater good. Hauser et al. (2007) compared participants’ reactions to two scenarios to test whether their judgments followed this principle. In the foreseen-side-effect scenario, a person on an out-of-control train changed the train’s trajectory so that the train killed one person instead of five. In the greater-good scenario, a person pushed a fat man in front of a train, killing him, to save five people. Whereas 89% of participants judged the action in the foreseen-side-effect scenario as permissible (95% CI = [87%, 91%]), only 11% of participants in the greater-good scenario judged it as permissible (95% CI = [9%,13%]). The difference between the percentages was significant, \(χ2(1, N=2,646) = 1,615.96\), \(p<.001\), \(w=.78\), \(d=2.50\), 95%CI=[2.22,2.86]. Thus, the results provided evidence for the principle of double effect. (p. 459, emphasis in the original)
You can download the data from OSF. You want the .csv file in Trolley Dilemma 1 > Hauser.1 > By Order > Data > Hauser_1_study_by_order_all_CLEAN_CASE.csv
Rows: 6,842
Columns: 27
$ uID <dbl> 65, 68, 102, 126, 145, 263, 267, 298, 309, 318, 350, …
$ variable <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes"…
$ factor <chr> "SideEffect", "SideEffect", "SideEffect", "SideEffect…
$ .id <chr> "ML2_Slate1_Brazil__Portuguese_execution_illegal_r.cs…
$ source <chr> "brasilia", "brasilia", "brasilia", "wilfredlaur", "w…
$ haus1.1 <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1,…
$ haus1.1t_1 <dbl> 39.054, 36.792, 56.493, 21.908, 25.635, 50.633, 58.66…
$ haus2.1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ haus2.1t_1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ Source.Global <chr> "brasilia", "brasilia", "brasilia", "wilfredlaur", "w…
$ Source.Primary <chr> "brasilia", "brasilia", "brasilia", "wilfredlaur", "w…
$ Source.Secondary <chr> "brasilia", "brasilia", "brasilia", "wilfredlaur", "w…
$ Country <chr> "Brazil", "Brazil", "Brazil", "Canada", "Canada", "Ca…
$ Location <chr> "Social and Work Psychology Department, University of…
$ Language <chr> "Portuguese", "Portuguese", "Portuguese", "English", …
$ Weird <dbl> 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Execution <chr> "illegal", "illegal", "illegal", "illegal", "illegal"…
$ SubjectPool <chr> "No", "No", "No", "Yes", "Yes", "Yes", "Yes", "Yes", …
$ Setting <chr> "In a classroom", "In a classroom", "In a classroom",…
$ Tablet <chr> "Computers", "Computers", "Computers", "Computers", "…
$ Pencil <chr> "No, the whole study was on the computer (except mayb…
$ StudyOrderN <chr> "Hauser|Ross.Slate1|Rottenstrich|Graham|Kay|Inbar|And…
$ IDiffOrderN <chr> "ID: Global self-esteem SISE|ID: Mood|ID: Subjective …
$ study.order <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ analysis.type <chr> "Order", "Order", "Order", "Order", "Order", "Order",…
$ subset <chr> "all", "all", "all", "all", "all", "all", "all", "all…
$ case.include <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
You could pool all the observations into a single model. The original study used a chi-square test to compare two percentages, but what we’re really talking about is a trial with a binary outcome (yes or no) and predicting that response from a binary predictor (greater good vs side effect). So the better way to model these data is with a binomial regression with a logit link.
Call:
glm(formula = y ~ factor, family = binomial(logit), data = d)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.57272 0.04575 -34.38 <2e-16 ***
factorSideEffect 2.44640 0.05894 41.51 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 9397.7 on 6841 degrees of freedom
Residual deviance: 7304.6 on 6840 degrees of freedom
AIC: 7308.6
Number of Fisher Scoring iterations: 4
But in a meta-analysis, you don’t have all of the data. Let’s pretend each of these sites did a separate study and shared only their effect size and standard error.
# A tibble: 59 × 7
loc term estimate std.error statistic p.value OR
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 factorSideEffect 2.32 0.475 4.89 0 10.2
2 2 factorSideEffect 3.64 0.644 5.64 0 37.9
3 3 factorSideEffect 2.37 0.399 5.96 0 10.7
4 4 factorSideEffect 2.24 0.263 8.54 0 9.43
5 5 factorSideEffect 2.02 0.505 4.00 0 7.52
6 6 factorSideEffect 2.49 0.571 4.36 0 12.0
7 7 factorSideEffect 2.53 0.658 3.84 0 12.5
8 8 factorSideEffect 1.78 0.459 3.87 0 5.93
9 9 factorSideEffect 1.81 0.378 4.79 0 6.12
10 10 factorSideEffect 2.37 0.495 4.79 0 10.7
# ℹ 49 more rows
glms %>%
arrange(estimate) |>
mutate(
OR = exp(estimate),
rank = row_number()) |>
ggplot(aes(x = rank, y = estimate)) +
geom_point() +
geom_errorbar( aes(ymin=exp(estimate-1.96*std.error),
ymax=exp(estimate+1.96*std.error)),
alpha=.5, width=.5) +
coord_flip() +
labs(y = expression(italic(y[j])~("odds ratio")))Given what we’ve done already, conducting a meta-analysis on this is relatively straight-forward. The key here is to recognize that the coefficient estimates are Gaussian distributed. Recall from our lecture on binary outcomes: the distribution on the outcome was Bernoulli, but the prior on the coefficients were normal. So we’ll work with the coefficients for the prediction of log odds; we can transform at the end to see how this translates into odds ratios.
\[\begin{align*} \text{estimate}_j &\sim \text{Normal}(\theta_j, \text{std.error}_j) \\ \theta_j &\sim \text{Normal}(\mu, \tau) \\ \mu &\sim \text{Normal}(0, 1.5) \\ \tau &\sim \text{Exponential}(1) \end{align*}\]
From Solomon Kurtz:
se()is one of the brms helper functions designed to provide additional information about the criterion variable. Here it informs brm() that each estimate value has an associated measurement error defined in thestd.errorcolumn. Unlike themi()function, which we used earlier in the chapter to accommodate measurement error and the Bayesian imputation of missing data, these()function is specially designed to handle meta-analyses.se()contains a sigma argument which is set toFALSEby default. This will return a model with no estimate forsigma, which is what we want. The uncertainty around the estimate-value for each study, \(j\), has already been encoded in the data asstd.error.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: estimate | se(std.error) ~ 1 + (1 | loc)
Data: glms (Number of observations: 59)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~loc (Number of levels: 59)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.43 0.09 0.26 0.63 1.00 1665 2474
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2.55 0.09 2.38 2.72 1.00 3487 2793
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.00 0.00 0.00 0.00 NA NA NA
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Remember that while our regression coefficients are likely to be gaussian distributed, they’re predictors for a logit-link binomial outcome. So to estimate the relative likelihoods, we really need to transform these back.
These effect sizes represent the difference or relative probability. Relative probabilities are expressed as odds ratios; a coefficient in a binomial regression is the log-odds. To get the odds, just invert the log (i.e., exponentiate).
# A tibble: 1 × 6
odds_ratio .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 12.8 10.8 15.2 0.95 mean qi
For comparison, the replication project found an odds ratio of 11.54.