Causal inference with count regression
Part 5 of the GLM and causal inference series.
By A. Solomon Kurz
May 7, 2023
In the third post in this series, we extended out counterfactual causalinference framework to binary outcome data. We saw how logistic regression complicated the approach, particularly when using baseline covariates. In this post, we’ll practice causal inference with unbounded count data, using the Poisson and negativebinomial likelihoods.
We need data
We’ll be working with a subset of the epilepsy
data from the brms package. Based on the brms documentation (execute ?epilepsy
), the data have their proximal origins in a (
1990) Biometrics paper by Thall and Vail. However, Thall and Vail cited an earlier paper by Leppik et al. (
1985) as the ultimate origin.^{1} Here we load our primary packages, adjust the global plotting theme, load the epilepsy
data, and take a peek.
# packages
library(tidyverse)
library(brms)
library(flextable)
library(GGally)
library(broom)
library(ggdist)
library(lmtest)
library(sandwich)
library(tidybayes)
library(marginaleffects)
library(patchwork)
# adjust the global theme
theme_set(theme_gray(base_size = 12) +
theme(panel.grid = element_blank()))
# load the data
data(package = "brms", epilepsy)
# what?
glimpse(epilepsy)
## Rows: 236
## Columns: 9
## $ Age <dbl> 31, 30, 25, 36, 22, 29, 31, 42, 37, 28, 36, 24, 23, 36, 26, 26, 28, 31, 32, 21, 29, 21, 32, …
## $ Base <dbl> 11, 11, 6, 8, 66, 27, 12, 52, 23, 10, 52, 33, 18, 42, 87, 50, 18, 111, 18, 20, 12, 9, 17, 28…
## $ Trt <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,…
## $ patient <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 2…
## $ visit <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ count <dbl> 5, 3, 2, 4, 7, 5, 6, 40, 5, 14, 26, 12, 4, 7, 16, 11, 0, 37, 3, 3, 3, 3, 2, 8, 18, 2, 3, 13,…
## $ obs <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 2…
## $ zAge <dbl> 0.42499501, 0.26528351, 0.53327400, 1.22355252, 1.01240850, 0.10557201, 0.42499501, 2.1818…
## $ zBase <dbl> 0.757172825, 0.757172825, 0.944403322, 0.869511123, 1.302362646, 0.158035233, 0.719726…
The original study was designed to see how well the anticonvulsant drug
progabide reduced epileptic seizures in persons who experienced simple or complex partial seizures. After a baseline assessment, participants were randomized into treatment groups in which they received either progabide (Trt == 1
) or a placebo (Trt == 0
). The primary outcome variable in the data is count
, which is the number of epileptic seizures each participant had had since their last visit. Following their baseline assessments, participants came back every two weeks for assessments, which are denoted by the factor variable visit
. To get a sense of the data, here are the seizures counts over the four assessment points, by treatment group.
epilepsy %>%
mutate(visit = as.double(visit)) %>%
ggplot(aes(x = visit, y = count, group = patient)) +
geom_line(linewidth = 1/3, alpha = 1/2) +
facet_wrap(~ Trt, labeller = label_both)
As we’ll see later, researchers often model counts like this with the log link. So here’s the same data, but with count
on the log scale.
epilepsy %>%
mutate(visit = as.double(visit)) %>%
ggplot(aes(x = visit, y = log(count), group = patient)) +
geom_line(linewidth = 1/3, alpha = 1/2) +
facet_wrap(~ Trt, labeller = label_both)
Here are some of the sample statistics, by Trt
and visit
, in a Table 1 type format.
epilepsy %>%
group_by(Trt, visit) %>%
summarise(mean = mean(count),
variance = var(count),
min = min(count),
max = max(count)) %>%
mutate_if(is.double, round, digits = 1) %>%
as_grouped_data(groups = c("Trt")) %>%
flextable()
Trt  visit  mean  variance  min  max 

0  
1  9.4  102.8  0  40  
2  8.3  66.7  0  29  
3  8.7  213.3  0  76  
4  8.0  58.2  0  29  
1  
1  8.6  332.7  0  102  
2  8.4  140.7  0  65  
3  8.1  193.0  0  72  
4  6.7  126.9  0  63 
It’ll become apparent why we computed variances instead of standard deviations in a little bit.
Subset, wrangle, inspect.
It’d be cool to play around modeling the data from all four time points,^{2} but that would be too much of a distraction from our central goal. So we’re going to subset the data to only include the rows for the fourth assessment period (i.e., visit == 4
). We’ll call the reduced data set ep4
.
ep4 < epilepsy %>%
# subset
filter(visit == 4) %>%
# add two transformed variables
mutate(lcBase = log(Base)  mean(log(Base)),
lcAge = log(Age)  mean(log(Age)))
# what?
glimpse(ep4)
## Rows: 59
## Columns: 11
## $ Age <dbl> 31, 30, 25, 36, 22, 29, 31, 42, 37, 28, 36, 24, 23, 36, 26, 26, 28, 31, 32, 21, 29, 21, 32, …
## $ Base <dbl> 11, 11, 6, 8, 66, 27, 12, 52, 23, 10, 52, 33, 18, 42, 87, 50, 18, 111, 18, 20, 12, 9, 17, 28…
## $ Trt <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,…
## $ patient <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 2…
## $ visit <fct> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ count <dbl> 3, 3, 5, 4, 21, 7, 2, 12, 5, 0, 22, 4, 2, 14, 9, 5, 3, 29, 5, 7, 4, 4, 5, 8, 25, 1, 2, 12, 8…
## $ obs <fct> 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 19…
## $ zAge <dbl> 0.42499501, 0.26528351, 0.53327400, 1.22355252, 1.01240850, 0.10557201, 0.42499501, 2.1818…
## $ zBase <dbl> 0.757172825, 0.757172825, 0.944403322, 0.869511123, 1.302362646, 0.158035233, 0.719726…
## $ lcBase <dbl> 0.75635379, 0.75635379, 1.36248959, 1.07480752, 1.03540568, 0.14158780, 0.66934241, 0.7…
## $ lcAge <dbl> 0.11420370, 0.08141387, 0.10090768, 0.26373543, 0.22874106, 0.04751232, 0.11420370, 0.4178…
The glimpse()
output helps clarify there were 59 unique participants in this study. If you use the count()
function, you’ll find \(n = 28\)
are in the placebo condition, and the remaining \(n = 31\)
got the active treatment. The data also contain information from the baseline assessment in the Base
column, which has the epileptic seizure counts during the 8week period before the baseline assessment. Bear in mind the counts in the count
column were for a 2week period, which means we’d expect the values in Base
to be at least 4 times larger than those in count
. But since they’re all seizure counts, it seems like we’d still expect the two variables to have a strong positive correlation. The other baseline covariate in the data, which was also considered by Thall & Vail (
1990), is age measured in years (Age
). The data from the brms package includes standardized version of these variables in the zBase
and zAge
columns. Thall and Vail, however, used logtransformed versions of these variables, which is what we’ll do in this blog post. Thus, our new variable lcBase
is the logtransformed and meancentered version of Base
, and lcAge
is the logtransformed and meancentered version of Age
.
Since it can be difficult to think in terms of the mean values of logtransformed variables, here are what the means of those logtransformed values are, when exponentiated back into their original metric.^{3}
ep4 %>%
summarise(`exponentiated_mean_of_log(Base)` = mean(log(Base)) %>% exp(),
`exponentiated_mean_of_log(Age)` = mean(log(Age)) %>% exp())
## exponentiated_mean_of_log(Base) exponentiated_mean_of_log(Age)
## 1 23.43543 27.65436
We might want to get a further sense of the outcome variable and the two baseline covariates with a couple pairs plot with help from the GGally package ( Schloerke et al., 2021). We’ll look at them in both their natural metrics, and on the log scale.
# natural metric
ep4 %>%
select(count, Base, Age) %>%
ggpairs(diag = list(continuous = wrap("barDiag", bins = 12)),
upper = list(continuous = wrap("cor", stars = FALSE))) +
ggtitle("All variables in their natural metric")
# log scale
ep4 %>%
mutate(lcount = log(count)) %>%
select(lcount, lcBase, lcAge) %>%
ggpairs(diag = list(continuous = wrap("barDiag", bins = 12)),
upper = list(continuous = wrap("cor", stars = FALSE))) +
ggtitle("All variables on the log scale")
Because several of the count
values were zero, transforming them to the log scale resulted in \(\infty\)
values. ggpairs()
dropped those values from the scatter plots and histograms, but otherwise returned the plots for the other values. However, those \(\infty\)
values kept us from computing the Pearson’s correlations between log(count)
and the other two variables. If you’re willing to settle for listwise removal of those values, here are the Pearson’s correlation estimates for the remaining cases.
ep4 %>%
mutate(lcount = log(count)) %>%
filter(lcount != Inf) %>%
summarise(`r[lcBase, lcount]` = cor(lcBase, lcount),
`r[lcAge, lcount]` = cor(lcAge, lcount))
## r[lcBase, lcount] r[lcAge, lcount]
## 1 0.7594784 0.06570049
Model framework
Our wellnamed focal variable count
is an unbounded count. Technically all count variables are bounded in that their lower limit is always zero.^{4} By unbounded count, I mean there is no clearlydefined upper limit such as with binomial data. When you have unbounded counts, the two most popular likelihoods are the Poisson and the negativebinomial. The Poisson likelihood is parsimonious in that it contains a single parameter \(\lambda\)
, which is both the mean and the variance. As such, the Poisson likelihood assumes the variance scales perfectly with the mean, which is often called the equidispersion assumption. If you look back above at the sample statistics, you’ll notice the variances for count
were a lot larger than the means at all time points and for both conditions, which doesn’t bode well for that equidispersion assumption.^{5} Happily, the negativebinomial likelihood provides an alternative where its second parameter \(\phi\)
accounts for additional variance above what we’d expect from the Poisson. For the sake of practice, in this blog post we’ll analyze these data from both frequentist and Bayesian perspectives, and also with the Poisson and negativebinomial likelihoods. To complicate matters even further, we’ll also analyze the data with ANOVA and ANCOVAtype frameworks. Putting all three binaries together, we’ll end up fitting and postprocessing 6 models in total:
tibble(
name = c(str_c("fit", 1:4), str_c("brm", 1:2)),
framework = rep(c("frequentist", "Bayesian"), times = c(4, 2)),
model = rep(c("ANOVA", "ANCOVA", "ANOVA", "ANCOVA"), times = c(2, 2, 1, 1)),
likelihood = c("Poisson", "negativebinomial", "Poisson", "negativebinomial", "negativebinomial", "Poisson")
) %>%
flextable() %>%
autofit()
name  framework  model  likelihood 

fit1  frequentist  ANOVA  Poisson 
fit2  frequentist  ANOVA  negativebinomial 
fit3  frequentist  ANCOVA  Poisson 
fit4  frequentist  ANCOVA  negativebinomial 
brm1  Bayesian  ANOVA  negativebinomial 
brm2  Bayesian  ANCOVA  Poisson 
For the sake of space, well only fit two Bayesian models. My hope is by the time we’re in that section, you’ll get the gist of what’s going on. For the frequentist models, we’ll also discuss the issue of robust standard errors, which won’t apply to our Bayesian models.
We start as frequentists.
Frequentist models.
Define the equations and fit.
Our Poisson ANOVA model will be of the form
$$
\begin{align*} \text{count}_i & \sim \operatorname{Poisson}(\lambda_i) \\ \log(\lambda_i) & = \beta_0 + \beta_1 \text{Trt}_i, \end{align*}
$$
where we use the conventional log link to ensure we only make positive predictions. The \(\beta_0\)
parameter is the placebo mean on the log scale, and \(\beta_1\)
is the deviation from that mean on logscale units for those in the active treatment group. In a similar way, the negativebinomial version of our ANOVA model will be of the form
$$
\begin{align*} \text{count}_i & \sim \operatorname{Negative Binomial}(\mu_i, \phi) \\ \log(\mu_i) & = \beta_0 + \beta_1 \text{Trt}_i, \end{align*}
$$
where instead of \(\lambda_i\)
, we now use the notation of \(\mu_i\)
. We continue to follow convention with the log link, and the \(\beta_0\)
and \(\beta_0\)
parameters have largely the same interpretation as before.
Our frequentist Poisson ANCOVA model will be of the form
$$
\begin{align*} \text{count}_i & \sim \operatorname{Poisson}(\lambda_i) \\ \log(\lambda_i) & = \beta_0 + \beta_1 \text{Trt}_i + \beta_2 \text{lcBase}_i + \beta_3 \text{lcAge}_i, \end{align*}
$$
where we use the logtransformed and meancentered versions of baseline seizure counts (lcBase
) and age (lcAge
). In a similar way, the negativebinomial version of the ANCOVA model will be
$$
\begin{align*} \text{count}_i & \sim \operatorname{Negative Binomial}(\mu_i, \phi) \\ \log(\mu_i) & = \beta_0 + \beta_1 \text{Trt}_i + \beta_2 \text{lcBase}_i + \beta_3 \text{lcAge}_i. \end{align*}
$$
We can fit the frequentist Poisson models with the goodold base R glm()
function, provided we set family = poisson
. The frequentist negativebinomial models require the glm.nb()
function from the MASS package (
Ripley, 2022;
Venables & Ripley, 2002). You’ll note that instead of opening MASS directly with library()
, we’re instead using the focused MASS::glm.nb()
syntax. This is because MASS can create conflicts with some of the tidyverse functions, and I’d like to avoid those complications.
# Poisson ANOVA
fit1 < glm(
data = ep4,
family = poisson,
count ~ Trt)
# NB ANOVA
fit2 < MASS::glm.nb(
data = ep4,
count ~ Trt)
# Poisson ANCOVA
fit3 < glm(
data = ep4,
family = poisson,
count ~ Trt + lcBase + lcAge)
# NB ANCOVA
fit4 < MASS::glm.nb(
data = ep4,
count ~ Trt + lcBase + lcAge)
For the sake of space, I’m not going to show all the summary()
output for these models. Here’s a table of their \(\beta\)
coefficient summaries, instead.
bind_rows(tidy(fit1), tidy(fit2), tidy(fit3), tidy(fit4)) %>%
mutate(fit = rep(str_c("fit", 1:4), times = c(2, 2, 4, 4))) %>%
select(fit, everything()) %>%
mutate_if(is.double, round, digits = 3) %>%
as_grouped_data(groups = c("fit")) %>%
flextable()
fit  term  estimate  std.error  statistic  p.value 

fit1  
(Intercept)  2.075  0.067  30.986  0.000  
Trt1  0.171  0.096  1.778  0.075  
fit2  
(Intercept)  2.075  0.196  10.607  0.000  
Trt1  0.171  0.271  0.632  0.527  
fit3  
(Intercept)  1.676  0.083  20.076  0.000  
Trt1  0.145  0.102  1.415  0.157  
lcBase  1.179  0.068  17.241  0.000  
lcAge  0.370  0.234  1.584  0.113  
fit4  
(Intercept)  1.795  0.117  15.342  0.000  
Trt1  0.308  0.162  1.904  0.057  
lcBase  1.068  0.111  9.617  0.000  
lcAge  0.275  0.367  0.749  0.454 
We should note that the standard errors in that table are all based on conventional maximum likelihood (ML) estimation, which are typically fine defaults, to my eye. However, that equidispersion assumption for the Poisson models can result in very small standard errors, relative to those from their negativebinomial alternatives. Sometimes researchers prefer using socalled robust standard errors, instead. We can compute those with help from the coeftest()
function from the lmtest package (
Hothorn et al., 2022;
Zeileis & Hothorn, 2002). By setting vcov = vcovHC
, we are requesting the socalled HC3 sandwich standard errors, via the sandwich package (
Zeileis, 2004,
2006;
Zeileis et al., 2020;
Zeileis & Lumley, 2022). Here’s how all this works for our Poisson ANOVA model fit1
, within the context of the tidy()
function.
coeftest(fit1, vcov = vcovHC) %>%
tidy()
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 2.07 0.184 11.3 2.13e29
## 2 Trt1 0.171 0.358 0.479 6.32e 1
If you compare those standard errors to the values in the first two rows of the table, above, you’ll see they’re about 3 times larger. Larger standard errors will result in wider (i.e., less certain) 95% confidence intervals. Now we have a sense of how this works, let’s compare the effects of the standard errors on the 95% confidence intervals in a coefficient plot.
bind_rows(
# conventional ML SE's
tibble(name = str_c("fit", 1:4),
model = rep(c("ANOVA", "ANCOVA"), each = 2),
likelihood = rep(c("Poisson", "NB"), times = 2),
se = "ML") %>%
mutate(fit = map(name, get)) %>%
mutate(tidy = map(fit, tidy, conf.int = TRUE)),
# sandwich SE's
tibble(name = str_c("fit", 1:4),
model = rep(c("ANOVA", "ANCOVA"), each = 2),
likelihood = rep(c("Poisson", "NB"), times = 2),
se = "sandwich") %>%
mutate(fit = map(name, get)) %>%
mutate(tidy = map(fit, ~tidy(coeftest(., vcov = vcovHC), conf.int = TRUE)))
) %>%
select(fit) %>%
unnest(tidy) %>%
mutate(model = factor(model, levels = c("ANOVA", "ANCOVA")),
likelihood = factor(likelihood, levels = c("Poisson", "NB")),
beta = case_when(
term == "(Intercept)" ~ "beta[0]",
term == "Trt1" ~ "beta[1]",
term == "lcBase" ~ "beta[2]",
term == "lcAge" ~ "beta[3]"
)) %>%
ggplot(aes(x = estimate, xmin = conf.low, xmax = conf.high, y = model, color = se)) +
geom_pointinterval(position = position_dodge(width = 0.4)) +
scale_color_viridis_d(expression(italic(SE)*"'s:"), option = "A", end = .5) +
scale_x_continuous(expand = expansion(mult = 0.25)) +
labs(x = "parameter space",
y = NULL) +
facet_grid(likelihood ~ beta, labeller = label_parsed, scales = "free_x") +
theme(axis.text.y = element_text(hjust = 0),
legend.position = "top")
Overall, the sandwich standard errors resulted in narrower CI’s in the Poisson models, particularly for the Poisson ANOVA. But the sandwich standard errors did change the results for the negativebinomial models, too. As we will see in the next few sections, the choice of one’s likelihood and standard error method will influence causal inference. Sadly, I can’t tell you what the right choice is for your research program. All I can do, here, is let you know about some of your options.
Can the \(\beta\)
coefficients help us estimate the ATE?
Much like with logistic regression models, the \(\beta_1\)
parameter alone cannot give us the average treatment effect (ATE) for either the Poisson or negativebinomial likelihoods. However, we can use a combination of the \(\beta_0\)
and \(\beta_1\)
parameters to give us the ATE, when they are from the ANOVA models. The formula is
$$\exp(\beta_0 + \beta_1)  \exp(\beta_0).$$
The reason we exponentiate is to convert the predictions from the logscale back onto the natural count scale. Here’s how to execute that formula in code, with our Poisson and negativebinomial ANOVAtype models.
exp(coef(fit1)[1] + coef(fit1)[2])  exp(coef(fit1)[1]) # Poisson ANOVA
## (Intercept)
## 1.254608
exp(coef(fit2)[1] + coef(fit2)[2])  exp(coef(fit2)[1]) # negativebinomial ANOVA
## (Intercept)
## 1.254608
Sadly, the Poisson and negativebinomial ANOCVAtype models give us no such way to compute the ATE. We can, however, compute some form of the conditional average treatment effect (CATE). For example, given how both of the covariates in our ANCOVA models are mean centered, we can simply use the equation from above,
$$\exp(\beta_0 + \beta_1)  \exp(\beta_0),$$
to compute the Poisson and negativebinomial CATE for a person of average baseline seizure counts (about 23.4) and age (about 27.7).
exp(coef(fit3)[1] + coef(fit3)[2])  exp(coef(fit3)[1]) # Poisson ANCOVA
## (Intercept)
## 0.7203651
exp(coef(fit4)[1] + coef(fit4)[2])  exp(coef(fit4)[1]) # negativebinomial ANCOVA
## (Intercept)
## 1.595425
Note how the ANCOVA models give different results from one another, and different results from their ANOVA counterparts. As we’ve seen in other contexts,
$$\tau_\text{ATE} \neq \tau_\text{CATE},$$
for Poisson and negativebinomial models using the conventional log link.
Compute \(\mathbb E (\lambda_i^1  \lambda_i^0)\)
or \(\mathbb E (\mu_i^1  \mu_i^0)\)
.
When we have unbounded count data, it’s still the case that
$$\tau_\text{ATE} = \mathbb E (y_i^1  y_i^0).$$
Somewhat similar to what we saw with logistic regression models, we have some technical quirks to consider. Unbounded count data can take on positive integer values, sometimes denoted \(\mathbb N_0\)
(natural numbers starting from zero). Therefore their contrasts, such as \(y_i^1  y_i^0\)
, can in principle take on any integer value, positive or negative, sometimes denoted \(\mathbb Z\)
. When you take the mean for several positive integer values, you get a positive real number, sometimes denoted \(\mathbb R^+\)
. But when you either take the mean of the mean of several positiveinteger contrasts, or make a contrast of two integer averages, you can in principle get any real number, sometimes denoted \(\mathbb R\)
.
Our Poisson framework returns linear models for \(\log(\lambda_i)\)
, and our negativebinomial framework returns linear models for \(\log(\mu_i)\)
. After exponentiation, both \(\lambda_i\)
and \(\mu_i\)
can take on positive real numbers \((\mathbb R^+)\)
, and their contrasts can take on positive or negative real numbers \((\mathbb R)\)
. For example, see what happens when we use predict()
for the Poisson ANOVA, fit1
.
nd < ep4 %>%
select(patient) %>%
expand_grid(Trt = factor(0:1))
# compute
predict(fit1,
newdata = nd,
se.fit = TRUE,
# request the lambda metric;
# the default is on the log scale
type = "response") %>%
data.frame() %>%
bind_cols(nd) %>%
# look at the first 6 rows
head()
## fit se.fit residual.scale patient Trt
## 1 7.964286 0.5333280 1 1 0
## 2 6.709677 0.4652324 1 1 1
## 3 7.964286 0.5333280 1 2 0
## 4 6.709677 0.4652324 1 2 1
## 5 7.964286 0.5333280 1 3 0
## 6 6.709677 0.4652324 1 3 1
The fit
column contains the \(\hat \lambda_i\)
values, rather than \(\hat y_i\)
values we like to think of when making causal inferences. This won’t be a problem, though. It turns out that when you take the average of the contrast of these values, you still get an estimate of the ATE (see Section 18.4 of
Greene, 2018). Thus within the context of our Poisson ANOVA model,
$$\tau_\text{ATE} = \mathbb E (y_i^1  y_i^0) = {\color{blueviolet}{\mathbb E (\lambda_i^1  \lambda_i^0)}}.$$
In a similar way for our negativebinomial ANOVA model,
$$\tau_\text{ATE} = \mathbb E (y_i^1  y_i^0) = {\color{blueviolet}{\mathbb E (\mu_i^1  \mu_i^0)}}.$$
The same principles will hold for our Poisson and negativebinomial ANCOVA models. To expand this framework to ANCOVAtype models, let \(\mathbf C_i\)
stand a vector of continuous covariates and let \(\mathbf D_i\)
stand a vector of discrete covariates, both of which vary across the \(i\)
cases. We can use these to help estimate the ATE from the Poisson ANCOVA with the formula
$$\tau_\text{ATE} = \mathbb E (y_i^1  y_i^0 \mid \mathbf C_i, \mathbf D_i) = {\color{blueviolet}{\mathbb E (\lambda_i^1  \lambda_i^0 \mid \mathbf C_i, \mathbf D_i)}},$$
where \(\lambda_i^1\)
and \(\lambda_i^0\)
are the counterfactual rates for each of the \(i\)
cases, estimated in light of their covariate values. In the same way, we estimate the ATE from the negativebinomial ANCOVA with the formula
$$\tau_\text{ATE} = \mathbb E (y_i^1  y_i^0 \mid \mathbf C_i, \mathbf D_i) = {\color{blueviolet}{\mathbb E (\mu_i^1  \mu_i^0 \mid \mathbf C_i, \mathbf D_i)}}.$$
This, again, is sometimes called standardization or gcomputation, this time applied to count models.
Before we practice computing our \(\hat \tau_\text{ATE}\)
values, we should spend some time with the counterfactual \(\hat \lambda_i^1\)
, \(\hat \lambda_i^0\)
, \(\hat \mu_i^1\)
, and \(\hat \mu_i^0\)
values. As we’ve done in previous posts, we can compute those with the marginaleffects::predictions()
function. We’ll start with the counterfactual predictions from the Poisson models.
models < c("ANOVA", "ANCOVA")
ses < c("ML", "sandwich")
# update the predictor grid to include the covariates
nd < ep4 %>%
select(patient, lcBase, lcAge) %>%
expand_grid(Trt = factor(0:1))
# compute and wrangle
bind_rows(
predictions(fit1, newdata = nd),
predictions(fit1, newdata = nd, vcov = "HC3"),
predictions(fit3, newdata = nd),
predictions(fit3, newdata = nd, vcov = "HC3")
) %>%
data.frame() %>%
mutate(y = ifelse(Trt == 0, "hat(lambda)^0", "hat(lambda)^1"),
model = rep(models, each = 2) %>% rep(., each = n() / 4) %>% factor(., levels = models),
se = rep(ses, times = 2) %>% rep(., each = n() / 4) %>% factor(., levels = ses)) %>%
# plot!
ggplot(aes(x = estimate, y = reorder(patient, estimate))) +
geom_interval(aes(xmin = conf.low, xmax = conf.high, color = y),
position = position_dodge(width = 0.2),
size = 1/5) +
geom_point(aes(color = y, shape = y),
size = 2) +
scale_color_viridis_d(NULL, option = "A", begin = .3, end = .6,
labels = scales::parse_format()) +
scale_shape_manual(NULL, values = c(20, 18),
labels = scales::parse_format()) +
scale_y_discrete(breaks = NULL) +
labs(subtitle = "Counterfactual Poisson rates, by model type and standard error method",
x = expression(lambda[italic(i)]),
y = "patient (ranked)") +
coord_cartesian(xlim = c(0, 50)) +
theme(legend.background = element_blank(),
legend.position = c(.9, .85)) +
facet_grid(se ~ model)
Unsurprisingly, the ANOVAbased point estimates are the same for all participants. Yet the estimates for the ANCOVA model showed great diversity among the participants. As to the 95% CI’s, they were markedly wider when based on the sandwich standard errors, suggesting the Poisson models did not adequately account for the overdispersion in the data. Now look at the negativebinomial versions of the figure.
# compute and wrangle
bind_rows(
predictions(fit2, newdata = nd),
predictions(fit2, newdata = nd, vcov = "HC3"),
predictions(fit4, newdata = nd),
predictions(fit4, newdata = nd, vcov = "HC3")
) %>%
data.frame() %>%
mutate(y = ifelse(Trt == 0, "hat(mu)^0", "hat(mu)^1"),
model = rep(models, each = 2) %>% rep(., each = n() / 4) %>% factor(., levels = models),
se = rep(ses, times = 2) %>% rep(., each = n() / 4) %>% factor(., levels = ses)) %>%
# plot!
ggplot(aes(x = estimate, y = reorder(patient, estimate))) +
geom_interval(aes(xmin = conf.low, xmax = conf.high, color = y),
position = position_dodge(width = 0.2),
size = 1/5) +
geom_point(aes(color = y, shape = y),
size = 2) +
scale_color_viridis_d(NULL, option = "A", begin = .3, end = .6,
labels = scales::parse_format()) +
scale_shape_manual(NULL, values = c(20, 18),
labels = scales::parse_format()) +
scale_y_discrete(breaks = NULL) +
labs(subtitle = "Counterfactual negativebinomial means, by model type and standard error method",
x = expression(mu[italic(i)]),
y = "patient (ranked)") +
coord_cartesian(xlim = c(0, 50)) +
theme(legend.background = element_blank(),
legend.position = c(.9, .85)) +
facet_grid(se ~ model)
For the ANOVA models, the negativebinomial point estimates are identical to those from the Poisson variants. For the ANCOVA models, the negativebinomial point estimates are similar to their Poisson counterparts, but not quite identical. Overall, the negativebinomial 95% CI’s are wider than those from the Poisson models, but much less notably so when computed with the sandwich standard errors.
Now we use the marginaleffects::comparisons()
function to compute the patientlevel treatment effect estimates for each combination of likelihood, model type, and standard error method.
# Poisson
p1 < bind_rows(
comparisons(fit1, variables = "Trt", by = "patient"),
comparisons(fit1, variables = "Trt", by = "patient", vcov = "HC3"),
comparisons(fit3, variables = "Trt", by = "patient"),
comparisons(fit3, variables = "Trt", by = "patient", vcov = "HC3")
) %>%
data.frame() %>%
mutate(model = rep(models, each = 2) %>% rep(., each = n() / 4) %>% factor(., levels = models),
se = rep(ses, times = 2) %>% rep(., each = n() / 4) %>% factor(., levels = ses)) %>%
ggplot(aes(x = estimate, xmin = conf.low, xmax = conf.high, y = reorder(patient, estimate))) +
geom_vline(xintercept = 0, color = "white") +
geom_pointrange(size = 1/10, linewidth = 1/4) +
coord_cartesian(xlim = c(25, 25)) +
scale_y_discrete(breaks = NULL) +
labs(subtitle = "Poisson models",
x = expression(hat(tau)[italic(i)]~("i.e., "*hat(lambda)[italic(i)]^1hat(lambda)[italic(i)]^0)),
y = "patient (ranked)") +
facet_grid(se ~ model)
# Negativebinomial
p2 < bind_rows(
comparisons(fit2, variables = "Trt", by = "patient"),
comparisons(fit2, variables = "Trt", by = "patient", vcov = "HC3"),
comparisons(fit4, variables = "Trt", by = "patient"),
comparisons(fit4, variables = "Trt", by = "patient", vcov = "HC3")
) %>%
data.frame() %>%
mutate(model = rep(models, each = 2) %>% rep(., each = n() / 4) %>% factor(., levels = models),
se = rep(ses, times = 2) %>% rep(., each = n() / 4) %>% factor(., levels = ses)) %>%
ggplot(aes(x = estimate, xmin = conf.low, xmax = conf.high, y = reorder(patient, estimate))) +
geom_vline(xintercept = 0, color = "white") +
geom_pointrange(size = 1/10, linewidth = 1/4) +
coord_cartesian(xlim = c(25, 25)) +
scale_y_discrete(breaks = NULL) +
labs(subtitle = "Negativebinomial models",
x = expression(hat(tau)[italic(i)]~("i.e., "*hat(mu)[italic(i)]^1hat(mu)[italic(i)]^0)),
y = "patient (ranked)") +
facet_grid(se ~ model)
# combine and entitle
(p1 / p2) & plot_annotation(
title = "Counterfactual patientlevel treatment effects",
subtitle = "Results presented by likelihood, model type, and standard error method")
The next step is to combine and summarize those participantlevel effects to compute the ATE with the avg_comparisons()
function. We’ll put the results right into another coefficient plot.
likelihoods < c("Poisson", "Negative binomial")
bind_rows(
avg_comparisons(fit1, variables = "Trt"),
avg_comparisons(fit1, variables = "Trt", vcov = "HC3"),
avg_comparisons(fit3, variables = "Trt"),
avg_comparisons(fit3, variables = "Trt", vcov = "HC3"),
avg_comparisons(fit2, variables = "Trt"),
avg_comparisons(fit2, variables = "Trt", vcov = "HC3"),
avg_comparisons(fit4, variables = "Trt"),
avg_comparisons(fit4, variables = "Trt", vcov = "HC3")
) %>%
data.frame() %>%
mutate(model = rep(models, each = 2) %>% rep(., times = 2) %>% factor(., levels = models),
se = rep(ses, times = 4) %>% factor(., levels = ses),
likelihood = rep(likelihoods, each = 4) %>% factor(., levels = likelihoods)) %>%
ggplot(aes(x = estimate, xmin = conf.low, xmax = conf.high, y = model, color = se)) +
geom_pointinterval(position = position_dodge(width = 0.5)) +
scale_color_viridis_d(expression(italic(SE)*"'s"), option = "A", end = .5) +
scale_x_continuous(expand = expansion(mult = 0.25)) +
labs(x = expression(hat(tau)[ATE]),
y = NULL) +
facet_wrap(~ likelihood, ncol = 1) +
theme(axis.text.y = element_text(hjust = 0))
Broadly speaking, the ANCOVA models returned more precise estimates for \(\tau_\text{ATE}\)
. If you have highquality baseline covariates laying around, put them to use!
Compute \(\lambda^1  \lambda^0\)
or \(\mu^1  \mu^0\)
.
Within the context of our count models, it is still the case that
$$\tau_\text{ATE} = \mathbb E (y_i^1  y_i^0) = \mathbb E (y_i^1)  \mathbb E (y_i^0).$$
Within the context of our Poisson ANOVA model
$$
\begin{align*} \mathbb E (y_i^1) & = \lambda^1,\ \text{and} \\ \mathbb E (y_i^0) & = \lambda^0, \end{align*}
$$
which means
$$
\begin{align*} \tau_\text{ATE} = \mathbb E (y_i^1  y_i^0) & = \mathbb E (y_i^1)  \mathbb E (y_i^0) \\ & = {\color{blueviolet}{\lambda^1  \lambda^0}}. \end{align*}
$$
In a similar way for our negativebinomial ANOVA model
$$
\begin{align*} \mathbb E (y_i^1) & = \mu^1,\ \text{and} \\ \mathbb E (y_i^0) & = \mu^0, \end{align*}
$$
which means
$$
\begin{align*} \tau_\text{ATE} = \mathbb E (y_i^1  y_i^0) & = \mathbb E (y_i^1)  \mathbb E (y_i^0) \\ & = {\color{blueviolet}{\mu^1  \mu^0}}. \end{align*}
$$
You can compute these predictions for either of the ANOVA models with base R predict()
. But let’s just jump straight to the nicer marginaleffects::predictions()
function. First we update our nd
prediction grid, then we compute the estimates.
# simplify the prediction grid
nd < tibble(Trt = factor(0:1))
# compute
predictions(fit1, newdata = nd, by = "Trt")
##
## Trt Estimate Pr(>z) 2.5 % 97.5 %
## 0 7.96 <0.001 6.98 9.08
## 1 6.71 <0.001 5.86 7.69
##
## Columns: rowid, Trt, estimate, p.value, conf.low, conf.high, count
The first row is for the \(\hat \lambda^0\)
value from the Poisson ANOVA, and the second row is for \(\hat \lambda^1\)
. In this case we used the conventional ML standard errors. One might also include vcov = "HC3"
to use the sandwich standard errors, instead. To compute the predictions from the negativebinomial ANOVA, just switch out fit1
for fit2
. To take the final step to compute the estimate for \(\tau_\text{ATE}\)
, just add hypothesis = "revpairwise"
.
predictions(fit1, newdata = nd, by = "Trt", hypothesis = "revpairwise")
##
## Term Estimate Std. Error z Pr(>z) 2.5 % 97.5 %
## 1  0 1.25 0.708 1.77 0.0763 2.64 0.133
##
## Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high
For kicks and giggles, here’s a coefficient plot of \(\hat \tau_\text{ATE}\)
from both versions of the ANOVA, with both kinds of standard errors.
bind_rows(
predictions(fit1, newdata = nd, by = "Trt", hypothesis = "revpairwise"),
predictions(fit1, newdata = nd, by = "Trt", hypothesis = "revpairwise", vcov = "HC3"),
predictions(fit2, newdata = nd, by = "Trt", hypothesis = "revpairwise"),
predictions(fit2, newdata = nd, by = "Trt", hypothesis = "revpairwise", vcov = "HC3")
) %>%
data.frame() %>%
mutate(se = rep(ses, times = 2) %>% factor(., levels = ses),
likelihood = rep(likelihoods, each = 2)) %>%
ggplot(aes(x = estimate, xmin = conf.low, xmax = conf.high, y = likelihood, color = se)) +
geom_pointinterval(position = position_dodge(width = 0.5)) +
scale_color_viridis_d(expression(italic(SE)*"'s"), option = "A", end = .5) +
scale_x_continuous(expand = expansion(mult = 0.25)) +
labs(x = expression(hat(tau)[ATE]),
y = NULL) +
theme(axis.text.y = element_text(hjust = 0))
The point estimates are all the same, but the 95% intervals vary widely.
Do we compute \(\lambda^1  \lambda^0\)
or \(\mu^1  \mu^0\)
conditional on covariates?
As with the logistic regression context, we cannot use our Poisson or negativebinomial ANCOVA models to estimate \(\tau_\text{ATE}\)
via the method we just explored, above. This is because for the Poisson and negativebinomial ANCOVA models,
$$ \tau_\text{CATE} = \operatorname{\mathbb{E}} (y_i^1 \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d)  \operatorname{\mathbb{E}}(y_i^0 \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d). $$
In words, when you subtract the counterfactual mean for the control condition from the counterfactual mean for the treatment condition, conditional on baseline covariate values, you get a conditional ATE (i.e., CATE), not the ATE. To my knowledge, there’s no way around this.
To express this equation in terms of the Poisson ANCOVA
$$\tau_\text{CATE} = ({\color{blueviolet}{\lambda^1}} \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d)  ({\color{blueviolet}{\lambda^0}} \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d),$$
and to express this equation in terms of the negativebinomial ANCOVA
$$\tau_\text{CATE} = ({\color{blueviolet}{\mu^1}} \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d)  ({\color{blueviolet}{\mu^0}} \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d).$$
As a consequence, you need to use the standardization/gcomputation method if you want to estimate the ATE from a count ANCOVA model. But if you actually wanted a CATE, this method will work just fine. For example, way we wanted the CATE for a person with an average number of baseline seizure counts (about 23.4) and age (about 27.7), we would set both covariate values to zero and make the computation with predictions()
. Here’s what that looks like for our Poisson ANCOVA model, using the sandwich standard errors.
# update the prediction grid
nd < tibble(
Trt = factor(0:1),
lcBase = 0,
lcAge = 0)
# conditional lambda's
predictions(fit3, newdata = nd, by = "Trt", vcov = "HC3")
##
## Trt Estimate Pr(>z) 2.5 % 97.5 % lcBase lcAge
## 0 5.34 <0.001 3.68 7.75 0 0
## 1 4.62 <0.001 3.51 6.09 0 0
##
## Columns: rowid, Trt, estimate, p.value, conf.low, conf.high, lcBase, lcAge, count
# the CATE
predictions(fit3, newdata = nd, by = "Trt", vcov = "HC3", hypothesis = "revpairwise")
##
## Term Estimate Std. Error z Pr(>z) 2.5 % 97.5 %
## 1  0 0.72 1.34 0.537 0.591 3.35 1.91
##
## Columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high
For a CATE based on different covariate values, update the nd
predictor grid as desired. The framework will work the same for either likelihood and whether you want ML or sandwichbased standard errors. Just bear in mind that with Poisson and negativebinomial ANCOVA models using the conventional log link,
$$\tau_\text{ATE} \neq \tau_\text{CATE}.$$
You can compute the CATE if you want; just don’t confuse it with the ATE.
Bayesian models.
To save y’all from soulcrushing redundancy, I’m not going to repeat all the content from the frequentist models in the Bayesian section. Here we’ll explore some of the results with a Bayesian negativebinomial ANOVA and a Bayesian Poisson ANCOVA.
Define the equations and fit.
Our Bayesian negativebinomial ANOVA model will be of the form
$$
\begin{align*} \text{count}_i & \sim \operatorname{Negative Binomial}(\mu_i, \phi) \\ \log(\mu_i) & = \beta_0 + \beta_1 \text{Trt}_i \\ \beta_0 & \sim \operatorname{Normal}(\log(14), 1) \\ \beta_1 & \sim \operatorname{Normal}(0, 0.5) \\ \phi & \sim \operatorname{Gamma}(0.01, 0.01), \end{align*}
$$
where the final three lines in the formula express the priors. By centering the prior for \(\beta_0\)
on \(\log(14)\)
, I’m betting those in the control (placebo) condition will report about 1 seizure at day (i.e., 14 during the last 2week period). But since I know very little about seizures, and even less about seizure medication trials, I’m very insure about that prior, which is why I’ve set the prior standard deviation to 1. In my experience, 1 on the log scale is pretty big. In case you’re not accustomed to how Gaussian priors on the log scale translate to counts on the natural scale, keep in mind that an exponentiated normal distribution is the same as the lognormal distribution of the same \(\mu\)
and \(\sigma\)
values. To give you a sense of what our \(\operatorname{Normal}(\log(14), 1)\)
prior means on the count scale, we’ll plot with a little help from a few nice tidybayes functions. For the sake of comparison, we’ll take a look at the more confident \(\operatorname{Normal}(\log(14), 0.5)\)
prior, too.
# log(14) is about 2.639057
c(prior(lognormal(2.639057, 0.5)),
prior(lognormal(2.639057, 1))) %>%
parse_dist() %>%
ggplot(aes(y = prior, dist = .dist, args = .args)) +
stat_halfeye(.width = c(.5, .95)) +
scale_y_discrete(NULL, labels = str_c("lognormal(log(14), ", c(0.5, 1), ")"),
expand = expansion(add = 0.1)) +
xlab(expression(exp(italic(p)(beta[0])))) +
coord_cartesian(xlim = c(0, 120))
The dots mark the prior medians, and the thick and thinner horizontal lines mark the percentilebased 50% and 95% ranges, respectively. Even the seemingly specific \(\operatorname{Normal}(\log(14), 0.5)\)
prior spreads the bulk of the prior mass around a decent portion of seizure count possibilities. But since I’m no expert in this domain, I’ve opted for the humbler \(\operatorname{Normal}(\log(14), 1)\)
prior.
To my eye, a onepoint difference on the log scale would be a pretty large treatment effect. So the \(\operatorname{Normal}(0, 0.5)\)
prior for \(\beta_1\)
is meant to protect against very large treatment effects for the experimental condition. Thus, it’s mildly conservative.
Finally, the \(\operatorname{Gamma}(0.01, 0.01)\)
prior for \(\phi\)
is the brms default. If you’re not familiar with setting priors for \(\phi\)
I’d recommend you start here, and then build up some intuition after practicing with open data sets. Otherwise, keep in mind \(\phi\)
can take on continuous positive values. Smaller values of \(\phi\)
indicate larger overdispersion, and as \(\phi \rightarrow \infty\)
, the negativebinomial distribution converges with the Poisson. So based on our quickanddirty sample statistic analysis form above, I’d generally expect \(\phi\)
values somewhere in the singledigit range, or maybe in the teens at the largest. But do keep in mind that basing your priors on sample statistics isn’t really kosher, since priors are supposed to express your expectations before (i.e., prior to) the data. But if you’re still a student and this is all new to you, go ahead and practice liberally. This is how we learn.
Our Bayesian Poisson ANCOVA model will be of the form
$$
\begin{align*} \text{count}_i & \sim \operatorname{Poisson}(\lambda_i) \\ \log(\lambda_i) & = \beta_0 + \beta_1 \text{Trt}_i + \beta_2 \text{lcBase}_i + \beta_3 \text{lcAge}_i \\ \beta_0 & \sim \operatorname{Normal}(\log(14), 1) \\ \beta_1 & \sim \operatorname{Normal}(0, 0.5) \\ \beta_2 & \sim \operatorname{Normal}(1, 0.5) \\ \beta_3 & \sim \operatorname{Normal}(0, 0.5), \end{align*}
$$
where the priors for \(\beta_0\)
and \(\beta_1\)
are the same as in the previous model. However, because we’ve added the two baseline covariates, the \(\beta_0\)
and \(\beta_1\)
parameters are now conditional on the covariates at zero, which is their mean value back in their original metrics. To the extent those covariate values change how you think about \(\beta_0\)
and \(\beta_1\)
, consider changing their priors accordingly.
I expect a strong positive correlation with the dependent variable and baseline counts (logtransformed and mean centered), and therefore the \(\operatorname{Normal}(1, 0.5)\)
prior is designed to convey that expectation. My expectations are less certain for our dependent variable and age, and thus I defaulted to the more conservative and regularizing \(\operatorname{Normal}(0, 0.5)\)
.
Here are how to fit the two Bayesian models with the brm()
function.
# NB ANOVA
brm1 < brm(
data = ep4,
family = negbinomial,
count ~ 0 + Intercept + Trt,
prior = prior(normal(log(14), 1), class = b, coef = Intercept) +
prior(normal(0, 0.5), class = b, coef = Trt1) +
prior(gamma(0.01, 0.01), class = shape),
cores = 4, seed = 5
)
# Poisson ANCOVA
brm2 < brm(
data = ep4,
family = poisson,
count ~ 0 + Intercept + Trt + lcBase + lcAge,
prior = prior(normal(log(14), 1), class = b, coef = Intercept) +
prior(normal(0, 0.5), class = b, coef = Trt1) +
prior(normal(1, 0.5), class = b, coef = lcBase) +
prior(normal(0, 0.5), class = b, coef = lcAge),
cores = 4, seed = 5
)
Check the model summaries.
summary(brm1)
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: count ~ 0 + Intercept + Trt
## Data: ep4 (Number of observations: 59)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total postwarmup draws = 4000
##
## PopulationLevel Effects:
## Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 2.08 0.18 1.74 2.45 1.00 2313 2696
## Trt1 0.14 0.24 0.61 0.35 1.00 2391 2646
##
## Family Specific Parameters:
## Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.04 0.22 0.66 1.52 1.00 2338 2358
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
summary(brm2)
## Family: poisson
## Links: mu = log
## Formula: count ~ 0 + Intercept + Trt + lcBase + lcAge
## Data: ep4 (Number of observations: 59)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total postwarmup draws = 4000
##
## PopulationLevel Effects:
## Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.68 0.08 1.52 1.84 1.00 1870 2084
## Trt1 0.15 0.10 0.35 0.05 1.00 2267 2484
## lcBase 1.17 0.07 1.04 1.30 1.00 2344 2469
## lcAge 0.30 0.21 0.11 0.70 1.00 2722 2621
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Both model summaries look fine, to me. If it wasn’t clear to you, the shape
line near the bottom of the summary(brm1)
output is for the \(\phi\)
parameter.
Can the \(\beta\)
posteriors give us the ATE?
I don’t want to spend a lot of time fooling with \(\beta\)
posteriors themselves, in this post, but the overall framework applies much like it did for our frequentist models. We can compute the ATE from a count ANOVA model with the formula
$$\exp(\beta_0 + \beta_1)  \exp(\beta_0).$$
In our case, we’d pull the posteriors from each parameter with the as_draws_df()
function, wrangle, and summarize as desired. Here’s what that can look like for our negativebinomial ANOVA brm1
.
# wrangle
as_draws_df(brm1) %>%
mutate(`beta[0]` = b_Intercept,
`beta[1]` = b_Trt1) %>%
# compute
mutate(ate = exp(`beta[0]` + `beta[1]`)  exp(`beta[0]`)) %>%
# summarize
mean_qi(ate)
## # A tibble: 1 × 6
## ate .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 1.07 4.93 2.64 0.95 mean qi
The first three columns displayed the posterior mean and percentilebased 95% interval for \(\hat \tau_\text{ATE}\)
. Just for kicks and giggles, here’s the whole distribution.
as_draws_df(brm1) %>%
mutate(`beta[0]` = b_Intercept,
`beta[1]` = b_Trt1) %>%
mutate(ate = exp(`beta[0]` + `beta[1]`)  exp(`beta[0]`)) %>%
ggplot(aes(x = ate)) +
stat_halfeye(.width = .95) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = "Behold our causal posterior",
x = expression(hat(tau)[ATE]))
You cannot, however, compute the posterior for \(\tau_\text{ATE}\)
based on the \(\beta\)
posteriors from an ANCOVA version of this model. You need the standardization/gcomputation method for that. Speaking of which…
Posteriors for \(\mathbb E (\lambda_i^1  \lambda_i^0)\)
and \(\mathbb E (\mu_i^1  \mu_i^0)\)
.
With Bayes, it’s still the case that
$$\tau_\text{ATE} = \mathbb E (y_i^1  y_i^0),$$
even with your count models. But as we learned in the
last post, applied Bayesian inference via MCMC methods adds a new procedural complication for the \(\mathbb E (y_i^1  y_i^0)\)
method. If you let \(j\)
stand for a given MCMC draw, we end up computing
$$\tau_{\text{ATE}_j} = \mathbb E_j (y_{ij}^1  y_{ij}^0), \ \text{for}\ j = 1, \dots, J,$$
which in words means we compute the familiar \(\mathbb E (y_i^1  y_i^0)\)
for each of the \(J\)
MCMC draws. This returns a \(J\)
row vector for the \(\tau_\text{ATE}\)
distribution, which we can then summarize the same as we would any other dimension of the posterior distribution. And since we’re working with count models, we might rewrite this as
$$\tau_{\text{ATE}_j} = \mathbb E_j ({\color{blueviolet}{\lambda}}_{ij}^1  {\color{blueviolet}{\lambda}}_{ij}^0), \ \text{for}\ j = 1, \dots, J,$$
for a Poisson model, and
$$\tau_{\text{ATE}_j} = \mathbb E_j ({\color{blueviolet}{\mu}}_{ij}^1  {\color{blueviolet}{\mu}}_{ij}^0), \ \text{for}\ j = 1, \dots, J,$$
for a negativebinomial model. In our last post, we practiced computing such quantities with three methods:
 a
fitted()
based approach,  an
as_draws_df()
based approach, and  a
predictions()
/avg_comparisons()
based approach.
We’ll do the same thing here.
As a first step, let’s make our summarization tasks easier by making a custom summarizing function I like to call brms_summary()
.
brms_summary < function(x) {
posterior::summarise_draws(x, "mean", "sd", ~quantile(.x, probs = c(0.025, 0.975)))
}
You, of course, could also use something like summarise()
or median_qi()
, instead. Next, we update our nd
predictor grid.
nd < ep4 %>%
select(patient, lcBase, lcAge) %>%
expand_grid(Trt = factor(0:1)) %>%
mutate(row = 1:n())
# what?
glimpse(nd)
## Rows: 118
## Columns: 5
## $ patient <fct> 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14…
## $ lcBase <dbl> 0.75635379, 0.75635379, 0.75635379, 0.75635379, 1.36248959, 1.36248959, 1.07480752, …
## $ lcAge <dbl> 0.11420370, 0.11420370, 0.08141387, 0.08141387, 0.10090768, 0.10090768, 0.26373543, 0.2637…
## $ Trt <fct> 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,…
## $ row <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 2…
Notice that unlike what we’ve done before, we added a row
index. This will help us join our nd
data to the fitted()
output, below. Speaking of which, here’s the posterior summary for the ATE, based on our Bayesian negativebinomial ANOVA.
fitted(brm1,
newdata = nd,
summary = F) %>%
data.frame() %>%
set_names(pull(nd, row)) %>%
mutate(draw = 1:n()) %>%
pivot_longer(draw) %>%
mutate(row = as.double(name)) %>%
left_join(nd, by = "row") %>%
select(draw, value, patient, Trt) %>%
pivot_wider(names_from = Trt, values_from = value) %>%
mutate(tau = `1`  `0`) %>%
# first compute the ATE within each MCMC draw
group_by(draw) %>%
summarise(ate = mean(tau)) %>%
select(ate) %>%
# now summarize the ATE across the MCMC draws
brms_summary()
## # A tibble: 1 × 5
## variable mean sd `2.5%` `97.5%`
## <chr> <num> <num> <num> <num>
## 1 ate 1.07 1.90 4.93 2.64
Here’s the add_epred_draws()
alternative.
nd %>%
add_epred_draws(brm1) %>%
ungroup() %>%
select(Trt, patient, .draw, .epred) %>%
pivot_wider(names_from = Trt, values_from = .epred) %>%
mutate(tau = `1`  `0`) %>%
# first compute the ATE within each MCMC draw
group_by(.draw) %>%
summarise(ate = mean(tau)) %>%
select(ate) %>%
# now summarize the ATE across the MCMC draws
brms_summary()
## # A tibble: 1 × 5
## variable mean sd `2.5%` `97.5%`
## <chr> <num> <num> <num> <num>
## 1 ate 1.07 1.90 4.93 2.64
Here’s the marginaleffects::avg_comparisons()
approach.
# change the **marginaleffects** default to use the mean
options(marginaleffects_posterior_center = mean)
# now summarize the ATE across the MCMC draws
avg_comparisons(brm1, variables = list(Trt = 0:1))
##
## Term Contrast Estimate 2.5 % 97.5 %
## Trt 1  0 1.07 4.93 2.64
##
## Columns: term, contrast, estimate, conf.low, conf.high
Whether you prefer a fitted()
, as_draws_df()
, or avg_comparisons()
based workflow, this is all just standardization/gcomputation methodology as applied to MCMCbased Bayesian models. This process works exactly the same if you’d like to instead use the standardization/gcomputation approach for a Bayesian count ANCOVA model. As a quick example, here’s the add_epred_draws()
based code for our Bayesian Poisson ANCOVA brm2
.
nd %>%
add_epred_draws(brm2) %>%
ungroup() %>%
select(Trt, patient, .draw, .epred) %>%
pivot_wider(names_from = Trt, values_from = .epred) %>%
mutate(tau = `1`  `0`) %>%
# first compute the ATE within each MCMC draw
group_by(.draw) %>%
summarise(ate = mean(tau)) %>%
select(ate) %>%
# now summarize the ATE across the MCMC draws
brms_summary()
## # A tibble: 1 × 5
## variable mean sd `2.5%` `97.5%`
## <chr> <num> <num> <num> <num>
## 1 ate 1.13 0.730 2.55 0.335
I’ll leave it up to the interested reader to practice the same with a fitted()
or avg_comparisons()
workflow. They’ll all return the same results.
Posteriors for \(\lambda^1  \lambda^0\)
and \(\mu^1  \mu^0\)
.
As with the frequentist framework, we can only compute our posteriors for the ATE with the groupmean difference method with the ANOVA versions of the Bayesian count models. Within the context of our negativebinomial brm1
, that means
$$
\begin{align*} \tau_\text{ATE} = \mathbb E (y_i^1  y_i^0) & = \mathbb E (y_i^1)  \mathbb E (y_i^0) \\ & = {\color{blueviolet}{\mu^1  \mu^0}}. \end{align*}
$$
Here’s what that looks like with a fitted()
based workflow.
nd < tibble(Trt = factor(0:1))
fitted(brm1,
newdata = nd,
summary = F) %>%
data.frame() %>%
set_names(pull(nd, Trt)) %>%
mutate(ate = `1`  `0`) %>%
brms_summary()
## # A tibble: 3 × 5
## variable mean sd `2.5%` `97.5%`
## <chr> <num> <num> <num> <num>
## 1 0 8.15 1.53 5.67 11.6
## 2 1 7.08 1.35 4.91 10.2
## 3 ate 1.07 1.90 4.93 2.64
The top two rows are the posterior summaries for \(\mu^0\)
and \(\mu^1\)
, and the third row is their contrast, the ATE. Here’s the same via the add_epred_draws()
method.
nd %>%
add_epred_draws(brm1) %>%
ungroup() %>%
select(Trt, .draw, .epred) %>%
pivot_wider(names_from = Trt, values_from = .epred) %>%
mutate(ate = `1`  `0`) %>%
brms_summary()
## # A tibble: 3 × 5
## variable mean sd `2.5%` `97.5%`
## <chr> <num> <num> <num> <num>
## 1 0 8.15 1.53 5.67 11.6
## 2 1 7.08 1.35 4.91 10.2
## 3 ate 1.07 1.90 4.93 2.64
Now we practice with the predictions()
method.
# predicted mu's
predictions(brm1, newdata = nd, by = "Trt")
##
## Trt Estimate 2.5 % 97.5 %
## 0 8.15 5.67 11.6
## 1 7.08 4.91 10.2
##
## Columns: rowid, Trt, estimate, conf.low, conf.high, count
# ATE
predictions(brm1, newdata = nd, by = "Trt", hypothesis = "revpairwise")
##
## Term Estimate 2.5 % 97.5 %
## 1  0 1.07 4.93 2.64
##
## Columns: term, estimate, conf.low, conf.high
Do we compute the Bayesian \(\lambda^1  \lambda^0\)
or \(\mu^1  \mu^0\)
conditional on covariates?
As with our frequentist count models, we can’t use the groupmeancontrast method to compute the ATE from a count ANCOVA. The best we can do with this method is compute a CATE. Here are the conditional \(\lambda\)
’s and their contrast for a person with a mean baseline siezure count and mean age, using the thrifty predictions()
method.
# update the prediction grid
nd < tibble(
Trt = factor(0:1),
lcBase = 0,
lcAge = 0)
# conditional lambda's
predictions(brm2, newdata = nd, by = "Trt")
##
## Trt Estimate 2.5 % 97.5 % lcBase lcAge
## 0 5.40 4.57 6.31 0 0
## 1 4.62 3.92 5.40 0 0
##
## Columns: rowid, Trt, estimate, conf.low, conf.high, lcBase, lcAge, count
# the CATE
predictions(brm2, newdata = nd, by = "Trt", hypothesis = "revpairwise")
##
## Term Estimate 2.5 % 97.5 %
## 1  0 0.772 1.74 0.226
##
## Columns: term, estimate, conf.low, conf.high
I’ll leave it up to the interested reader to practice the same with a fitted()
or add_epred_draws()
workflow. Though our code here used our Bayesian Poisson ANCOVA, it would have worked the same way with a Bayesian negativebinomial ANCOVA.
Lingering issues
What about those “robust” SE’s for Bayesian models?
The whole robust sandwich standard error issue mainly applies to frequentist models. Some are starting to propose similar methods for Bayesian models (e.g., Szpiro et al., 2010), but I am not aware those solutions are available with brms (see this discussion). If you don’t like the assumptions inherent in your likelihood function, think hard about alternative likelihoods. If you’re aware of another Bayesian alternative, tell me all about it in the comments section, below.
Can we trust the negative binomial?
There’s some concern over whether the negativebinomial model can show bias under certain conditions, which leads into a technical issue I’ve avoided up until this point. There are two major ways to parameterize the negativebinomial model. The canonical version is often called NB1, and the more popular version is called NB2. For a nice historical overview on the two versions, see Chapter 1 in Hilbe (
2011). Whether with the MASS::glm.nb()
function or the brms::brm()
function, all the negativebinomial models in this blog post used the NB2 parameterization. In a nice simulation study, Blackburn (
2015) showed the NB1 approach can return markedly biased \(\beta\)
coefficients when the variance in the sample data doesn’t perfectly match the model. The NB2 approach can show parameter bias in some conditions, but the magnitude of the bias was generally pretty small (to my eye), and it performed pretty well overall. These kinds of concerns have led some to prefer socalled quasilikelihood approaches, which are not to my taste and will not be further explored in this blog series. But if you’re curious, you can learn about quasilikelihood approaches in Chapter 9 of McCullagh & Nelder (
1989).
I’d like to thank the great Noah Greifer for dragging me kicking and screaming into this topic.
Recap
In this post, some of the main points we covered were:
 With count regression, the
\(\beta_1\)
coefficient has no direct relationship with the ATE, regardless of whether you have included covariates.  For the Poisson regression ANOVA model,
\(\tau_\text{ATE} = \mathbb E (\lambda_i^1  \lambda_i^0)\)
, and\(\tau_\text{ATE} = \lambda^1  \lambda^0\)
.
 For the negativebinomial ANOVA model,
\(\tau_\text{ATE} = \mathbb E (\mu_i^1  \mu_i^0)\)
, and\(\tau_\text{ATE} = \mu^1  \mu^0\)
.
 For the Poisson regression ACNOVA model,
\(\tau_\text{ATE} = \mathbb E (\lambda_i^1  \lambda_i^0 \mid \mathbf C_i, \mathbf D_i)\)
, but\(\tau_\text{CATE} = (\lambda^1 \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d)  (\lambda^0 \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d)\)
.
 For the negativebinomial ACNOVA model,
\(\tau_\text{ATE} = \mathbb E (\mu_i^1  \mu_i^0 \mid \mathbf C_i, \mathbf D_i)\)
, but\(\tau_\text{CATE} = (\mu^1 \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d)  (\mu^0 \mid \mathbf C = \mathbf c, \mathbf D = \mathbf d)\)
.
 For Poisson or negativebinomial ANCOVA models, there can be many different values for the conditional average treatment effect,
\(\tau_\text{CATE}\)
, depending which values one uses for the covariates.  Sometimes frequentists prefer sandwich standard errors over the conventional ML standard errors for their count models. For Bayesians, the posterior
\(\textit{SD}\)
is just the posterior\(\textit{SD}\)
. If you don’t like it, think harder about your likelihood.
In the next post, we’ll explore how our causal inference methods work with gamma regression models.
Thank the reviewer
I’d like to publicly acknowledge and thank
for his kind efforts reviewing the draft of this post. Go team!
Do note the final editorial decisions were my own, and I do not think it would be reasonable to assume my reviewer has given a blanket endorsement of the current version of this post.
Session info
sessionInfo()
## R version 4.2.3 (20230315)
## Platform: x86_64appledarwin17.0 (64bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF8/en_US.UTF8/en_US.UTF8/C/en_US.UTF8/en_US.UTF8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] patchwork_1.1.2 marginaleffects_0.11.1.9008 tidybayes_3.0.4
## [4] sandwich_3.02 lmtest_0.940 zoo_1.810
## [7] ggdist_3.2.1.9000 broom_1.0.4 GGally_2.1.2
## [10] flextable_0.9.1 brms_2.19.0 Rcpp_1.0.10
## [13] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
## [16] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4
## [19] tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2
## [22] tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] uuid_1.10 backports_1.4.1 systemfonts_1.0.4 plyr_1.8.7
## [5] igraph_1.3.4 svUnit_1.0.6 splines_4.2.3 crosstalk_1.2.0
## [9] katex_1.4.0 TH.data_1.11 rstantools_2.2.0 inline_0.3.19
## [13] digest_0.6.31 htmltools_0.5.5 fansi_1.0.4 magrittr_2.0.3
## [17] checkmate_2.1.0 tzdb_0.3.0 RcppParallel_5.1.5 matrixStats_0.63.0
## [21] xslt_1.4.3 officer_0.6.2 xts_0.12.1 askpass_1.1
## [25] timechange_0.2.0 gfonts_0.2.0 prettyunits_1.1.1 colorspace_2.10
## [29] textshaping_0.3.6 xfun_0.39 callr_3.7.3 crayon_1.5.2
## [33] jsonlite_1.8.4 lme4_1.131 survival_3.53 glue_1.6.2
## [37] gtable_0.3.3 emmeans_1.8.0 V8_4.2.1 distributional_0.3.1
## [41] pkgbuild_1.3.1 rstan_2.21.8 abind_1.45 scales_1.2.1
## [45] fontquiver_0.2.1 mvtnorm_1.13 DBI_1.1.3 miniUI_0.1.1.1
## [49] viridisLite_0.4.1 xtable_1.84 stats4_4.2.3 StanHeaders_2.21.07
## [53] fontLiberation_0.1.0 DT_0.24 collapse_1.9.2 htmlwidgets_1.5.4
## [57] threejs_0.3.3 arrayhelpers_1.10 RColorBrewer_1.13 posterior_1.4.1
## [61] ellipsis_0.3.2 reshape_0.8.9 pkgconfig_2.0.3 loo_2.5.1
## [65] farver_2.1.1 sass_0.4.5 crul_1.2.0 utf8_1.2.3
## [69] labeling_0.4.2 tidyselect_1.2.0 rlang_1.1.0 reshape2_1.4.4
## [73] later_1.3.0 munsell_0.5.0 tools_4.2.3 cachem_1.0.7
## [77] cli_3.6.1 generics_0.1.3 evaluate_0.20 fastmap_1.1.1
## [81] ragg_1.2.5 yaml_2.3.7 processx_3.8.1 knitr_1.42
## [85] zip_2.2.0 nlme_3.1162 equatags_0.2.0 mime_0.12
## [89] projpred_2.2.1 xml2_1.3.3 compiler_4.2.3 bayesplot_1.10.0
## [93] shinythemes_1.2.0 rstudioapi_0.14 gamm4_0.26 curl_5.0.0
## [97] bslib_0.4.2 stringi_1.7.12 highr_0.10 ps_1.7.5
## [101] blogdown_1.16 Brobdingnag_1.28 gdtools_0.3.3 lattice_0.2045
## [105] Matrix_1.53 fontBitstreamVera_0.1.1 nloptr_2.0.3 markdown_1.1
## [109] shinyjs_2.1.0 tensorA_0.36.2 vctrs_0.6.2 pillar_1.9.0
## [113] lifecycle_1.0.3 jquerylib_0.1.4 bridgesampling_1.12 estimability_1.4.1
## [117] insight_0.19.1.6 data.table_1.14.8 httpuv_1.6.5 R6_2.5.1
## [121] bookdown_0.28 promises_1.2.0.1 gridExtra_2.3 codetools_0.219
## [125] boot_1.328.1 colourpicker_1.1.1 MASS_7.358.2 gtools_3.9.4
## [129] openssl_2.0.6 httpcode_0.3.0 withr_2.5.0 shinystan_2.6.0
## [133] multcomp_1.420 mgcv_1.842 parallel_4.2.3 hms_1.1.3
## [137] grid_4.2.3 coda_0.194 minqa_1.2.5 rmarkdown_2.21
## [141] shiny_1.7.2 base64enc_0.13 dygraphs_1.1.1.6
References
Blackburn, M. L. (2015). The relative performance of Poisson and negative binomial regression estimators. Oxford Bulletin of Economics and Statistics, 77(4), 605–616. https://doi.org/10.1111/obes.12074
Greene, W. H. (2018). Econometric analysis (Eighth Edition). Pearson.
Hilbe, J. M. (2011). Negative binomial regression (Second Edition). https://doi.org/10.1017/CBO9780511973420
Hothorn, T., Zeileis, A., Farebrother, R. W., & Cummins, C. (2022). Lmtest: Testing Linear Regression Models. https://CRAN.Rproject.org/package=lmtest
Leppik, I., Dreifuss, F., BowmanCloyd, T., Santilli, N., Jacobs, M., Crosby, C., Cloyd, J., Stockman, J., Graves, N., Sutula, T., et al. (1985). A doubleblind crossover evaluation of progabide in partial seizures. Neurology, 35(4), 285–295.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (Second Edition). Chapman and Hall.
Ripley, B. (2022). MASS: Support functions and datasets for venables and Ripley’s MASS. https://CRAN.Rproject.org/package=MASS
Schloerke, B., Crowley, J., Di Cook, Briatte, F., Marbach, M., Thoen, E., Elberg, A., & Larmarange, J. (2021). GGally: Extension to ’ggplot2’. https://CRAN.Rproject.org/package=GGally
Szpiro, A. A., Rice, K. M., & Lumley, T. (2010). Modelrobust regression and a Bayesian “sandwich” estimator. Annals of Applied Statistics, 4(4), 2099–2113. https://doi.org/10.1214/10AOAS362
Thall, P. F., & Vail, S. C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics, 46(3), 657–671. https://doi.org/10.2307/2532086
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (Fourth Edition). Springer. http://www.stats.ox.ac.uk/pub/MASS4
Zeileis, A. (2004). Econometric computing with HC and HAC covariance matrix estimators. Journal of Statistical Software, 11(10), 1–17. https://doi.org/10.18637/jss.v011.i10
Zeileis, A. (2006). Objectoriented computation of sandwich estimators. Journal of Statistical Software, 16(9), 1–16. https://doi.org/10.18637/jss.v016.i09
Zeileis, A., & Hothorn, T. (2002). Diagnostic Checking in Regression Relationships. R News, 2(3), 7–10. https://CRAN.Rproject.org/doc/Rnews/
Zeileis, A., Köll, S., & Graham, N. (2020). Various versatile variances: An objectoriented implementation of clustered covariances in R. Journal of Statistical Software, 95(1), 1–36. https://doi.org/10.18637/jss.v095.i01
Zeileis, A., & Lumley, T. (2022). Sandwich: Robust covariance matrix estimators [Manual]. https://sandwich.RForge.Rproject.org/

I haven’t been able to locate the actual paper by Leppik et al. ( 1985). If you have a copy, I’d love to see it. ↩︎

For ideas on what that might look like, execute
?epilepsy
and scroll down to the bottom of the help page. ↩︎ 
Careful readers will note those sample statistics end up closer to the medians, rather than the means, of the original untransformed variables. Logs are tricky, friends. ↩︎

Okay, technically you can have a censored or truncated count, which complicates how you think about the lower limit. In this post, we’re not focusing on those issues. Thus, our lower limit is the conventional value zero. But if you ever find yourself in a position where you have to contend with those issues, brms can handle both censored and truncated data. ↩︎

Clever readers will note our quickanddirty samplestatistic analysis is a lot like checking for equidispersion without controlling for the baseline covariates
Base
andAge
. Maybe it’d be okay to presume equidispersion after conditioning on the baseline covariates. See how tricky this all gets? ↩︎
 Posted on:
 May 7, 2023
 Length:
 47 minute read, 9879 words