Regression models for 2-timepoint non-experimental data

Purpose

In the contemporary longitudinal data analysis literature, 2-timepoint data (a.k.a. pre/post data) get a bad wrap. Singer and Willett (2003, p. 10) described 2-timepoint data as only “marginally better” than cross-sectional data and Rogosa et al. (1982) give a technical overview on the limitations of 2-timepoint data. Limitations aside, sometimes two timepoints are all you have. In those cases, researchers should have a good sense of which data analysis options they have at their disposal. I recently came across Jeffrey Walker’s free (2018) text, Elements of statistical modeling for experimental biology, which contains a nice chapter on 2-timepoint experimental designs. Inspired by his work, this post aims to explore how one might analyze non-experimental 2-timepoint data within a regression model paradigm.

I make assumptions.

In this post, I’m presuming you are familiar with longitudinal data analysis with conventional and multilevel regression. Though I don’t emphasize it much, it will also help if you’re familiar with Bayesian statistics. All code is in R (R Core Team, 2020), with healthy doses of the tidyverse (Wickham, 2019; Wickham et al., 2019). The statistical models will be fit with brms (Bürkner, 2017, 2018, 2020) and we’ll also make some use of the tidybayes (Kay, 2020) and patchwork (Pedersen, 2019) packages. If you need to shore up, I list some educational resources at the end of the post.

Load the primary packages.

library(tidyverse)
library(brms)
library(tidybayes)
library(patchwork)

Warm-up

Before we jump into 2-timepoint data, we’ll first explore how one might analyze a fuller data set of 6 timepoints. We will then reduce the data set to two different 2-timepoint versions for use in the remainder of the post.

We need data.

We will simulate the data based on a conventional multilevel growth model of the kind you can learn about in Singer & Willett (2003), Hoffman (2015), or Kurz (2020, Chapter 14). We’ll have one criterion variable \(y\) which will vary across participants \(i\) and over time \(t\). For simplicity, the systemic change over time will be linear. We might express it in statistical notation1 as

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma) \\ \mu_{ti} & = \beta_0 + \beta_1 \text{time}_{ti} + u_{0i} + u_{1i} \text{time}_{ti} \\ \sigma & = \sigma_\epsilon \\ \begin{bmatrix} u_{0i} \\ u_{1i} \end{bmatrix} & \sim \operatorname{MVNormal} \left (\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \mathbf \Sigma \right) \\ \mathbf \Sigma & = \mathbf{SRS} \\ \mathbf S & = \begin{bmatrix} \sigma_0 & 0 \\ 0 & \sigma_1 \end{bmatrix} \\ \mathbf R & = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}, \end{align*} \]

where \(\beta_0\) is the population-level intercept (initial status) and \(\beta_1\) is the population-level slope (change over time). The \(u_{0i}\) and \(u_{1i}\) terms are the participant-level deviations around the population-level intercept and slope. Those \(u\) deviations follow a bivariate normal distribution centered on zero (they are deviations, after all) and including a covariance matrix, \(\mathbf \Sigma\). As is typical within the brms framework (see Bürkner, 2017), \(\mathbf \Sigma\) is decomposed into a matrix of standard deviations (\(\mathbf S\)) and a correlation matrix (\(\mathbf R\)). Also notice we renamed our original \(\sigma\) parameter as \(\sigma_\epsilon\) to help distinguish it from the multilevel standard deviations in the \(\mathbf S\) matrix (\(\sigma_0\) and \(\sigma_1\)). In this way, \(\sigma_0\) and \(\sigma_1\) capture differences between participants in their intercepts and slopes, whereas \(\sigma_\epsilon\) captures the differences within participants over time that occur apart from their linear trajectories.

To simulate data of this kind, we’ll first set the true values for \(\beta_0, \beta_1, \sigma_0, \sigma_1, \rho,\) and \(\sigma_\epsilon\).

b0     <- 0      # starting point (average intercept)
b1     <- 1      # growth over time (average slope)
sigma0 <- 1      # std dev in intercepts
sigma1 <- 1      # std dev in slopes
rho    <- -.5    # correlation between intercepts and slopes
sigma_e <- 0.75  # std dev within participants

Now combine several of those values to define the \(\mathbf S, \mathbf R,\) and \(\mathbf \Sigma\) matrices. Then simulate \(N = 100\) participant-level intercepts and slopes.

mu     <- c(b0, b1)          # combine the means in a vector
sigmas <- c(sigma0, sigma1)  # combine the std devs in a vector

s <- diag(sigmas)      # standard deviation matrix
r <- matrix(c(1, rho,  # correlation matrix
             rho, 1), nrow = 2)

# now matrix multiply s and r to get a covariance matrix
sigma <- s %*% r %*% s

# how many participants would you like?
n_id <- 100

# make the simulation reproducible
set.seed(1)

vary_effects <- 
  MASS::mvrnorm(n_id, mu = mu, Sigma = sigma) %>% 
  data.frame() %>% 
  set_names("intercepts", "slopes") %>% 
  mutate(id = 1:n_id) %>% 
  select(id, everything())

head(vary_effects)
##   id  intercepts     slopes
## 1  1  0.85270825  0.7676584
## 2  2 -0.18009772  1.1379818
## 3  3  1.17913643  0.7317852
## 4  4 -1.46056809  2.3025393
## 5  5  0.04193022  1.6126544
## 6  6 -0.17309717 -0.5941901

Now we have our random intercepts and slopes, we’re almost ready to simulate our \(y_{ti}\) values. We just need to decide on how many values we’d like to collect over time and how we’d like to structure those assessment periods. To keep things simple, I’m going to specify six evenly-spaced timepoints. The first timepoint will be set to 0, the last timepoint will be set to 1, and the four timepoints in the middle will be the corresponding fractions.

# how many timepoints?
time_points <- 6

d <-
  vary_effects %>% 
  # add in time
  expand(nesting(id, intercepts, slopes),
         time = seq(from = 0, to = 1, length.out = time_points)) %>% 
  # now use the model formula to compute y
  mutate(y = rnorm(n(), mean = intercepts + slopes * time, sd = sigma_e))

head(d, n = 10)
## # A tibble: 10 x 5
##       id intercepts slopes  time      y
##    <int>      <dbl>  <dbl> <dbl>  <dbl>
##  1     1      0.853  0.768   0    1.16 
##  2     1      0.853  0.768   0.2  2.27 
##  3     1      0.853  0.768   0.4  2.35 
##  4     1      0.853  0.768   0.6  1.07 
##  5     1      0.853  0.768   0.8 -0.247
##  6     1      0.853  0.768   1    3.49 
##  7     2     -0.180  1.14    0    0.320
##  8     2     -0.180  1.14    0.2  0.453
##  9     2     -0.180  1.14    0.4  0.265
## 10     2     -0.180  1.14    0.6  0.885

Before we move on, I should acknowledge that this simulation workflow is heavily influenced by McElreath (2020, Chapter 14). You can find a similar workflow in the (2020) preprint by DeBruine and Barr, Understanding mixed effects models through data simulation.

Explore the data.

Before fitting the model, it might help if we look at what we’ve done. Here’s a scatter plot of the random intercepts and slopes.

# set the global plotting theme
theme_set(theme_linedraw() +
            theme(text = element_text(family = "Times"),
                  panel.grid = element_blank()))

p1 <-
  vary_effects %>% 
  ggplot(aes(x = intercepts, y = slopes)) +
  geom_point() +
  stat_ellipse(color = "grey50")

p1

The 95%-interval ellipse helps point out the negative correlation between the intercepts and slopes. Here’s the Pearson’s correlation.

vary_effects %>% 
  summarise(rho = cor(intercepts, slopes))
##          rho
## 1 -0.4502206

It’s no coincidence that value is very close to our data-generating rho value.

rho
## [1] -0.5

Now check the sample means and standard deviations of our intercepts and slopes.

vary_effects %>% 
  summarise(b0 = mean(intercepts),
            b1 = mean(slopes),
            sigma0 = sd(intercepts),
            sigma1 = sd(slopes)) %>% 
  mutate_all(round, digits = 2)
##      b0   b1 sigma0 sigma1
## 1 -0.08 1.11   0.91   0.91

Those aren’t quite the true data-generating values for b0 through sigma1, from above. But they’re pretty decent sample approximations. With only \(N = 100\) and \(T = 6\), this is about as close as we should expect.

To get a sense of the time and y values, we’ll plot them in two ways. First we’ll plot a random subset from nine of our simulated participants. Then we’ll plot the linear trajectories from all 100 participants, along with the grand mean trajectory.

set.seed(1)

p2 <-
  d %>% 
  nest(data = c(intercepts, slopes, time, y)) %>% 
  slice_sample(n = 9) %>% 
  unnest(data) %>% 
  
  ggplot(aes(x = time, y = y)) +
  geom_point() +
  geom_abline(aes(intercept = intercepts, slope = slopes),
              color = "blue") +
  labs(subtitle = "random subset of 9 participants") +
  theme(strip.background = element_blank(),
        strip.text = element_blank()) +
  facet_wrap(~slopes)

p3 <-
  d %>% 
  ggplot(aes(x = time, y = y)) +
  geom_point(color = "transparent") +
  geom_abline(aes(intercept = intercepts, slope = slopes, group = id),
              color = "blue", size = 1/10, alpha = 1/2) +
  geom_abline(intercept = b0, slope = b1,
              color = "blue", size = 2) +
  labs(subtitle = "All participant-level trajectories, along\nwith the grand mean")

# combine
(p2 + p3) &
  scale_x_continuous(breaks = 0:5 / 5, labels = c(0, "", "", "", "", 1)) &
  coord_cartesian(ylim = c(-2.5, 3.5))

See how the points in the plots on the left deviate quite a bit from their linear trajectories? That’s the result of \(\sigma_\epsilon\).

Fit the multilevel growth model.

Now we’ll use brms to fit the multilevel growth model. If all goes well, we should largely reproduce our data-generating values in the posterior. Before fitting the model, we should consider a few things about the brm() syntax.

In this model and in most of the models to follow, we’re relying on the default brm() priors. When fitting real-world models, you are much better off going beyond the defaults. However, I will generally deemphasize priors, in this post, to help keep the focus on the conceptual models.

Note how we set the seed argument. Though you don’t need to do this, setting the seed makes the results more reproducible.

Also, note the custom settings for iter and warmup. Often times, the default settings are fine. But since we’ll be comparing a lot of models, I want to make sure we have enough posterior draws from each to ensure stable estimates.

Okay, fit the model.

m0 <-
  brm(data = d,
      y ~ 1 + time + (1 + time | id),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)

Check the model summary.

print(m0)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 1 + time + (1 + time | id) 
##    Data: d (Number of observations: 600) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)           0.89      0.09     0.72     1.07 1.00     3663     5989
## sd(time)                0.84      0.15     0.53     1.14 1.00     1782     2936
## cor(Intercept,time)    -0.40      0.15    -0.65    -0.05 1.00     3758     4228
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.13      0.11    -0.33     0.08 1.00     4153     6181
## time          1.16      0.13     0.91     1.42 1.00     6636     6477
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.81      0.03     0.76     0.87 1.00     4706     5720
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

A series of plots might help show how well our model captured the data-generating values.

# name the parameters with the Greek terms
names <- c("beta[0]", "beta[1]", "sigma[0]", "sigma[1]", "rho", "sigma[epsilon]")

# for the vertical lines marking off the true values
vline <-
  tibble(name = names,
         true_value = c(b0, b1, sigma0, sigma1, rho, sigma_e))

# wrangle
posterior_samples(m0) %>% 
  select(b_Intercept:sigma) %>% 
  set_names(names) %>% 
  pivot_longer(everything()) %>% 
  
  # plot
  ggplot(aes(x = value, y = 0)) +
  stat_halfeye(.width = .95, normalize = "panels", size = 1/2) +
  geom_vline(data = vline,
             aes(xintercept = true_value),
             size = 1/4, linetype = 2) +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("marginal posterior") +
  facet_wrap(~name, scales = "free", labeller = label_parsed)

The marginal posterior distribution for all the major summary parameters is summarized by the median (dot) and percentile-based 95% interval (horizontal line). The true values are shown in the dashed vertical lines. Overall, we did okay.

As fun as this has all been, we’ve just been warming up.

Make the 2-timepoint data.

Before fitting the 2-timepoint longitudinal models, we’ll need to adjust the data, which currently contains values over six timepoints. Since it’s easy to think of 2-timepoint data in terms of pre and post, we’ll keep the data points for which time == 0 and time == 1.

small_data_long <-
  d %>% 
  filter(time == 0 | time == 1) %>% 
  select(-intercepts, -slopes) %>% 
  mutate(`pre/post` = factor(if_else(time == 0, "pre", "post"),
                             levels = c("pre", "post"))) 

head(small_data_long)
## # A tibble: 6 x 4
##      id  time     y `pre/post`
##   <int> <dbl> <dbl> <fct>     
## 1     1     0 1.16  pre       
## 2     1     1 3.49  post      
## 3     2     0 0.320 pre       
## 4     2     1 1.27  post      
## 5     3     0 0.879 pre       
## 6     3     1 0.971 post

As the name implies, the small_data_long data are still in the long format. Hoffman (2015) described this as the stacked format and Singer and Willett (2003) called this a person-period data set. Each level of id has two rows, one for each level of time, which is an explicit variable. In this formulation, time == 0 is the same as the “pre” timepoint and time == 1 is the same as “post.” To help clarify that, we added a pre/post column.

We’ll need a second variant of this data set, this time in the wide format.

small_data_wide <-
  small_data_long %>% 
  select(-time) %>% 
  pivot_wider(names_from = `pre/post`, values_from = y) %>% 
  mutate(change = post - pre)

head(small_data_wide)
## # A tibble: 6 x 4
##      id    pre   post change
##   <int>  <dbl>  <dbl>  <dbl>
## 1     1  1.16   3.49  2.33  
## 2     2  0.320  1.27  0.953 
## 3     3  0.879  0.971 0.0920
## 4     4 -0.979  0.586 1.57  
## 5     5 -0.825  1.85  2.68  
## 6     6 -0.912 -0.841 0.0715

With our small_data_wide data, each level of id only has one row. The time-structured y column was broken up into a pre and post column, and we no longer have a variable explicitly defining time. We have a new column, change, which is the result of subtracting pre from post. In her text, Hoffman referred to this type of data structure as the multivariate format and Singer and Willett called it a person-level data set.

The information is essentially the same in these two data sets, small_data_long and small_data_wide. Yet, the models supported by them will provide different insights.

2-timepoint longitudinal models

Before we start fitting and interpreting models, we should prepare ourselves with an overview.

Overview.

We will consider 20 ways to fit models based on 2-timepoint data. It seems like there multiple ways to categorize these. Here we’ll break them up into four groupings.

The first four model types will take post as the criterion:

  • \(\mathcal M_1 \colon\) The unconditional post model (post ~ 1),
  • \(\mathcal M_2 \colon\) The simple autoregressive model (post ~ 1 + pre),
  • \(\mathcal M_3 \colon\) The bivariate autoregressive model (bf(post ~ 1 + pre) + bf(pre ~ 1) + set_rescor(rescor = FALSE)), and
  • \(\mathcal M_4 \colon\) The bivariate correlational pre/post model (bf(post ~ 1) + bf(pre ~ 1) + set_rescor(rescor = TRUE)).

The next four model types will take change as the criterion:

  • \(\mathcal M_5 \colon\) The unconditional change-score model (change ~ 1) and
  • \(\mathcal M_6 \colon\) The conditional change-score model (change ~ 1 + pre).
  • \(\mathcal M_7 \colon\) The bivariate conditional change-score model (bf(change ~ 1 + pre) + bf(pre ~ 1) + set_rescor(rescor = FALSE)), and
  • \(\mathcal M_8 \colon\) The bivariate correlational pre/change model (bf(change ~ 1) + bf(pre ~ 1) + set_rescor(rescor = TRUE)).

The next eight model types will take y as the criterion:

  • \(\mathcal M_9 \colon\) The grand-mean model (y ~ 1),
  • \(\mathcal M_{10} \colon\) The random-intercept model (y ~ 1 + (1 | id)),
  • \(\mathcal M_{11} \colon\) The cross-classified model (y ~ 1 + (1 | id) + (1 | time)),
  • \(\mathcal M_{12} \colon\) The simple liner model (y ~ 1 + time),
  • \(\mathcal M_{13} \colon\) The liner model with a random intercept (y ~ 1 + time + (1 | id)),
  • \(\mathcal M_{14} \colon\) The liner model with a random slope (y ~ 1 + time + (0 + time | id)),
  • \(\mathcal M_{15} \colon\) The multilevel growth model with regularizing priors (y ~ 1 + time + (1 + time | id)), and
  • \(\mathcal M_{16} \colon\) The fixed effects with correlated error model (y ~ 1 + time + ar(time = time, p = 1, gr = id).

The final four model types will expand previous ones with robust variance parameters:

  • \(\mathcal M_{17} \colon\) The cross-classified model with robust variances for discrete time (bf(y ~ 1 + (1 | id) + (1 | time), sigma ~ 0 + factor(time))),
  • \(\mathcal M_{18} \colon\) The simple liner model with robust variance for linear time (bf(y ~ 1 + time, sigma ~ 1 + time)),
  • \(\mathcal M_{19} \colon\) The liner model with correlated random intercepts for \(\mu\) and \(\sigma\) (bf(y ~ 1 + time + (1 |x| id), sigma ~ 1 + (1 |x| id))), and
  • \(\mathcal M_{20} \colon\) The liner model with a random slope for \(\mu\) and uncorrelated random intercept for \(\sigma\) (bf(y ~ 1 + time + (0 + time | id), sigma ~ 1 + (1 | id))).

As far as these model names go, I’m making no claim they are canonical. Call them what you want. My goal, here, is to use names that are minimally descriptive and similar to the terms you might find used by other authors.

Models focusing on the second timepoint, post.

\(\mathcal M_1 \colon\) The unconditional post model.

We can write the unconditional post model as

\[ \begin{align*} \text{post}_i & \sim \operatorname{Normal}(\mu, \sigma) \\ \mu & = \beta_0, \end{align*} \]

where \(\beta_0\) is both the model intercept and the estimate for the mean value of post. The focus this model places on post comes at the cost of any contextual information on what earlier values we might compare post to. Also, since the only variable in the model is post, this technically is not a 2-timepoint model. But given its connection to the models to follow, it’s worth working through.

Here’s how to fit the model with brms.

m1 <-
  brm(data = small_data_wide,
      post ~ 1,
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m1)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: post ~ 1 
##    Data: small_data_wide (Number of observations: 100) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.02      0.12     0.79     1.25 1.00     6972     5683
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.17      0.08     1.02     1.36 1.00     8412     6599
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We might compare those parameters with their sample values.

small_data_wide %>% 
  summarise(mean = mean(post),
            sd = sd(post))
## # A tibble: 1 x 2
##    mean    sd
##   <dbl> <dbl>
## 1  1.02  1.16

If you only care about computing the population estimates for \(\mu_\text{post}\) and \(\sigma_\text{post}\), this model does a great job. With no other variables in the model, this approach does a poor job telling us about growth processes.

\(\mathcal M_2 \colon\) The simple autoregressive model.

The simple model with the pre scores predicting post is a substantial improvement from the previous one. It follows the form

\[ \begin{align*} \text{post}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 \text{pre}_i, \end{align*} \]

where the intercept \(\beta_0\) is the expected value of post when pre is at zero. As with many other regression contexts, centering the predictor pre at the mean or some other meaningful value can help make \(\beta_0\) more interpretable. Of greater interest is the \(\beta_1\) coefficient, which is the expected deviation from \(\beta_0\) for a one-unit increase in pre. But since pre and post are really the same variable y measured at two timepoints, it might be helpful if we express this model in another way. In perhaps more technical form, the simple model with pre predicting post is really an autoregressive model following the form

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \phi y_{t - 1,i}, \end{align*} \]

where the criterion \(y\) varies across persons \(i\) and timepoints \(t\). Here we only have two timepoints, \(\text{post} = t\) and \(\text{pre} = t - 1\). The strength of association between \(y_t\) and \(y_{t - 1}\) captured by the autoregressive parameter \(\phi\), which is often expressed in a correlation metric.

Here’s how to fit the model with brm().

m2 <-
  brm(data = small_data_wide,
      post ~ 1 + pre,
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m2)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: post ~ 1 + pre 
##    Data: small_data_wide (Number of observations: 100) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.05      0.12     0.83     1.28 1.00     8995     6957
## pre           0.24      0.10     0.04     0.43 1.00     9548     6404
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.15      0.08     1.00     1.32 1.00     8407     7109
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s compare the pre coefficient with the Pearson’s correlation between pre and post.

small_data_wide %>% 
  summarise(correlation = cor(pre, post))
## # A tibble: 1 x 1
##   correlation
##         <dbl>
## 1       0.232

Well look at that. Recall that the \(\beta_0\) parameter is the expected value in post when the predictor pre is at zero. Though the sample mean for pre is very close to zero, it’s not exactly so.

small_data_wide %>% 
  summarise(pre_mean = mean(pre))
## # A tibble: 1 x 1
##   pre_mean
##      <dbl>
## 1   -0.154

Here’s how to use that information to predict the mean value for post.

fixef(m2)[1, 1] + fixef(m2)[2, 1] * mean(small_data_wide$pre)
## [1] 1.015851

We can get a full posterior summary with aid from fitted().

nd <- tibble(pre = mean(small_data_wide$pre))

fitted(m2, newdata = nd)
##      Estimate Est.Error      Q2.5    Q97.5
## [1,] 1.015851 0.1142147 0.7939083 1.241721

\(\mathcal M_3 \colon\) The bivariate autoregressive model.

Though the simple autoregressive model gives us a sense of the strength of association between pre and post–and thus a sense of the stability in \(y\) over time–, it still lacks an explicit parameter for mean value of \(y\) at \(t - 1\). Enter the bivariate autoregressive model,

\[ \begin{align*} \text{post}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \text{pre}_i & \sim \operatorname{Normal}(\nu, \tau) \\ \mu_i & = \beta_0 + \beta_1 \text{pre}_i \\ \nu & = \gamma_0, \end{align*} \]

where \(\text{post}_i\) is still modeled as a simple linear function of \(\text{pre}_i\), but now we also include an unconditional model for \(\text{pre}_i\). This will give us an explicit comparison for where we started at the outset (\(\nu\)) and where we ended up (\(\mu_i\)). We can fit this model using the brms multivariate syntax where the two submodels are encased in bf() statements (Bürkner, 2021b). Also, be careful to use set_rescor(rescor = FALSE) to omit a residual correlation between the two. Their association is already handled with the \(\beta_1\) parameter.

m3 <-
  brm(data = small_data_wide,
      bf(post ~ 1 + pre) +
        bf(pre ~ 1) +
        set_rescor(rescor = FALSE),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m3)
##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: post ~ 1 + pre 
##          pre ~ 1 
##    Data: small_data_wide (Number of observations: 100) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## post_Intercept     1.05      0.12     0.82     1.28 1.00    10717     7470
## pre_Intercept     -0.15      0.11    -0.38     0.07 1.00    12822     7706
## post_pre           0.24      0.10     0.04     0.43 1.00    11030     7641
## 
## Family Specific Parameters: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_post     1.15      0.08     1.00     1.33 1.00    10955     7694
## sigma_pre      1.16      0.08     1.01     1.33 1.00    13273     7703
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Now that we’ve broken out the multivariate syntax, we might consider a second bivariate model.

\(\mathcal M_4 \colon\) The bivariate correlational model.

The bivariate correlational model follows the form

\[ \begin{align*} \begin{bmatrix} \text{post}_i \\ \text{pre}_i \end{bmatrix} & \sim \operatorname{MVNormal} \left (\begin{bmatrix} \mu \\ \nu \end{bmatrix}, \mathbf \Sigma \right) \\ \mu & = \beta_0 \\ \nu & = \gamma_0 \\ \mathbf \Sigma & = \mathbf{SRS} \\ \mathbf S & = \begin{bmatrix} \sigma & 0 \\ 0 & \tau \end{bmatrix} \\ \mathbf R & = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}, \end{align*} \]

where means of both pre and post are modeled in intercept-only models. However, the association between the two timepoints is captured in the residual correlation \(\rho\). Yet because we have no predictors in for either variable, the “residual” correlation is really just a correlation. We might also gain some insights if we re-express this model in terms of \(y_{ti}\) and \(y_{t - 1,i}\):

\[ \begin{align*} \begin{bmatrix} y_{ti} \\ y_{t - 1,i} \end{bmatrix} & \sim \operatorname{MVNormal} \left (\begin{bmatrix} \mu_t \\ \mu_{t - 1} \end{bmatrix}, \mathbf \Sigma \right) \\ \mu_t & = \beta_t \\ \mu_{t - 1} & = \beta_{t - 1} \\ \mathbf \Sigma & = \mathbf{SRS} \\ \mathbf S & = \begin{bmatrix} \sigma_t & 0 \\ 0 & \sigma_{t - 1} \end{bmatrix} \\ \mathbf R & = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}, \end{align*} \]

where what we formerly called an autoregressive coefficient in \(\mathcal M_2\), we’re now calling a correlation. Note also that this model freely estimates \(\sigma_t\) and \(\sigma_{t - 1}\). In some contexts, these are presumed to be equal. Though we won’t be imposing that constraint, here, I believe it is possible with the brms non-linear syntax (Bürkner, 2021c). Anyway, here’s how to fit the model with the brm() function .

m4 <-
  brm(data = small_data_wide,
      bf(post ~ 1) +
        bf(pre ~ 1) +
        set_rescor(rescor = TRUE),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)

Note out use of the set_rescor(rescor = TRUE) syntax in the model formula. This explicitly told brm() to include the residual correlation. Here’s the summary.

print(m4)
##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: post ~ 1 
##          pre ~ 1 
##    Data: small_data_wide (Number of observations: 100) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## post_Intercept     1.02      0.12     0.79     1.24 1.00    10992     8033
## pre_Intercept     -0.15      0.12    -0.39     0.08 1.00    10679     8012
## 
## Family Specific Parameters: 
##            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_post     1.18      0.08     1.03     1.36 1.00    11342     7819
## sigma_pre      1.16      0.08     1.02     1.34 1.00    11150     7777
## 
## Residual Correlations: 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## rescor(post,pre)     0.23      0.09     0.03     0.40 1.00    10074     7292
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Now the intercept and sigma parameters do a good job capturing the sample statistics.

small_data_wide %>% 
  pivot_longer(pre:post) %>% 
  group_by(name) %>% 
  summarise(mean = mean(value),
            sd = sd(value)) %>% 
  mutate_if(is.double, round, digits = 2)
## # A tibble: 2 x 3
##   name   mean    sd
##   <chr> <dbl> <dbl>
## 1 post   1.02  1.16
## 2 pre   -0.15  1.14

The new ‘rescor’ line at the bottom of the print() summary approximates the Pearson’s correlation of the two variables, much like the autoregressive parameter did two models up.

small_data_wide %>% 
  summarise(correlation = cor(pre, post))
## # A tibble: 1 x 1
##   correlation
##         <dbl>
## 1       0.232

Another nice quality of this model is if you subtract \(\gamma_0\) from \(\beta_0\) (i.e., \(\beta_t - \beta_{t - 1}\)), you’d end up with the posterior mean of the change score.

posterior_samples(m4) %>% 
  mutate(change = b_post_Intercept - b_pre_Intercept) %>% 
  summarise(mu_change = mean(change))
##   mu_change
## 1  1.168898

Keeping that in mind, let’s switch gears to the first of the change-score models.

Change-score models.

Instead of modeling post, \(y_{ti}\), we might instead want to focus on the change from pre to post, \(y_{ti} - y_{t - 1,i}\). When you subtract pre from post in your data set–like we did to make the change variable–, the product is often referred to as a change score or difference score, \(y_\Delta\). Though they’re conceptually intuitive and simple to compute, change scores have a long history of criticisms in the methodological literature, particularly around issues of reliability (see Lord & Novick, 1968; Rogosa et al., 1982; cf. Kisbu-Sakarya et al., 2013). Here we consider four change-score models simply as options.

\(\mathcal M_5 \colon\) The unconditional change-score model.

We’ve already saved that in our small_data_wide data as change. Here’s what the unconditional change-score model2 looks like:

\[ \begin{align*} \text{change}_i & \sim \operatorname{Normal}(\mu, \sigma) \\ \mu & = \beta_0, \end{align*} \]

where \(\beta_0\) is the expected value for \(\text{change}_i\). In the terms of the last model, \(\text{change}_i = \text{post}_i - \text{pre}_i\) or, in the terms of the simple autoregressive model, \(y_{\Delta i} = y_{ti} - y_{t - 1,i}\).

m5 <-
  brm(data = small_data_wide,
      change ~ 1,
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m5)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: change ~ 1 
##    Data: small_data_wide (Number of observations: 100) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.17      0.14     0.89     1.45 1.00     8075     6240
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.44      0.10     1.26     1.66 1.00     8673     6829
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Thus, this model suggests the average change from pre to post was about 1.2 units. We can compute the sample statistics for that in two ways.

small_data_wide %>% 
  summarise(change = mean(change),
            `post - pre` = mean(post - pre))
## # A tibble: 1 x 2
##   change `post - pre`
##    <dbl>        <dbl>
## 1   1.17         1.17

The major deficit in the unconditional change model is that change is disconnected from any reference points. We have no explicit way of knowing what number we changed from or what number we changed to. The next model offers a solution.

\(\mathcal M_6 \colon\) The conditional change-score model.

Instead of fitting an unconditional model of change, why not condition on the initial pre value? We might express this as

\[ \begin{align*} \text{change}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 \text{pre}_i, \end{align*} \]

where \(\beta_0\) is now the expected value for change when pre is at zero. As with the simple autoregressive model, centering the predictor pre at the mean or some other meaningful value can help make \(\beta_0\) more interpretable. Perhaps of greater interest, the \(\beta_1\) coefficient allows us to predict different levels of change, conditional in the initial values at pre.

m6 <-
  brm(data = small_data_wide,
      change ~ 1 + pre,
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m6)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: change ~ 1 + pre 
##    Data: small_data_wide (Number of observations: 100) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.05      0.12     0.82     1.28 1.00     9315     7108
## pre          -0.76      0.10    -0.96    -0.57 1.00    10559     7804
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.15      0.08     1.00     1.33 1.00     9745     7268
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Since the intercept in this model is the expected change value based on when pre == 0, it might be easiest to interpret that value when using the mean of pre.

fixef(m6)[1, 1] + fixef(m6)[2, 1] * mean(small_data_wide$pre)
## [1] 1.167458

That is the point prediction for the mean of change. Let’s compare that to the sample value.

small_data_wide %>% 
  summarise(mean = mean(change))
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  1.17

Of course, we can get a fuller summary using the fitted() method.

nd <- tibble(pre = mean(small_data_wide$pre))

fitted(m6, newdata = nd)
##      Estimate Est.Error      Q2.5   Q97.5
## [1,] 1.167458 0.1151598 0.9428394 1.39075

Note how the coefficient for pre is about -0.76. This is a rough analogue of the negative correlation among the intercepts and slopes in the original data-generating model.

rho
## [1] -0.5

It tells us something very similar; participants with higher values at pre tended to have lower change values. Much like with the simple autoregressive model, a deficit of this model is there is no explicit parameter for the expected value of pre, which we can amend by fitting a bivariate model.

\(\mathcal M_7 \colon\) The bivariate conditional change-score model.

The bivariate conditional change-score model follows the form

\[ \begin{align*} \text{change}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \text{pre}_i & \sim \operatorname{Normal}(\nu, \tau) \\ \mu_i & = \beta_0 + \beta_1 \text{pre}_i \\ \nu & = \gamma_0, \end{align*} \]

where the simple linear model of \(\text{change}_i\) conditional on \(\text{pre}_i\) is coupled with an unconditional intercept-only model for \(\text{pre}_i\). We can fit this model with brms by way of the multivariate syntax, where the two submodels are encased in bf() statements and we set set_rescor(rescor = FALSE) to omit a residual correlation between the two.

m7 <-
  brm(data = small_data_wide,
      bf(change ~ 1 + pre) +
        bf(pre ~ 1) +
        set_rescor(rescor = FALSE),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m7)
##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: change ~ 1 + pre 
##          pre ~ 1 
##    Data: small_data_wide (Number of observations: 100) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## change_Intercept     1.05      0.12     0.82     1.28 1.00    11908     7575
## pre_Intercept       -0.16      0.12    -0.39     0.07 1.00    13113     8030
## change_pre          -0.76      0.10    -0.96    -0.57 1.00    12023     7190
## 
## Family Specific Parameters: 
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_change     1.15      0.08     1.00     1.33 1.00    13064     7696
## sigma_pre        1.16      0.08     1.01     1.33 1.00    11419     7142
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Now we have an intercept for pre, we can use the model parameters to compute the expected values for pre, change, and post.

posterior_samples(m7) %>% 
  mutate(pre    = b_pre_Intercept,
         change = b_change_Intercept + b_change_pre * b_pre_Intercept) %>% 
  mutate(post = pre + change) %>% 
  pivot_longer(pre:post) %>% 
  group_by(name) %>% 
  mean_qi(value) %>% 
  mutate_if(is.double, round, digits = 2)
## # A tibble: 3 x 7
##   name   value .lower .upper .width .point .interval
##   <chr>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 change  1.17   0.88   1.45   0.95 mean   qi       
## 2 post    1.01   0.78   1.25   0.95 mean   qi       
## 3 pre    -0.16  -0.39   0.07   0.95 mean   qi

The posterior means are in the value column and the lower- and upper-levels of the percentile-based 95% intervals are in the .lower and .upper columns. Now compare those mean estimates with the sample means.

small_data_wide %>% 
  pivot_longer(-id) %>% 
  group_by(name) %>% 
  summarise(mean = mean(value)) %>% 
  mutate_if(is.double, round, digits = 2)
## # A tibble: 3 x 2
##   name    mean
##   <chr>  <dbl>
## 1 change  1.17
## 2 post    1.02
## 3 pre    -0.15

As handy as this model is, the \(\beta_1\) coefficient might not be in the most intuitive metric. Let’s reparameterize.

\(\mathcal M_8 \colon\) The bivariate correlational pre/change model.

The bivariate correlational pre/change model follows the form

\[ \begin{align*} \begin{bmatrix} \text{change}_i \\ \text{pre}_i \end{bmatrix} & \sim \operatorname{MVNormal} \left (\begin{bmatrix} \mu \\ \nu \end{bmatrix}, \mathbf \Sigma \right) \\ \mu & = \beta_0 \\ \nu & = \gamma_0 \\ \mathbf \Sigma & = \mathbf{SRS} \\ \mathbf S & = \begin{bmatrix} \sigma & 0 \\ 0 & \tau \end{bmatrix} \\ \mathbf R & = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}, \end{align*} \]

where means of both pre and change are modeled in intercept-only models and the association between the two is captured by the correlation \(\rho\). And again, because we have no predictors in for either variable, the “residual” correlation is really just a correlation. Here’s how to fit the model with brms.

m8 <-
  brm(data = small_data_wide,
      bf(change ~ 1) +
        bf(pre ~ 1) +
        set_rescor(rescor = TRUE),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m8)
##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: change ~ 1 
##          pre ~ 1 
##    Data: small_data_wide (Number of observations: 100) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## change_Intercept     1.17      0.15     0.88     1.45 1.00     8200     7551
## pre_Intercept       -0.15      0.12    -0.38     0.08 1.00     8160     7194
## 
## Family Specific Parameters: 
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_change     1.44      0.10     1.26     1.67 1.00     8811     7147
## sigma_pre        1.16      0.08     1.01     1.33 1.00     8996     7487
## 
## Residual Correlations: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## rescor(change,pre)    -0.60      0.06    -0.71    -0.46 1.00     8299     7233
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Now the parameter in the ‘Residual Correlations’ section of the summary output is a close analogue to our original data-generating rho parameter.

rho
## [1] -0.5

Those with higher pre values tended to have lower change values. We can look at that with a plot of the data.

p2 <-
  small_data_wide %>% 
  ggplot(aes(x = pre, y = change)) +
  geom_point() +
  stat_ellipse(color = "grey50")

(p1 + p2) &
  coord_cartesian(xlim = range(small_data_wide$pre),
                  ylim = range(small_data_wide$change))

Here we’ve placed the scatter plot of the data-generating id-level intercepts and slopes next to the scatter plot of the pre and change scores. They are not exactly the same, but the latter are a partial consequence of the former. This is why the correlation parameter in our m8 model closely, but not exactly, resembled the data-generating rho parameter.

Okay, let’s switch gears again.

Models modeling the criterion \(y_{ti}\) directly.

All of the models focusing on post or change used the wide version of the data, small_data_wide. The remaining models will all take advantage of the long data set, small_data_long, and take \(y_{ti}\) as the criterion. Consequently, most of these models will use some version of the multilevel model.

\(\mathcal M_9 \colon\) The grand-mean model.

The simplest model we might fit using the long version of the 2-timepoint data, small_data_long, is what we might call the grand-mean model, or what Hoffman (2015) called the between-person empty model. It follows the form

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu, \sigma) \\ \mu & = \beta_0, \end{align*} \]

where the intercept \(\beta_0\) is the expected value value for \(y\) across all \(i\) participants and \(t\) timepoints. We fit this with brm() much like we fit the unconditional post model and the unconditional change-score model.

m9 <-
  brm(data = small_data_long,
      y ~ 1 ,
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m9)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 1 
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.43      0.09     0.25     0.61 1.00     8486     6236
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.30      0.07     1.18     1.43 1.00     9455     6868
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We might check these with the sample statistics for y.

small_data_long %>% 
  summarise(mean = mean(y),
            sd = sd(y))
## # A tibble: 1 x 2
##    mean    sd
##   <dbl> <dbl>
## 1 0.431  1.29

Though this model did to a good job describing the population values for the mean and standard deviation for y, it did a terrible job telling us about change in y, about individual differences in that change, or about anything else of interest we might have as longitudinal researchers.

\(\mathcal M_{10} \colon\) The random-intercept model.

Now that we have a grand mean, we might want to ask what kinds of variables would help explain the variation around the grand mean. From a multilevel perspective, the first source of variation of interest will be across participants, which we can express by allowing the mean to vary by participant. This is what Hoffman called the within-person empty model and the empty means, random intercept model. It follows the form

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + u_{\text{id},i} \\ u_\text{id} & \sim \operatorname{Normal}(0, \sigma_\text{id}) \\ \sigma & = \sigma_\epsilon, \end{align*} \]

where \(\beta_0\) is now the grand mean among the participant-level means in \(y_{ti}\). The participant-level deviations from the grand mean are expressed as \(u_{\text{id},i}\), which is normally distributed with a mean at zero (these are deviations, after all) and a standard deviation of \(\sigma_\text{id}\). The \(\sigma_\epsilon\) parameter is a mixture of the variation within participants and over time. With brms, we can fit the random-intercept model model like so.

m10 <-
  brm(data = small_data_long,
      y ~ 1 + (1 | id),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000,
      control = list(adapt_delta = .99))
print(m10)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 1 + (1 | id) 
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.23      0.15     0.01     0.56 1.00     2079     4188
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.43      0.09     0.24     0.61 1.00    13669     6976
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.27      0.07     1.14     1.41 1.00     6211     5523
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Our \(\beta_0\) intercept returned a very similar estimate for the grand mean, but we now interpret it as the grand mean for the within-person means for y. The variation in the id-level deviations around the grand mean, which we called \(u_{\text{id},i}\), is summarized by \(\sigma_\text{id}\) in the ‘Group-Level Effects’ section of the summary output.

Authors of many longitudinal text books (e.g., Hoffman, 2015; Singer & Willett, 2003) typically present this model as a way to directly compare the between- and within-person variation in the data by way of the intraclass correlation coefficient (ICC),

\[\text{ICC} = \frac{\text{between-person variance}}{\text{total variance}} = \frac{\sigma_\text{id}^2}{\sigma_\text{id}^2 + \sigma_\epsilon^2}.\]

When using frequentist methods, the ICC is typically expressed with a point estimate. When working with all our posterior draws, we can get full posterior distributions within the Bayesian framework.

posterior_samples(m10) %>% 
  mutate(icc = sd_id__Intercept^2 / (sd_id__Intercept^2 + sigma^2)) %>% 
  ggplot(aes(x = icc, y = 0)) +
  stat_halfeye(.width = .95) +
  scale_x_continuous("Intraclass correlation coefficient (ICC)", 
                     breaks = 0:5 / 5, expand = c(0, 0), limits = 0:1) +
  scale_y_continuous(NULL, breaks = NULL)

The ICC is a proportion, which limits it to the range of zero to one. Here it suggests that 0–20% of the variation in our data is due to differences between participants; the remaining variation occurs within them. Given that our data were collected across time, it might make sense to fit a model that explicitly accounts for time.

\(\mathcal M_{11} \colon\) The cross-classified model.

A direct extension of the random-intercept model is the cross-classified multilevel model (McElreath, 2015, Chapter 12), which we might express as

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma) \\ \mu_{ti} & = \beta_0 + u_{\text{id},i} + u_{\text{time},i} \\ u_\text{id} & \sim \operatorname{Normal}(0, \sigma_\text{id}) \\ u_\text{time} & \sim \operatorname{Normal}(0, \sigma_\text{time}) \\ \sigma & = \sigma_\epsilon, \end{align*} \]

where \(\sigma_\text{id}\) captures systemic differences between participants, \(\sigma_\text{time}\) captures systemic variation across the two timepoints, and \(\sigma_\epsilon\) captures the variation within participants over time. Another way to think of this model is as a Bayesian multilevel version of the repeated-measures ANOVA3, where variance is partitioned into a between level (\(\sigma_\text{id} \approx \text{SS}_\text{between}\)), a model level (\(\sigma_\text{time} \approx \text{SS}_\text{model}\)), and error (\(\sigma_\epsilon \approx \text{SS}_\text{error}\)).

m11 <-
  brm(data = small_data_long,
      y ~ 1 + (1 | time) + (1 | id),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000,
      control = list(adapt_delta = .9999,
                     max_treedepth = 11))
print(m11)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 1 + (1 | time) + (1 | id) 
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.51      0.16     0.14     0.78 1.00     1519     1545
## 
## ~time (Number of levels: 2) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.67      1.30     0.41     5.00 1.00     5240     7037
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.46      1.07    -1.78     2.81 1.00     4541     4935
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.04      0.08     0.90     1.20 1.00     2313     3750
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The intercept is the grand mean across all measures of y. The first row in the ‘Group-Level Effects’ section is our summary for \(\sigma_\text{id}\), which gives us a sense of the variation between participants in their overall tendencies in the criterion y. If we use the posterior_samples(), we can even look at the posteriors for the \(u_{\text{id},i}\) parameters, themselves.

posterior_samples(m11) %>% 
  pivot_longer(starts_with("r_id")) %>% 
  
  ggplot(aes(x = value, y = reorder(name, value))) +
  stat_pointinterval(point_interval = mean_qi, .width = .95, size = 1/6) +
  scale_y_discrete(expression(italic(i)), breaks = NULL) +
  labs(subtitle = expression(sigma[id]~is~the~summary~of~the~variation~across~these),
       x = expression(italic(u)[id][','*italic(i)]))

Given that each of the \(u_{\text{id},i}\) parameters is based primarily on two data points (the two data points per participant), it should be no surprise they are fairly wide. Even a few more measurement occasions within participants will narrow them substantially. If you fit the same model using the original 6-timepoint data, you’ll see the 95% intervals are almost half as wide.

Perhaps of greater interest are the \(u_{\text{time},i}\) parameters. If you combine them with the intercept, you’ll get the model-based expected values at both timepoints.

posterior_samples(m11) %>% 
  transmute(pre  = b_Intercept + `r_time[0,Intercept]`,
            post = b_Intercept + `r_time[1,Intercept]`) %>% 
  pivot_longer(everything()) %>% 
  group_by(name) %>% 
  mean_qi(value) %>% 
  mutate_if(is.double, round, digits = 2)
## # A tibble: 2 x 7
##   name  value .lower .upper .width .point .interval
##   <chr> <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 post   1.01   0.78   1.24   0.95 mean   qi       
## 2 pre   -0.15  -0.38   0.09   0.95 mean   qi

The final variance parameter, \(\sigma_\text{time}\), captures the within-participant variation over time. With a model like this, it seems natural to directly compare the magnitudes of the three variance parameters, which answers the question: Where’s the variance at? Here we’ll do so with a plot.

posterior_samples(m11) %>% 
  select(sd_id__Intercept:sigma) %>% 
  set_names("sigma[id]", "sigma[time]", "sigma[epsilon]") %>% 
  pivot_longer(everything()) %>% 
  mutate(name = factor(name, levels = c("sigma[epsilon]", "sigma[time]", "sigma[id]"))) %>%
  
  ggplot(aes(x = value, y = name)) +
  tidybayes::stat_halfeye(.width = .95, size = 1, normalize = "xy") +
  scale_x_continuous("marginal posterior", expand = expansion(mult = c(0, 0.05)), breaks = c(0, 1, 2, 5)) +
  scale_y_discrete(NULL, labels = ggplot2:::parse_safe) +
  coord_cartesian(xlim = c(0, 5.25),
                  ylim = c(1.5, 3.5)) +
  theme(axis.text.y = element_text(hjust = 0))

At first glance, it might be surprising how wide the \(\sigma_\text{time}\) posterior is compared to the other two. Yet recall this parameter is summarizing the standard deviation of only two levels. If you have experience with multilevel models, you’ll know that it can be difficult to estimate a variance parameter with few levels–two levels is the extreme lower limit. This is why we had to fiddle with the adapt_delta and max_treedepth parameters within the brm() function to get the model to sample properly. Though we pulled this off using default priors, don’t be surprised if you have to use tighter priors when fitting a model like this.

\(\mathcal M_{12} \colon\) The simple liner model.

The last three models focused on the grand mean and sources of variance around that grand mean. A more familiar looking approach might be to fit a simple linear model with y conditional on time,

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma) \\ \mu_{ti} & = \beta_0 + \beta_1 \text{time}_{ti}, \end{align*} \]

where the intercept \(\beta_0\) is the expected value at the first timepoint and \(\beta_1\) captures the change in \(y_{ti}\) for the final timepoint.

m12 <-
  brm(data = small_data_long,
      y ~ 1 + time,
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m12)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 1 + time 
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.15      0.12    -0.38     0.08 1.00     9459     7344
## time          1.17      0.16     0.85     1.49 1.00     9582     7294
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.16      0.06     1.05     1.28 1.00     9982     7362
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Now our \(\beta_0\) and \(\beta_1\) parameters are the direct pre/post single-level analogues to the population-level \(\beta_0\) and \(\beta_1\) parameters from our original multilevel model based on the full 6-timepoint data set.

fixef(m0) %>% round(digits = 2)
##           Estimate Est.Error  Q2.5 Q97.5
## Intercept    -0.13      0.11 -0.33  0.08
## time          1.16      0.13  0.91  1.42

The major deficit in this model, which is the reason you’ll see it criticized in the methodological literature, is it ignores how the y values are nested within levels of id. The only variance parameter, \(\sigma\), was estimated under the typical assumption that the residuals are all independent of one another. Sure, the model formula accounted for the overall trend in time, but it ignored the insights revealed from many of the other models the capture between-participant correlations in intercepts and slopes. This means that if you know something about the value of one’s residual for when time == 0, you’ll also know something about where to expect their residual for when time == 1. The two are not independent. As long as we’re working with the data in the long format, we’ll want to account for this, somehow.

\(\mathcal M_{13} \colon\) The liner model with a random intercept.

A natural first step to accounting for how the y values are nested within levels of id is to fit a random-intercept model, or what Hoffman (2015) called the fixed linear time, random intercept model. It follows the form

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma) \\ \mu_{ti} & = \beta_{0i} + \beta_1 \text{time}_{ti} \\ \beta_{0i} & = \gamma_0 + u_{0i} \\ u_{0i} & \sim \operatorname{Normal}(0, \sigma_0) \\ \sigma & = \sigma_\epsilon, \end{align*} \]

where \(\beta_{0i}\) is the intercept and \(\beta_1\) is the time slope. Although this parameterization holds the variation in slopes constant across participants, between-participant variation is at least captured in \(\beta_{0i}\), which is decomposed into a grand mean, \(\gamma_0\), and participant-level deviations around that grand mean, \(u_{0i}\). Those participant-level deviations are summarized by the \(\sigma_0\) parameter. In this model, \(\sigma_\epsilon\) is a mixture of within-participant variation and between-participant variation in slopes.

m13 <-
  brm(data = small_data_long,
      y ~ 1 + time + (1 | id),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m13)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 1 + time + (1 | id) 
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.52      0.15     0.14     0.78 1.00     1645     1960
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.15      0.12    -0.38     0.08 1.00     9441     7715
## time          1.17      0.15     0.88     1.46 1.00    14821     6993
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.03      0.08     0.90     1.19 1.00     2403     4360
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The \(\beta\) parameters are very similar to those from the simple linear model, above.

fixef(m12) %>% round(digits = 2)
##           Estimate Est.Error  Q2.5 Q97.5
## Intercept    -0.15      0.12 -0.38  0.08
## time          1.17      0.16  0.85  1.49

But now look at the size of \(\sigma_0\), which suggests substantial differences in staring points. To get a sense of what this means, we’ll plot all 100 participant-level trajectories with a little help from fitted().

nd <- distinct(small_data_long, id, time)

fitted(m13, 
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  ggplot(aes(x = time, y = Estimate, group = id)) +
  geom_abline(intercept = fixef(m13)[1, 1],
              slope = fixef(m13)[2, 1],
              size = 3, color = "blue") +
  geom_line(size = 1/4, alpha = 2/3) +
  scale_x_continuous(breaks = 0:1) +
  labs(subtitle = "Random intercepts, fixed slope",
       y = "y") +
  coord_cartesian(ylim = c(-1, 2))

The bold blue line in the middle is based on the population-level intercept and slope, whereas the thinner black lines are the participant-level trajectories. To keep from overplotting, we’re only showing the posterior means, here. Because we only allowed the intercept to vary across participants, all the slopes are identical. And indeed, look at all the variation we see in the intercepts–an insight lacking in the simple linear model.

\(\mathcal M_{14} \colon\) The liner model with a random slope.

The counterpoint to the last model is to allow the time slopes, but not the intercepts, vary across participants:

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma) \\ \mu_{ti} & = \beta_0 + \beta_{1i} \text{time}_{ti} \\ \beta_{1i} & = \gamma_1 + u_{1i} \\ u_{1i} & \sim \operatorname{Normal}(0, \sigma_1) \\ \sigma & = \sigma_\epsilon, \end{align*} \]

where \(\beta_0\) is the intercept for all participants. Now \(\beta_{1i}\) is the population mean for the distribution of slopes, which vary across participants, the standard deviation for which is measured by \(\sigma_1\).

m14 <-
  brm(data = small_data_long,
      y ~ 1 + time + (0 + time | id),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000,
      control = list(adapt_delta = .95))
print(m14)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 1 + time + (0 + time | id) 
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(time)     0.33      0.21     0.02     0.75 1.00     2240     4641
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.15      0.11    -0.38     0.08 1.00    20513     7195
## time          1.17      0.16     0.85     1.49 1.00    18079     7133
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.13      0.06     1.01     1.26 1.00     6825     6149
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Here’s random-slopes alternative to the random-intercepts plot from the last section.

fitted(m14, 
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  ggplot(aes(x = time, y = Estimate, group = id)) +
  geom_abline(intercept = fixef(m14)[1, 1],
              slope = fixef(m14)[2, 1],
              size = 3, color = "blue") +
  geom_line(size = 1/4, alpha = 2/3) +
  scale_x_continuous(breaks = 0:1) +
  labs(subtitle = "Fixed intercept, random slopes",
       y = "y") +
  coord_cartesian(ylim = c(-1, 2))

But why choose between random intercepts or random slopes?

\(\mathcal M_{15} \colon\) The multilevel growth model with regularizing priors.

If you were modeling 2-timepoint data with conventional frequentist estimators (e.g., maximum likelihood), you can have random intercepts or random slopes, but you can’t have both; that would require data from three timepoints or more. But because Bayesian models bring in extra information by way of the priors, you can actually fit a full multilevel growth model with both random intercepts and slopes:

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma_\epsilon ) \\ \mu_{ti} & = \beta_0 + \beta_1 \text{time}_{ti} + u_{0i} + u_{1i} \text{time}_{ti} \\ \begin{bmatrix} u_{0i} \\ u_{1i} \end{bmatrix} & \sim \operatorname{MVNormal} \left (\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \mathbf \Sigma \right) \\ \mathbf \Sigma & = \mathbf{SRS} \\ \mathbf S & = \begin{bmatrix} \sigma_0 & 0 \\ 0 & \sigma_1 \end{bmatrix} \\ \mathbf R & = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}. \end{align*} \]

The trick is you have to go beyond the diffuse brms default settings for the priors for \(\sigma_0\) and \(\sigma_1\). If you have high-quality information from theory or previous studies, you can base the priors on those. Another approach is to use regularizing priors. Given standardized data, members of the Stan team like either \(\operatorname{Normal}^+(0, 1)\) or \(\operatorname{Student-t}^+(3, 0, 1)\) for variance parameters (see the Generic prior for anything section from the Prior choice recommendations wiki). In the second edition of his text, McElreath generally favored the \(\operatorname{Exponential}(1)\) prior (McElreath, 2020), which is the approach we’ll experiment with, here. It’ll also help if we use a regularizing prior on the correlation among the intercepts and slopes, \(\rho\).

m15 <-
  brm(data = small_data_long,
      y ~ 1 + time + (1 + time | id),
      seed = 1,
      prior = prior(exponential(1), class = sd) +
        prior(lkj(4), class = cor),
      cores = 4, chains = 4, iter = 3500, warmup = 1000,
      control = list(adapt_delta = .9995))

Even with our tighter priors, we still had to adjust the adapt_delta parameter to improve the quality of the MCMC sampling. Take a look at the model summary.

print(m15)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 1 + time + (1 + time | id) 
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)           0.52      0.18     0.12     0.84 1.01     1012     1473
## sd(time)                0.31      0.23     0.01     0.88 1.00     1021      876
## cor(Intercept,time)    -0.05      0.33    -0.63     0.59 1.00     4039     6272
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.15      0.11    -0.38     0.07 1.00    10022     8348
## time          1.17      0.15     0.88     1.46 1.00    13870     7175
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.01      0.10     0.80     1.18 1.00     1060      946
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Though the posteriors, particularly for the \(\sigma\) and \(\rho\) parameters, are not as precise as with the 6-timepoint data, we now have a model with a summary mirroring the structure of the data-generating model. Yet compared to the data-generating values, the estimates for \(\sigma_1\) and \(\rho\) are particularly biased toward zero. Here’s a look at the trajectories.

fitted(m15,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  ggplot(aes(x = time, y = Estimate, group = id)) +
  geom_abline(intercept = fixef(m15)[1, 1],
              slope = fixef(m15)[2, 1],
              size = 3, color = "blue") +
  geom_line(size = 1/4, alpha = 2/3) +
  scale_x_continuous(breaks = 0:1) +
  labs(subtitle = "Random intercepts AND random slopes\n(2-timepoint data)",
       y = "y") +
  coord_cartesian(ylim = c(-1, 2))

For comparision, here’s the plot for the original 6-timepoint model.

fitted(m0,
       newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  
  ggplot(aes(x = time, y = Estimate, group = id)) +
  geom_abline(intercept = fixef(m0)[1, 1],
              slope = fixef(m0)[2, 1],
              size = 3, color = "blue") +
  geom_line(size = 1/4, alpha = 2/3) +
  scale_x_continuous(breaks = 0:1) +
  labs(subtitle = "Random intercepts AND random slopes\n(6-timepoint data)",
       y = "y") +
  coord_cartesian(ylim = c(-1, 2))

There wasn’t enough information in the 2-timepoint data set to capture the complexity in the full 6-timepoint data set.

\(\mathcal M_{16} \colon\) The fixed effects with correlated error model.

Though our Bayesian 2-timepoint version of the full multilevel growth model was exciting, it’s not generally used in the wild. Even with our tighter regularizing priors, there just wasn’t enough information in the data to do the model justice. A very different and humbler approach is to combine the simple linear model with the autoregressive model,

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \mathbf \Sigma) \\ \mu_{ti} & = \beta_0 + \beta_1 \text{time}_{ti} \\ \mathbf \Sigma & = \mathbf{SRS} \\ \mathbf S & = \begin{bmatrix} \sigma & 0 \\ 0 & \sigma \end{bmatrix} \\ \mathbf R & = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}, \end{align*} \]

where \(\rho\) captures the correlation between the responses in the two timepoints, \(t\) and \(t - 1\), which is an alternative to the way the mixed model from above handles the dependencies (a.k.a heteroskedasticity) inherent in longitudinal data. Note how \(\mathbf \Sigma\) in this model is defined very differently from the full multilevel growth model from above. To fit this model with brms, we use the ar() syntax.

m16 <-
  brm(data = small_data_long,
      y ~ time + ar(time = time, p = 1, gr = id),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m16)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ time + ar(time = time, p = 1, gr = id) 
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Correlation Structures:
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## ar[1]     0.23      0.10     0.04     0.43 1.00     9766     7458
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.15      0.12    -0.38     0.07 1.00    10593     7136
## time          1.17      0.15     0.89     1.46 1.00     9607     6546
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.15      0.06     1.04     1.27 1.00    10604     7665
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

This summary suggests that, after you account for the linear trend, the correlation between \(y_{ti}\) and \(y_{t - 1,i}\) is about 0.23. Though we don’t get id-specific variance parameters, this model does account for the nonindependence of the data over time. If you scroll back up, notice how similar this is to the correlation from the bivariate correlational model, m4.

The models with robust variance parameters.

In the social sciences, many of our theories and statistical models are comparisons of or changes in group means. Every model in this blog post uses the normal likelihood, which parameterizes the criterion in terms of \(\mu\) and \(\sigma\). Every time we added some kind of linear model, we focused that model around the \(\mu\). But contemporary Bayesian software allows us to model the \(\sigma\) parameter, too. Within the brms framework, Bürkner (2021a) calls these distributional models. The final four models under consideration all use some form of the distributional modeling syntax to relax unnecessarily restrictive assumptions on the variance parameters. Though this section is not exhaustive, it should give a sense of how flexible this approach can be.

\(\mathcal M_{17} \colon\) The cross-classified model with robust variances for discrete time.

One of the criticisms of the conventional repeated measures ANOVA approach is how it presumes the variances in \(y\) are constant over time. However, we can relax that constraint with a model like

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma_{ti}) \\ \mu_{ti} & = \beta_0 + u_{\text{id},i} + u_{\text{time},t} \\ \log (\sigma_{ti}) & = \eta_{\text{time},t} \\ u_\text{id} & \sim \operatorname{Normal}(0, \sigma_\text{id}) \\ u_\text{time} & \sim \operatorname{Normal}(0, \sigma_\text{time}), \end{align*} \]

where we are now modeling both parameters in the likelihood, \(\mu\) AND \(\sigma\). The second line shows a typical-looking model for \(\mu_{ti}\). All the excitement lies in the third line, which contains the linear model for \(\log (\sigma_{ti})\). The reason we are modeling \(\log (\sigma_{ti})\) rather than directly modeling \(\sigma\) is to avoid solutions that predict negative values for \(\sigma\). For this model, it’s unlikely we’d run into that problem. But since the brms default is to use the log link anytime we model \(\sigma\) within the distributional modeling syntax, we’ll just get used to the log link right from the start. If you are unfamiliar with link functions, they’re widely used within the generalized linear modeling framework. Logistic regression with the logit link and Poisson regression with the log link are two widely-used examples. For more on link functions and the generalized linear model, check out the texts by Agresti (2015); Gelman, Hill, and Vehtari (2020); and McElreath (2020, 2015).

Anyway, the linear model for \(\mu_{ti}\) is exactly the same as with the original cross-classified model, m11; it includes a grand mean (\(\beta_0\)) and two kinds of deviations around that grand mean (\(u_{\text{id},i}\) and \(u_{\text{time},t}\)). The model for \(\log (\sigma_{ti})\) contains an intercept, which varies across the two levels of time, \(\eta_{\text{time},t}\). Here’s how to fit the model with brms::brm().

m17 <-
  brm(data = small_data_long,
      bf(y ~ 1 + (1 | time) + (1 | id),
         sigma ~ 0 + factor(time)),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000,
      control = list(adapt_delta = .999,
                     max_treedepth = 12))
print(m17)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: y ~ 1 + (1 | time) + (1 | id) 
##          sigma ~ 0 + factor(time)
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.53      0.15     0.15     0.79 1.00     1376     1194
## 
## ~time (Number of levels: 2) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.66      1.26     0.41     5.00 1.00     5133     6854
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept             0.45      1.07    -1.78     2.66 1.00     4653     5056
## sigma_factortime0     0.01      0.10    -0.18     0.20 1.00     3068     5577
## sigma_factortime1     0.03      0.10    -0.17     0.22 1.00     2678     4488
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Even though we didn’t explicitly ask to use the log link in our brm() syntax, you can look at the second line in the print() output to see that it was automatically used. Though I won’t explore how to do so, here, one can fit this model without the log link. Anyway, the primary focus in this model is the sigma_factortime0 and sigma_factortime1 lines in the ‘Population-Level Effects’ section of the output. Those are the summaries for the \(\sigma\) parameters, conditional on whether time == 0 or time == 1. Though is might be difficult to evaluate parameters on the log scale, we can simply exponentiate them to convert them back to their natural metric.

fixef(m17)[2:3, c(1, 3:4)] %>% exp()
##                   Estimate      Q2.5    Q97.5
## sigma_factortime0 1.013928 0.8352872 1.223336
## sigma_factortime1 1.032290 0.8446723 1.247435

In this case, it looks like the two parameters are largely overlapping. If we work directly with the posterior draws, we can compute a formal difference score and plot the results.

posterior_samples(m17) %>% 
  mutate(`sigma[time==0]` = exp(b_sigma_factortime0),
         `sigma[time==1]` = exp(b_sigma_factortime1),
         `sigma[time==1]-sigma[time==0]` = exp(b_sigma_factortime1) - exp(b_sigma_factortime0)) %>% 
  pivot_longer(`sigma[time==0]`:`sigma[time==1]-sigma[time==0]`) %>% 
  
  ggplot(aes(x = value, y = name)) + 
  stat_halfeye(.width = .95) +
  scale_y_discrete(NULL, labels = ggplot2:::parse_safe) +
  coord_cartesian(ylim = c(1.5, 3.1)) +
  xlab("marginal posterior") +
  theme(axis.text.y = element_text(hjust = 0))

In this case, it looks like there was little difference between \(\sigma_{\text{time} = 0}\) and \(\sigma_{\text{time} = 1}\). This shouldn’t be a surprise; we simulated that data that way. However, it won’t always be like this in real-world data.

\(\mathcal M_{18} \colon\) The simple liner model with robust variance for linear time.

Our cross-classified approach treated time as a factor. Here we’ll treat it as a continuous variable in the models of both \(\mu\) and \(\sigma\). This will be a straight extension of the simple linear model, m12, following the form

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma_{ti}) \\ \mu_{ti} & = \beta_0 + \beta_1 \text{time}_{ti} \\ \log(\sigma_{ti}) & = \eta_0 + \eta_1 \text{time}_{ti}, \end{align*} \]

where the intercept \(\beta_0\) is the expected value at the first timepoint and \(\beta_1\) captures the change in \(y_i\) for the final timepoint. Now \(\log(\sigma_{ti})\) has a similar linear model, where \(\eta_0\) is the expected log of the standard deviation at the first timepoint and \(\eta_1\) captures the change standard deviation for the final timepoint.

m18 <-
  brm(data = small_data_long,
      bf(y ~ 1 + time, 
         sigma ~ 1 + time),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)
print(m18)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: y ~ 1 + time 
##          sigma ~ 1 + time
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept          -0.15      0.11    -0.38     0.06 1.00    11620     6673
## sigma_Intercept     0.14      0.07     0.00     0.28 1.00    11090     7442
## time                1.17      0.16     0.85     1.49 1.00    11137     7818
## sigma_time          0.01      0.10    -0.18     0.21 1.00    11157     7620
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Even though this model looks very different from the last one, we can wrangle the posterior draws a little to make a similar plot comparing \(\sigma\) at the two timepoints.

posterior_samples(m18) %>% 
  mutate(`sigma[time==0]` = exp(b_sigma_Intercept),
         `sigma[time==1]` = exp(b_sigma_Intercept + b_sigma_time * 1),
         `sigma[time==1]-sigma[time==0]` = exp(b_sigma_Intercept) - exp(b_sigma_Intercept + b_sigma_time)) %>% 
  pivot_longer(`sigma[time==0]`:`sigma[time==1]-sigma[time==0]`) %>% 
  
  ggplot(aes(x = value, y = name)) + 
  stat_halfeye(.width = .95) +
  scale_y_discrete(NULL, labels = ggplot2:::parse_safe) +
  coord_cartesian(ylim = c(1.5, 3.1)) +
  xlab("marginal posterior") +
  theme(axis.text.y = element_text(hjust = 0))

Though this model is robust to differences in \(\sigma\) based on timepoint, it still ignores systemic differences across participants. The next model tackles that that limitation in spades.

\(\mathcal M_{19} \colon\) The liner model with correlated random intercepts for \(\mu\) and \(\sigma.\)

Here we return to the multilevel model framework to accommodate participant-level differences for both \(\mu\) and \(\sigma\):

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma_{ti}) \\ \mu_{ti} & = \beta_0 + \beta_1 \text{time}_{ti} + u_{0i} \\ \log(\sigma_{ti}) & = \eta_0 + u_{2i} \\ \begin{bmatrix} u_{0i} \\ u_{2i} \end{bmatrix} & \sim \operatorname{MVNormal} \left (\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \mathbf \Sigma \right) \\ \mathbf \Sigma & = \mathbf{SRS} \\ \mathbf S & = \begin{bmatrix} \sigma_0 & 0 \\ 0 & \sigma_2 \end{bmatrix} \\ \mathbf R & = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}, \end{align*} \]

where \(\beta_0\) is the grand mean for the intercepts and \(u_{0i}\) are the participant-level deviations around that grand mean. \(\beta_1\) is the time slope, which is invariant across all participants in this model. The \(\eta_0\) parameter is the grand mean for the log standard deviations and \(u_{2i}\) captures the participant-level deviations around that grand mean. In the fourth line, we learn that \(u_{0i}\) and \(u_{2i}\) are multivariate normal, with a mean vector of two zeros and a variance/covariance matrix \(\mathbf \Sigma\). As is typical within the brms framework, we decompose \(\mathbf \Sigma\) into a variance matrix \(\mathbf S\) and correlation matrix \(\mathbf R\). Of particular interest is the \(\rho\) parameter, which captures the correlation in the participant-level intercepts and participant-level standard deviations. Here’s how to fit the model with brms.

m19 <-
  brm(data = small_data_long,
      bf(y ~ 1 + time + (1 |x| id),
         sigma ~ 1 + (1 |x| id)),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000,
      control = list(adapt_delta = .9))

You may have noticed the |x| parts in the formula lines for y and sigma. What that did was tell brms we wanted those parameters to be correlated. That is, that’s how we estimated the \(\rho\) parameter. There was nothing special about including x between the vertical lines. We could have used any other character. The important thing is that we used the same character in both. Anyway, here’s the model summary.

print(m19)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: y ~ 1 + time + (1 | x | id) 
##          sigma ~ 1 + (1 | x | id)
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##                                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                      0.52      0.16     0.14     0.78 1.00     1496     1443
## sd(sigma_Intercept)                0.11      0.08     0.01     0.29 1.00     3032     4589
## cor(Intercept,sigma_Intercept)    -0.34      0.49    -0.97     0.80 1.00     5949     6097
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept          -0.15      0.12    -0.38     0.08 1.00    10102     7923
## sigma_Intercept     0.02      0.08    -0.13     0.17 1.00     2344     3931
## time                1.17      0.15     0.88     1.45 1.00    14868     8009
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Once again, brms used the log link for \(\sigma\). If you want to see \(\eta_0\) in its natural \(\sigma\) metric, exponentiate.

fixef(m19)["sigma_Intercept", c(1, 3:4)] %>% exp()
##  Estimate      Q2.5     Q97.5 
## 1.0171275 0.8759026 1.1849572

The sd(sigma_Intercept) row in the ‘Group-Level Effects’ section shows the variation in those \(\log \sigma\)’s. It might be easier to appreciate them in a plot.

posterior_samples(m19) %>% 
  pivot_longer(starts_with("r_id__sigma")) %>% 
  mutate(sigma_i = exp(sd_id__sigma_Intercept + value)) %>% 
  
  ggplot(aes(x = sigma_i, y = reorder(name, sigma_i))) +
  stat_pointinterval(point_interval = mean_qi, .width = .95, size = 1/6) +
  scale_y_discrete(expression(italic(i)), breaks = NULL) +
  labs(subtitle = expression(sigma[2]~summarizes~the~variation~across~these),
       x = expression(sigma[italic(i)]))

In this case, there was not a lot of variation in \(\sigma\) across participants. This is because we simulated the data that way. Though it may be hard to model participant-level variances with 2-timepoint data, I have found it comes in handy in real-world data sets based on more measurement occasions.

Finally, it might be useful to consider our \(\rho\) parameter, which suggested a mild negative correlation between the \(u_{0i}\) and \(u_{2i}\) deviations. Here’s how you might visualize that in a plot.

bind_rows(
  posterior_samples(m19) %>% select(starts_with("r_id[")) %>% set_names(1:100),
  posterior_samples(m19) %>% select(starts_with("r_id__sigma") %>% set_names(1:100))
) %>% 
  mutate(iter = rep(1:c(n() / 2), times = 2),
         type = rep(c("intercept", "log_sigma"), each = n() / 2)) %>% 
  pivot_longer(-c(iter, type)) %>% 
  pivot_wider(names_from = type, values_from = value) %>% 
  
  ggplot(aes(x = intercept, y = log_sigma, group = name)) +
  stat_ellipse(geom = "polygon", level = .01, alpha = 1/4) +
  labs(x = expression(italic(u)[0][italic(i)]),
       y = expression(log(italic(u)[2][italic(i)])))

Each of the ovals is a 1% ellipse of the bivariate posterior for \(u_{0i}\) and \(\log(u_{2i})\). Notice how using ellipses helps reveal the differences in the between- and within-person patterns.

This approach where residual variance parameters vary across participants has its origins in the work of Donald Hedeker and colleagues (Hedeker et al., 2008, 2012). More recently, Philippe Rast and colleagues (particularly graduate student, Donald Williams) have adapted this approach for use within the Stan/brms ecosystem (Williams, Liu, et al., 2019; Williams, Rouder, et al., 2019; Williams, Martin, et al., 2019; Williams, Zimprich, et al., 2019).

\(\mathcal M_{20} \colon\) The liner model with a random slope for \(\mu\) and uncorrelated random intercept for \(\sigma\).

Though we can find interesting things when we allow the random components in the \(\mu\) and \(\sigma\) models, we don’t have to think of them as covarying. Here we fit an extension of the linear model with a random time slope, where we add an orthogonal random intercept for \(\sigma\):

\[ \begin{align*} y_{ti} & \sim \operatorname{Normal}(\mu_{ti}, \sigma_{ti}) \\ \mu_{ti} & = \beta_0 + \beta_1 \text{time}_{ti} + u_{1i} \\ \log(\sigma_{ti}) & = \eta_0 + u_{2i} \\ u_{1i} & \sim \operatorname{Normal}(0, \sigma_1) \\ u_{2i} & \sim \operatorname{Normal}(0, \sigma_2), \end{align*} \]

where the two random components, \(u_{1i}\) and \(u_{2i}\), are now modeled with separate normal distributions, \(\operatorname{Normal}(0, \sigma_1)\) and \(\operatorname{Normal}(0, \sigma_2)\).

m20 <-
  brm(data = small_data_long,
      bf(y ~ 1 + time + (0 + time | id),
         sigma ~ 1 + (1 | id)),
      seed = 1,
      cores = 4, chains = 4, iter = 3500, warmup = 1000)

Note that in sharp contrast with our syntax for the previous model, this time we did not employ the |x| syntax in the formula lines for y and sigma. By omitting the |x| syntax, we omitted the correlation among those two random effects. Here’s the summary.

print(m20)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: y ~ 1 + time + (0 + time | id) 
##          sigma ~ 1 + (1 | id)
##    Data: small_data_long (Number of observations: 200) 
## Samples: 4 chains, each with iter = 3500; warmup = 1000; thin = 1;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~id (Number of levels: 100) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(time)                0.33      0.21     0.02     0.75 1.00     2031     4346
## sd(sigma_Intercept)     0.12      0.09     0.00     0.32 1.00     2463     3519
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept          -0.15      0.11    -0.37     0.07 1.00    13451     7317
## sigma_Intercept     0.11      0.06    -0.02     0.23 1.00     4612     4412
## time                1.18      0.16     0.86     1.49 1.00    12600     7054
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We can see the \(\sigma_1\) and \(\sigma_2\) summaries in the ‘Group-Level Effects’ section. It’s hard to compare them directly, because one is based on parameters in the log metric. But we can at least get a sense of what these parameters are summarizing by plotting the bivariate posterior for \(u_{1i}\) and \(\log(u_{2i})\).

bind_rows(
  posterior_samples(m20) %>% select(starts_with("r_id[")) %>% set_names(1:100),
  posterior_samples(m20) %>% select(starts_with("r_id__sigma") %>% set_names(1:100))
) %>% 
  mutate(iter = rep(1:c(n() / 2), times = 2),
         type = rep(c("slope", "log_sigma"), each = n() / 2)) %>% 
  pivot_longer(-c(iter, type)) %>% 
  pivot_wider(names_from = type, values_from = value) %>% 
  
  ggplot(aes(x = slope, y = log_sigma, group = name)) +
  stat_ellipse(geom = "polygon", level = .01, alpha = 1/4) +
  labs(x = expression(italic(u)[1][italic(i)]),
       y = expression(log(italic(u)[2][italic(i)])))

Notice how, this time, the 1% ellipses suggest no clear association between these two dimensions.

Next steps

As promised, here I recommend some resources for understanding the models in this post.

Books focusing on longutidinal data analysis.

Books introducing regression.

Session info

sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.1.1 tidybayes_2.3.1 brms_2.15.0     Rcpp_1.0.6      forcats_0.5.1   stringr_1.4.0  
##  [7] dplyr_1.0.5     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3     tibble_3.1.0    ggplot2_3.3.3  
## [13] tidyverse_1.3.0
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1         backports_1.2.1      plyr_1.8.6           igraph_1.2.6        
##   [5] splines_4.0.4        svUnit_1.0.3         crosstalk_1.1.0.1    TH.data_1.0-10      
##   [9] rstantools_2.1.1     inline_0.3.17        digest_0.6.27        htmltools_0.5.1.1   
##  [13] rsconnect_0.8.16     fansi_0.4.2          magrittr_2.0.1       modelr_0.1.8        
##  [17] RcppParallel_5.0.2   matrixStats_0.57.0   xts_0.12.1           sandwich_3.0-0      
##  [21] prettyunits_1.1.1    colorspace_2.0-0     rvest_0.3.6          ggdist_2.4.0.9000   
##  [25] haven_2.3.1          xfun_0.22            callr_3.5.1          crayon_1.4.1        
##  [29] jsonlite_1.7.2       lme4_1.1-25          survival_3.2-10      zoo_1.8-8           
##  [33] glue_1.4.2           gtable_0.3.0         emmeans_1.5.2-1      V8_3.4.0            
##  [37] distributional_0.2.2 pkgbuild_1.2.0       rstan_2.21.2         abind_1.4-5         
##  [41] scales_1.1.1         mvtnorm_1.1-1        DBI_1.1.0            miniUI_0.1.1.1      
##  [45] xtable_1.8-4         stats4_4.0.4         StanHeaders_2.21.0-7 DT_0.16             
##  [49] htmlwidgets_1.5.2    httr_1.4.2           threejs_0.3.3        arrayhelpers_1.1-0  
##  [53] ellipsis_0.3.1       farver_2.0.3         pkgconfig_2.0.3      loo_2.4.1           
##  [57] dbplyr_2.0.0         utf8_1.1.4           labeling_0.4.2       tidyselect_1.1.0    
##  [61] rlang_0.4.10         reshape2_1.4.4       later_1.1.0.1        munsell_0.5.0       
##  [65] cellranger_1.1.0     tools_4.0.4          cli_2.3.1            generics_0.1.0      
##  [69] broom_0.7.5          ggridges_0.5.2       evaluate_0.14        fastmap_1.0.1       
##  [73] yaml_2.2.1           processx_3.4.5       knitr_1.31           fs_1.5.0            
##  [77] nlme_3.1-152         mime_0.10            projpred_2.0.2       xml2_1.3.2          
##  [81] compiler_4.0.4       bayesplot_1.8.0      shinythemes_1.1.2    rstudioapi_0.13     
##  [85] gamm4_0.2-6          curl_4.3             reprex_0.3.0         statmod_1.4.35      
##  [89] stringi_1.5.3        highr_0.8            ps_1.6.0             blogdown_1.3        
##  [93] Brobdingnag_1.2-6    lattice_0.20-41      Matrix_1.3-2         nloptr_1.2.2.2      
##  [97] markdown_1.1         shinyjs_2.0.0        vctrs_0.3.6          pillar_1.5.1        
## [101] lifecycle_1.0.0      bridgesampling_1.0-0 estimability_1.3     httpuv_1.5.4        
## [105] R6_2.5.0             bookdown_0.21        promises_1.1.1       gridExtra_2.3       
## [109] codetools_0.2-18     boot_1.3-26          colourpicker_1.1.0   MASS_7.3-53         
## [113] gtools_3.8.2         assertthat_0.2.1     withr_2.4.1          shinystan_2.5.0     
## [117] multcomp_1.4-16      mgcv_1.8-33          parallel_4.0.4       hms_0.5.3           
## [121] grid_4.0.4           coda_0.19-4          minqa_1.2.4          rmarkdown_2.7       
## [125] shiny_1.5.0          lubridate_1.7.9.2    base64enc_0.1-3      dygraphs_1.1.1.6

References

Agresti, A. (2015). Foundations of linear and generalized linear models. John Wiley & Sons. https://www.wiley.com/en-us/Foundations+of+Linear+and+Generalized+Linear+Models-p-9781118730034
Bürkner, P.-C. (2021a). Estimating distributional models with brms. https://CRAN.R-project.org/package=brms/vignettes/brms_distreg.html
Bürkner, P.-C. (2021b). Estimating multivariate models with brms. https://CRAN.R-project.org/package=brms/vignettes/brms_multivariate.html
Bürkner, P.-C. (2021c). Estimating non-linear models with brms. https://CRAN.R-project.org/package=brms/vignettes/brms_nonlinear.html
Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411. https://doi.org/10.32614/RJ-2018-017
Bürkner, P.-C. (2020). brms: Bayesian regression models using ’Stan. https://CRAN.R-project.org/package=brms
DeBruine, L. M., & Barr, D. J. (2020). Understanding mixed effects models through data simulation. https://doi.org/10.1177/2515245920965119
Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other stories. Cambridge University Press. https://doi.org/10.1017/9781139161879
Hedeker, D., Mermelstein, R. J., & Demirtas, H. (2008). An application of a mixed-effects location scale model for analysis of ecological momentary assessment (EMA) data. Biometrics, 64(2), 627–634. https://doi.org/10.1111/j.1541-0420.2007.00924.x
Hedeker, D., Mermelstein, R. J., & Demirtas, H. (2012). Modeling between- and within-subject variance in ecological momentary assessment (EMA) data using mixed-effects location scale models. Statistics in Medicine, 31(27). https://doi.org/10.1002/sim.5338
Hoffman, L. (2015). Longitudinal analysis: Modeling within-person fluctuation and change (1 edition). Routledge. https://www.routledge.com/Longitudinal-Analysis-Modeling-Within-Person-Fluctuation-and-Change/Hoffman/p/book/9780415876025
Kay, M. (2020). tidybayes: Tidy data and ’geoms’ for Bayesian models. https://mjskay.github.io/tidybayes/
Kisbu-Sakarya, Y., MacKinnon, D. P., & Aiken, L. S. (2013). A Monte Carlo comparison study of the power of the analysis of covariance, simple difference, and residual change scores in testing two-wave data. Educational and Psychological Measurement, 73(1), 47–62. https://doi.org/10.1177/0013164412450574
Kurz, A. S. (2020). Statistical rethinking with brms, Ggplot2, and the tidyverse: Second edition (version 0.1.1). https://bookdown.org/content/4857/
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (Second Edition). CRC Press. https://xcelab.net/rm/statistical-rethinking/
McElreath, R. (2015). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press. https://xcelab.net/rm/statistical-rethinking/
Newsom, J. T. (2015). Longitudinal structural equation modeling: A comprehensive introduction. Routledge. http://www.longitudinalsem.com/
Pedersen, T. L. (2019). patchwork: The composer of plots. https://CRAN.R-project.org/package=patchwork
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Rogosa, D., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92(3), 726–748. https://doi.org/10.1037/0033-2909.92.3.726
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press, USA. https://oxford.universitypressscholarship.com/view/10.1093/acprof:oso/9780195152968.001.0001/acprof-9780195152968
Walker, J. A. (2018). Elements of statistical modeling for experimental biology ("2020–11th–22" ed.). https://www.middleprofessor.com/files/applied-biostatistics_bookdown/_book/
Wickham, H. (2019). tidyverse: Easily install and load the ’tidyverse’. https://CRAN.R-project.org/package=tidyverse
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Williams, D. R., Liu, S., Martin, S. R., & Rast, P. (2019). Bayesian multivariate mixed-effects location scale modeling of longitudinal relations among affective traits, states, and physical activity. https://doi.org/10.31234/osf.io/4kfjp
Williams, D. R., Martin, S. R., & Rast, P. (2019). Putting the individual into reliability: Bayesian testing of homogeneous within-person variance in hierarchical models. https://doi.org/10.31234/osf.io/hpq7w
Williams, D. R., Rouder, J., & Rast, P. (2019). Beneath the surface: Unearthing within-Person variability and mean relations with Bayesian mixed models. https://doi.org/10.31234/osf.io/gwatq
Williams, D. R., Zimprich, D. R., & Rast, P. (2019). A Bayesian nonlinear mixed-effects location scale model for learning. Behavior Research Methods, 51(5), 1968–1986. https://doi.org/10.3758/s13428-019-01255-9

  1. Throughout this post, my statistical notation will be a blend of sensibilities from Singer & Willett (2003), Hoffman (2015), and McElreath (2020).↩︎

  2. Though I’m no fan of the null-hypothesis significance testing paradigm, it might be helpful to point out if one were to focus on whether zero is within the 95% interval bounds of our \(\beta_0\) parameter, you be viewing this model through the lens of the repeated-measures \(t\)-test. For more on that connection, see Chapter 3 in Newsom (2015).↩︎

  3. There’s some debate over how to think about the repeated measures ANOVA and what its closest multilevel analogue might be. For a nice collection of perspectives, check out this Twitter thread.↩︎

Related