Bayesian power analysis: Part I. Prepare to reject $H_0$ with simulation.

Version 1.1.0

Edited on April 21, 2021, to remove the broom::tidy() portion of the workflow.


If you’d like to learn how to do Bayesian power calculations using brms, stick around for this multi-part blog series. Here with part I, we’ll set the foundation.

Power is hard, especially for Bayesians.

Many journals, funding agencies, and dissertation committees require power calculations for your primary analyses. Frequentists have a variety of tools available to perform these calculations (e.g., here). Bayesians, however, have a more difficult time of it. Most of our research questions and data issues are sufficiently complicated that we cannot solve the problems by hand. We need Markov chain Monte Carlo methods to iteratively sample from the posterior to summarize the parameters from our models. Same deal for power. If you’d like to compute the power for a given combination of \(N\), likelihood \(p(\text{data} | \theta)\), and set of priors \(p (\theta)\), you’ll need to simulate.

It’s been one of my recent career goals to learn how to do this. You know how they say: The best way to learn is to teach. This series of blog posts is the evidence of me learning by teaching. It will be an exploration of what a Bayesian power simulation workflow might look like. The overall statistical framework will be within R (R Core Team, 2020), with an emphasis on code style based on the tidyverse (Wickham, 2019; Wickham et al., 2019). We’ll be fitting our Bayesian models with Bürkner’s brms package (Bürkner, 2017, 2018, 2020a).

What this series is not, however, is an introduction to statistical power itself. Keep reading if you’re ready to roll up your sleeves, put on your applied hat, and learn how to get things done. If you’re more interested in introductions to power, see the references in the next section.

I make assumptions.

For this series, I’m presuming you are familiar with linear regression, familiar with the basic differences between frequentist and Bayesian approaches to statistics, and have a basic sense of what we mean by statistical power. Here are some resources if you’d like to shore up.

  • If you’re unfamiliar with statistical power, Kruschke covered it in chapter 13 of his (2015) text. You might also check out the (2008) review paper by Maxwell, Kelley, and Rausch. There’s always, of course, the original work by Cohen (e.g., Cohen, 1988). You might also like this Khan Academy video.
  • To learn about Bayesian regression, I recommend the introductory text books by either McElreath (2020, 2015) or Kruschke (2015). Both authors host blogs (here and here, respectively). If you go with McElreath, do check out his online lectures and my (2020a, 2020c) ebooks translating his text to brms and tidyverse code. I have an ebook for Kruschke’s text (Kurz, 2020b), too.
  • For even more brms-related resources, you can find vignettes and documentation at
  • For tidyverse introductions, your best bets are Grolemund and Wickham’s (2017) R for data science and Wickham’s (2020) The tidyverse style guide.
  • We’ll be simulating data. If that’s new to you, both Kruschke and McElreath cover that a little in their texts. You can find nice online tutorials here and here, too.
  • We’ll also be making a couple custom functions. If that’s new, you might check out R4DS, chapter 19 or chapter 14 of Roger Peng’s (2019) R Programming for Data Science.

We need to warm up before jumping into power.

Let’s load our primary packages. The tidyverse helps organize data and we model with brms.


Consider a case where you have some dependent variable \(Y\) that you’d like to compare between two groups, which we’ll call treatment and control. Here we presume \(Y\) is continuous and, for the sake of simplicity, is in a standardized metric for the control condition. Letting \(c\) stand for control and \(i\) index the data row for a given case, we might write that as \(y_{i, c} \sim \operatorname{Normal} (0, 1)\). The mean for our treatment condition is 0.5, with the standard deviation still in the standardized metric. In the social sciences a standardized mean difference of 0.5 would typically be considered a medium effect size. Here’s what that’d look like.

# set our theme because, though I love the default ggplot theme, I hate gridlines
theme_set(theme_grey() +
            theme(panel.grid = element_blank()))

# define the means
mu_c <- 0
mu_t <- 0.5

# set up the data
tibble(x = seq(from = -4, to = 5, by = .01)) %>%
  mutate(c = dnorm(x, mean = mu_c, sd = 1),
         t = dnorm(x, mean = mu_t, sd = 1)) %>% 
  # plot
  ggplot(aes(x = x)) +
  geom_area(aes(y = c),
            size = 0, alpha = 1/3, fill = "grey25") +
  geom_area(aes(y = t),
            size = 0, alpha = 1/3, fill = "blue2") +
 annotate(geom = "text",
           x = c(-.5, 1), y = .385,
           label = c("control", "treatment"),
           hjust = 1:0,
           size = 5) +
  scale_x_continuous(NULL, breaks = -4:5) +
  scale_y_continuous(NULL, breaks = NULL) +
  scale_color_manual(values = c("grey25", "blue2"))

Sure, those distributions have a lot of overlap. But their means are clearly different and we’d like to make sure we plan on collecting enough data to do a good job showing that. A power analysis will help.

Within the conventional frequentist paradigm, power is the probability of rejecting the null hypothesis \(H_0\) in favor of the alternative hypothesis \(H_1\), given the alternative hypothesis is “true.” In this case, the typical null hypothesis is

\[H_0\text{: } \mu_c = \mu_t,\]

or put differently,

\[ H_0\text{: } \mu_t - \mu_c = 0. \]

And the alternative hypothesis is often just

\[H_1\text{: } \mu_c \neq \mu_t,\]

or otherwise put,

\[ H_1\text{: } \mu_t - \mu_c \neq 0. \]

Within the regression framework, we’ll be comparing \(\mu\)s using the formula

\[ \begin{align*} y_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 \text{treatment}_i, \end{align*} \]

where \(\text{treatment}\) is a dummy variable coded 0 = control 1 = treatment and varies across cases indexed by \(i\). In this setup, \(\beta_0\) is the estimate for \(\mu_c\) and \(\beta_1\) is the estimate of the difference between condition means, \(\mu_t - \mu_c\). Thus our focal parameter, the one we care about the most in our power analysis, will be \(\beta_1\).

Within the frequentist paradigm, we typically compare these hypotheses using a \(p\)-value for \(H_0\) with the critical value, \(\alpha\), set to .05. Thus, power is the probability we’ll have \(p < .05\) when it is indeed the case that \(\mu_c \neq \mu_t\). We won’t be computing \(p\)-values in this project, but we will use 95% intervals. Recall that the result of a Bayesian analysis, the posterior distribution, is the probability of the parameters, given the data \(p (\theta | \text{data})\). With our 95% Bayesian credible intervals, we’ll be able to describe the parameter space over which our estimate of \(\mu_t - \mu_c\) is 95% probable. That is, for our power analysis, we’re interested in the probability our 95% credible intervals for \(\beta_1\) contain zero within their bounds when we know a priori \(\mu_c \neq \mu_t\).

The reason we know \(\mu_c \neq \mu_t\) is because we’ll be simulating the data that way. What our power analysis will help us determine is how many cases we’ll need to achieve a predetermined level of power. The conventional threshold is .8.

Dry run number 1.

To make this all concrete, let’s start with a simple example. We’ll simulate a single set of data, fit a Bayesian regression model, and examine the results for the critical parameter \(\beta_1\). For the sake of simplicity, let’s keep our two groups, treatment and control, the same size. We’ll start with \(n = 50\) for each.

n <- 50

We already decided above that

\[ \begin{align*} y_{i, c} & \sim \operatorname{Normal}(0, 1) \text{ and} \\ y_{i, t} & \sim \operatorname{Normal}(0.5, 1). \end{align*} \]

Here’s how we might simulate data along those lines.


d <-
  tibble(group = rep(c("control", "treatment"), each = n)) %>% 
  mutate(treatment = ifelse(group == "control", 0, 1),
         y         = ifelse(group == "control", 
                            rnorm(n, mean = mu_c, sd = 1),
                            rnorm(n, mean = mu_t, sd = 1)))

## Rows: 100
## Columns: 3
## $ group     <chr> "control", "control", "control", "control", "control", "cont…
## $ treatment <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ y         <dbl> -0.62645381, 0.18364332, -0.83562861, 1.59528080, 0.32950777…

In case it wasn’t clear, the two variables group and treatment are redundant. Whereas the former is composed of names, the latter is the dummy-variable equivalent (i.e., control = 0, treatment = 1). The main event was how we used the rnorm() function to simulate the normally-distributed values for y.

Before we fit our model, we need to decide on priors. To give us ideas, here are the brms defaults for our model and data.

get_prior(data = d,
          family = gaussian,
          y ~ 0 + Intercept + treatment)
##                 prior class      coef group resp dpar nlpar bound       source
##                (flat)     b                                            default
##                (flat)     b Intercept                             (vectorized)
##                (flat)     b treatment                             (vectorized)
##  student_t(3, 0, 2.5) sigma                                            default

A few things: Notice that here we’re using the 0 + Intercept syntax. This is because brms handles the priors for the default intercept under the presumption you’ve mean-centered all your predictor variables. However, since our treatment variable is a dummy, that assumption won’t fly. The 0 + Intercept allows us to treat the model intercept as just another \(\beta\) parameter, which makes no assumptions about centering. Along those lines, you’ll notice brms currently defaults to flat priors for the \(\beta\) parameters (i.e., those for which class = b). And finally, the default prior on \(\sigma\) is moderately wide student_t(3, 0, 2.5). By default, brms also sets the left bounds for \(\sigma\) parameters at zero, making that a folded-\(t\) distribution. If you’re confused by these details, spend some time with the brms reference manual (Bürkner, 2020b), particularly the brm and brmsformula sections.

In this project, we’ll be primarily using two kinds of priors: default flat priors and weakly-regularizing priors. Hopefully flat priors are self-explanatory. They let the likelihood (data) dominate the posterior and tend to produce results similar to those from frequentist estimators.

As for weakly-regularizing priors, McElreath covered them in his text. They’re mentioned a bit in the Stan team’s Prior Choice Recommendations wiki, and you can learn even more from Gelman, Simpson, and Betancourt’s (2017) The prior can only be understood in the context of the likelihood. These priors aren’t strongly informative and aren’t really representative of our research hypotheses. But they’re not as absurd as flat priors, either. Rather, with just a little bit of knowledge about the data, these priors are set to keep the MCMC chains on target. Since our y variable has a mean near zero and a standard deviation near one and since our sole predictor, treatment is a dummy, setting \(\operatorname{Normal}(0, 2)\) as the prior for both \(\beta\) parameters might be a good place to start. The prior is permissive enough that it will let likelihood dominate the posterior, but it also rules out ridiculous parts of the parameter space (e.g., a standardized mean difference of 20, an intercept of -93). And since we know the data are on the unit scale, we might just center our folded-Student-\(t\) prior on one and add a gentle scale setting of one.

Feel free to disagree and use your own priors. The great thing about priors is that they can be proposed, defended, criticized and improved. The point is to settle on the priors you can defend with written reasons. Select ones you’d feel comfortable defending to a skeptical reviewer.

Here’s how we might fit the model.

fit <-
  brm(data = d,
      family = gaussian,
      y ~ 0 + Intercept + treatment,
      prior = c(prior(normal(0, 2), class = b),
                prior(student_t(3, 1, 1), class = sigma)),
      seed = 1)

Before we look at the summary, we might check the chains in a trace plot. We’re looking for “stuck” chains that don’t appear to come from a normal distribution (the chains are a profile-like view rather than histogram, allowing for inspection of dependence between samples).


Yep, the chains all look good. Here’s the parameter summary.

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ 0 + Intercept + treatment 
##    Data: d (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.10      0.13    -0.16     0.36 1.00     2094     2048
## treatment     0.51      0.18     0.16     0.90 1.00     2079     1989
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.92      0.07     0.80     1.06 1.00     2684     2122
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The 95% credible intervals for our \(\beta_1\) parameter, termed treatment in the output, are well above zero.

Another way to look at the parameter summary is with the brms::fixef() function.

##            Estimate Est.Error       Q2.5     Q97.5
## Intercept 0.1033326 0.1312632 -0.1556192 0.3640170
## treatment 0.5142275 0.1847559  0.1587896 0.8976941

You can reuse a fit.

Especially with simple models like this, a lot of the time we spend waiting for brms::brm() to return the model is wrapped up in compilation. This is because brms is a collection of user-friendly functions designed to fit models with Stan (Stan Development Team, 2020, 2021a, 2021b). With each new model, brm() translates your model into Stan code, which then gets translated to C++ and is compiled afterwards (see here or here). However, we can use the update() function to update a previously-compiled fit object with new data. This cuts out the compilation time and allows us to get directly to sampling. Here’s how to do it.

# set a new seed

# simulate new data based on that new seed
d <-
  tibble(group = rep(c("control", "treatment"), each = n)) %>% 
  mutate(treatment = ifelse(group == "control", 0, 1),
         y         = ifelse(group == "control", 
                            rnorm(n, mean = mu_c, sd = 1),
                            rnorm(n, mean = mu_t, sd = 1)))

updated_fit <-
         newdata = d,
         seed = 2)

Behold the fixef() parameter summary for our updated model.

##             Estimate Est.Error       Q2.5     Q97.5
## Intercept 0.06410605 0.1692172 -0.2700568 0.3995847
## treatment 0.30654690 0.2387028 -0.1434469 0.7815116

Well how about that? In this case, our 95% credible intervals for the \(\beta_1\) treatment coefficient did include zero within their bounds. Though the posterior mean, 0.30, is still well away from zero, here we’d fail to reject \(H_0\) at the conventional level. This is why we simulate.

To recap, we’ve

  1. determined our primary data type,
  2. cast our research question in terms of a regression model,
  3. identified the parameter of interest,
  4. settled on defensible priors,
  5. picked an initial sample size,
  6. fit an initial model with a single simulated data set, and
  7. practiced reusing that fit with update().

We’re more than half way there! It’s time to do our first power simulation.

Simulate to determine power.

In this post, we’ll play with three ways to do a Bayesian power simulation. They’ll all be similar, but hopefully you’ll learn a bit as we transition from one to the next. Though if you’re impatient and all this seems remedial, you could probably just skip down to the final example, Version 3.

Version 1: Let’s introduce making a custom model-fitting function.

For our power analysis, we’ll need to simulate a large number of data sets, each of which we’ll fit a model to. Here we’ll make a custom function, sim_d(), that will simulate new data sets just like before. Our function will have two parameters: we’ll set our seeds with seed and determine how many cases we’d like per group with n.

sim_d <- function(seed, n) {
  mu_t <- .5
  mu_c <- 0

  tibble(group = rep(c("control", "treatment"), each = n)) %>% 
  mutate(treatment = ifelse(group == "control", 0, 1),
         y         = ifelse(group == "control", 
                            rnorm(n, mean = mu_c, sd = 1),
                            rnorm(n, mean = mu_t, sd = 1)))

Here’s a quick example of how our function works.

sim_d(seed = 123, n = 2)
## # A tibble: 4 x 3
##   group     treatment      y
##   <chr>         <dbl>  <dbl>
## 1 control           0 -0.560
## 2 control           0 -0.230
## 3 treatment         1  2.06 
## 4 treatment         1  0.571

Now we’re ready to get down to business. We’re going to be saving our simulation results in a nested data frame, s. Initially, s will have one column of seed values. These will serve a dual function. First, they are the values we’ll be feeding into the seed argument of our custom data-generating function, sim_d(). Second, since the seed values serially increase, they also stand in as iteration indexes.

For our second step, we add the data simulations and save them in a nested column, d. In the first argument of the purrr::map() function, we indicate we want to iterate over the values in seed. In the second argument, we indicate we want to serially plug those seed values into the first argument within the sim_d() function. That argument, recall, is the well-named seed argument. With the final argument in map(), n = 50, we hard code 50 into the n argument of sim_d().

For the third step, we expand our purrr::map() skills from above to purrr::map2(), which allows us to iteratively insert two arguments into a function. Within this paradigm, the two arguments are generically termed .x and .y. Thus our approach will be .x = d, .y = seed. For our function, we specify ~update(fit, newdata = .x, seed = .y). Thus we’ll be iteratively inserting our simulated d data into the newdata argument and will be simultaneously inserting our seed values into the seed argument.

Also notice that the number of iterations we’ll be working with is determined by the number of rows in the seed column. We are defining that number as n_sim. Since this is just a blog post, I’m going to take it easy and use 100. But if this was a real power analysis for one of your projects, something like 1,000 would be better.

Finally, you don’t have to do this, but I’m timing my simulation by saving Sys.time() values at the beginning and end of the simulation.

# how many simulations would you like?
n_sim <- 100

# this will help us track time
t1 <- Sys.time()

# here's the main event!
s <-
  tibble(seed = 1:n_sim) %>% 
  mutate(d = map(seed, sim_d, n = 50)) %>% 
  mutate(fit = map2(d, seed, ~update(fit, newdata = .x, seed = .y)))

t2 <- Sys.time()

The entire simulation took just about a minute on my new laptop.

t2 - t1
## Time difference of 44.20048 secs

Your mileage may vary.

Let’s take a look at what we’ve done.

## # A tibble: 6 x 3
##    seed d                  fit      
##   <int> <list>             <list>   
## 1     1 <tibble [100 × 3]> <brmsfit>
## 2     2 <tibble [100 × 3]> <brmsfit>
## 3     3 <tibble [100 × 3]> <brmsfit>
## 4     4 <tibble [100 × 3]> <brmsfit>
## 5     5 <tibble [100 × 3]> <brmsfit>
## 6     6 <tibble [100 × 3]> <brmsfit>

In our 100-row nested tibble, we have all our simulated data sets in the d column and all of our brms fit objects nested in the fit column. Next we’ll use fixef() and a little wrangling to extract the parameter of interest, treatment (i.e., \(\beta_1\)), from each simulation. We’ll save the results as parameters.

parameters <-
  s %>% 
  mutate(treatment = map(fit, ~ fixef(.) %>% 
                           data.frame() %>% 
                           rownames_to_column("parameter"))) %>% 

parameters %>% 
  select(-d, -fit) %>% 
  filter(parameter == "treatment") %>% 
## # A tibble: 6 x 6
##    seed parameter Estimate Est.Error    Q2.5 Q97.5
##   <int> <chr>        <dbl>     <dbl>   <dbl> <dbl>
## 1     1 treatment    0.514     0.185  0.159  0.898
## 2     2 treatment    0.307     0.239 -0.143  0.782
## 3     3 treatment    0.643     0.171  0.310  0.975
## 4     4 treatment    0.224     0.182 -0.128  0.574
## 5     5 treatment    0.429     0.189  0.0596 0.792
## 6     6 treatment    0.304     0.208 -0.114  0.711

As an aside, I know I’m moving kinda fast with all this wacky purrr::map()/purrr::map2() stuff. If you’re new to using the tidyverse for iterating and saving the results in nested data structures, I recommend fixing an adult beverage and cozying up with Hadley Wickham’s presentation, Managing many models. And if you really hate it, both Kruschke and McElreath texts contain many examples of how to iterate in a more base R sort of way.

Anyway, here’s what those 100 \(\beta_1\) summaries look like in bulk.

parameters %>% 
  filter(parameter == "treatment") %>% 
  ggplot(aes(x = seed, y = Estimate, ymin = Q2.5, ymax = Q97.5)) +
  geom_hline(yintercept = c(0, .5), color = "white") +
  geom_pointrange(fatten = 1/2) +
  labs(x = "seed (i.e., simulation index)",
       y = expression(beta[1]))

The horizontal lines show the idealized effect size (0.5) and the null hypothesis (0). Already, it’s apparent that most of our intervals indicate there’s more than a 95% probability the null hypothesis is not credible. Several do. Here’s how to quantify that.

parameters %>% 
  filter(parameter == "treatment") %>% 
  mutate(check = ifelse(Q2.5 > 0, 1, 0)) %>% 
  summarise(power = mean(check))
## # A tibble: 1 x 1
##   power
##   <dbl>
## 1  0.66

With the second mutate() line, we used a logical statement within ifelse() to code all instances where the lower limit of the 95% interval (Q2.5) was greater than 0 as a 1, with the rest as 0. That left us with a vector of 1’s and 0’s, which we saved as check. In the summarise() line, we took the mean of that column, which returned our Bayesian power estimate.

That is, in 66 of our 100 simulations, an \(n = 50\) per group was enough to produce a 95% Bayesian credible interval that did not straddle 0.

I should probably point out that a 95% interval for which Q97.5 < 0 would have also been consistent with the alternative hypothesis of \(\mu_c \neq \mu_t\). However, I didn’t bother to work that option into the definition of our check variable because I knew from the outset that that would be a highly unlikely result. But if you’d like to work more rigor into your checks, by all means do.

And if you’ve gotten this far and have been following along with code of your own, congratulations! You did it! You’ve estimated the power of a Bayesian model with a given \(n\). Now let’s refine our approach.

Version 2: We might should be more careful with memory.

I really like it that our s object contains all our brm() fits. It makes it really handy to do global diagnostics like making sure our \(\widehat R\) values are all within a respectable range.

s %>% 
  mutate(rhat = map(fit, rhat)) %>% 
  unnest(rhat) %>% 
  ggplot(aes(x = rhat)) +
  geom_histogram(bins = 20)

Man those \(\widehat R\) values look sweet. It’s great to have a workflow that lets you check them. But holding on to all those fits can take up a lot of memory. If the only thing you’re interested in are the parameter summaries, a better approach might be to do the model refitting and parameter extraction in one step. That way you only save the parameter summaries. Here’s how you might do that.

t3 <- Sys.time()

s2 <-
  tibble(seed = 1:n_sim) %>% 
  mutate(d = map(seed, sim_d, n = 50)) %>% 
  # here's the new part
  mutate(b1 = map2(d, seed, ~update(fit, newdata = .x, seed = .y) %>% 
                     fixef() %>% 
                     data.frame() %>% 
                     rownames_to_column("parameter") %>% 
                     filter(parameter == "treatment")))

t4 <- Sys.time()

Like before, this only about a minute.

t4 - t3
## Time difference of 48.12622 secs

As a point of comparison, here are the sizes of the results from our first approach to those from the second.

## 79822320 bytes
## 503120 bytes

That’s a big difference. Hopefully you get the idea. With more complicated models and 10+ times the number of simulations, size will eventually matter.

Anyway, here are the results.

s2 %>% 
  unnest(b1) %>% 

  ggplot(aes(x = seed, y = Estimate, ymin = Q2.5, ymax = Q97.5)) +
  geom_hline(yintercept = c(0, .5), color = "white") +
  geom_pointrange(fatten = 1/2) +
  labs(x = "seed (i.e., simulation index)",
       y = expression(beta[1]))

Same parameter summaries, lower memory burden.

Version 3: Still talking about memory, we can be even stingier.

So far, both of our simulation attempts resulted in our saving the simulated data sets. It’s a really nice option if you ever want to go back and take a look at those simulated data. For example, you might want to inspect a random subset of the data simulations with box plots.


s2 %>% 
  sample_n(12) %>% 
  unnest(d) %>% 
  ggplot(aes(x = group, y = y)) +
  geom_boxplot(aes(fill = group), 
               alpha = 2/3, show.legend = F) +
  scale_fill_manual(values = c("grey25", "blue2")) +
  xlab(NULL) +
  facet_wrap(~ seed)

In this case, it’s no big deal if we keep the data around or not. The data sets are fairly small and we’re only simulating 100 of them. But in cases where the data are larger and you’re doing thousands of simulations, keeping the data could become a memory drain.

If you’re willing to forgo the luxury of inspecting your data simulations, it might make sense to run our power analysis in a way that avoids saving them. One way to do so would be to just wrap the data simulation and model fitting all in one function. We’ll call it sim_d_and_fit().

sim_d_and_fit <- function(seed, n) {
  mu_t <- .5
  mu_c <- 0
  d <-
    tibble(group     = rep(c("control", "treatment"), each = n)) %>% 
    mutate(treatment = ifelse(group == "control", 0, 1),
           y         = ifelse(group == "control", 
                              rnorm(n, mean = mu_c, sd = 1),
                              rnorm(n, mean = mu_t, sd = 1)))
         newdata = d, 
         seed = seed) %>% 
    fixef() %>% 
    data.frame() %>% 
    rownames_to_column("parameter") %>% 
    filter(parameter == "treatment")

Now iterate 100 times once more.

t5 <- Sys.time()

s3 <-
  tibble(seed = 1:n_sim) %>% 
  mutate(b1 = map(seed, sim_d_and_fit, n = 50)) %>% 

t6 <- Sys.time()

That was pretty quick.

t6 - t5
## Time difference of 48.48654 secs

Here’s what it returned.

## # A tibble: 6 x 6
##    seed parameter Estimate Est.Error    Q2.5 Q97.5
##   <int> <chr>        <dbl>     <dbl>   <dbl> <dbl>
## 1     1 treatment    0.514     0.185  0.159  0.898
## 2     2 treatment    0.307     0.239 -0.143  0.782
## 3     3 treatment    0.643     0.171  0.310  0.975
## 4     4 treatment    0.224     0.182 -0.128  0.574
## 5     5 treatment    0.429     0.189  0.0596 0.792
## 6     6 treatment    0.304     0.208 -0.114  0.711

By wrapping our data simulation, model fitting, and parameter extraction steps all in one function, we simplified the output such that we’re no longer holding on to the data simulations or the brms fit objects. We just have the parameter summaries and the seed, making the product even smaller.

tibble(object = c("s", "s2", "s3")) %>% 
  mutate(bytes = map_dbl(object, ~get(.) %>% object.size()))
## # A tibble: 3 x 2
##   object    bytes
##   <chr>     <dbl>
## 1 s      79822320
## 2 s2       503120
## 3 s3         5952

But the primary results are the same.

s3 %>% 
  ggplot(aes(x = seed, y = Estimate, ymin = Q2.5, ymax = Q97.5)) +
  geom_hline(yintercept = c(0, .5), color = "white") +
  geom_pointrange(fatten = 1/2) +
  labs(x = "seed (i.e., simulation index)",
       y = expression(beta[1]))

We still get the same power estimate, too.

s3 %>% 
  mutate(check = ifelse(Q2.5 > 0, 1, 0)) %>% 
  summarise(power = mean(check))
## # A tibble: 1 x 1
##   power
##   <dbl>
## 1  0.66

Next steps

But my goal was to figure out what \(n\) will get me power of .8 or more!, you say. Fair enough. Try increasing n to 65 or something.

If that seems unsatisfying, welcome to the world of simulation. Since our Bayesian models are complicated, we don’t have the luxury of plugging a few values into some quick power formula. Just as simulation is an iterative process, determining on the right values to simulate over might well be an iterative process, too.


Anyway, that’s the essence of the brms/tidyverse workflow for Bayesian power analysis. You follow these steps:

  1. Determine your primary data type.
  2. Determine your primary regression model and parameter(s) of interest.
  3. Pick defensible priors for all parameters–the kinds of priors you intend to use once you have the real data in hand.
  4. Select a sample size.
  5. Fit an initial model and save the fit object.
  6. Simulate some large number of data sets all following your prechosen form and use the update() function to iteratively fit the models.
  7. Extract the parameter(s) of interest.
  8. Summarize.

In addition, we played with a few approaches based on logistical concerns like memory. In the next post, part II, we’ll see how the precision-oriented approach to sample-size planning is a viable alternative to power focused on rejecting null hypotheses.

I had help.

Special thanks to Christopher Peters (@statwonk) for the helpful edits and suggestions.

Session info

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## other attached packages:
##  [1] brms_2.15.0     Rcpp_1.0.6      forcats_0.5.1   stringr_1.4.0  
##  [5] dplyr_1.0.5     purrr_0.3.4     readr_1.4.0     tidyr_1.1.3    
##  [9] tibble_3.1.0    ggplot2_3.3.3   tidyverse_1.3.0
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1         backports_1.2.1      plyr_1.8.6          
##   [4] igraph_1.2.6         splines_4.0.4        crosstalk_1.1.0.1   
##   [7] TH.data_1.0-10       rstantools_2.1.1     inline_0.3.17       
##  [10] digest_0.6.27        htmltools_0.5.1.1    rsconnect_0.8.16    
##  [13] fansi_0.4.2          magrittr_2.0.1       modelr_0.1.8        
##  [16] RcppParallel_5.0.2   matrixStats_0.57.0   xts_0.12.1          
##  [19] sandwich_3.0-0       prettyunits_1.1.1    colorspace_2.0-0    
##  [22] rvest_0.3.6          haven_2.3.1          xfun_0.22           
##  [25] callr_3.5.1          crayon_1.4.1         jsonlite_1.7.2      
##  [28] lme4_1.1-25          survival_3.2-10      zoo_1.8-8           
##  [31] glue_1.4.2           gtable_0.3.0         emmeans_1.5.2-1     
##  [34] V8_3.4.0             pkgbuild_1.2.0       rstan_2.21.2        
##  [37] abind_1.4-5          scales_1.1.1         mvtnorm_1.1-1       
##  [40] DBI_1.1.0            miniUI_0.1.1.1       xtable_1.8-4        
##  [43] stats4_4.0.4         StanHeaders_2.21.0-7 DT_0.16             
##  [46] htmlwidgets_1.5.2    httr_1.4.2           threejs_0.3.3       
##  [49] ellipsis_0.3.1       pkgconfig_2.0.3      loo_2.4.1           
##  [52] farver_2.0.3         dbplyr_2.0.0         utf8_1.1.4          
##  [55] tidyselect_1.1.0     labeling_0.4.2       rlang_0.4.10        
##  [58] reshape2_1.4.4       later_1.1.0.1        munsell_0.5.0       
##  [61] cellranger_1.1.0     tools_4.0.4          cli_2.3.1           
##  [64] generics_0.1.0       broom_0.7.5          ggridges_0.5.2      
##  [67] evaluate_0.14        fastmap_1.0.1        yaml_2.2.1          
##  [70] processx_3.4.5       knitr_1.31           fs_1.5.0            
##  [73] nlme_3.1-152         mime_0.10            projpred_2.0.2      
##  [76] xml2_1.3.2           compiler_4.0.4       bayesplot_1.8.0     
##  [79] shinythemes_1.1.2    rstudioapi_0.13      gamm4_0.2-6         
##  [82] curl_4.3             reprex_0.3.0         statmod_1.4.35      
##  [85] stringi_1.5.3        highr_0.8            ps_1.6.0            
##  [88] blogdown_1.3         Brobdingnag_1.2-6    lattice_0.20-41     
##  [91] Matrix_1.3-2         nloptr_1.2.2.2       markdown_1.1        
##  [94] shinyjs_2.0.0        vctrs_0.3.6          pillar_1.5.1        
##  [97] lifecycle_1.0.0      bridgesampling_1.0-0 estimability_1.3    
## [100] httpuv_1.5.4         R6_2.5.0             bookdown_0.21       
## [103] promises_1.1.1       gridExtra_2.3        codetools_0.2-18    
## [106] boot_1.3-26          colourpicker_1.1.0   MASS_7.3-53         
## [109] gtools_3.8.2         assertthat_0.2.1     withr_2.4.1         
## [112] shinystan_2.5.0      multcomp_1.4-16      mgcv_1.8-33         
## [115] parallel_4.0.4       hms_0.5.3            grid_4.0.4          
## [118] coda_0.19-4          minqa_1.2.4          rmarkdown_2.7       
## [121] shiny_1.5.0          lubridate_1.7.9.2    base64enc_0.1-3     
## [124] dygraphs_1.1.1.6
# for the hard-core scrollers:
# if you increase n to 65, the power becomes about .85
n_sim <- 100

t7 <- Sys.time()

s4 <-
  tibble(seed = 1:n_sim) %>% 
  mutate(b1 = map(seed, sim_d_and_fit, n = 65))

t8 <- Sys.time()

t8 - t7


s4 %>% 
  unnest(b1) %>% 
  mutate(check = ifelse(Q2.5 > 0, 1, 0)) %>% 
  summarise(power = mean(check))


Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.
Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411.
Bürkner, P.-C. (2020a). brms: Bayesian regression models using ’Stan.
Bürkner, P.-C. (2020b). brms reference manual, Version 2.14.4.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. L. Erlbaum Associates.
Gelman, A., Simpson, D., & Betancourt, M. (2017). The prior can often only be understood in the context of the likelihood. Entropy, 19(10), 555.
Grolemund, G., & Wickham, H. (2017). R for data science. O’Reilly.
Kruschke, J. K. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press.
Kurz, A. S. (2020a). Statistical rethinking with brms, ggplot2, and the tidyverse (version 1.2.0).
Kurz, A. S. (2020b). Doing Bayesian data analysis in brms and the tidyverse (version 0.3.0).
Kurz, A. S. (2020c). Statistical rethinking with brms, Ggplot2, and the tidyverse: Second edition (version 0.1.1).
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59(1), 537–563.
McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (Second Edition). CRC Press.
McElreath, R. (2015). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press.
Peng, R. D. (2019). R programming for data science.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Stan Development Team. (2020). RStan: The R interface to Stan.
Stan Development Team. (2021a). Stan reference manual, Version 2.26.
Stan Development Team. (2021b). Stan user’s guide, Version 2.26.
Wickham, H. (2019). tidyverse: Easily install and load the ’tidyverse’.
Wickham, H. (2020). The tidyverse style guide.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686.