tidyverse

Causal inference with beta regression

In this ninth post of the causal inference + GLM series, we explore the beta likelihood for continuous data restricted within the range of 0 to 1.

Causal inference with change scores

In this post, we explore how we can frame our causal inferences in terms of change from baseline, and how this may or may not involve change scores.

Causal inference with ordinal regression

In this seventh post of the causal inference series, we apply our approach to ordinal models. Ordinal models make causal inference tricky, and it’s not entirely clear what the causal estimand should even be. We explore two of the estimands that have been proposed in the literature, and I offer a third estimand of my own.

Causal inference with gamma regression or: The problem is with the link function, not the likelihood

So far the difficulties we have seen with covaraites, causal inference, and the GLM have all been restricted to discrete models (e.g., binomial, Poisson, negative binomial). In this sixth post of the series, we’ll see this issue can extend to models for continuous data, too. As it turns out, it may have less to do with the likelihood function, and more to do with the choice of link function. To highlight the point, we’ll compare Gaussian and gamma models, with both the identity and log links.

Causal inference with count regression

In this fifth post of the causal inference series, we practice with Poisson and negative-binomial models for unbounded count data. Since I’m a glutton for punishment, we practice both as frequentists and as Bayesians. You’ll find a little robust sandwich-based standard error talk, too.

Causal inference with Bayesian models

In this fourth post, we refit the models from the previous posts with Bayesian software, and show how to compute our primary estimates when working with posterior draws. The content will be very light on theory, and heavy on methods. So if you don’t love that Bayes, you can feel free to skip this one.

Causal inference with logistic regression

In this third post of the causal inference series, we switch to a binary outcome variable. As we will see, some of the nice qualities from the OLS paradigm fall apart when we want to make causal inferences with binomial models.

Causal inference with potential outcomes bootcamp

In this second post, we learn how the potential outcomes framework can help us connect our regression models to estimands from the contemporary causal inference literature. We start with simple OLS-based models. In future posts, we’ll expand to other models from the GLM.

Boost your power with baseline covariates

This is the first post in a series on causal inference. Our ultimate goal is to learn how to analyze data from true experiments, such as RCT’s, with various likelihoods from the generalized linear model (GLM), and with techniques from the contemporary causal inference literature. In this post, we review how baseline covariates help us compare our experimental conditions.

Sum-score effect sizes for multilevel Bayesian cumulative probit models

This is a follow-up to my earlier post, Notes on the Bayesian cumulative probit. This time, the topic we’re addressing is: After you fit a full multilevel Bayesian cumulative probit model of several Likert-type items from a multi-item questionnaire, how can you use the model to compute an effect size in the sum-score metric?

Just use multilevel models for your pre/post RCT data

I’ve been thinking a lot about how to analyze pre/post control group designs, lately. Happily, others have thought a lot about this topic, too. The goal of this post is to introduce the change-score and ANCOVA models, introduce their multilevel-model counterparts, and compare their behavior in a couple quick simulation studies. Spoiler alert: The multilevel variant of the ANCOVA model is the winner.

Example power analysis report, II

In an earlier post, I gave an example of what a power analysis report could look like for a multilevel model. At my day job, I was recently asked for a rush-job power analysis that required a multilevel model of a different kind and it seemed like a good opportunity to share another example.

Notes on the Bayesian cumulative probit

In this post, I have reformatted my personal notes into something of a tutorial on the Bayesian cumulative probit model. Using a single psychometric data set, we explore a variety of models, starting with the simplest single-level thresholds-only model and ending with a conditional multilevel distributional model.

Use emmeans() to include 95% CIs around your lme4-based fitted lines

You’re an R user and just fit a nice multilevel model to some grouped data and you’d like to showcase the results in a plot. In your plots, it would be ideal to express the model uncertainty with 95% interval bands. If you’re a frequentist and like using the popular lme4 package, you might be surprised how difficult it is to get those 95% intervals. I recently stumbled upon a solution with the emmeans package, and the purpose of this blog post is to show you how it works.

Conditional logistic models with brms: Rough draft.

After tremendous help from Henrik Singmann and Mattan Ben-Shachar, I finally have two (!) workflows for conditional logistic models with brms. These workflows are on track to make it into the next update of my ebook translation of Kruschke’s text. But these models are new to me and I’m not entirely confident I’ve walked them out properly. The goal of this blog post is to present a draft of my workflow, which will eventually make it’s way into Chapter 22 of the ebook.

If you fit a model with multiply imputed data, you can still plot the line.

If you’re an R user and like multiple imputation for missing data, you probably know all about the mice package. The bummer is there are no built-in ways to plot the fitted lines from models fit from multiply-imputed data using van Buuren’s mice-oriented workflow. However, there is a way to plot your fitted lines by hand and in this blog post I’ll show you how.

Sexy up your logistic regression model with logit dotplots

The major shortcoming in typical logistic regression line plots is they usually don’t show the data due to overplottong across the y-axis. Happily, new developments with Matthew Kay’s ggdist package make it easy to show your data when you plot your logistic regression curves. In this post I’ll show you how.

One-step Bayesian imputation when you have dropout in your RCT

Say you have 2-timepoint RCT, where participants received either treatment or control. Even in the best of scenarios, you’ll probably have some dropout in those post-treatment data. To get the full benefit of your data, you can use one-step Bayesian imputation when you compute your effect sizes. In this post, I’ll show you how.

Example power analysis report

If you plan to analyze your data with anything more complicated than a t-test, the power analysis phase gets tricky. I’m willing to bet that most applied researchers have never done a power analysis for a multilevel model and probably have never seen what one might look like, either. The purpose of this post is to give a real-world example of just such an analysis.

Don’t forget your inits

When your MCMC chains look a mess, you might have to manually set your initial values. If you’re a fancy pants, you can use a custom function.

Effect sizes for experimental trials analyzed with multilevel growth models: Two of two

This post is the second of a two-part series. In the first post, we explored how one might compute an effect size for two-group experimental data with only 2 time points. In this second post, we fulfill our goal to show how to generalize this framework to experimental data collected over 3+ time points. The data and overall framework come from Feingold (2009).

Regression models for 2-timepoint non-experimental data

I recently came across Jeffrey Walker’s free text, Elements of statistical modeling for experimental biology, which contains a nice chapter on 2-timepoint experimental designs. Inspired by his work, this post aims to explore how one might analyze non-experimental 2-timepoint data within a regression model paradigm.

Multilevel models and the index-variable approach

PhD candidate Huaiyu Liu recently reached out with a question about how to analyze clustered data. Liu’s basic setup was an experiment with four conditions. The dependent variable was binary, where success = 1, fail = 0. Each participant completed multiple trials under each of the four conditions. The catch was Liu wanted to model those four conditions with a multilevel model using the index-variable approach McElreath advocated for in the second edition of his text. Like any good question, this one got my gears turning. Thanks, Liu! The purpose of this post will be to show how to model data like this two different ways.

bookdown, My Process

The purpose of this post is to give readers a sense of how I used bookdown to make my first ebooks. I propose there are three fundamental skill sets you need basic fluency in before playing with bookdown: (a) R and R Studio, (b) scripts and R Markdown files, and (c) Git and GitHub.