31 Averaging models with AIC

This lab will guide you through averaging models with AIC. You can download the template .qmd file for this lab here:

The template will prompt you to work through the following sections.

31.1 Overall goal

We want to understand how to average models with AIC, and then we also want to see for ourselves why model averaging is useful for prediction. So we will build a hand full of models, average them, and then compare the predictive abilities of individual models against the predictive ability of the averaged model. For all these goals we will split our real data in half and use one half for training our models, and the other half for testing them.

We will return to the question of what drives the proportion of non-native trees in a plot—something that would be very useful for land managers to be able to predict! To model proportions we will again be using our old friend the binomial GLM.

31.2 Preparing the data

We need to join the tree survey, climate, and geo/human impact data.frames and then use the split-apply-combine workflow to calculate plot-level data. The plot-level data we will need are:

all explanatory variables (hint: use across)
number of non-native species
number of native species

For ease of working together, let’s all name our final, combined data object dat_non_nat.

Once those steps are completed, we’ll split our data into training and testing sets. The key is, we want these training and testing sets to be random samples from the data. For random samples, we use the function sample. We are sampling rows, so this is how it will work

# be sure to switch `eval` to `true` in this chunk

# size of half the data
n_subsamp <- round(nrow(dat_non_nat) / 2)

# random indexes to be used for sampling
rand_idx <- sample(nrow(dat_non_nat), n_subsamp)

# extract the training data
dat_train <- dat_non_nat[rand_idx, ]

# the test data is everything left
dat_test <- dat_non_nat[-rand_idx, ]

31.3 Making models and averaging them

Now let’s make three different models based on genuine hypotheses about what processes might affect the proportion of non-native species in a plot. Again, for simplicity of working together, let’s name those models mod1, mod2, and mod3, even though each of us might have different explanatory variables in the models.

As a refresher, for making a binomial GLM when we know the number of “successes” and “failures”, we use this syntax

# `x` is just a place holder for a real explanatory variable
mod1 <- glm(cbind(n_non, n_nat) ~ x, data = dat_train, 
            family = binomial)

Notice that we are using dat_train! We are only fitting these models on the training data so we can then see how good (or bad) they predict data they have never seen before—dat_test.

Once your three models are built, let’s average them. We’ll use model.avg from the MuMIn package:

mod_avg <- model.avg(list(mod1, mod2, mod3))

For the sake of really understanding how the model averaging works, let’s calculate by hand the Akaike weights and compare to what model.avg gave us. The equation for weights is

\[ w_i = \frac{e^{-\Delta\text{AIC}_i / 2}}{\sum_j e^{-\Delta\text{AIC}_j / 2}} \]

where \(\Delta\text{AIC}_i = \text{AIC}(\text{model}_i) - \min(\text{AIC})\).

When calculating AIC, we might as well use AICc from MuMIn so our comparison with the output of model.avg will be equivalent.

Next let’s use the weights to calculate averaged coefficients for the explanatory variables we used in our models. As an example, if explanatory variable x showed up in models mod1 and mod2, its averaged coefficient would be

# assume w1 is weight for mod1, and w2 is weight for mod2
coefficients(mod1)["x"] * w1 + coefficients(mod2)["x"] * w2

You only need to do this for one of your explanatory variables. Compare to the output of coefficients(mod_avg).

31.4 Testing the models

Finally we get to the why. Why is it useful to average models? Because the predictive ability of an averaged model will be greater than the predictive ability of an individual model on its own. We’ll show this to ourselves by comparing the log likelihood values of the test data set under each our three individual models and the averaged model.

We’ll need to calculate AIC by hand for this exercise. But how do we calculate the log likelihood for the test data when our model was fit to the training data? By using dbinom.

# first use `predict` to get the predicted binomial p parameter
# for the test data
p_mod1 <- predict(mod1, newdata = dat_test, type = "response")


# then plug the predicted p into dbinom and sum log probs
ll_mod1 <- dbinom(x = dat_test$n_non, 
       size = dat_test$n_non + dat_test$n_nat, 
       prob = p_mod1, 
       log = TRUE) |> 
    sum()

Compare the log likelihood values for mod1, mod2, mod3, and mod_avg on dat_test. Which has the highest log likelihood?

Bonus question: why are we comparing log likelihoods even though our models have different numbers of parameters that may or may not be nested?