Model Comparison and Averaging

What predicts species richness best?

Temperature or Human Impact?

We saw that including both in one model had problem of collinearity

But does one predict richness better than another?

We cannot use LRT because these are not nested models, in fact they have the same d.f.

\[ \begin{aligned} \log(\mu_\text{spp richness}) &= b_0 + b_1 \log(\text{area}) + b_2 \text{temperature} \\ \log(\mu_\text{spp richness}) &= b_0 + b_1 \log(\text{area}) + b_2 (\text{human impact}) \end{aligned} \]

This is a job for AIC

Akaike Information Criterion

\[ AIC = -2 \mathcal{l}(\hat{\theta}_{MLE}) + 2k \]

\(\mathcal{l}(\hat{\theta}_{MLE})\) is max of the log likelihood
\(k\) is number of parameters

We can compare non-nested models with AIC

This is a job for AIC

\[ AIC = -2 \mathcal{l}(\hat{\theta}_{MLE}) + 2k \]

mod_temp <- glm.nb(nspp ~ log(Plot_Area) + avg_temp_annual_c, 
                   data = dat_plt)
AIC(mod_temp)

[1] 2321.152

mod_hii <- glm.nb(nspp ~ log(Plot_Area) + hii, 
                  data = dat_plt)
AIC(mod_hii)

[1] 2314.374

The human impact model is better supported (lower AIC)

This is a job for AIC

\[ AIC = -2 \mathcal{l}(\hat{\theta}_{MLE}) + 2k \]

Purpose of \(2k\) term is more clear for models with unequal d.f.

mod_temp_precip <- glm.nb(
    nspp ~ log(Plot_Area) + avg_temp_annual_c + rain_annual_mm, 
    data = dat_plt
)
AIC(mod_temp_precip)

[1] 2297.937

AIC(mod_hii)

[1] 2314.374

Now the temp + rain model is better supported

AIC penalizes more parameters

But…

logLik(mod_temp_precip)

'log Lik.' -1143.968 (df=5)

logLik(mod_temp)

'log Lik.' -1156.576 (df=4)

adding parameters will always increase log likelihood

Is the increase real? Or just “over fitting”

“over fitting” is a term for what we know already: more parameters could let a model wiggle to fit noise

AIC penalizes more parameters

How? and why?

Imagine you have one data set

and fit a model to it by maximizing likelihood.

Imagine you have one data set then collect a new one

and calculate the log likelihood of the new data given the same model fit to the first data

AIC penalizes more parameters

How? and why?

Imagine you have one data set

and fit a model to it by maximizing likelihood.

AIC penalizes more parameters

How? and why?

Imagine you have one data set then collect a new one

We could calculate the log likelihood for the new data but with parameter estimates from original model

The difference between the mean log likelihood of the original data and the mean log likelihood of the new data will be about \(k\)

\[ \frac{1}{n} \mathcal{l}(\hat{\theta} \mid \text{data}_\text{orig}) - \frac{1}{n} \mathcal{l}(\hat{\theta} \mid \text{data}_\text{new}) \approx k \]

AIC penalizes more parameters

So AIC is trying to account for the chance that additional parameters are just fitting noise by including a penalty for those additional parameters

Returning to model comparisons

We can rest assured that the improvement in AIC for the temp + rain model is not due just to adding antother parameter

AIC(mod_temp_precip)

[1] 2297.937

AIC(mod_hii)

[1] 2314.374

A common rule is that models with AIC values within 2 points are effectively equivalent; \(>2\) means one model really is better supported (Burnham & Anderson 2002)

Model averaging

model comparisons tells us which model is more supported, a way of testing hypotheses
model averaging is less about testing hypotheses and more about prediction
for example: how can we best predict tree richness across the pae ʻāina?

Model averaging

Consider the 3 models we have made so far

nspp ~ log(Plot_Area) + hii
nspp ~ log(Plot_Area) + avg_temp_annual_c
nspp ~ log(Plot_Area) + avg_temp_annual_c + rain_annual_mm

Each model makes a different prediction about the expected number of species in a plot

How could we combine the models to make one prediction?

Why do we want to?

Model averaging

Consider the 3 models we have made so far

nspp ~ log(Plot_Area) + hii
nspp ~ log(Plot_Area) + avg_temp_annual_c
nspp ~ log(Plot_Area) + avg_temp_annual_c + rain_annual_mm

The predictions from the combined model will be more accurate than any one model by itself

And also better than one big model

nspp ~ log(Plot_Area) + hii + avg_temp_annual_c + rain_annual_mm

Because we know, e.g., there are issues of collinearity. For smaller data sets over fitting one big model is also an issue

Model averaging

How do we combine models to make one prediction?

We could average the parameters of each model to make an average model

\[ \begin{aligned} \log(\mu_\text{nspp}) = &b_0^{\text(avg.)} + b_1^{\text(avg.)} \log(\text{area}) + b_2^{\text(avg.)} \text{hii} + \\ &b_3^{\text(avg.)} \text{temp} + b_4^{\text(avg.)} \text{rain} \end{aligned} \]

where, e.g., \(b_3^{\text(avg.)}\) is the average of all slope estimates for avg_temp_annual_c across all the models

Model averaging

Let’s make that more concrete with our actual models.

Here are the slope estimates for temperature in every model where it shows up

coefficients(mod_temp)["avg_temp_annual_c"]

avg_temp_annual_c 
       0.02427288

coefficients(mod_temp_precip)["avg_temp_annual_c"]

avg_temp_annual_c 
       0.02797484

So the average slope is \(b_3^{\text(avg.)} = \frac{0.024 + 0.028}{2} = 0.026\)

Model averaging

Note: there are two ways to average

average over only models where the variable shows up

\(b_3^{\text(avg.)} = \frac{0.024 + 0.028}{2}\)

average over all models no matter if the variable shows up

\(b_3^{\text(avg.)} = \frac{0 + 0.024 + 0.028}{3}\)

The default in the code we’ll use for model average is to use only models where the variable shows up

Model averaging

But this simple average parameter value ignores the fact that some individual models are already better at making predictions

The AIC values tell us about the support of each model

AIC(mod_hii)

[1] 2314.374

AIC(mod_temp)

[1] 2321.152

AIC(mod_temp_precip)

[1] 2297.937

We want to account for these differences in support when averaging

Model averaging

We turn AIC values into weights. For model \(i\):

\[ w_i = \frac{e^{-\Delta\text{AIC}_i / 2}}{\sum_j e^{-\Delta\text{AIC}_j / 2}} \]

where \(\Delta\text{AIC}_i = \text{AIC}(\text{model}_i) - (\text{minimum AIC})\)

The weights sum to 1, they can be roughly interpreted as the probability that \(\text{model}_i\) is the best supported model

Model averaging

Again, let’s make this concrete

\(\text{AIC}_i\)

AIC(mod_hii)

[1] 2314.374

AIC(mod_temp)

[1] 2321.152

AIC(mod_temp_precip)

[1] 2297.937

\(\Delta\text{AIC}_i\)

16.438

23.216

\(w_i\)

0.00027

0.00001

0.99972

Package MuMIn does this for us!

Create the averaged model

library(MuMIn)

# pass in a list of all models we want to consider
mod_avg <- model.avg(list(mod_hii, mod_temp, mod_temp_precip))

Have a look at the averaged coefficients

coefficients(mod_avg)

      (Intercept)    log(Plot_Area) avg_temp_annual_c    rain_annual_mm 
    -2.5267209876      0.4876844180      0.0279748056      0.0000782252 
              hii 
     0.0133591861

Package MuMIn does this for us!

Look at a similar table of AIC, \(\Delta\)AIC, and weights

mod_avg$msTable

    df    logLik     AICc    delta       weight
134  5 -1143.968 2298.052  0.00000 9.997160e-01
23   4 -1153.187 2314.451 16.39911 2.746982e-04
13   4 -1156.576 2321.229 23.17712 9.268898e-06

A few things to note:

order is ranked best supported model to least supported
AICc is used instead of AIC

AICc is a sample size corrected version AIC, not a bad idea to use

Package MuMIn does this for us!

You can get AICc with the AICc function from MuMIn

# uncorrected AIC
AIC(mod_temp_precip)

[1] 2297.937

# sample size corrected
AICc(mod_temp_precip)

[1] 2298.052

Our sample size is large, so the correction is small

Package MuMIn does other things

MuMIn lets us do some things we maybe shouldn’t

make all possible models with all possible variables
then let AIC pick one, or average over all of them
not recommended because we’re just kind of “fishing” for something “significant” rather than testing hypotheses based in ecology/evolution/conservation
MuMIn knows it’s bad, their function for this workflow is called dredge

Package MuMIn does other things

But dredge-ing can be useful for identifying which variables in our data are most “meaningful”

dredge up all models
for each variable, sum up the weights of the models it is a part of
normalize those summed weights to be [0, 1]
the higher the summed weight, the more “important” that variable
should do this based on variables of interest biologically, not just every variable you could possibly get your hands on

Package MuMIn does other things

Variable importance via summed weights

# model with all possible variables
mod_big <- glm.nb(
    nspp ~ log(Plot_Area) + hii + avg_temp_annual_c + rain_annual_mm, 
    data = dat_plt, na.action = "na.fail"
)

# dredge it
big_dredge <- dredge(mod_big)

sw(big_dredge)

                     log(Plot_Area) rain_annual_mm hii  avg_temp_annual_c
Sum of weights:      1.00           1.00           0.99 0.45             
N containing models:    8              8              8    8

So area is important (no surprise). But a new insight: human impact and rain are quite important, and temperature less so

References

Burnham KP, Anderson DR. 2002. Model selection and multimodel inference: A practical information-theoretic approach. Springer.