Null hypothesis testing considerations and connections

The GLM

We’ve been working with models like

sar_mod <- glm.nb(nspp ~ log(Plot_Area), data = dat_plt)

And testing “significance” with a likelihood ratio test (LRT)

But there are other approaches, perhaps some that seem more familiar

The GLM

The summary function gives us a lot of info (scroll down)

summary(sar_mod)


Call:
glm.nb(formula = nspp ~ log(Plot_Area), data = dat_plt, init.theta = 9.014755177, 
    link = log)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -1.75853    0.33458  -5.256 1.47e-07 ***
log(Plot_Area)  0.48172    0.05051   9.537  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(9.0148) family taken to be 1)

    Null deviance: 622.95  on 527  degrees of freedom
Residual deviance: 515.85  on 526  degrees of freedom
AIC: 2331.3

Number of Fisher Scoring iterations: 1

              Theta:  9.01 
          Std. Err.:  1.76 

 2 x log-likelihood:  -2325.269

The GLM

Some of this info in fact directly relates to the LRT

    Null deviance: 622.95  on 527  degrees of freedom
Residual deviance: 515.85  on 526  degrees of freedom

(527 d.f.) - (526 d.f.) = 1 d.f., the right number if we were comparing models

mod_full <- nspp ~ log(Plot_Area)
mod_reduced <- nspp ~ 1

The LRT statistic for those two models is

\[ \begin{aligned} \Lambda &= 2 (\log\mathscr{L}_\text{full} - \log\mathscr{L}_\text{reduced}) \\ &= 98.88 \end{aligned} \]

which is (approx.) equal to 622.95 - 515.85

(approx. because printed summary for glm.nb is a little strange)

The GLM

What’s going on?

The deviance is an attempt to generalize the idea of “sum of squares”

\[ \begin{aligned} \text{Null deviance} &= 2 (\log\mathscr{L}_\text{saturated} - \log\mathscr{L}_\text{reduced}) \\ \text{Residual deviance} &= 2 (\log\mathscr{L}_\text{saturated} - \log\mathscr{L}_\text{full}) \end{aligned} \] Where \(\log\mathscr{L}_\text{saturated}\) is the log likelihood of a model where every data point gets a parameter (i.e. its the highest possible log likelihood)

The GLM

What’s going on?

Null deviance

Residual deviance

corresponds to “total sum of squares” in a least squares world

corresponds to “residual sum of squares”

In least squares world we do an \(F\)-test (aka ANOVA) which compares residual and total sums of squares

In likelihood world we do an LRT (perhaps with function anova) which compares residual and null deviances

The GLM versus the LM

The only difference between a GLM and LM is an LM assumes normality

It is a mathematical novelty that under normality,

least squares

minimize sum of squares residual

\(P\)-value of SumSq ANOVA

\(\Longleftrightarrow\)

\(=\)

maximum likelihood

maximize log likelihood function

\(P\)-value of LRT

The GLM

The summary function also gave \(P\)-values

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -1.75853    0.33458  -5.256 1.47e-07 ***
log(Plot_Area)  0.48172    0.05051   9.537  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What are those about?

Those come from the asymptotic (CLT) behavior of maximum likelihood estimates being normally distributed
They are generally not as robust as the \(t\)-test we can do for LM with normality assumption
The LRT is preferred for significance testing

The mixed effects model

Recall for a mixed effects model we did not get any \(P\)-values despite summary reporting \(t\)-values (\(t\) as in \(T\)-test)

library(lme4)

mixed_mod <- lmer(log_dbh_cm ~ Native_Status * hii + (1 | PlotID), 
                  data = dat, REML = FALSE)

summary(mixed_mod)

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: log_dbh_cm ~ Native_Status * hii + (1 | PlotID)
   Data: dat

      AIC       BIC    logLik -2*log(L)  df.resid 
  66659.5   66711.3  -33323.8   66647.5     41447 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3415 -0.6791 -0.0499  0.5649  4.8794 

Random effects:
 Groups   Name        Variance Std.Dev.
 PlotID   (Intercept) 0.1367   0.3697  
 Residual             0.2818   0.5309  
Number of obs: 41453, groups:  PlotID, 530

Fixed effects:
                         Estimate Std. Error t value
(Intercept)              2.354021   0.050752  46.383
Native_Statusnative      0.619260   0.040328  15.356
hii                      0.000556   0.002294   0.242
Native_Statusnative:hii -0.007436   0.001750  -4.249

Correlation of Fixed Effects:
            (Intr) Ntv_St hii   
Ntv_Sttsntv -0.603              
hii         -0.915  0.488       
Ntv_Sttsnt:  0.540 -0.949 -0.478

The mixed effects model

Here again an LRT is the better option

# a model to single out the interaction term
reduced_mod <- lmer(log_dbh_cm ~ Native_Status + hii + (1 | PlotID), 
                  data = dat, REML = FALSE)

# the LRT
anova(mixed_mod, reduced_mod, test = "Chisq")

Data: dat
Models:
reduced_mod: log_dbh_cm ~ Native_Status + hii + (1 | PlotID)
mixed_mod: log_dbh_cm ~ Native_Status * hii + (1 | PlotID)
            npar   AIC   BIC logLik -2*log(L)  Chisq Df Pr(>Chisq)    
reduced_mod    5 66676 66719 -33333     66666                         
mixed_mod      6 66660 66711 -33324     66648 18.033  1  2.171e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1