Don’t Assume Anything – Normality Assumption Testing

Lisa Pilkington

16 Sep 2024

Article Linear Regression Diagnostics

As noted in “How healthy is your regression? Linear model diagnostic residual plots”, various assumptions are critical to check when carrying out linear regression. In addition to inspection of residual diagnostic plots, the Shapiro-Wilk test can be used to determine if the linear model residuals (i.e. errors in the y values) are normally distributed.

To conduct a Shapiro-Wilk test, the code is:

shapiro.test(lmExample.lm$residuals)

The term "shapiro.test(…)” asks R to conduct a Shapiro-Wilk test for the linear model (in this case lmExample.lm) that has been calculated previously. This command will produce an output from the test that can then be interpreted.

An example of the output is shown below:

> shapiro.test(lmExample.lm$residuals)
⁠    Shapiro-Wilk normality test
⁠data: lmExample.lm$residuals
⁠W = 0.94228, p-value = 0.2647

The Shapiro-Wilk test is like other significance tests, and the key value to look at, in the output, is the p-value:

If the p-value > 0.05, there is no evidence that residuals are not normally-distributed.
If the p-value < 0.05, this is evidence that the residuals are not normally-distributed.

It should be noted that the results of this Shapiro-Wilk test should be analysed in conjunction with the Normal Q-Q plot of residuals (see “How healthy is your regression? Linear model diagnostic residual plots”).

If all is in order regarding normality and the other assumptions when fitting a linear model, then you are all set and can proceed with your linear regression – see “RModule Section 2D”.