As described in Section 2C Part III, ANOVA (Analysis of Variance) is used to determine if there is a significant difference in data related to different groups – these groups are often referred to as levels of a factor.
In Section 2C Part III, we introduced how you could explore the impact of one factor (one-way ANOVA), but it doesn’t stop there – multiple factors can be simultaneously analysed, using two-way ANOVA.
To carry out two-way ANOVA, the process is almost identical to performing one-way ANOVA, except two or more grouping variables (i.e. factors) are fit in the linear model. For this test, we use a data frame to store the data - this data frame will have a column for the measurements and other columns that represents each of the factors. Once the data frame is in R, ANOVA can then be carried out, where first you need to fit a linear model and then you run the analysis on the model.
To conduct a two-way ANOVA, the code is:
anovaexample <- lm(measurements~Factor1 + Factor2, data = ExampleDataframe.df)
anova(anovaexample)
- The term "anovaexample" is the name of a linear model.
- "<-lm(...)" instructs R to create a linear model, using information from the dataframe (in this case the dataframe is called "ExampleDataframe.df"). The “measurements” is the response (i.e. quantitative data) and “Factor1”, “Factor2”… are the names of the factors being explored.
- The term "anova(...)" instructs R to conduct an ANOVA analysis on the relationship that you defined as your linear model (in this case, the linear model is called "anovaexample").
You can include as many factors as you want in the analysis, not just two, like shown in the above example.
To give an example of this in action, take the following scenario.
In an experiment to compare the percentage efficiency of different chelating agents in extracting a metal ion from aqueous solution the following results were obtained:
- On each day a fresh solution of the metal ion was prepared and the extraction performed with each of the chelating agents taken in a random order.
- In this experiment the use of different chelating agents is a controlled factor since the chelating agents are chosen by the experimenter.
- The day is a uncontrolled factor addressed by the blocking design – see “Controlling what you can – randomisation and blocking in experimental design”
ANOVA can be used either to test for a significant effect due to a controlled factor/treatment, or to estimate the variance of an uncontrolled factor.
The code to conduct the analysis is:
PercentageEfficiency<-c(84,80,83,79,79,77,80,79,83,78,80,78)
Day<- c("Day1","Day1","Day1","Day1","Day2","Day2","Day2","Day2","Day3","Day3","Day3","Day3")
ChelatingAgent<-c("A","B","C","D","A","B","C","D","A","B","C","D")
LectureExample.df<-data.frame(PercentageEfficiency,Day,ChelatingAgent)
ChelatingExample<-lm(PercentageEfficiency~ChelatingAgent+Day, data = LectureExample.df)
anova(ChelatingExample)
And the output generated by the two-way ANOVA is:
From this output, we can conclude:
- There is a significant difference between the efficiencies of the chelating agents (i.e. the chelating agent does effect the metal ion extraction efficiency), (p-value < 0.05).
- There is no significant difference between the results from the days (i.e. the day the test was carried out on does not effect the metal ion extraction efficiency), (p-value > 0.05).
Not only can the individual effects of the factors be analysed as above (often called their main effects), but their interactions can also be explored using ANOVA – see Section 2C Part III for more details.
If you do observe significant differences for main effects (like for chelating agents in the above example), one great way to see the nature of these differences is to plot a side-by-side dot or box plot (see Section 2B Part II).