Controlling what you can – randomisation and blocking in experimental design

Lisa Pilkington

14 Jul 2024

Article Randomisation Controlled Factors

Designing experiments is an essential aspect of many scientists work. Unfortunately, even the simplest of experiments may be impacted by many additional factors that cannot be controlled easily. Such variation in an uncontrolled factor may produce a trend in the results and could lead to a systematic error in the results, thus influencing or invalidating the conclusions made in the experiment.

This is because having a trend ignores one of the fundamental assumptions of ANOVA – the technique used to identify significant factors in EDs - ANOVA assumes that the observations in the trial experiments are independent of each other and this is clearly not true if there is a trend in the conditions.

But don’t worry – there are methods that can be used in experimental design to minimse the effect of these factors – randomisation and blocking. These approaches (and how we implement them in R) can be shown through an example scenario:

Say you want to analyse a drug and its metabolites in a urine extract using reversed-phase HPLC.

Suppose we wish to study the effects of four different solvent compositions (A-D) on the resolution of the signals we see as output from the HPLC.
The use of each solvent is often referred to as a treatment.
To estimate random measurement errors, each solvent (i.e. each treatment) is run three times. If three experiments with one solvent are done first, then three with the second solvent and three with the third, we run the risk that any genuine effect of changing the solvent will be confused or confounded by a drift in the experimental conditions.
The problem is avoided by assigning labels (1 – 12) to each experiment, then using R to randomise the order.

To generate a randomised list of numbers, the code is:

sample(1:12, 12, replace = FALSE)

The term "sample()” asks R to sample from a list of numbers and give the output.
The term “1:n” tells R the numbers to sample from – this should be between 1 to the total number of experiments (in this case 12).
The term “n” tells R how many numbers to sample – this should be equal to the total number of experiments (in this case 12). The term “replace = FALSE” tells R that there can be no repeats when choosing. This should be FALSE each time.

The output in the console will be a randomised list of n numbers that can then be used to give the new order of experiments.

Although the experiments will be performed in a random order, the outcome evidently may still not be ideal. If, for example, the experiments were performed at the rate of 3 per day over 4 days, all the treatments may still end up grouped together across a few days – therefore, some time-dependent uncontrolled factors could then still affect the results. In other words, complete randomisation may by chance leave some partial correlation….. although, this is by far more preferable than not randomising at all!

This difficulty can be overcome, for known uncontrolled factors, such as time, by using the technique of blocking. Blocking is the deliberate dividing of experiments into groups that are (for example) performed on different days – each day is a block.

The easiest way to do this is to have equal representation of each treatment in each block. Then, to allow for uncontrolled variation within a block, the order of the treatments would be randomized (using the above procedure) giving a randomised block design.

The next steps after your experimental design is to carry out the experiments…. Then its on to analyse the data. See how to analyse the results using ANOVA in the upcoming block post “Two-way ANOVA – Twice the Fun!”