One of the most important (first!) steps of any data analysis is to be able to gain an understanding of your data. Heatmaps are an excellent tool to do this - values are depicted by color of the squares/tiles, making it easy to visualise complex data and get an overview of the data at a glance – they are excellent for exploratory analysis.
There are a few options to think about when producing your heatmaps, including:
- Plotting the raw values (A) or normalising values by column (B).
- Depending on the nature of the data, sometimes plotting the raw data is most relevant.
- Normalising is often recommended as columns can have very different scales, so are difficult to visualise differences in columns with lower values.
- Ordering the rows and/or columns by performing hierarchical cluster analysis (C) (see RModule Section 3A II for more information about HCA).
- Separate HCA are performed for the columns and the samples.
- The default in R is conducting HCA according to the Euclidean distance.
- Heatmaps are an excellent way to graphically show the results of HCA.
To generate heatmaps, once you have imported the data frame, the code is:
Example.mat<-as.matrix(ExampleDataframe.df)
heatmap(Example.mat)
The term "as.matrix()
” asks R to convert the data frame to a matrix, which is required for producing a heat map.
The term "heatmap(...)
" asks R to plot the heatmap of the matrix (in this case Example.mat). This command will produce the heatmap in the plot window.
In the "heatmap(...)
" brackets, after the name of the matrix, if you include the term(s):
- “
Colv = NA, Rowv = NA
”, this will stop R from performing HCA and including dendrograms in the heatmap. - “
scale = “column”
”, this will ask R to normalise the values by column and then use this data to product the heatmap.
To show you this in practise, the three different heatmaps below are produced by the following code:
heatmap(Example.mat, Colv = NA, Rowv = NA)
heatmap(Example.mat, Colv = NA, Rowv = NA, scale = "column")
heatmap(Example.mat, scale = "column")
Once you generate them, heatmaps can be analysed and interpreted to understand many features in the data:
- Colours of the cells can show patterns and trends in the data.
- Look for patterns or blocks in the colours and identify what this might mean.
- Look for extreme high/low values to see what this might mean and represents.
- If produced, dendrograms are for the samples and variables.
- Dendrograms for the rows show clustering of the samples.
- Dendrograms for the columns show similarity of the measured variables.
Heatmaps are a great way to visualise multivariate data that allows one to quickly gauge overall patterns and trends that they can then go on and explore further, as well (further down the track) being a tool to communicate findings to others.