A heatmap is a method for a visualizing of data, in which a table of individual values is represented as a grid of colored cells. In clustered heatmaps, the rows and columns of the grid are ordered to show patterns and cen be accompanied by dendrograms. In molecular biology, heatmaps are often use to visualize the level of gene expression across a number of samples.
To create heapmaps we are going to use a subset of breast cancer proteome profiling dataset.
First of all we need to load the required packages and open the dataset. The dataset is stored in .data/proteomes.csv
. Tumor subtypes for individual samples are stored in .data/tumors.csv
.
library(tidyverse)
library(viridis)
library(RColorBrewer)
prot <- read_csv("./data/proteomes.csv")
tumors <- read_csv("./data/tumors.csv")
GeneSymbol | RefSeqProteinID | Species | Gene Name | AO-A12D | C8-A131 | AO-A12B | BH-A18Q | C8-A130 | C8-A138 | E2-A154 | C8-A12L | A2-A0EX | AO-A12D1 | AN-A04A | BH-A0AV | C8-A12T | A8-A06Z | A2-A0CM | BH-A18U | A2-A0EQ | AR-A0U4 | AO-A0J9 | AR-A1AP | AN-A0FK | AO-A0J6 | A7-A13F | BH-A0E1 | A7-A0CE | A2-A0YC | AO-A0JC | A8-A08Z | AR-A0TX | A8-A076 | AO-A126 | BH-A0C1 | A2-A0EY | AR-A1AW | AR-A1AV | C8-A135 | A2-A0EV | AN-A0AM | D8-A142 | AN-A0FL | BH-A0DG | AR-A0TV | C8-A12Z | AO-A0JJ | AO-A0JE | AN-A0AJ | A7-A0CJ | AO-A12F | A8-A079 | A2-A0T3 | A2-A0YD | AR-A0TR | AO-A03O | AO-A12E | A8-A06N | A2-A0YG | BH-A18N | AN-A0AL | A2-A0T6 | E2-A158 | E2-A15A | AO-A0JM | C8-A12V | A2-A0D2 | C8-A12U | AR-A1AS | A8-A09G | C8-A1311 | C8-A134 | A2-A0YF | BH-A0DD | BH-A0E9 | AR-A0TT | AO-A12B1 | A2-A0SW | AO-A0JL | BH-A0BV | A2-A0YM | BH-A0C7 | A2-A0SX |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ACTR3B | NP_065178 | Homo sapiens | ARP3 actin-related protein 3 homolog B (yeast) | 0.2313070 | 2.2901604 | -0.3189368 | 0.5164973 | 0.2111122 | 0.5232544 | 0.2266471 | 0.4726967 | 1.7253762 | 0.5289704 | -0.3576252 | 0.4180143 | -1.4204014 | -0.8050804 | 1.906685 | 0.5583354 | 1.9614776 | 1.0545540 | 1.4491893 | 1.4540377 | -0.3056243 | 0.3568475 | 0.3229966 | 0.3020104 | -0.2046938 | 0.4596641 | 1.2980129 | 0.6763732 | -0.327754 | 0.5136592 | -0.1115624 | 0.4770089 | 1.737007 | 1.9720995 | 0.9884810 | 1.3190469 | 1.158737 | 0.8146905 | 0.813316 | 0.4256959 | 0.4689096 | -0.6346328 | -1.0216032 | 0.9128162 | 2.1569350 | -2.4680148 | 0.1885128 | -0.1008306 | -0.7779610 | 0.5033701 | -0.3382081 | -0.4643813 | -0.1177570 | -0.1095418 | -0.4123545 | 0.2193463 | -0.2403768 | 0.0290786 | -0.3506201 | 0.7599120 | -0.1505928 | 0.2055265 | 1.1923548 | 1.1985548 | -0.0459722 | -0.8341064 | -0.4910574 | 2.0351112 | 0.8430517 | 0.5611634 | -0.8260504 | 1.1894404 | 1.2277305 | -0.7669136 | 1.0270340 | 0.9985043 | 1.4956453 | 1.5753071 | -2.0739585 | 2.261875 |
MLPH | NP_001035932 | Homo sapiens | melanophilin | 0.4217969 | -1.2234003 | 0.2991019 | -4.1068204 | -1.5665767 | 0.3294260 | -0.7104360 | -1.7753864 | 0.4272324 | 0.4446787 | 3.5975451 | -3.2055893 | 1.4908091 | 0.8894621 | -4.029719 | 0.5516413 | -1.2418004 | -2.2298064 | -0.7489333 | -0.5153134 | 1.4234317 | -1.0885903 | 2.7616782 | 0.6448659 | -4.3678354 | -0.9059333 | 1.2543742 | -1.0432841 | -1.453810 | 1.1938205 | 3.5853931 | -4.4719863 | 3.773154 | -3.4390885 | 0.4610447 | 0.1140835 | -2.608071 | -3.5436479 | -5.530553 | -1.1446787 | 1.7945820 | -1.1977180 | -0.8184494 | 1.8939498 | -1.6689845 | -1.1331595 | -0.1968048 | -3.4228993 | 2.0613501 | -1.7215170 | -0.2406672 | 0.2226642 | -0.8911497 | 1.2931051 | 2.2786025 | 1.2201899 | 0.6889866 | -1.7417357 | 2.6543758 | -2.8298921 | -0.5308439 | -0.0850662 | -4.4180157 | -5.5673719 | -0.9098075 | 0.8938731 | 1.4611936 | -0.9420468 | -5.4156693 | 4.0968378 | -0.3356901 | 1.0879215 | -1.2400342 | 0.3978990 | 0.6250153 | -2.9889504 | 0.2126428 | -2.4120655 | -1.4603511 | -4.235783 |
MLPH | NP_077006 | Homo sapiens | melanophilin | 0.4179871 | -1.2638791 | 0.3018734 | -4.1068204 | -1.5665767 | 0.3294260 | -0.7104360 | -1.7726449 | 0.4159768 | 0.4446787 | 3.5975451 | -3.2055893 | 1.4908091 | 0.8894621 | -4.029719 | 0.5516413 | -1.2418004 | -2.2298064 | -0.7489333 | -0.5153134 | 1.4234317 | -1.0885903 | 2.7616782 | 0.6448659 | -4.3204911 | -0.8704120 | 1.2824276 | -1.0468667 | -1.432051 | 1.1938205 | 3.5853931 | -4.4719863 | 3.773154 | -3.4390885 | 0.4610447 | 0.1140835 | -2.608071 | -3.5436479 | -5.530553 | -1.0810456 | 1.8086351 | -1.1977180 | -0.8184494 | 1.8939498 | -1.6689845 | -1.1331595 | -0.1968048 | -3.4228993 | 2.0613501 | -1.7215170 | -0.2406672 | 0.2226642 | -0.8911497 | 1.2931051 | 2.2786025 | 1.1762340 | 0.7064558 | -1.7417357 | 2.6543758 | -2.8298921 | -0.5308439 | -0.0850662 | -4.4180157 | -5.5673719 | -0.9098075 | 0.8938731 | 1.4611936 | -0.9420468 | -5.4156693 | 4.1044088 | -0.3169203 | 1.1191581 | -1.2803093 | 0.5178061 | 0.5381793 | -3.4222329 | 0.1430225 | -2.4120655 | -1.4603511 | -4.235783 |
ERBB2 | NP_004439 | Homo sapiens | v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 | 9.6681771 | 0.1974059 | -1.4912702 | -5.1873788 | -2.9605210 | -0.5044018 | -3.8942603 | 0.0395294 | -1.6437948 | 9.4382367 | 0.0382217 | -1.5321187 | -1.0166569 | -1.6722998 | -4.419112 | 3.3229698 | -2.7956007 | -4.7779652 | -3.4060046 | -3.8623828 | -4.0375686 | -2.0625669 | -2.3004945 | -2.5666925 | -3.0990083 | -2.5872757 | -1.9312488 | -3.8269793 | 1.663244 | 3.8532872 | -0.3267698 | -2.8604262 | 5.800973 | -2.5294155 | -1.0382666 | 3.8864406 | -3.252209 | -4.2518779 | -3.130366 | -1.3969386 | -5.8877598 | -2.0577536 | 5.6355891 | -1.7057955 | 2.9954430 | -3.1124277 | -4.9186357 | -2.0364466 | -2.7664681 | -3.9371337 | -4.4319692 | -1.5375937 | -2.6721316 | -2.3043992 | -1.2097795 | 5.7679423 | -1.6064710 | -2.4070324 | -4.8045152 | -2.5318622 | -4.6822921 | 4.5678357 | -2.0183325 | -5.4210097 | -4.5628057 | -3.6858027 | 2.1030295 | -3.3837721 | -4.1406139 | -2.4218252 | 0.9922713 | -2.2582979 | -4.7732583 | -4.4040978 | -4.3085582 | -1.7707359 | -3.6231040 | -3.3223513 | -2.7790493 | -4.891209 |
ANLN | NP_061155 | Homo sapiens | anillin | 2.6314798 | 2.4196926 | -1.9929070 | -1.2331374 | 0.5063004 | 1.2729679 | -2.0614914 | 2.6467575 | -2.5217302 | 1.8116702 | -0.6281206 | 0.1379779 | -0.9401579 | 0.0978357 | 2.277710 | -0.8507580 | -0.8590910 | -0.1474077 | -2.3673312 | -2.5219002 | -2.9774460 | -2.4013413 | 0.9135834 | 1.0701804 | 1.7774540 | -0.9967100 | -3.2248238 | -1.2940675 | -5.724120 | -2.0054567 | -2.9438454 | 1.2995179 | -1.123591 | -3.8607395 | 0.4369486 | -2.0972983 | -4.632905 | -0.2078427 | -1.529036 | -0.0810965 | -1.4423071 | -0.0127178 | -3.5188316 | -1.4013057 | -0.9938222 | 0.4439563 | -1.7752551 | 1.1329399 | 1.3325606 | 0.8710945 | -3.6102612 | -2.1180287 | -0.4744980 | 0.4276421 | -2.0466530 | 2.6572120 | -2.7978729 | 3.3323922 | -3.5250578 | 0.1548888 | 1.2093644 | 0.2533888 | 1.3182641 | 2.9349431 | 0.8910694 | -0.3005176 | -1.9218166 | 2.0389086 | 3.1565180 | -0.6842722 | -0.4060767 | -2.5316181 | -0.7494104 | -2.9166779 | -0.1114830 | 2.3046312 | -2.4329284 | 3.1849308 | 1.7683400 | 3.480372 |
CEP55 | NP_060601 | Homo sapiens | centrosomal protein 55kDa | -0.3173039 | 0.6305292 | -0.0556467 | -1.1093583 | 1.4279436 | 0.5671400 | -1.8206347 | 0.9552121 | -0.8821670 | -0.7977076 | -0.7006925 | -0.8100972 | 0.4495732 | 0.2616204 | 3.126292 | -3.0196238 | 0.1627432 | 0.2492397 | -0.1692086 | -1.7606384 | -3.2786617 | -1.6673300 | 1.7854020 | 0.5016478 | 0.5938466 | 0.5425471 | -0.9369105 | -2.7306978 | -7.021533 | -0.5587644 | -2.3097522 | -0.0973534 | -1.602440 | 0.2035078 | -1.8307597 | -2.3437681 | -6.134027 | -1.2639016 | -3.867773 | 0.2984296 | -0.3976960 | 0.2786298 | -5.4347278 | -1.6584304 | 0.0733698 | -0.8594052 | -0.3218640 | 0.3547533 | 0.3762305 | -0.3464132 | -1.9757126 | -1.9332372 | -0.3710159 | NA | NA | -0.3791312 | -3.6783224 | 2.6008969 | -1.7441913 | 0.1817788 | 0.6233302 | 0.0516833 | 0.3850539 | -0.6043618 | 1.7109808 | -0.2369110 | -1.3013752 | 0.9072847 | 0.8819038 | -0.3700742 | 2.1888442 | -0.7901783 | -0.1672522 | -1.4578073 | 0.6539606 | 0.4521916 | -0.7023668 | 0.1697525 | 0.1774231 | -0.047013 |
prot_matrix <- as.matrix(prot[5:84])
rownames(prot_matrix) <- prot$GeneSymbol
heatmap(prot_matrix)
To remove clustering of samples, we can use the Rowv = NA
option (but we don’t really want to do it). If you don’t like the default color scheme, it’s convenient to use predefined palettes from RColorBrewer
or viridis
packages.
heatmap(prot_matrix, Rowv = NA, col=viridis(11, option = "inferno"))
By default, heatmap() use hierarchical clustering complete method, but it’s possible to call a different method. We can also distinguish tumor subtypes by color.
hclust_ward <- function(x) { hclust(x, method="ward.D2") }
tumors <- mutate(tumors, color = case_when(tumor == "Basal-like" ~ "yellow",
tumor == "HER2-enriched" ~ "green",
tumor == "Luminal A" ~ "blue",
tumor == "Luminal B" ~ "red"))
heatmap(prot_matrix,
hclustfun=hclust_ward,
col=viridis(11),
ColSideColors=tumors$color)
Instead setting colors for tumor subtypes manually, we can use RColorBrewer
.
cols <- palette(brewer.pal(4, name = "Set3"))[as.factor(tumors$tumor)]
heatmap(prot_matrix,
col=viridis(11, option = "plasma"),
ColSideColors=cols)
Gplots heatmap.2() function allows us to set more options.
library(gplots)
heatmap.2(prot_matrix,
col = bluered,
trace = "none",
cexRow=0.6, cexCol = 0.25, # label font size
srtCol=45 # label rotation
)
par(cex.main=0.75) # Shrink title fonts on plot
heatmap.2(prot_matrix,
col = colorRampPalette(brewer.pal(9, "GnBu"))(100), # RColorbrewer palette
scale = "column", # Z-score scaling within columns
trace = "none",
ColSideColors = cols,
cexRow=1, cexCol = 0.4,
reorderfun=function(d, w) reorder(d, w, agglo.FUN = mean) ## Reorder dendrogram by branch means rather than sums
)
Custom dendrograms (we can use them in the standard heatmap() function as well):
library(dendextend)
dend_col <- t(prot_matrix) %>%
dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
color_branches(k=3)
dend_row <- prot_matrix %>%
dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
color_branches(k=2)
heatmap.2(prot_matrix,
col = plasma(225),
trace = "none",
Colv = dend_col,
Rowv = dend_row,
ColSideColors = cols,
cexRow=1, cexCol = 0.4,
na.color = "grey", # change color of NA values
scale = "row",
density.info = "none" # show only the color key
)
Add legend:
heatmap.2(prot_matrix,
col = plasma(225),
trace = "none",
Colv = dend_col,
Rowv = dend_row,
dendrogram = "column", # show only dendrogram for columns
ColSideColors = cols,
cexRow=1, cexCol = 0.4,
na.color = "grey", # change color of NA values
scale = "row",
density.info = "none" # show only the color key
)
legend(y=0.8, x=-0.05, xpd=TRUE, # location of the legend on the heatmap plot
legend = unique(tumors$tumor),
col = unique(cols),
lty = 1,
lwd = 10,
cex = 0.6
)
library(pheatmap)
pheatmap(
mat = prot_matrix,
color = plasma(20),
fontsize = 7
)
To create annotation, we need to create a separate dataframe, which contain the information. This dataframe can have one or more columns. Let’s create another column with random values to see multiple annotation. I fe don’t like the default color scheme for annotations, we can assign our own colors.
anot_col <- data.frame(subtype = tumors$tumor,
treatment = factor(sample(c("treatment","control"), 80, replace=T)))
rownames(anot_col) <- tumors$ID
anot_colors <- list(subtype = brewer.pal(4, "Set3"), treatment = c(treatment = "#998ec3", control = "#f1a340" ))
names(anot_colors$subtype) <- unique(tumors$tumor)
pheatmap(
mat = prot_matrix,
color = plasma(20),
fontsize = 8,
main = "I like pheatmap!", # plot title
annotation_col = anot_col,
annotation_colors = anot_colors,
cutree_cols = 4, # cut the heatmap according to clusters
border_color = NA, # remove border,
clustering_method = "ward.D2", # different clustering method
fontsize_col = 4 # smaller font for column names
)
ComplexHeatmpat
is available on Bioconductor, we can install it by
source("http://bioconductor.org/biocLite.R")
biocLite("ComplexHeatmap")
library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 1.17.1
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://bioconductor.org/packages/ComplexHeatmap/
##
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
## genomic data. Bioinformatics 2016.
## ========================================
Heatmap(prot_matrix,
column_title = "Sample", row_title = "Gene",
row_names_gp = gpar(fontsize = 7), # Text size for row names
column_names_gp = gpar(fontsize = 4),
name = "value", # legend title,
col = colorRampPalette(brewer.pal(9, "RdBu"))(100)
)
As in previous cases, we can add annotations to heatmaps.
col = list(subtype = brewer.pal(4, "Set3"))
names(col$subtype) <- unique(tumors$tumor)
ha_column = HeatmapAnnotation(df = data.frame(subtype = tumors$tumor),
col=col)
Heatmap(prot_matrix,
column_title = "Samples", row_title = "Genes",
row_names_gp = gpar(fontsize = 7), # Text size for row names
column_names_gp = gpar(fontsize = 4),
name = "Value", # legend title,
col = viridis(100, option = "magma"),
top_annotation = ha_column
)
We can also plot two heatmaps side by side:
h1 <- Heatmap(prot_matrix,
column_title = "Samples", row_title = "Genes",
row_names_gp = gpar(fontsize = 7),
column_names_gp = gpar(fontsize = 3),
name = "map1", # legend title,
col = viridis(100),
clustering_method_rows = "ward.D2",
clustering_method_columns = "ward.D2"
)
h2 <- Heatmap(prot_matrix,
column_title = "Samples", row_title = "Genes",
row_names_gp = gpar(fontsize = 7),
column_names_gp = gpar(fontsize = 3),
name = "map2",
col = rev(colorRampPalette(brewer.pal(9, "YlGnBu"))(100)),
clustering_method_rows = "centroid")
draw(h1+h2)
The d3heatmap package allows you to create an interactive heatmap widget, which allows the inspection of specific values by moving the mose over a cell.
library(d3heatmap)
d3heatmap(prot_matrix,
colors = "PRGn",
k_col = 3,
k_row = 3,
scale = "row",
cexRow = 0.3
)
D3heatmap is not actively maintained anymore, so you may consider using a different package.
Another package for creating of interactive heatmaps is heatmaply.
library(heatmaply)
# custom dendrograms
dend_col <- t(select(prot, -GeneSymbol, -`Gene Name`, -Species, -RefSeqProteinID)) %>%
dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
color_branches(k=3)
row_clst <- select(prot, -GeneSymbol, -`Gene Name`, -Species, -RefSeqProteinID) %>%
dist %>% hclust(method = "ward.D") # we are going to need row order to sort gene symbols
dend_row <- row_clst %>% as.dendrogram %>%
color_branches(k=2)
heatmaply(select(prot, -`Gene Name`, -Species, -RefSeqProteinID, -GeneSymbol),
col_side_colors = select(tumors, tumor),
Colv = dend_col, # custom dendrogram
Rowv = dend_row,
labRow = rev(prot[row_clst$order, ]$GeneSymbol),
showticklabels = c(F, T),
margins = c(50,150,NA,0),
plot_method = "plotly"
)
Making heatmaps in ggplot2
is not straightforward, there isn’t anything as geom_heatmap()
in ggplot2
. As ggplot relies on the tidy data idea, in whih rows are observations and columns are variables, which doesn’t work smoothly with data in a matrix form. Before plotting a heatmap, we have to tranform the data into a tidy dataframe.
If we want to cluster the data, we can use hclust()
to reorder the matrix-like data before transforming it into the tidy form. Unfortunately there isn’t an easy way to add denrograms to ggplot
plots. there is’t an easy way to add dendrograms to ggplot heatmap plots.
df <- prot[,5:84] #numerical columns
# reorder the data
row.order <- hclust(dist(df), method = "ward.D2")$order
col.order <- hclust(dist(t(df)), method = "ward.D2")$order
df_new <- df[row.order, col.order]
df_new$GeneSymbol <- prot$GeneSymbol[row.order]
df_new$ProteinID <- prot$RefSeqProteinID[row.order]
# transform the data into the tidy form
df_gather <- gather(df_new, key = "ID", value = "Expression", -GeneSymbol, -ProteinID)
# use factors to keep the order during plotting
df_gather$ProteinID <- factor(df_gather$ProteinID, as.character(unique(df_gather$ProteinID)))
df_gather$ID <- factor(df_gather$ID, as.character((unique(df_gather$ID))))
#We have to use RefSeq Protein ID, as GeneSymbol doesn't have unique values
ggplot(df_gather, aes(x = ID, y= ProteinID, fill=Expression))+
geom_raster() +
scale_y_discrete(breaks= df_new$ProteinID, labels=df_new$GeneSymbol) + # change y axis labels
scale_fill_viridis(option = "plasma") +
ylab("Gene") +
theme_minimal() +
theme(axis.text.x = element_text(angle=90, size = 4, hjust = 1),
legend.title = element_blank())