A heatmap is a method for a visualizing of data, in which a table of individual values is represented as a grid of colored cells. In clustered heatmaps, the rows and columns of the grid are ordered to show patterns and cen be accompanied by dendrograms. In molecular biology, heatmaps are often use to visualize the level of gene expression across a number of samples.

To create heapmaps we are going to use a subset of breast cancer proteome profiling dataset.

First of all we need to load the required packages and open the dataset. The dataset is stored in .data/proteomes.csv. Tumor subtypes for individual samples are stored in .data/tumors.csv.

library(tidyverse)
library(viridis)
library(RColorBrewer)

prot <- read_csv("./data/proteomes.csv")
tumors <- read_csv("./data/tumors.csv")
GeneSymbol RefSeqProteinID Species Gene Name AO-A12D C8-A131 AO-A12B BH-A18Q C8-A130 C8-A138 E2-A154 C8-A12L A2-A0EX AO-A12D1 AN-A04A BH-A0AV C8-A12T A8-A06Z A2-A0CM BH-A18U A2-A0EQ AR-A0U4 AO-A0J9 AR-A1AP AN-A0FK AO-A0J6 A7-A13F BH-A0E1 A7-A0CE A2-A0YC AO-A0JC A8-A08Z AR-A0TX A8-A076 AO-A126 BH-A0C1 A2-A0EY AR-A1AW AR-A1AV C8-A135 A2-A0EV AN-A0AM D8-A142 AN-A0FL BH-A0DG AR-A0TV C8-A12Z AO-A0JJ AO-A0JE AN-A0AJ A7-A0CJ AO-A12F A8-A079 A2-A0T3 A2-A0YD AR-A0TR AO-A03O AO-A12E A8-A06N A2-A0YG BH-A18N AN-A0AL A2-A0T6 E2-A158 E2-A15A AO-A0JM C8-A12V A2-A0D2 C8-A12U AR-A1AS A8-A09G C8-A1311 C8-A134 A2-A0YF BH-A0DD BH-A0E9 AR-A0TT AO-A12B1 A2-A0SW AO-A0JL BH-A0BV A2-A0YM BH-A0C7 A2-A0SX
ACTR3B NP_065178 Homo sapiens ARP3 actin-related protein 3 homolog B (yeast) 0.2313070 2.2901604 -0.3189368 0.5164973 0.2111122 0.5232544 0.2266471 0.4726967 1.7253762 0.5289704 -0.3576252 0.4180143 -1.4204014 -0.8050804 1.906685 0.5583354 1.9614776 1.0545540 1.4491893 1.4540377 -0.3056243 0.3568475 0.3229966 0.3020104 -0.2046938 0.4596641 1.2980129 0.6763732 -0.327754 0.5136592 -0.1115624 0.4770089 1.737007 1.9720995 0.9884810 1.3190469 1.158737 0.8146905 0.813316 0.4256959 0.4689096 -0.6346328 -1.0216032 0.9128162 2.1569350 -2.4680148 0.1885128 -0.1008306 -0.7779610 0.5033701 -0.3382081 -0.4643813 -0.1177570 -0.1095418 -0.4123545 0.2193463 -0.2403768 0.0290786 -0.3506201 0.7599120 -0.1505928 0.2055265 1.1923548 1.1985548 -0.0459722 -0.8341064 -0.4910574 2.0351112 0.8430517 0.5611634 -0.8260504 1.1894404 1.2277305 -0.7669136 1.0270340 0.9985043 1.4956453 1.5753071 -2.0739585 2.261875
MLPH NP_001035932 Homo sapiens melanophilin 0.4217969 -1.2234003 0.2991019 -4.1068204 -1.5665767 0.3294260 -0.7104360 -1.7753864 0.4272324 0.4446787 3.5975451 -3.2055893 1.4908091 0.8894621 -4.029719 0.5516413 -1.2418004 -2.2298064 -0.7489333 -0.5153134 1.4234317 -1.0885903 2.7616782 0.6448659 -4.3678354 -0.9059333 1.2543742 -1.0432841 -1.453810 1.1938205 3.5853931 -4.4719863 3.773154 -3.4390885 0.4610447 0.1140835 -2.608071 -3.5436479 -5.530553 -1.1446787 1.7945820 -1.1977180 -0.8184494 1.8939498 -1.6689845 -1.1331595 -0.1968048 -3.4228993 2.0613501 -1.7215170 -0.2406672 0.2226642 -0.8911497 1.2931051 2.2786025 1.2201899 0.6889866 -1.7417357 2.6543758 -2.8298921 -0.5308439 -0.0850662 -4.4180157 -5.5673719 -0.9098075 0.8938731 1.4611936 -0.9420468 -5.4156693 4.0968378 -0.3356901 1.0879215 -1.2400342 0.3978990 0.6250153 -2.9889504 0.2126428 -2.4120655 -1.4603511 -4.235783
MLPH NP_077006 Homo sapiens melanophilin 0.4179871 -1.2638791 0.3018734 -4.1068204 -1.5665767 0.3294260 -0.7104360 -1.7726449 0.4159768 0.4446787 3.5975451 -3.2055893 1.4908091 0.8894621 -4.029719 0.5516413 -1.2418004 -2.2298064 -0.7489333 -0.5153134 1.4234317 -1.0885903 2.7616782 0.6448659 -4.3204911 -0.8704120 1.2824276 -1.0468667 -1.432051 1.1938205 3.5853931 -4.4719863 3.773154 -3.4390885 0.4610447 0.1140835 -2.608071 -3.5436479 -5.530553 -1.0810456 1.8086351 -1.1977180 -0.8184494 1.8939498 -1.6689845 -1.1331595 -0.1968048 -3.4228993 2.0613501 -1.7215170 -0.2406672 0.2226642 -0.8911497 1.2931051 2.2786025 1.1762340 0.7064558 -1.7417357 2.6543758 -2.8298921 -0.5308439 -0.0850662 -4.4180157 -5.5673719 -0.9098075 0.8938731 1.4611936 -0.9420468 -5.4156693 4.1044088 -0.3169203 1.1191581 -1.2803093 0.5178061 0.5381793 -3.4222329 0.1430225 -2.4120655 -1.4603511 -4.235783
ERBB2 NP_004439 Homo sapiens v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 9.6681771 0.1974059 -1.4912702 -5.1873788 -2.9605210 -0.5044018 -3.8942603 0.0395294 -1.6437948 9.4382367 0.0382217 -1.5321187 -1.0166569 -1.6722998 -4.419112 3.3229698 -2.7956007 -4.7779652 -3.4060046 -3.8623828 -4.0375686 -2.0625669 -2.3004945 -2.5666925 -3.0990083 -2.5872757 -1.9312488 -3.8269793 1.663244 3.8532872 -0.3267698 -2.8604262 5.800973 -2.5294155 -1.0382666 3.8864406 -3.252209 -4.2518779 -3.130366 -1.3969386 -5.8877598 -2.0577536 5.6355891 -1.7057955 2.9954430 -3.1124277 -4.9186357 -2.0364466 -2.7664681 -3.9371337 -4.4319692 -1.5375937 -2.6721316 -2.3043992 -1.2097795 5.7679423 -1.6064710 -2.4070324 -4.8045152 -2.5318622 -4.6822921 4.5678357 -2.0183325 -5.4210097 -4.5628057 -3.6858027 2.1030295 -3.3837721 -4.1406139 -2.4218252 0.9922713 -2.2582979 -4.7732583 -4.4040978 -4.3085582 -1.7707359 -3.6231040 -3.3223513 -2.7790493 -4.891209
ANLN NP_061155 Homo sapiens anillin 2.6314798 2.4196926 -1.9929070 -1.2331374 0.5063004 1.2729679 -2.0614914 2.6467575 -2.5217302 1.8116702 -0.6281206 0.1379779 -0.9401579 0.0978357 2.277710 -0.8507580 -0.8590910 -0.1474077 -2.3673312 -2.5219002 -2.9774460 -2.4013413 0.9135834 1.0701804 1.7774540 -0.9967100 -3.2248238 -1.2940675 -5.724120 -2.0054567 -2.9438454 1.2995179 -1.123591 -3.8607395 0.4369486 -2.0972983 -4.632905 -0.2078427 -1.529036 -0.0810965 -1.4423071 -0.0127178 -3.5188316 -1.4013057 -0.9938222 0.4439563 -1.7752551 1.1329399 1.3325606 0.8710945 -3.6102612 -2.1180287 -0.4744980 0.4276421 -2.0466530 2.6572120 -2.7978729 3.3323922 -3.5250578 0.1548888 1.2093644 0.2533888 1.3182641 2.9349431 0.8910694 -0.3005176 -1.9218166 2.0389086 3.1565180 -0.6842722 -0.4060767 -2.5316181 -0.7494104 -2.9166779 -0.1114830 2.3046312 -2.4329284 3.1849308 1.7683400 3.480372
CEP55 NP_060601 Homo sapiens centrosomal protein 55kDa -0.3173039 0.6305292 -0.0556467 -1.1093583 1.4279436 0.5671400 -1.8206347 0.9552121 -0.8821670 -0.7977076 -0.7006925 -0.8100972 0.4495732 0.2616204 3.126292 -3.0196238 0.1627432 0.2492397 -0.1692086 -1.7606384 -3.2786617 -1.6673300 1.7854020 0.5016478 0.5938466 0.5425471 -0.9369105 -2.7306978 -7.021533 -0.5587644 -2.3097522 -0.0973534 -1.602440 0.2035078 -1.8307597 -2.3437681 -6.134027 -1.2639016 -3.867773 0.2984296 -0.3976960 0.2786298 -5.4347278 -1.6584304 0.0733698 -0.8594052 -0.3218640 0.3547533 0.3762305 -0.3464132 -1.9757126 -1.9332372 -0.3710159 NA NA -0.3791312 -3.6783224 2.6008969 -1.7441913 0.1817788 0.6233302 0.0516833 0.3850539 -0.6043618 1.7109808 -0.2369110 -1.3013752 0.9072847 0.8819038 -0.3700742 2.1888442 -0.7901783 -0.1672522 -1.4578073 0.6539606 0.4521916 -0.7023668 0.1697525 0.1774231 -0.047013

Default heatmap function

prot_matrix <- as.matrix(prot[5:84])
rownames(prot_matrix) <- prot$GeneSymbol
heatmap(prot_matrix)

To remove clustering of samples, we can use the Rowv = NA option (but we don’t really want to do it). If you don’t like the default color scheme, it’s convenient to use predefined palettes from RColorBrewer or viridis packages.

heatmap(prot_matrix, Rowv = NA, col=viridis(11, option = "inferno"))

By default, heatmap() use hierarchical clustering complete method, but it’s possible to call a different method. We can also distinguish tumor subtypes by color.

hclust_ward <- function(x) { hclust(x, method="ward.D2") }


tumors <- mutate(tumors, color = case_when(tumor == "Basal-like" ~ "yellow", 
                             tumor == "HER2-enriched" ~ "green",
                             tumor == "Luminal A" ~ "blue",
                             tumor == "Luminal B" ~ "red"))

heatmap(prot_matrix,
        hclustfun=hclust_ward,
        col=viridis(11),
        ColSideColors=tumors$color)

Instead setting colors for tumor subtypes manually, we can use RColorBrewer.

cols <- palette(brewer.pal(4, name = "Set3"))[as.factor(tumors$tumor)]

heatmap(prot_matrix,
        col=viridis(11, option = "plasma"),
        ColSideColors=cols)

Gplots heatmap function

Gplots heatmap.2() function allows us to set more options.

library(gplots)

heatmap.2(prot_matrix,
          col = bluered,
          trace = "none",
          cexRow=0.6, cexCol = 0.25, # label font size
          srtCol=45 # label rotation
          ) 

par(cex.main=0.75)  # Shrink title fonts on plot
heatmap.2(prot_matrix,
          col = colorRampPalette(brewer.pal(9, "GnBu"))(100), # RColorbrewer palette
          scale = "column", # Z-score scaling within columns
          trace = "none",
          ColSideColors = cols,
          cexRow=1, cexCol = 0.4,
          reorderfun=function(d, w) reorder(d, w, agglo.FUN = mean) ## Reorder dendrogram by branch means rather than sums
) 

Custom dendrograms (we can use them in the standard heatmap() function as well):

library(dendextend)

dend_col <- t(prot_matrix) %>% 
  dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
  color_branches(k=3)

dend_row <- prot_matrix %>% 
  dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
  color_branches(k=2)

heatmap.2(prot_matrix,
          col = plasma(225),
          trace = "none",
          Colv = dend_col,
          Rowv = dend_row,
          ColSideColors = cols,
          cexRow=1, cexCol = 0.4,
          na.color = "grey", # change color of NA values
          scale = "row",
          density.info = "none" # show only the color key
)

Add legend:

heatmap.2(prot_matrix,
          col = plasma(225),
          trace = "none",
          Colv = dend_col,
          Rowv = dend_row,
          dendrogram = "column", # show only dendrogram for columns
          ColSideColors = cols,
          cexRow=1, cexCol = 0.4,
          na.color = "grey", # change color of NA values
          scale = "row",
          density.info = "none" # show only the color key
)

legend(y=0.8, x=-0.05, xpd=TRUE,    # location of the legend on the heatmap plot
       legend = unique(tumors$tumor),
       col = unique(cols),  
       lty = 1,
       lwd = 10,
       cex = 0.6
       )


Pheatmap

library(pheatmap)

pheatmap(
  mat = prot_matrix,
  color = plasma(20),
  fontsize = 7
)

To create annotation, we need to create a separate dataframe, which contain the information. This dataframe can have one or more columns. Let’s create another column with random values to see multiple annotation. I fe don’t like the default color scheme for annotations, we can assign our own colors.

anot_col <- data.frame(subtype = tumors$tumor,
                       treatment = factor(sample(c("treatment","control"), 80, replace=T)))
rownames(anot_col) <- tumors$ID

anot_colors <- list(subtype = brewer.pal(4, "Set3"), treatment = c(treatment = "#998ec3", control = "#f1a340" ))
names(anot_colors$subtype) <- unique(tumors$tumor)

pheatmap(
  mat               = prot_matrix,
  color             = plasma(20),
  fontsize          = 8,
  main =  "I like pheatmap!", # plot title
  annotation_col = anot_col, 
  annotation_colors = anot_colors,
  cutree_cols = 4, # cut the heatmap according to clusters
  border_color = NA, # remove border,
  clustering_method = "ward.D2", # different clustering method
  fontsize_col = 4 # smaller font for column names
)


Complexheatmap

ComplexHeatmpat is available on Bioconductor, we can install it by

source("http://bioconductor.org/biocLite.R")
biocLite("ComplexHeatmap")
library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 1.17.1
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://bioconductor.org/packages/ComplexHeatmap/
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## ========================================
Heatmap(prot_matrix, 
        column_title = "Sample", row_title = "Gene",
        row_names_gp = gpar(fontsize = 7), # Text size for row names
        column_names_gp = gpar(fontsize = 4),
        name = "value", # legend title,
        col = colorRampPalette(brewer.pal(9, "RdBu"))(100)
)

As in previous cases, we can add annotations to heatmaps.

col = list(subtype = brewer.pal(4, "Set3"))
names(col$subtype) <- unique(tumors$tumor)

ha_column = HeatmapAnnotation(df = data.frame(subtype = tumors$tumor),
                              col=col) 
                             
Heatmap(prot_matrix, 
        column_title = "Samples", row_title = "Genes",
        row_names_gp = gpar(fontsize = 7), # Text size for row names
        column_names_gp = gpar(fontsize = 4),
        name = "Value", # legend title,
        col = viridis(100, option = "magma"),
        top_annotation = ha_column
)

We can also plot two heatmaps side by side:

h1 <- Heatmap(prot_matrix, 
        column_title = "Samples", row_title = "Genes",
        row_names_gp = gpar(fontsize = 7),
        column_names_gp = gpar(fontsize = 3),
        name = "map1", # legend title,
        col = viridis(100),
        clustering_method_rows = "ward.D2",
        clustering_method_columns = "ward.D2"
)

h2 <- Heatmap(prot_matrix, 
              column_title = "Samples", row_title = "Genes",
              row_names_gp = gpar(fontsize = 7), 
              column_names_gp = gpar(fontsize = 3),
              name = "map2", 
              col = rev(colorRampPalette(brewer.pal(9, "YlGnBu"))(100)),
              clustering_method_rows = "centroid")

draw(h1+h2)


D3heatmap

The d3heatmap package allows you to create an interactive heatmap widget, which allows the inspection of specific values by moving the mose over a cell.

library(d3heatmap)

d3heatmap(prot_matrix, 
          colors = "PRGn", 
          k_col = 3, 
          k_row = 3,
          scale = "row", 
          cexRow =  0.3
)

D3heatmap is not actively maintained anymore, so you may consider using a different package.


Heatmaply

Another package for creating of interactive heatmaps is heatmaply.

library(heatmaply)

# custom dendrograms
dend_col <- t(select(prot, -GeneSymbol, -`Gene Name`, -Species, -RefSeqProteinID)) %>% 
  dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
  color_branches(k=3)

row_clst <-  select(prot, -GeneSymbol, -`Gene Name`, -Species, -RefSeqProteinID) %>% 
  dist %>% hclust(method = "ward.D") # we are going to need row order to sort gene symbols
dend_row <- row_clst %>% as.dendrogram %>%
  color_branches(k=2)

heatmaply(select(prot, -`Gene Name`, -Species, -RefSeqProteinID, -GeneSymbol),
          col_side_colors = select(tumors, tumor),
          Colv = dend_col, # custom dendrogram
          Rowv = dend_row,
          labRow = rev(prot[row_clst$order, ]$GeneSymbol),
          showticklabels = c(F, T),
          margins = c(50,150,NA,0),
          plot_method = "plotly"
)

Heatmap in ggplot2

Making heatmaps in ggplot2 is not straightforward, there isn’t anything as geom_heatmap() in ggplot2. As ggplot relies on the tidy data idea, in whih rows are observations and columns are variables, which doesn’t work smoothly with data in a matrix form. Before plotting a heatmap, we have to tranform the data into a tidy dataframe.

If we want to cluster the data, we can use hclust() to reorder the matrix-like data before transforming it into the tidy form. Unfortunately there isn’t an easy way to add denrograms to ggplot plots. there is’t an easy way to add dendrograms to ggplot heatmap plots.

df <- prot[,5:84] #numerical columns

# reorder the data
row.order <- hclust(dist(df), method = "ward.D2")$order 
col.order <- hclust(dist(t(df)), method = "ward.D2")$order

df_new <- df[row.order, col.order]
df_new$GeneSymbol <- prot$GeneSymbol[row.order]
df_new$ProteinID <- prot$RefSeqProteinID[row.order]

# transform the data into the tidy form
df_gather <- gather(df_new, key = "ID", value = "Expression", -GeneSymbol, -ProteinID)

# use factors to keep the order during plotting
df_gather$ProteinID <- factor(df_gather$ProteinID, as.character(unique(df_gather$ProteinID)))
df_gather$ID <- factor(df_gather$ID, as.character((unique(df_gather$ID))))

#We have to use RefSeq Protein ID, as GeneSymbol doesn't have unique values
ggplot(df_gather, aes(x = ID, y= ProteinID, fill=Expression))+ 
  geom_raster() +
  scale_y_discrete(breaks= df_new$ProteinID, labels=df_new$GeneSymbol) + # change y axis labels
  scale_fill_viridis(option = "plasma") +
  ylab("Gene") + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle=90, size = 4, hjust = 1),
        legend.title = element_blank())