Visualization Module

A heatmap is a method for a visualizing of data, in which a table of individual values is represented as a grid of colored cells. In clustered heatmaps, the rows and columns of the grid are ordered to show patterns and cen be accompanied by dendrograms. In molecular biology, heatmaps are often use to visualize the level of gene expression across a number of samples.

To create heapmaps we are going to use a subset of breast cancer proteome profiling dataset.

First of all we need to load the required packages and open the dataset. The dataset is stored in .data/proteomes.csv. Tumor subtypes for individual samples are stored in .data/tumors.csv.

library(tidyverse)
library(viridis)
library(RColorBrewer)

prot <- read_csv("./data/proteomes.csv")
tumors <- read_csv("./data/tumors.csv")

GeneSymbol	RefSeqProteinID	Species	Gene Name	AO-A12D	C8-A131	AO-A12B	BH-A18Q	C8-A130	C8-A138	E2-A154	C8-A12L	A2-A0EX	AO-A12D1	AN-A04A	BH-A0AV	C8-A12T	A8-A06Z	A2-A0CM	BH-A18U	A2-A0EQ	AR-A0U4	AO-A0J9	AR-A1AP	AN-A0FK	AO-A0J6	A7-A13F	BH-A0E1	A7-A0CE	A2-A0YC	AO-A0JC	A8-A08Z	AR-A0TX	A8-A076	AO-A126	BH-A0C1	A2-A0EY	AR-A1AW	AR-A1AV	C8-A135	A2-A0EV	AN-A0AM	D8-A142	AN-A0FL	BH-A0DG	AR-A0TV	C8-A12Z	AO-A0JJ	AO-A0JE	AN-A0AJ	A7-A0CJ	AO-A12F	A8-A079	A2-A0T3	A2-A0YD	AR-A0TR	AO-A03O	AO-A12E	A8-A06N	A2-A0YG	BH-A18N	AN-A0AL	A2-A0T6	E2-A158	E2-A15A	AO-A0JM	C8-A12V	A2-A0D2	C8-A12U	AR-A1AS	A8-A09G	C8-A1311	C8-A134	A2-A0YF	BH-A0DD	BH-A0E9	AR-A0TT	AO-A12B1	A2-A0SW	AO-A0JL	BH-A0BV	A2-A0YM	BH-A0C7	A2-A0SX
ACTR3B	NP_065178	Homo sapiens	ARP3 actin-related protein 3 homolog B (yeast)	0.2313070	2.2901604	-0.3189368	0.5164973	0.2111122	0.5232544	0.2266471	0.4726967	1.7253762	0.5289704	-0.3576252	0.4180143	-1.4204014	-0.8050804	1.906685	0.5583354	1.9614776	1.0545540	1.4491893	1.4540377	-0.3056243	0.3568475	0.3229966	0.3020104	-0.2046938	0.4596641	1.2980129	0.6763732	-0.327754	0.5136592	-0.1115624	0.4770089	1.737007	1.9720995	0.9884810	1.3190469	1.158737	0.8146905	0.813316	0.4256959	0.4689096	-0.6346328	-1.0216032	0.9128162	2.1569350	-2.4680148	0.1885128	-0.1008306	-0.7779610	0.5033701	-0.3382081	-0.4643813	-0.1177570	-0.1095418	-0.4123545	0.2193463	-0.2403768	0.0290786	-0.3506201	0.7599120	-0.1505928	0.2055265	1.1923548	1.1985548	-0.0459722	-0.8341064	-0.4910574	2.0351112	0.8430517	0.5611634	-0.8260504	1.1894404	1.2277305	-0.7669136	1.0270340	0.9985043	1.4956453	1.5753071	-2.0739585	2.261875
MLPH	NP_001035932	Homo sapiens	melanophilin	0.4217969	-1.2234003	0.2991019	-4.1068204	-1.5665767	0.3294260	-0.7104360	-1.7753864	0.4272324	0.4446787	3.5975451	-3.2055893	1.4908091	0.8894621	-4.029719	0.5516413	-1.2418004	-2.2298064	-0.7489333	-0.5153134	1.4234317	-1.0885903	2.7616782	0.6448659	-4.3678354	-0.9059333	1.2543742	-1.0432841	-1.453810	1.1938205	3.5853931	-4.4719863	3.773154	-3.4390885	0.4610447	0.1140835	-2.608071	-3.5436479	-5.530553	-1.1446787	1.7945820	-1.1977180	-0.8184494	1.8939498	-1.6689845	-1.1331595	-0.1968048	-3.4228993	2.0613501	-1.7215170	-0.2406672	0.2226642	-0.8911497	1.2931051	2.2786025	1.2201899	0.6889866	-1.7417357	2.6543758	-2.8298921	-0.5308439	-0.0850662	-4.4180157	-5.5673719	-0.9098075	0.8938731	1.4611936	-0.9420468	-5.4156693	4.0968378	-0.3356901	1.0879215	-1.2400342	0.3978990	0.6250153	-2.9889504	0.2126428	-2.4120655	-1.4603511	-4.235783
MLPH	NP_077006	Homo sapiens	melanophilin	0.4179871	-1.2638791	0.3018734	-4.1068204	-1.5665767	0.3294260	-0.7104360	-1.7726449	0.4159768	0.4446787	3.5975451	-3.2055893	1.4908091	0.8894621	-4.029719	0.5516413	-1.2418004	-2.2298064	-0.7489333	-0.5153134	1.4234317	-1.0885903	2.7616782	0.6448659	-4.3204911	-0.8704120	1.2824276	-1.0468667	-1.432051	1.1938205	3.5853931	-4.4719863	3.773154	-3.4390885	0.4610447	0.1140835	-2.608071	-3.5436479	-5.530553	-1.0810456	1.8086351	-1.1977180	-0.8184494	1.8939498	-1.6689845	-1.1331595	-0.1968048	-3.4228993	2.0613501	-1.7215170	-0.2406672	0.2226642	-0.8911497	1.2931051	2.2786025	1.1762340	0.7064558	-1.7417357	2.6543758	-2.8298921	-0.5308439	-0.0850662	-4.4180157	-5.5673719	-0.9098075	0.8938731	1.4611936	-0.9420468	-5.4156693	4.1044088	-0.3169203	1.1191581	-1.2803093	0.5178061	0.5381793	-3.4222329	0.1430225	-2.4120655	-1.4603511	-4.235783
ERBB2	NP_004439	Homo sapiens	v-erb-b2 erythroblastic leukemia viral oncogene homolog 2	9.6681771	0.1974059	-1.4912702	-5.1873788	-2.9605210	-0.5044018	-3.8942603	0.0395294	-1.6437948	9.4382367	0.0382217	-1.5321187	-1.0166569	-1.6722998	-4.419112	3.3229698	-2.7956007	-4.7779652	-3.4060046	-3.8623828	-4.0375686	-2.0625669	-2.3004945	-2.5666925	-3.0990083	-2.5872757	-1.9312488	-3.8269793	1.663244	3.8532872	-0.3267698	-2.8604262	5.800973	-2.5294155	-1.0382666	3.8864406	-3.252209	-4.2518779	-3.130366	-1.3969386	-5.8877598	-2.0577536	5.6355891	-1.7057955	2.9954430	-3.1124277	-4.9186357	-2.0364466	-2.7664681	-3.9371337	-4.4319692	-1.5375937	-2.6721316	-2.3043992	-1.2097795	5.7679423	-1.6064710	-2.4070324	-4.8045152	-2.5318622	-4.6822921	4.5678357	-2.0183325	-5.4210097	-4.5628057	-3.6858027	2.1030295	-3.3837721	-4.1406139	-2.4218252	0.9922713	-2.2582979	-4.7732583	-4.4040978	-4.3085582	-1.7707359	-3.6231040	-3.3223513	-2.7790493	-4.891209
ANLN	NP_061155	Homo sapiens	anillin	2.6314798	2.4196926	-1.9929070	-1.2331374	0.5063004	1.2729679	-2.0614914	2.6467575	-2.5217302	1.8116702	-0.6281206	0.1379779	-0.9401579	0.0978357	2.277710	-0.8507580	-0.8590910	-0.1474077	-2.3673312	-2.5219002	-2.9774460	-2.4013413	0.9135834	1.0701804	1.7774540	-0.9967100	-3.2248238	-1.2940675	-5.724120	-2.0054567	-2.9438454	1.2995179	-1.123591	-3.8607395	0.4369486	-2.0972983	-4.632905	-0.2078427	-1.529036	-0.0810965	-1.4423071	-0.0127178	-3.5188316	-1.4013057	-0.9938222	0.4439563	-1.7752551	1.1329399	1.3325606	0.8710945	-3.6102612	-2.1180287	-0.4744980	0.4276421	-2.0466530	2.6572120	-2.7978729	3.3323922	-3.5250578	0.1548888	1.2093644	0.2533888	1.3182641	2.9349431	0.8910694	-0.3005176	-1.9218166	2.0389086	3.1565180	-0.6842722	-0.4060767	-2.5316181	-0.7494104	-2.9166779	-0.1114830	2.3046312	-2.4329284	3.1849308	1.7683400	3.480372
CEP55	NP_060601	Homo sapiens	centrosomal protein 55kDa	-0.3173039	0.6305292	-0.0556467	-1.1093583	1.4279436	0.5671400	-1.8206347	0.9552121	-0.8821670	-0.7977076	-0.7006925	-0.8100972	0.4495732	0.2616204	3.126292	-3.0196238	0.1627432	0.2492397	-0.1692086	-1.7606384	-3.2786617	-1.6673300	1.7854020	0.5016478	0.5938466	0.5425471	-0.9369105	-2.7306978	-7.021533	-0.5587644	-2.3097522	-0.0973534	-1.602440	0.2035078	-1.8307597	-2.3437681	-6.134027	-1.2639016	-3.867773	0.2984296	-0.3976960	0.2786298	-5.4347278	-1.6584304	0.0733698	-0.8594052	-0.3218640	0.3547533	0.3762305	-0.3464132	-1.9757126	-1.9332372	-0.3710159	NA	NA	-0.3791312	-3.6783224	2.6008969	-1.7441913	0.1817788	0.6233302	0.0516833	0.3850539	-0.6043618	1.7109808	-0.2369110	-1.3013752	0.9072847	0.8819038	-0.3700742	2.1888442	-0.7901783	-0.1672522	-1.4578073	0.6539606	0.4521916	-0.7023668	0.1697525	0.1774231	-0.047013

Default heatmap function

prot_matrix <- as.matrix(prot[5:84])
rownames(prot_matrix) <- prot$GeneSymbol
heatmap(prot_matrix)

To remove clustering of samples, we can use the Rowv = NA option (but we don’t really want to do it). If you don’t like the default color scheme, it’s convenient to use predefined palettes from RColorBrewer or viridis packages.

heatmap(prot_matrix, Rowv = NA, col=viridis(11, option = "inferno"))

By default, heatmap() use hierarchical clustering complete method, but it’s possible to call a different method. We can also distinguish tumor subtypes by color.

hclust_ward <- function(x) { hclust(x, method="ward.D2") }


tumors <- mutate(tumors, color = case_when(tumor == "Basal-like" ~ "yellow", 
                             tumor == "HER2-enriched" ~ "green",
                             tumor == "Luminal A" ~ "blue",
                             tumor == "Luminal B" ~ "red"))

heatmap(prot_matrix,
        hclustfun=hclust_ward,
        col=viridis(11),
        ColSideColors=tumors$color)

Instead setting colors for tumor subtypes manually, we can use RColorBrewer.

cols <- palette(brewer.pal(4, name = "Set3"))[as.factor(tumors$tumor)]

heatmap(prot_matrix,
        col=viridis(11, option = "plasma"),
        ColSideColors=cols)

Gplots heatmap function

Gplots heatmap.2() function allows us to set more options.

library(gplots)

heatmap.2(prot_matrix,
          col = bluered,
          trace = "none",
          cexRow=0.6, cexCol = 0.25, # label font size
          srtCol=45 # label rotation
          )

par(cex.main=0.75)  # Shrink title fonts on plot
heatmap.2(prot_matrix,
          col = colorRampPalette(brewer.pal(9, "GnBu"))(100), # RColorbrewer palette
          scale = "column", # Z-score scaling within columns
          trace = "none",
          ColSideColors = cols,
          cexRow=1, cexCol = 0.4,
          reorderfun=function(d, w) reorder(d, w, agglo.FUN = mean) ## Reorder dendrogram by branch means rather than sums
)

Custom dendrograms (we can use them in the standard heatmap() function as well):

library(dendextend)

dend_col <- t(prot_matrix) %>% 
  dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
  color_branches(k=3)

dend_row <- prot_matrix %>% 
  dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
  color_branches(k=2)

heatmap.2(prot_matrix,
          col = plasma(225),
          trace = "none",
          Colv = dend_col,
          Rowv = dend_row,
          ColSideColors = cols,
          cexRow=1, cexCol = 0.4,
          na.color = "grey", # change color of NA values
          scale = "row",
          density.info = "none" # show only the color key
)

Add legend:

heatmap.2(prot_matrix,
          col = plasma(225),
          trace = "none",
          Colv = dend_col,
          Rowv = dend_row,
          dendrogram = "column", # show only dendrogram for columns
          ColSideColors = cols,
          cexRow=1, cexCol = 0.4,
          na.color = "grey", # change color of NA values
          scale = "row",
          density.info = "none" # show only the color key
)

legend(y=0.8, x=-0.05, xpd=TRUE,    # location of the legend on the heatmap plot
       legend = unique(tumors$tumor),
       col = unique(cols),  
       lty = 1,
       lwd = 10,
       cex = 0.6
       )

Pheatmap

library(pheatmap)

pheatmap(
  mat = prot_matrix,
  color = plasma(20),
  fontsize = 7
)

To create annotation, we need to create a separate dataframe, which contain the information. This dataframe can have one or more columns. Let’s create another column with random values to see multiple annotation. I fe don’t like the default color scheme for annotations, we can assign our own colors.

anot_col <- data.frame(subtype = tumors$tumor,
                       treatment = factor(sample(c("treatment","control"), 80, replace=T)))
rownames(anot_col) <- tumors$ID

anot_colors <- list(subtype = brewer.pal(4, "Set3"), treatment = c(treatment = "#998ec3", control = "#f1a340" ))
names(anot_colors$subtype) <- unique(tumors$tumor)

pheatmap(
  mat               = prot_matrix,
  color             = plasma(20),
  fontsize          = 8,
  main =  "I like pheatmap!", # plot title
  annotation_col = anot_col, 
  annotation_colors = anot_colors,
  cutree_cols = 4, # cut the heatmap according to clusters
  border_color = NA, # remove border,
  clustering_method = "ward.D2", # different clustering method
  fontsize_col = 4 # smaller font for column names
)

Complexheatmap

ComplexHeatmpat is available on Bioconductor, we can install it by

source("http://bioconductor.org/biocLite.R")
biocLite("ComplexHeatmap")

library(ComplexHeatmap)

## Loading required package: grid

## ========================================
## ComplexHeatmap version 1.17.1
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://bioconductor.org/packages/ComplexHeatmap/
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## ========================================

Heatmap(prot_matrix, 
        column_title = "Sample", row_title = "Gene",
        row_names_gp = gpar(fontsize = 7), # Text size for row names
        column_names_gp = gpar(fontsize = 4),
        name = "value", # legend title,
        col = colorRampPalette(brewer.pal(9, "RdBu"))(100)
)

As in previous cases, we can add annotations to heatmaps.

col = list(subtype = brewer.pal(4, "Set3"))
names(col$subtype) <- unique(tumors$tumor)

ha_column = HeatmapAnnotation(df = data.frame(subtype = tumors$tumor),
                              col=col) 
                             
Heatmap(prot_matrix, 
        column_title = "Samples", row_title = "Genes",
        row_names_gp = gpar(fontsize = 7), # Text size for row names
        column_names_gp = gpar(fontsize = 4),
        name = "Value", # legend title,
        col = viridis(100, option = "magma"),
        top_annotation = ha_column
)

We can also plot two heatmaps side by side:

h1 <- Heatmap(prot_matrix, 
        column_title = "Samples", row_title = "Genes",
        row_names_gp = gpar(fontsize = 7),
        column_names_gp = gpar(fontsize = 3),
        name = "map1", # legend title,
        col = viridis(100),
        clustering_method_rows = "ward.D2",
        clustering_method_columns = "ward.D2"
)

h2 <- Heatmap(prot_matrix, 
              column_title = "Samples", row_title = "Genes",
              row_names_gp = gpar(fontsize = 7), 
              column_names_gp = gpar(fontsize = 3),
              name = "map2", 
              col = rev(colorRampPalette(brewer.pal(9, "YlGnBu"))(100)),
              clustering_method_rows = "centroid")

draw(h1+h2)

D3heatmap

The d3heatmap package allows you to create an interactive heatmap widget, which allows the inspection of specific values by moving the mose over a cell.

library(d3heatmap)

d3heatmap(prot_matrix, 
          colors = "PRGn", 
          k_col = 3, 
          k_row = 3,
          scale = "row", 
          cexRow =  0.3
)

D3heatmap is not actively maintained anymore, so you may consider using a different package.

Heatmaply

Another package for creating of interactive heatmaps is heatmaply.

library(heatmaply)

# custom dendrograms
dend_col <- t(select(prot, -GeneSymbol, -`Gene Name`, -Species, -RefSeqProteinID)) %>% 
  dist %>% hclust(method = "ward.D") %>% as.dendrogram %>%
  color_branches(k=3)

row_clst <-  select(prot, -GeneSymbol, -`Gene Name`, -Species, -RefSeqProteinID) %>% 
  dist %>% hclust(method = "ward.D") # we are going to need row order to sort gene symbols
dend_row <- row_clst %>% as.dendrogram %>%
  color_branches(k=2)

heatmaply(select(prot, -`Gene Name`, -Species, -RefSeqProteinID, -GeneSymbol),
          col_side_colors = select(tumors, tumor),
          Colv = dend_col, # custom dendrogram
          Rowv = dend_row,
          labRow = rev(prot[row_clst$order, ]$GeneSymbol),
          showticklabels = c(F, T),
          margins = c(50,150,NA,0),
          plot_method = "plotly"
)

Heatmap in ggplot2

Making heatmaps in ggplot2 is not straightforward, there isn’t anything as geom_heatmap() in ggplot2. As ggplot relies on the tidy data idea, in whih rows are observations and columns are variables, which doesn’t work smoothly with data in a matrix form. Before plotting a heatmap, we have to tranform the data into a tidy dataframe.

If we want to cluster the data, we can use hclust() to reorder the matrix-like data before transforming it into the tidy form. Unfortunately there isn’t an easy way to add denrograms to ggplot plots. there is’t an easy way to add dendrograms to ggplot heatmap plots.

df <- prot[,5:84] #numerical columns

# reorder the data
row.order <- hclust(dist(df), method = "ward.D2")$order 
col.order <- hclust(dist(t(df)), method = "ward.D2")$order

df_new <- df[row.order, col.order]
df_new$GeneSymbol <- prot$GeneSymbol[row.order]
df_new$ProteinID <- prot$RefSeqProteinID[row.order]

# transform the data into the tidy form
df_gather <- gather(df_new, key = "ID", value = "Expression", -GeneSymbol, -ProteinID)

# use factors to keep the order during plotting
df_gather$ProteinID <- factor(df_gather$ProteinID, as.character(unique(df_gather$ProteinID)))
df_gather$ID <- factor(df_gather$ID, as.character((unique(df_gather$ID))))

#We have to use RefSeq Protein ID, as GeneSymbol doesn't have unique values
ggplot(df_gather, aes(x = ID, y= ProteinID, fill=Expression))+ 
  geom_raster() +
  scale_y_discrete(breaks= df_new$ProteinID, labels=df_new$GeneSymbol) + # change y axis labels
  scale_fill_viridis(option = "plasma") +
  ylab("Gene") + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle=90, size = 4, hjust = 1),
        legend.title = element_blank())

Visualization Module - Heatmaps

CCP - Bioinformatics Unit

January 17, 2017