--- title: "Visualization Module - Heatmaps" author: "CCP - Bioinformatics Unit" date: "January 17, 2017" output: html_document: toc: true toc_float: true --- A heatmap is a method for a visualizing of data, in which a table of individual values is represented as a grid of colored cells. In clustered heatmaps, the rows and columns of the grid are ordered to show patterns and cen be accompanied by dendrograms. In molecular biology, heatmaps are often use to visualize the level of gene expression across a number of samples. To create heapmaps we are going to use a subset of breast cancer proteome profiling [dataset](https://www.kaggle.com/piotrgrabo/breastcancerproteomes). First of all we need to load the required packages and open the dataset. The dataset is stored in `.data/proteomes.csv`. Tumor subtypes for individual samples are stored in `.data/tumors.csv`. ```{r setup, message = FALSE, warning = FALSE} library(tidyverse) library(viridis) library(RColorBrewer) prot <- read_csv("./data/proteomes.csv") tumors <- read_csv("./data/tumors.csv") ``` ```{r message = FALSE, warning = FALSE, echo=FALSE, results='asis'} library(knitr) library(kableExtra) kable(head(prot), "html") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>% scroll_box(width = "750px", height = "450px") ``` ----- ## Default heatmap function ```{r heatmap, message = FALSE, warning = FALSE} prot_matrix <- as.matrix(prot[5:84]) rownames(prot_matrix) <- prot$GeneSymbol heatmap(prot_matrix) ``` To remove clustering of samples, we can use the `Rowv = NA` option (but we don't really want to do it). If you don't like the default color scheme, it's convenient to use predefined palettes from `RColorBrewer` or `viridis` packages. ```{r heatmap_2, message = FALSE, warning = FALSE} heatmap(prot_matrix, Rowv = NA, col=viridis(11, option = "inferno")) ``` By default, heatmap() use hierarchical clustering complete method, but it's possible to call a different method. We can also distinguish tumor subtypes by color. ```{r heatmap_3, message = FALSE, warning = FALSE} hclust_ward <- function(x) { hclust(x, method="ward.D2") } tumors <- mutate(tumors, color = case_when(tumor == "Basal-like" ~ "yellow", tumor == "HER2-enriched" ~ "green", tumor == "Luminal A" ~ "blue", tumor == "Luminal B" ~ "red")) heatmap(prot_matrix, hclustfun=hclust_ward, col=viridis(11), ColSideColors=tumors$color) ``` Instead setting colors for tumor subtypes manually, we can use `RColorBrewer`. ```{r, message=FALSE, warning = FALSE} cols <- palette(brewer.pal(4, name = "Set3"))[as.factor(tumors$tumor)] heatmap(prot_matrix, col=viridis(11, option = "plasma"), ColSideColors=cols) ``` ----- ## Gplots heatmap function Gplots heatmap.2() function allows us to set more options. ```{r, message=FALSE, warning = FALSE} library(gplots) heatmap.2(prot_matrix, col = bluered, trace = "none", cexRow=0.6, cexCol = 0.25, # label font size srtCol=45 # label rotation ) ``` ```{r, message=FALSE, warning = FALSE} par(cex.main=0.75) # Shrink title fonts on plot heatmap.2(prot_matrix, col = colorRampPalette(brewer.pal(9, "GnBu"))(100), # RColorbrewer palette scale = "column", # Z-score scaling within columns trace = "none", ColSideColors = cols, cexRow=1, cexCol = 0.4, reorderfun=function(d, w) reorder(d, w, agglo.FUN = mean) ## Reorder dendrogram by branch means rather than sums ) ``` Custom dendrograms (we can use them in the standard heatmap() function as well): ```{r, message=FALSE, warning = FALSE} library(dendextend) dend_col <- t(prot_matrix) %>% dist %>% hclust(method = "ward.D") %>% as.dendrogram %>% color_branches(k=3) dend_row <- prot_matrix %>% dist %>% hclust(method = "ward.D") %>% as.dendrogram %>% color_branches(k=2) heatmap.2(prot_matrix, col = plasma(225), trace = "none", Colv = dend_col, Rowv = dend_row, ColSideColors = cols, cexRow=1, cexCol = 0.4, na.color = "grey", # change color of NA values scale = "row", density.info = "none" # show only the color key ) ``` Add legend: ```{r, message=FALSE, warning = FALSE} heatmap.2(prot_matrix, col = plasma(225), trace = "none", Colv = dend_col, Rowv = dend_row, dendrogram = "column", # show only dendrogram for columns ColSideColors = cols, cexRow=1, cexCol = 0.4, na.color = "grey", # change color of NA values scale = "row", density.info = "none" # show only the color key ) legend(y=0.8, x=-0.05, xpd=TRUE, # location of the legend on the heatmap plot legend = unique(tumors$tumor), col = unique(cols), lty = 1, lwd = 10, cex = 0.6 ) ``` ----- ## Pheatmap ```{r pheatmap, message=FALSE, warning = FALSE} library(pheatmap) pheatmap( mat = prot_matrix, color = plasma(20), fontsize = 7 ) ``` To create annotation, we need to create a separate dataframe, which contain the information. This dataframe can have one or more columns. Let's create another column with random values to see multiple annotation. I fe don't like the default color scheme for annotations, we can assign our own colors. ```{r, message=FALSE, warning = FALSE} anot_col <- data.frame(subtype = tumors$tumor, treatment = factor(sample(c("treatment","control"), 80, replace=T))) rownames(anot_col) <- tumors$ID anot_colors <- list(subtype = brewer.pal(4, "Set3"), treatment = c(treatment = "#998ec3", control = "#f1a340" )) names(anot_colors$subtype) <- unique(tumors$tumor) pheatmap( mat = prot_matrix, color = plasma(20), fontsize = 8, main = "I like pheatmap!", # plot title annotation_col = anot_col, annotation_colors = anot_colors, cutree_cols = 4, # cut the heatmap according to clusters border_color = NA, # remove border, clustering_method = "ward.D2", # different clustering method fontsize_col = 4 # smaller font for column names ) ``` ----- ## Complexheatmap `ComplexHeatmpat` is available on Bioconductor, we can install it by ```{r MESSAGE=FALSE, WARNING=FALSE, eval= FALSE} source("http://bioconductor.org/biocLite.R") biocLite("ComplexHeatmap") ``` ```{r MESSAGE=FALSE, WARNING=FALSE} library(ComplexHeatmap) Heatmap(prot_matrix, column_title = "Sample", row_title = "Gene", row_names_gp = gpar(fontsize = 7), # Text size for row names column_names_gp = gpar(fontsize = 4), name = "value", # legend title, col = colorRampPalette(brewer.pal(9, "RdBu"))(100) ) ``` As in previous cases, we can add annotations to heatmaps. ```{r MESSAGE = FALSE, WARNING = FALSE} col = list(subtype = brewer.pal(4, "Set3")) names(col$subtype) <- unique(tumors$tumor) ha_column = HeatmapAnnotation(df = data.frame(subtype = tumors$tumor), col=col) Heatmap(prot_matrix, column_title = "Samples", row_title = "Genes", row_names_gp = gpar(fontsize = 7), # Text size for row names column_names_gp = gpar(fontsize = 4), name = "Value", # legend title, col = viridis(100, option = "magma"), top_annotation = ha_column ) ``` We can also plot two heatmaps side by side: ```{r message=FALSE, warning = FALSE} h1 <- Heatmap(prot_matrix, column_title = "Samples", row_title = "Genes", row_names_gp = gpar(fontsize = 7), column_names_gp = gpar(fontsize = 3), name = "map1", # legend title, col = viridis(100), clustering_method_rows = "ward.D2", clustering_method_columns = "ward.D2" ) h2 <- Heatmap(prot_matrix, column_title = "Samples", row_title = "Genes", row_names_gp = gpar(fontsize = 7), column_names_gp = gpar(fontsize = 3), name = "map2", col = rev(colorRampPalette(brewer.pal(9, "YlGnBu"))(100)), clustering_method_rows = "centroid") draw(h1+h2) ``` ----- ## D3heatmap The d3heatmap package allows you to create an interactive heatmap widget, which allows the inspection of specific values by moving the mose over a cell. ```{r, message=FALSE, warning = FALSE} library(d3heatmap) d3heatmap(prot_matrix, colors = "PRGn", k_col = 3, k_row = 3, scale = "row", cexRow = 0.3 ) ``` D3heatmap is not actively maintained anymore, so you may consider using a different package. ----- ## Heatmaply Another package for creating of interactive heatmaps is heatmaply. ```{r heatmaply, message=FALSE, warning = FALSE} library(heatmaply) # custom dendrograms dend_col <- t(select(prot, -GeneSymbol, -`Gene Name`, -Species, -RefSeqProteinID)) %>% dist %>% hclust(method = "ward.D") %>% as.dendrogram %>% color_branches(k=3) row_clst <- select(prot, -GeneSymbol, -`Gene Name`, -Species, -RefSeqProteinID) %>% dist %>% hclust(method = "ward.D") # we are going to need row order to sort gene symbols dend_row <- row_clst %>% as.dendrogram %>% color_branches(k=2) heatmaply(select(prot, -`Gene Name`, -Species, -RefSeqProteinID, -GeneSymbol), col_side_colors = select(tumors, tumor), Colv = dend_col, # custom dendrogram Rowv = dend_row, labRow = rev(prot[row_clst$order, ]$GeneSymbol), showticklabels = c(F, T), margins = c(50,150,NA,0), plot_method = "plotly" ) ``` ----- ## Heatmap in ggplot2 Making heatmaps in `ggplot2` is not straightforward, there isn't anything as `geom_heatmap()` in `ggplot2`. As ggplot relies on the tidy data idea, in whih rows are observations and columns are variables, which doesn't work smoothly with data in a matrix form. Before plotting a heatmap, we have to tranform the data into a tidy dataframe. If we want to cluster the data, we can use `hclust()` to reorder the matrix-like data before transforming it into the tidy form. Unfortunately there isn't an easy way to add denrograms to `ggplot` plots. there is't an easy way to add dendrograms to ggplot heatmap plots. ```{r, MESSAGE = FALSE, WARNING = FALSE} df <- prot[,5:84] #numerical columns # reorder the data row.order <- hclust(dist(df), method = "ward.D2")$order col.order <- hclust(dist(t(df)), method = "ward.D2")$order df_new <- df[row.order, col.order] df_new$GeneSymbol <- prot$GeneSymbol[row.order] df_new$ProteinID <- prot$RefSeqProteinID[row.order] # transform the data into the tidy form df_gather <- gather(df_new, key = "ID", value = "Expression", -GeneSymbol, -ProteinID) # use factors to keep the order during plotting df_gather$ProteinID <- factor(df_gather$ProteinID, as.character(unique(df_gather$ProteinID))) df_gather$ID <- factor(df_gather$ID, as.character((unique(df_gather$ID)))) #We have to use RefSeq Protein ID, as GeneSymbol doesn't have unique values ggplot(df_gather, aes(x = ID, y= ProteinID, fill=Expression))+ geom_raster() + scale_y_discrete(breaks= df_new$ProteinID, labels=df_new$GeneSymbol) + # change y axis labels scale_fill_viridis(option = "plasma") + ylab("Gene") + theme_minimal() + theme(axis.text.x = element_text(angle=90, size = 4, hjust = 1), legend.title = element_blank()) ```