R繪圖基礎（四）熱圖 heatmap ← 糗世界

panhoy 2014-11-03

展開全文

轉載自糗世界

我們在分析了差異表達數(shù)據(jù)之后，經(jīng)常要生成一種直觀圖－－熱圖(heatmap)。這一節(jié)就以基因芯片數(shù)據(jù)為例，示例生成高品質的熱圖。

比如

鋼藍漸白配色的熱圖

首先還是從最簡單的heatmap開始。

> library(ggplot2)
> library(ALL) #可以使用biocLite("ALL")安裝該數(shù)據(jù)包
> data("ALL")
> library(limma)
> eset<-ALL[,ALL$mol.biol %in% c("BCR/ABL","ALL1/AF4")]
> f<-factor(as.character(eset$mol.biol))
> design<-model.matrix(~f)
> fit<-eBayes(lmFit(eset,design)) #對基因芯片數(shù)據(jù)進行分析，得到差異表達的數(shù)據(jù)
> selected  <- p.adjust(fit$p.value[, 2]) <0.001 
> esetSel <- eset[selected,] #選擇其中一部分繪制熱圖
> dim(esetSel) #從這尺度上看，數(shù)目并不多，但也不少。如果基因數(shù)過多，可以分兩次做圖。
Features  Samples 
      84       47 
> library(hgu95av2.db)
> data<-exprs(esetSel)
> probes<-rownames(data)
> symbol<-mget(probes,hgu95av2SYMBOL,ifnotfound=NA)
> symbol<-do.call(rbind,symbol)
> symbol[is.na(symbol[,1]),1]<-rownames(symbol)[is.na(symbol[,1])]
> rownames(data)<-symbol[probes,1] #給每行以基因名替換探針名命名，在繪制熱圖時直接顯示基因名。
> heatmap(data,cexRow=0.5)

使用heatmap函數(shù)默認顏色生成的熱圖

這個圖有三個部分，樣品分枝樹圖和基因分枝樹圖，以及熱圖本身。之所以對樣品進行聚類分析排序，是因為這次的樣品本身并沒有分組。如果有分組的話，那么可以關閉對樣品的聚類分析。對基因進行聚類分析排序，主要是為了色塊好看，其實可以選擇不排序，或者使用GO聚類分析排序。上面的這種熱圖，方便簡單，效果非常不錯。

接下來我們假設樣品是分好組的，那么我們想用不同的顏色來把樣品組標記出來，那么我們可以使用ColSideColors參數(shù)來實現(xiàn)。同時，我們希望變更熱圖的漸變填充色，可以使用col參數(shù)來實現(xiàn)。

> color.map <- function(mol.biol) { if (mol.biol=="ALL1/AF4") "#FF0000" else "#0000FF" }
> patientcolors <- unlist(lapply(esetSel$mol.bio, color.map))
> heatmap(data, col=topo.colors(100), ColSideColors=patientcolors, cexRow=0.5)

使用heatmap函數(shù)top.colors填充生成的熱圖

在heatmap函數(shù)中，樣品分組只能有一種，如果樣品分組有多次分組怎么辦？heatmap.plus就是來解決這個問題的。它們的參數(shù)都一致，除了ColSideColors和RowSideColors。heatmap使用是一維數(shù)組，而heatmap.plus使用的是字符矩陣來設置這兩個參數(shù)。

> library(heatmap.plus)
> hc<-hclust(dist(t(data)))
> dd.col<-as.dendrogram(hc)
> groups <- cutree(hc,k=5)
> color.map <- function(mol.biol) { if (mol.biol=="ALL1/AF4") 1 else 2 }
> patientcolors <- unlist(lapply(esetSel$mol.bio, color.map))
> col.patientcol<-rbind(groups,patientcolors)
> mode(col.patientcol)<-"character"
> heatmap.plus(data,ColSideColors=t(col.patientcol),cexRow=0.5)

使用heatmap.plus繪制熱圖

這樣繪圖的不足是沒有熱圖色key值。gplots中的heatmap.2為我們解決了這個問題。而且它帶來了更多的預設填充色。下面就是幾個例子。

> library("gplots")
> heatmap.2(data, col=redgreen(75), scale="row", ColSideColors=patientcolors,
+            key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5)

使用heatmap.2函數(shù),readgreen漸變色填充生成的熱圖

> heatmap.2(data, col=heat.colors(100), scale="row", ColSideColors=patientcolors,
+            key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5)
> heatmap.2(data, col=terrain.colors(100), scale="row", ColSideColors=patientcolors,
+            key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5)
> heatmap.2(data, col=cm.colors(100), scale="row", ColSideColors=patientcolors,
+            key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5)
> heatmap.2(data, col=redblue(100), scale="row", ColSideColors=patientcolors,
+            key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5)
> heatmap.2(data, col=colorpanel(100,low="white",high="steelblue"), scale="row", ColSideColors=patientcolors,
+            key=TRUE, keysize=1, symkey=FALSE, density.info="none", trace="none", cexRow=0.5)

使用heatmap.2函數(shù),heat.colors漸變色填充生成的熱圖

使用heatmap.2函數(shù),terrain.colors漸變色填充生成的熱圖

使用heatmap.2函數(shù),cm.colors漸變色填充生成的熱圖

使用heatmap.2函數(shù),redblue漸變色填充生成的熱圖

使用heatmap.2函數(shù),colorpanel漸變色填充生成的熱圖

然而，以上的heatmap以及heatmap.2雖然方便簡單，效果也很不錯，可以使用colorpanel方便的設置漸變填充色，但是它的布局沒有辦法改變，生成的效果圖顯得有點呆板，不簡潔。為此這里介紹如何使用ggplot2當中的geom_tile來為基因芯片繪制理想的熱圖。

> library(ggplot2)
> hc<-hclust(dist(data))
> rowInd<-hc$order
> hc<-hclust(dist(t(data)))
> colInd<-hc$order
> data.m<-data[rowInd,colInd] #聚類分析的作用是為了色塊集中，顯示效果好。如果本身就對樣品有分組，基因有排序，就可以跳過這一步。
> data.m<-apply(data.m,1,rescale) #以行為基準對數(shù)據(jù)進行變換，使每一行都變成［0,1］之間的數(shù)字。變換的方法可以是scale,rescale等等，按照自己的需要來變換。
> data.m<-t(data.m) #變換以后轉置了。
> coln<-colnames(data.m) 
> rown<-rownames(data.m) #保存樣品及基因名稱。因為geom_tile會對它們按坐標重排，所以需要使用數(shù)字把它們的序列固定下來。
> colnames(data.m)<-1:ncol(data.m)
> rownames(data.m)<-1:nrow(data.m)
> data.m<-melt(data.m) #轉換數(shù)據(jù)成適合geom_tile使用的形式
> head(data.m)
  X1 X2     value
1  1  1 0.1898007
2  2  1 0.6627467
3  3  1 0.5417057
4  4  1 0.4877054
5  5  1 0.5096474
6  6  1 0.2626248
> base_size<-12 #設置默認字體大小，依照樣品或者基因的多少而微變。
> (p <- ggplot(data.m, aes(X2, X1)) + geom_tile(aes(fill = value), #設定橫坐標為以前的列，縱坐標為以前的行，填充色為轉換后的數(shù)據(jù)
+      colour = "white") + scale_fill_gradient(low = "white", #設定漸變色的低值為白色，變值為鋼藍色。
+      high = "steelblue"))
> p + theme_grey(base_size = base_size) + labs(x = "", #設置xlabel及ylabel為空
+      y = "") + scale_x_continuous(expand = c(0, 0),labels=coln,breaks=1:length(coln)) + #設置x坐標擴展部分為0，刻度為之前的樣品名
+      scale_y_continuous(expand = c(0, 0),labels=rown,breaks=1:length(rown)) + opts( #設置y坐標擴展部分為0，刻度為之前的基因名
+      axis.ticks = theme_blank(), axis.text.x = theme_text(size = base_size *  #設置坐標字體為基準的0.8倍，貼近坐標對節(jié)，x坐標旋轉90度，色彩為中灰
+      0.8, angle = 90, hjust = 0, colour = "grey50"), axis.text.y = theme_text(
+      size = base_size * 0.8, hjust=1, colour="grey50"))

使用ggplot2中geom_tile函數(shù),鋼藍漸白配色的熱圖

也可以很輕松的實現(xiàn)傳統(tǒng)漸變填充色，紅黃漸變。

> (p <- ggplot(data.m, aes(X2, X1)) + geom_tile(aes(fill = value),
+      colour = "white") + scale_fill_gradient(low = "yellow",
+      high = "red"))
> p + theme_grey(base_size = base_size) + labs(x = "",
+      y = "") + scale_x_continuous(expand = c(0, 0),labels=coln,breaks=1:length(coln)) +
+      scale_y_continuous(expand = c(0, 0),labels=rown,breaks=1:length(rown)) + opts(
+      axis.ticks = theme_blank(), axis.text.x = theme_text(size = base_size *
+      0.8, angle = 90, hjust = 0, colour = "grey50"), axis.text.y = theme_text(
+      size = base_size * 0.8, hjust=1, colour="grey50"))

使用ggplot2中geom_tile函數(shù)，紅黃漸變填充的熱圖

使用紅綠漸變填充。

> (p <- ggplot(data.m, aes(X2, X1)) + geom_tile(aes(fill = value),
+      colour = "white") + scale_fill_gradient(low = "green",
+      high = "red"))
> p + theme_grey(base_size = base_size) + labs(x = "",
+      y = "") + scale_x_continuous(expand = c(0, 0),labels=coln,breaks=1:length(coln)) +
+      scale_y_continuous(expand = c(0, 0),labels=rown,breaks=1:length(rown)) + opts(
+      axis.ticks = theme_blank(), axis.text.x = theme_text(size = base_size *
+      0.8, angle = 90, hjust = 0, colour = "grey50"), axis.text.y = theme_text(
+      size = base_size * 0.8, hjust=1, colour="grey50"))

使用ggplot2中geom_tile函數(shù)，紅綠漸變填充的熱圖

使用綠白漸變填充。

> (p <- ggplot(data.m, aes(X2, X1)) + geom_tile(aes(fill = value),
+      colour = "white") + scale_fill_gradient(low = "seagreen",
+      high = "white"))
> p + theme_grey(base_size = base_size) + labs(x = "",
+      y = "") + scale_x_continuous(expand = c(0, 0),labels=coln,breaks=1:length(coln)) +
+      scale_y_continuous(expand = c(0, 0),labels=rown,breaks=1:length(rown)) + opts(
+      axis.ticks = theme_blank(), axis.text.x = theme_text(size = base_size *
+      0.8, angle = 90, hjust = 0, colour = "grey50"), axis.text.y = theme_text(
+      size = base_size * 0.8, hjust=1, colour="grey50"))

使用ggplot2中geom_tile函數(shù)，綠白漸變填充的熱圖

使用棕白漸變填充。

> (p <- ggplot(data.m, aes(X2, X1)) + geom_tile(aes(fill = value),
+      colour = "white") + scale_fill_gradient(low = "white",
+      high = "sienna4"))
> p + theme_grey(base_size = base_size) + labs(x = "",
+      y = "") + scale_x_continuous(expand = c(0, 0),labels=coln,breaks=1:length(coln)) +
+      scale_y_continuous(expand = c(0, 0),labels=rown,breaks=1:length(rown)) + opts(
+      axis.ticks = theme_blank(), axis.text.x = theme_text(size = base_size *
+      0.8, angle = 90, hjust = 0, colour = "grey50"), axis.text.y = theme_text(
+      size = base_size * 0.8, hjust=1, colour="grey50"))

使用ggplot2中geom_tile函數(shù)，棕白漸變填充的熱圖

使用灰階填充。

> (p <- ggplot(data.m, aes(X2, X1)) + geom_tile(aes(fill = value),
+      colour = "white") + scale_fill_gradient(low = "black",
+      high = "gray85"))
> p + theme_grey(base_size = base_size) + labs(x = "",
+      y = "") + scale_x_continuous(expand = c(0, 0),labels=coln,breaks=1:length(coln)) +
+      scale_y_continuous(expand = c(0, 0),labels=rown,breaks=1:length(rown)) + opts(
+      axis.ticks = theme_blank(), axis.text.x = theme_text(size = base_size *
+      0.8, angle = 90, hjust = 0, colour = "grey50"), axis.text.y = theme_text(
+      size = base_size * 0.8, hjust=1, colour="grey50"))

使用ggplot2中geom_tile函數(shù)，灰色漸變填充的熱圖

除了ggplot2，還有l(wèi)attice也是不錯的選擇。我只使用一種填充色，生成兩個圖，以作示例。

> hc<-hclust(dist(data))
> dd.row<-as.dendrogram(hc)
> row.ord<-order.dendrogram(dd.row) #介紹另一種獲得排序的辦法
> hc<-hclust(dist(t(data)))
> dd.col<-as.dendrogram(hc)
> col.ord<-order.dendrogram(dd.col)
> data.m<-data[row.ord,col.ord]
> library(ggplot2)
> data.m<-apply(data.m,1,rescale) #rescale是ggplot2當中的一個函數(shù)
> library(lattice)
> levelplot(data.m,
+           aspect = "fill",xlab="",ylab="",
+           scales = list(x = list(rot = 90, cex=0.8),y=list(cex=0.5)),
+           colorkey = list(space = "left"),col.regions = heat.colors)
> library(latticeExtra)
> levelplot(data.m,
+           aspect = "fill",xlab="",ylab="",
+           scales = list(x = list(rot = 90, cex=0.5),y=list(cex=0.4)),
+           colorkey = list(space = "left"),col.regions = heat.colors,
+           legend =
+           list(right =
+                list(fun = dendrogramGrob, #dendrogramGrob是latticeExtra中繪制樹型圖的一個函數(shù)
+                     args =
+                     list(x = dd.row, ord = row.ord,
+                          side = "right",
+                          size = 5)),
+                top =
+                list(fun = dendrogramGrob,
+                     args =
+                     list(x = dd.col, 
+                          side = "top",
+                          type = "triangle")))) #使用三角型構圖

使用lattice中的levelplot函數(shù)，heat.colors填充繪制熱圖

使用lattice中的levelplot函數(shù)，heat.colors填充，dendrogramGrob繪樹型，繪制熱圖

可是可是，繪制一個漂亮的熱圖這么難么？參數(shù)如此之多，設置如此復雜，色彩還需要自己指定。有沒有簡單到發(fā)指的函數(shù)呢？有！那就是pheatmap，全稱pretty heatmaps.

> library(pheatmap)
> pheatmap(data,fontsize=9, fontsize_row=6) #最簡單地直接出圖
> pheatmap(data, scale = "row", clustering_distance_row = "correlation", fontsize=9, fontsize_row=6) #改變排序算法
> pheatmap(data, color = colorRampPalette(c("navy", "white", "firebrick3"))(50), fontsize=9, fontsize_row=6) #自定義顏色
> pheatmap(data, cluster_row=FALSE, fontsize=9, fontsize_row=6) #關閉按行排序
> pheatmap(data, legend = FALSE, fontsize=9, fontsize_row=6) #關閉圖例
> pheatmap(data, cellwidth = 6, cellheight = 5, fontsize=9, fontsize_row=6) #設定格子的尺寸
> color.map <- function(mol.biol) { if (mol.biol=="ALL1/AF4") 1 else 2 }
> patientcolors <- unlist(lapply(esetSel$mol.bio, color.map))
> hc<-hclust(dist(t(data)))
> dd.col<-as.dendrogram(hc)
> groups <- cutree(hc,k=7)
> annotation<-data.frame(Var1=factor(patientcolors,labels=c("class1","class2")),Var2=groups)
> pheatmap(data, annotation=annotation, fontsize=9, fontsize_row=6) #為樣品分組
> Var1 = c("navy", "skyblue")
> Var2 = c("snow", "steelblue")
> names(Var1) = c("class1", "class2")
> ann_colors = list(Var1 = Var1, Var2 = Var2)
> pheatmap(data, annotation=annotation, annotation_colors = ann_colors, fontsize=9, fontsize_row=6) #為分組的樣品設定顏色