更多實(shí)戰(zhàn)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | conc < - c( 2.856829 , 5.005303 , 7.519473 , 22.101664 , 27.769976 , 39.198025 , 45.483269 , 203.784238 ) rate < - c( 14.58342 , 24.74123 , 31.34551 , 72.96985 , 77.50099 , 96.08794 , 96.96624 , 108.88374 ) L.minor < - data.frame(conc, rate) L.minor.m1 < - nls(rate ~ Vm * conc / (K + conc), data = L.minor, #采用M-M動(dòng)力學(xué)方程 start = list (K = 20 , Vm = 120 ), #初始值設(shè)置為K=20,Vm=120 trace = TRUE) #占線擬合過(guò)程 #確定x軸范圍并構(gòu)建數(shù)據(jù)集 min < - range (L.minor$conc)[ 1 ] max < - range (L.minor$conc)[ 2 ] line.data < - data.frame(conc = seq( min , max , length.out = 1000 )) #用模型預(yù)測(cè)數(shù)據(jù)構(gòu)建數(shù)據(jù)集 line.data$p.predict < - predict(L.minor.m1, newdata = line.data) require(ggplot2) M_Mfunction < - ggplot() + geom_point(aes(x = conc, y = rate), data = L.minor, alpha = 0.5 , size = 5 , color = "red" ) + geom_line(aes(x = conc, y = p.predict), data = line.data, size = 1 , color = "blue" ) + scale_x_continuous( name = expression(Substrate ~~ concentration(mmol ~~ m^ 3 )), #采用expression來(lái)表示數(shù)學(xué)公式 breaks = seq( 0 , 200 , by = 25 )) + scale_y_continuous( name = "Uptake rate (weight/h)" , breaks = seq( 0 , 120 , by = 10 )) + geom_text(aes(x = 100 , y = 60 ), label = "bolditalic(f(list(x, (list(K, V[m])))) == frac(V[m]%.%x, K+x))" , #注意 geom_text中如果用expression()來(lái)進(jìn)行表達(dá),必須開(kāi)啟parse = TRUE #同時(shí)以字符串""的形式表示,不能使用expression parse = TRUE, size = 5 , family = "times" ) + theme_bw() + theme( axis.title.x = element_text(size = 16 ), axis.title.y = element_text(size = 16 ), axis.text.x = element_text(size = 12 ), axis.text.y = element_text(size = 12 )) |
熱圖是一種極好的數(shù)據(jù)可視化方式,能夠清楚的顯示出多維數(shù)據(jù)之間的關(guān)聯(lián)性和差異性,糗世界已經(jīng)為我們展現(xiàn)了R里面所常用的heatmap,ggplot2和lattice3種熱圖繪制方式,當(dāng)然隨著R的不斷進(jìn)步,已經(jīng)有多種包提供了更豐富和更簡(jiǎn)單的熱圖繪制方式,例如gplots中的heatmap.2,pheatmap,heatmap.plus等等。ggplot2進(jìn)行熱圖的繪制也十分方便,熱圖的關(guān)鍵是聚類,兩個(gè)可行的方案是對(duì)聚類結(jié)果進(jìn)行排序和將聚類結(jié)果因子化后固定,通過(guò)結(jié)合plyr包,可以很方便的實(shí)現(xiàn)。這里采用一組來(lái)源于WHO國(guó)家數(shù)據(jù)來(lái)對(duì)熱圖的繪制進(jìn)行,首先數(shù)據(jù)標(biāo)準(zhǔn)化和正態(tài)化后按Index的D(為各國(guó)的人口數(shù)據(jù))進(jìn)行排序,再將其因子化后固定,用geom_tile()進(jìn)行熱圖的繪制,在ggplot2種已能通過(guò)scale_fill_gradient2在三種基本色進(jìn)行漸變。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | WHO< - read.csv( "WHO.csv" , header = TRUE) require(plyr) #按總?cè)丝跀?shù)排列數(shù)據(jù) WHO< - arrange(WHO, desc(D)) #將數(shù)據(jù)的名字轉(zhuǎn)換為因子,并固定已拍好的country, #同理可以按照聚類的結(jié)果進(jìn)行排列 WHO< - transform(WHO, Country = factor(Country, levels = unique(Country))) require(reshape2) require(ggplot2) require(scales) require(grid) #melt數(shù)據(jù) m.WHO < - melt(WHO) #標(biāo)準(zhǔn)化,每排數(shù)據(jù)映射到按最小值和最大值映射到(0,1)區(qū)間 m.WHO < - ddply(m.WHO, .(variable), transform, rescale = rescale(value)) #標(biāo)準(zhǔn)化并正態(tài)化數(shù)據(jù) s.WHO < - ddply(m.WHO, .(variable), transform, rescale = scale(value)) require(ggplot2) p< - ggplot(s.WHO, aes(variable, Country)) + #用tile來(lái)進(jìn)行繪熱力圖 geom_tile(aes(fill = rescale)) + scale_fill_gradient2(mid = "black" , high = "red" , low = "green" , name = "Intensity" ) + labs(x = "Country" , y = "Index" , face = "bold" ) + theme_bw() + theme( axis.title.x = element_text(size = 16 ), axis.title.y = element_text(size = 16 ), axis.text.x = element_text(size = 12 , colour = "grey50" ), axis.text.y = element_text(size = 12 , colour = "grey50" ), legend.title = element_text(size = 14 ), legend.text = element_text(size = 12 ), legend.key.size = unit( 0.8 , "cm" )) #需要載入grid包來(lái)調(diào)整legend的大小 |
相信很多人都被Hans Rosling在TED和BCC展現(xiàn)的動(dòng)態(tài)散點(diǎn)圖所驚艷到,這是一種多維數(shù)據(jù)展現(xiàn)方式,并成功的加入了時(shí)間這一維度,各路牛人都用不同的手段進(jìn)行了實(shí)現(xiàn),精彩的作品例如d3.js,和基于google charts API的googlevis。統(tǒng)計(jì)之都的魔王大人也用ggplot2結(jié)合animation包和ffmpeg進(jìn)行了繪制。但ggplot2生成動(dòng)態(tài)圖比較簡(jiǎn)陋,主要的原理是一次輸出多張按年份排列的圖片,再將這些圖片按順序結(jié)合生成視頻或動(dòng)態(tài)圖。在dataguru的課程上我使用循環(huán)并結(jié)合paste循環(huán)輸出了如下的動(dòng)態(tài)圖,詳細(xì)的代碼請(qǐng)移步
火山圖是散點(diǎn)圖的一種,能夠快速的辨別出大型數(shù)據(jù)集重復(fù)變量之間的差異,具體的介紹可以參考wiki和Colin Gillespie的博客,下面的代碼和圖是使用ggplot2的實(shí)現(xiàn)方式。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | require(ggplot2) ##change theme## old_theme < - theme_update( axis.ticks = element_line(colour = "black" ), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(size = 0.5 ) ) ##Highlight genes that have an absolute fold change > 2 and a p-value < Bonferroni cut-off a < - read.table( "flu.txt" ,header = TRUE,sep = "\t" ,) P.Value < - c(a$P.Value) FC < - c(a$FC) df < - data.frame(P.Value, FC) df.G < - subset(df, log2(FC) < - 1 & P.Value < 0.05 ) #define Green df.G < - cbind(df.G, rep( 1 , nrow(df.G))) colnames(df.G)[ 3 ] < - "Color" df.B < - subset(df, (log2(FC) > = - 1 & log2(FC) < = 1 ) | P.Value > = 0.05 ) #define Black df.B < - cbind(df.B, rep( 2 , nrow(df.B))) colnames(df.B)[ 3 ] < - "Color" df.R < - subset(df, log2(FC) > 1 & P.Value < 0.05 ) #define Red df.R < - cbind(df.R, rep( 3 , nrow(df.R))) colnames(df.R)[ 3 ] < - "Color" df.t < - rbind(df.G, df.B, df.R) df.t$Color < - as.factor(df.t$Color) ##Construct the plot object ggplot(data = df.t, aes(x = log2(FC), y = - log10(P.Value), color = Color )) + geom_point(alpha = 0.5 , size = 1.75 ) + theme( legend.position = "none" ) + xlim(c( - 5 , 5 )) + ylim(c( 0 , 20 )) + scale_color_manual(values = c( "green" , "black" , "red" )) + labs(x = expression(log[ 2 ](FC)), y = expression( - log[ 10 ](P.Value))) + theme(axis.title.x = element_text(size = 20 ), axis.text.x = element_text(size = 15 )) + theme(axis.title.y = element_text(size = 20 ), axis.text.y = element_text(size = 15 )) |
繪圖完成后最后一步便是圖片輸出,高質(zhì)量的圖片輸出讓人賞心悅目,而不正確的輸出方式或者直接采用截圖的方式從圖形設(shè)備中截取,得到的圖片往往是低劣的。一幅高質(zhì)量的圖片應(yīng)當(dāng)控制圖片尺寸和字體大小,并對(duì)矢量圖進(jìn)行高質(zhì)量渲染,即所謂的抗鋸齒。R語(yǔ)言通過(guò)支持Cairo矢量圖形處理的類庫(kù),可以創(chuàng)建高質(zhì)量的矢量圖形(PDF,PostScript,SVG) 和 位圖(PNG,JPEG, TIFF),同時(shí)支持在后臺(tái)程序中高質(zhì)量渲染。在ggplot2我比較推薦的圖片輸出格式為經(jīng)過(guò)Cairo包處理的PDF,因?yàn)镻DF格式體積小,同時(shí)可以儲(chǔ)存為其他任何格式,隨后再將PDF儲(chǔ)存為eps格式并在Photoshop中打開(kāi)做最終的調(diào)整,例如調(diào)整比例、色彩空間和dpi(一般雜志和出版社要求dpi=300以上)等。額外需要注意的是ggplot2中的字體大小問(wèn)題,在cookbook-r一書(shū)中指出,在ggplot2中絕大多數(shù)情況下,size的大小以mm記,詳細(xì)的討論也可以參考stackover的討論,而在theme()中對(duì)element_text()里的size進(jìn)行調(diào)整,此時(shí)的size是以磅值(points, pts)來(lái)進(jìn)行表示。
下面以3種ggplot2種常用的圖片輸出方式,輸出一幅主標(biāo)題為20pts,橫縱坐標(biāo)標(biāo)題為15pts,長(zhǎng)為80mm(3.15in),寬為60mm(2.36in)的圖為例。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | require(ggplot2) require(Cairo) ggplot() + geom_text(aes(x = 16 , y = 16 ), label = "ABC" , size = 11.28 ) + #尺寸為11.28mm,即為32磅 geom_text(aes(x = 16 , y = 14.5 ), label = "ABC" , size = 32 ) + #尺寸為32mm labs( x = "x axis" , y = "y axis" ) + ylim( c( 14 , 16.5 )) + xlim( c( 15.75 , 16.25 )) + theme( axis.title.x = element_text(size = 32 ), #尺寸為32磅 axis.title.y = element_text(size = 32 )) #尺寸為32磅 x < - seq( - 4 , 4 , length.out = 1000 ) y < - dnorm(x) data < - data.frame(x, y) #用Cairo包輸出 require(Cairo) CairoPDF( "plot1.pdf" , 3.15 , 3.15 ) #單位為英寸 ggplot(data, aes(x = x, y = y)) + geom_line(size = 1 ) + theme_bw() dev.off() #關(guān)閉圖像設(shè)備,同時(shí)儲(chǔ)存圖片 plot2 < - ggplot(data, aes(x = x, y = y)) + geom_line(size = 1 ) + theme_bw() #用ggsave輸出,默認(rèn)即以用Cairo包進(jìn)行抗鋸齒處理 ggsave( "plot2.pdf" , plot2, width = 3.15 , height = 3.15 ) #RStudio輸出 |
更改默認(rèn)字體或者采用中文輸出圖片是十分惱人的一件事情,好在我們還有各種拓展包和功能強(qiáng)大的Rstudio來(lái)實(shí)現(xiàn)。
extrafont包能夠直接調(diào)用字體文件,再通過(guò)Ghostscript(需要安裝)將寫(xiě)入的字體插入生成的PDF中,具體代碼可參考了作者說(shuō)明
邱怡軒大神寫(xiě)了一個(gè)好玩的showtext,確實(shí)好好玩~
最簡(jiǎn)單實(shí)用的輸出方法還是使用RStudio輸出,直接調(diào)用系統(tǒng)字體(我的是win7,mac和linux下還沒(méi)有試過(guò))并輸出即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #showtext require(showtext) require(ggplot2) require(Cairo) font.add( "BlackoakStd" , "C://Windows//Fonts//BlackoakStd.otf" ) font.add( "BrushScriptStd" , "C://Windows//Fonts//BrushScriptStd.otf" ) font.add( "times" , "C://Windows//Fonts//times.ttf" ) font.add( "STHUPO" , "C://Windows//Fonts//STHUPO.ttf" ) CairoPDF( "showtext_output" , 8 , 8 ) showtext.begin() ggplot() + geom_text(aes(x = 16 , y = 16.25 ), label = "Blackoak Std" , size = 8 , family = "BlackoakStd" ) + geom_text(aes(x = 16 , y = 16 ), label = "Brush Script Std" , size = 16 , family = "BrushScriptStd" ) + geom_text(aes(x = 16 , y = 15.75 ), label = "Times New Roman" , size = 16 , family = "times" ) + geom_text(aes(x = 16 , y = 15.50 ), label = "華文琥珀" , size = 16 , family = "STHUPO" ) + ylim(c( 15.25 , 16.50 )) + labs(x = " ", y = " ") + theme_bw() #在用RStudio輸出 |
|
來(lái)自: 枯井道人 > 《統(tǒng)計(jì)》