感謝“宏基因組0”群友李海敏、沈偉推薦此包繪制堆疊柱狀圖各成分連線:突出展示組間物種豐度變化。 沖擊圖(alluvial diagram)是流程圖(flow diagram)的一種,最初開發(fā)用于代表網(wǎng)絡(luò)結(jié)構(gòu)的時(shí)間變化。 實(shí)例1. neuroscience coalesced from other related disciplines to form its own field. From PLoS ONE 5(1): e8694 (2010) 實(shí)例2. Sciences封面哈扎人腸道菌群 圖1中的C/D就使用了3個(gè)沖擊圖。詳見3分和30分文章差距在哪里? ggalluvial是一個(gè)基于ggplot2的擴(kuò)展包,專門用于快速繪制沖擊圖(alluvial diagram),有些人也叫它?;鶊D(Sankey diagram),但兩者略有區(qū)別,將來我們會(huì)介紹 軟件源代碼位于Github: https://github.com/corybrunson/ggalluvial CRNA官方演示教程: https://cran./web/packages/ggalluvial/vignettes/ggalluvial.html 安裝以下三種方裝方式,三選1: # 國內(nèi)用戶推薦清華鏡像站site='https://mirrors.tuna./CRAN'# 安裝穩(wěn)定版(推薦)install.packages('ggalluvial', repo=site)# 安裝開發(fā)版(連github不穩(wěn)定有時(shí)間下載失敗,多試幾次可以成功)devtools::install_github('corybrunson/ggalluvial', build_vignettes = TRUE)# 安裝新功能最優(yōu)版devtools::install_github('corybrunson/ggalluvial', ref = 'optimization') 顯示幫助文檔使用vignette查看演示教程 # 查看教程vignette(topic = 'ggalluvial', package = 'ggalluvial') 接下來我們的演示均基于此官方演示教程,我的主要貢獻(xiàn)是翻譯與代碼注釋。 基于ggplot2的沖擊圖 原作者:Jason Cory Brunson, 更新日期:2018-02-11 1. 最簡單的示例基于泰坦尼克事件人員統(tǒng)計(jì)繪制性別與艙位和年齡的關(guān)系。 # 加載包library(ggalluvial)# 轉(zhuǎn)換內(nèi)部數(shù)據(jù)為數(shù)據(jù)框,寬表格模式titanic_wide <- data.frame(titanic)#="" 顯示數(shù)據(jù)格式head(titanic_wide)#=""> Class Sex Age Survived Freq#> 1 1st Male Child No 0#> 2 2nd Male Child No 0#> 3 3rd Male Child No 35#> 4 Crew Male Child No 0#> 5 1st Female Child No 0#> 6 2nd Female Child No 0# 繪制性別與艙位和年齡的關(guān)系ggplot(data = titanic_wide, aes(axis1 = Class, axis2 = Sex, axis3 = Age, weight = Freq)) + scale_x_discrete(limits = c('Class', 'Sex', 'Age'), expand = c(.1, .05)) + geom_alluvium(aes(fill = Survived)) + geom_stratum() + geom_text(stat = 'stratum', label.strata = TRUE) + theme_minimal() + ggtitle('passengers on the maiden voyage of the Titanic', 'stratified by demographics and survival')-> 具體參考說明:data設(shè)置數(shù)據(jù)源,axis設(shè)置顯示的柱,weight為數(shù)值,geom_alluvium為沖擊圖組間面積連接并按生存率比填充分組,geom_stratum()每種有柱狀圖,geom_text()顯示柱狀圖中標(biāo)簽,theme_minimal()主題樣式的一種,ggtitle()設(shè)置圖標(biāo)題 圖1. 展示性別與艙位和年齡的關(guān)系及存活率比例 我們發(fā)現(xiàn)上圖居然畫的是寬表格模式下的數(shù)據(jù),而通常ggplot2處理都是長表格模式,如何轉(zhuǎn)換呢? to_loades轉(zhuǎn)換為長表格 # 長表格模式,to_loades多組組合,會(huì)生成alluvium和stratum列。主分組位于命名的key列中titanic_long <- to_lodes(data.frame(titanic),="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" key='Demographic' ,="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" axes="1:3)head(titanic_long)ggplot(data" =="" titanic_long,="" ="" ="" ="" aes(x="Demographic," stratum="stratum," alluvium="alluvium," ="" ="" ="" ="" ="" weight="Freq," label="stratum))" +="" geom_alluvium(aes(fill="Survived))" +="" geom_stratum()="" +="" geom_text(stat='stratum' )="" +="" theme_minimal()="" +="" ggtitle('passengers="" on="" the="" maiden="" voyage="" of="" the="" titanic',="" ="" ="" ="" ="" 'stratified="" by="" demographics="" and="">-> 產(chǎn)生和上圖一樣的圖,只是數(shù)據(jù)源格式不同。 2. 輸入數(shù)據(jù)格式定義一種Alluvial寬表格 # 顯示數(shù)據(jù)格式head(as.data.frame(UCBAdmissions), n = 12)## Admit Gender Dept Freq## 1 Admitted Male A 512## 2 Rejected Male A 313## 3 Admitted Female A 89## 4 Rejected Female A 19## 5 Admitted Male B 353## 6 Rejected Male B 207## 7 Admitted Female B 17## 8 Rejected Female B 8## 9 Admitted Male C 120## 10 Rejected Male C 205## 11 Admitted Female C 202## 12 Rejected Female C 391# 判斷數(shù)據(jù)格式is_alluvial(as.data.frame(UCBAdmissions), logical = FALSE, silent = TRUE)## [1] 'alluvia' 查看性別與專業(yè)間關(guān)系,并按錄取情況分組 ggplot(as.data.frame(UCBAdmissions), aes(weight = Freq, axis1 = Gender, axis2 = Dept)) + geom_alluvium(aes(fill = Admit), width = 1/12) + geom_stratum(width = 1/12, fill = 'black', color = 'grey') + geom_label(stat = 'stratum', label.strata = TRUE) + scale_x_continuous(breaks = 1:2, labels = c('Gender', 'Dept')) + scale_fill_brewer(type = 'qual', palette = 'Set1') + ggtitle('UC Berkeley admissions and rejections, by sex and department') 3. 三類型間關(guān)系,按重點(diǎn)著色Titanic按生存,性別,艙位分類查看關(guān)系,并按艙位填充色 ggplot(as.data.frame(Titanic), aes(weight = Freq, axis1 = Survived, axis2 = Sex, axis3 = Class)) + geom_alluvium(aes(fill = Class), width = 0, knot.pos = 0, reverse = FALSE) + guides(fill = FALSE) + geom_stratum(width = 1/8, reverse = FALSE) + geom_text(stat = 'stratum', label.strata = TRUE, reverse = FALSE) + scale_x_continuous(breaks = 1:3, labels = c('Survived', 'Sex', 'Class')) + coord_flip() + ggtitle('Titanic survival by class and sex') 4. 長表格數(shù)據(jù)# to_lodes轉(zhuǎn)換為長表格UCB_lodes <- to_lodes(as.data.frame(ucbadmissions),="" axes="1:3)head(UCB_lodes," n="12)##" ="" freq="" alluvium="" ="" ="" x="" stratum##="" 1="" ="" 512="" ="" ="" ="" 1="" admit="" admitted##="" 2="" ="" 313="" ="" ="" ="" 2="" admit="" rejected##="" 3="" ="" 89="" ="" ="" ="" 3="" admit="" admitted##="" 4="" ="" 19="" ="" ="" ="" 4="" admit="" rejected##="" 5="" ="" 353="" ="" ="" ="" 5="" admit="" admitted##="" 6="" ="" 207="" ="" ="" ="" 6="" admit="" rejected##="" 7="" ="" 17="" ="" ="" ="" 7="" admit="" admitted##="" 8="" ="" ="" 8="" ="" ="" ="" 8="" admit="" rejected##="" 9="" ="" 120="" ="" ="" ="" 9="" admit="" admitted##="" 10="" 205="" ="" ="" ="" 10="" admit="" rejected##="" 11="" 202="" ="" ="" ="" 11="" admit="" admitted##="" 12="" 391="" ="" ="" ="" 12="" admit="" rejected#="" 判斷是否符合格式要求is_alluvial(ucb_lodes,="" logical="FALSE," silent="TRUE)##" [1]="">-> 主要列說明:
5. 繪制非等高沖擊圖以各國難民數(shù)據(jù)為例,觀察多國難民數(shù)量隨時(shí)間變化 data(Refugees, package = 'alluvial')country_regions <- c(="" afghanistan='Middle East' ,="" burundi='Central Africa' ,="" `congo="" drc`='Central Africa' ,="" iraq='Middle East' ,="" myanmar='Southeast Asia' ,="" palestine='Middle East' ,="" somalia='Horn of Africa' ,="" sudan='Central Africa' ,="" syria='Middle East' ,="" vietnam='Southeast Asia' )refugees$region="">-><- country_regions[refugees$country]ggplot(data="Refugees," ="" ="" ="" aes(x="year," weight="refugees," alluvium="country))" +="" geom_alluvium(aes(fill="country," colour="country)," ="" ="" ="" ="" ="" ="" ="" alpha=".75," decreasing="FALSE)" +="" scale_x_continuous(breaks="seq(2003," 2013,="" 2))="" +="" theme(axis.text.x="element_text(angle" =="" -30,="" hjust="0))" +="" scale_fill_brewer(type='qual' ,="" palette='Set3' )="" +="" scale_color_brewer(type='qual' ,="" palette='Set3' )="" +="" facet_wrap(~="" region,="" scales='fixed' )="" +="" ggtitle('refugee="" volume="" by="" country="" and="" region="" of="">-> 6. 等高非等量關(guān)系不同學(xué)期學(xué)生學(xué)習(xí)科目的變化 data(majors)majors$curriculum <- as.factor(majors$curriculum)ggplot(majors,="" ="" ="" ="" aes(x="semester," stratum="curriculum," alluvium="student," ="" ="" ="" ="" ="" fill="curriculum," label="curriculum))" +="" scale_fill_brewer(type='qual' ,="" palette='Set2' )="" +="" geom_flow(stat='alluvium' ,="" lode.guidance='rightleft' ,="" ="" ="" ="" ="" ="" color='darkgray' )="" +="" geom_stratum()="" +="" theme(legend.position='bottom' )="" +="" ggtitle('student="" curricula="" across="" several="">-> 7. 工作狀態(tài)時(shí)間變化圖data(vaccinations)levels(vaccinations$response) <- rev(levels(vaccinations$response))ggplot(vaccinations,="" ="" ="" ="" aes(x="survey," stratum="response," alluvium="subject," ="" ="" ="" ="" ="" weight="freq," ="" ="" ="" ="" ="" fill="response," label="response))" +="" geom_flow()="" +="" geom_stratum(alpha=".5)" +="" geom_text(stat='stratum' ,="" size="3)" +="" theme(legend.position='none' )="" +="" ggtitle('vaccination="" survey="" responses="" at="" three="" points="" in="">-> 8. 分類學(xué)門水平相對(duì)豐度實(shí)戰(zhàn)# 實(shí)戰(zhàn)1. 組間豐度變化 # 編寫測試數(shù)據(jù)df=data.frame( Phylum=c('Ruminococcaceae','Bacteroidaceae','Eubacteriaceae','Lachnospiraceae','Porphyromonadaceae'), GroupA=c(37.7397,31.34317,222.08827,5.08956,3.7393), GroupB=c(113.2191,94.02951,66.26481,15.26868,11.2179), GroupC=c(123.2191,94.02951,46.26481,35.26868,1.2179), GroupD=c(37.7397,31.34317,222.08827,5.08956,3.7393))# 數(shù)據(jù)轉(zhuǎn)換長表格library(reshape2)melt_df = melt(df)# 繪制分組對(duì)應(yīng)的分類學(xué),有點(diǎn)像circosggplot(data = melt_df, aes(axis1 = Phylum, axis2 = variable, weight = value)) + scale_x_discrete(limits = c('Phylum', 'variable'), expand = c(.1, .05)) + geom_alluvium(aes(fill = Phylum)) + geom_stratum() + geom_text(stat = 'stratum', label.strata = TRUE) + theme_minimal() + ggtitle('Phlyum abundance in each group') 繪制分組對(duì)應(yīng)的分類學(xué),有點(diǎn)像circos # 組間各豐度變化 ggplot(data = melt_df, aes(x = variable, weight = value, alluvium = Phylum)) + geom_alluvium(aes(fill = Phylum, colour = Phylum, colour = Phylum), alpha = .75, decreasing = FALSE) + theme_minimal() + theme(axis.text.x = element_text(angle = -30, hjust = 0)) + ggtitle('Phylum change among groups') 組間各豐度變化,如果組為時(shí)間效果更好 Reference# 如何引用citation('ggalluvial') Jason Cory Brunson (2017). ggalluvial: Alluvial Diagrams in ‘ggplot2’. R package version 0.5.0. https://en./wiki/Alluvial_diagram ggalluvial包源碼:http://corybrunson./ggalluvial/index.html 官方示例 Alluvial Diagrams in ggplot2 https://cran./web/packages/ggalluvial/vignettes/ggalluvial.html |
|