【原】ROGUE: 【張院士團隊R包】一種基于熵的用于評估單細胞群體純度的度量標準

TS的美夢 2025-04-28

展開全文

需要者詳情請聯(lián)系作者（非需要者勿擾，我很社恐）：

精彩內(nèi)容

1、購買打包合集（2025KS微信VIP付費合集），價格感人，加入微信VIP群（答疑交流群，甚至有小伙伴覺得群比代碼更好），可以獲取建號以來所有內(nèi)容，群成員專享視頻教程，提前更新，其他更多福利！

2、《KS科研分享與服務》公眾號有QQ群，進入門檻是20元（完全是為了防止白嫖黨，請理解），請考慮清楚。群里有免費推文的注釋代碼和示例數(shù)據(jù)（終身擁有），沒有付費內(nèi)容，群成員福利是購買單個付費內(nèi)容半價！

還是來源于之前那篇cancer cell，小伙伴對于其中的一些分析方式比較感興趣，之前沒有提及，所以分享一下。原文結(jié)果是這樣描述的”Notably, based on RNA-seq ontology graphic user environment (ROGUE)......our clusters demonstrated high internal homogeneity......"，提到了一種方法ROGUE，可以判斷細胞的同質(zhì)性，或者說細胞群的純度。

（reference：Cross-tissue human fibroblast atlas reveals myofibroblast subtypes with distinct roles in immune modulation）

看了一下，包出來很久了，2020年，是張澤明院士團隊的，他們真的對單細胞領域貢獻了好多方法。包的原文發(fā)表在Nature communications上，參考：An entropy-based metric for assessing the purityof single cell populations。這個包用于準確量化單細胞RNA測序(scRNA-seq)數(shù)據(jù)中鑒定的細胞簇的純度。作者證明ROGUE廣泛適用，能夠?qū)Ω鞣N模擬和真實數(shù)據(jù)集中的簇純度進行準確、敏感和穩(wěn)健的評估。他們表明ROGUE可以識別額外的細胞亞型，并有助于檢測特定亞群中的精確生物信號。感覺說了又好像沒說：那具體有啥用呢？就不套用官方的話了，用我自己的理解說一下，不對的地方還請批評指正。主要有兩個方面可應用：

其一：ROGUE值可以用于識別純度較高的細胞亞型。這應該是大多數(shù)人會遇到的一個問題，提取大類細胞做亞群鑒定，到底分幾群才合適呢？很多人可能是佛系的聽天由命。而ROGUE值恰好可以為我們提供一個參考判斷。ROGUE越高，越接近1，表明細胞群越純，反之表示細胞群異質(zhì)性比較高，這個群體還可再細分，這樣我們可以分離得到一些亞群。
其二：ROGUE可以用于評估批次效應的影響。這個主要針對多數(shù)據(jù)集的整合，或者不同來源數(shù)據(jù)整合，比如我們見過一篇文獻就是多個公共數(shù)據(jù)庫數(shù)據(jù)整合，建設審稿人問你如何確定不同數(shù)據(jù)集批次效應有無去除，單純的放一個UMAP圖可能說服不了他，那么就可以搬出ROUGE算法，計算每種celltype在不同來源/不同sample中的純度，如果ROUGE高，細胞群體純度高，批次效應弱！

分析超級簡單，步驟不多，首先安裝包：

#github鏈接：https:///PaulingLiu/ROGUE#教程鏈接：https://htmlpreview.github.io/?https:///PaulingLiu/ROGUE/blob/master/vignettes/ROGUE_Tutorials.html
#install.packages("tidyverse") 這個是依賴包，之前沒有的話先安裝if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")devtools::install_github("PaulingLiu/ROGUE")#下載安裝不成功也可以本地安裝，本地包下載地址https://codeload.github.com/PaulingLiu/ROGUE/legacy.tar.gz/HEAD

加載數(shù)據(jù)（我們演示的數(shù)據(jù)時一個Epi亞群初步分群的數(shù)據(jù)），提取矩陣和metadata：過濾低質(zhì)量細胞和表達量低的基因。


setwd('D:\\KS項目\\公眾號文章\\ROUGE單細胞純度分析')
#安裝包# if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")# devtools::install_github("PaulingLiu/ROGUE")
library(Seurat)library(ROGUE)library(tidyverse)library(ggplot2)library(ggrastr)#-------------------------------------------------------------------------------expr <- GetAssayData(Epi, assay = 'RNA',layer = 'counts') %>% as.matrix()meta <- Epi@meta.data
#filter genes and cellsexpr <- matr.filter(expr, min.cells = 10, min.genes = 10)

計算表達熵模型，這是后續(xù)的基礎：

ent.res <- SE_fun(expr)SEplot(ent.res)

ROGUE calculation，這個是針對整個Epi細胞群體的計算，得到的最終值是0.3很低，說明Epi群體異質(zhì)性很大，這個是符合的，且不說Epi可以分亞群，我們這個演示數(shù)據(jù)的Epi包含的還是正常人和腫瘤病人的Epi，那自然異質(zhì)性更大了。

rogue.value <- CalculateRogue(ent.res, platform = "UMI")#[1] 0.339205

為了獲得每個聚類的準確純度估計值，計算不同樣本中每種細胞類型的ROGUE值。并用箱線圖可視化結(jié)果！

rogue.res <- rogue(expr, labels = meta$seurat_clusters, samples = meta$orig.ident, platform = "UMI", span = 0.6)

#這組顏色來源于cancer cell，可以收藏myColor <- c("#E41B1B", "#4376AC", "#48A75A", "#87638F", "#D87F32", "#737690", "#D690C6","#B17A7D", "#847A74", "#4285BF","#204B75", "#588257", "#B6DB7B", "#E3BC06", "#FA9B93", "#E9358B", "#A0094E", "#999999", "#6FCDDC", "#BD5E95")#寬數(shù)據(jù)轉(zhuǎn)化為長數(shù)據(jù)格式，使用ggplot作圖plotData <- rogue.res %>%  tidyr::gather(key = clusters, value = ROGUE) %>%  filter(!is.na(ROGUE))#散點箱線圖ggplot(data = plotData, aes(clusters, ROGUE, color = clusters)) +  geom_boxplot(outlier.shape = NA) +#添加box  geom_jitter_rast(shape = 16, position = position_jitter(0.2)) +#添加抖動散點  scale_color_manual(values = myColor) +  theme_classic() +  theme(    axis.text = element_text(size = 12, colour = "black"),    axis.title = element_text(size = 13, colour = "black")  ) +  labs(x = "", y = "ROGUE index") +  ylim(0, 1)

以上我們使用的是一個亞群分析的數(shù)據(jù)，可能有些例子舉的不是很恰當，但是也可以作為參考。同時我們也演示另外一組數(shù)據(jù)，關于批次效應的數(shù)據(jù)演示這是一個大型數(shù)據(jù)集，合并了多個數(shù)據(jù)庫不同來源的同一組織的單細胞數(shù)據(jù)，這里使用ROGUE驗證一下。合并公共數(shù)據(jù)庫最讓人擔心的問題不就是怕批次效應，或者數(shù)據(jù)集差異，最終得到錯誤結(jié)果。分析和前面一樣，很簡單。對于這個數(shù)據(jù)，最后我們計算了每個樣本的cluster純度，以及每個數(shù)據(jù)庫下cluster純度，發(fā)現(xiàn)ROGUE還可以，說明批次效應較小。

library(Seurat)library(ROGUE)library(tidyverse)library(ggplot2)library(ggrastr)
# Rce <- subset(sce, sequencing=='scRNA_seq')#單細胞seurat obj#-------------------------------------------------------------------------------expr <- GetAssayData(Rce, assay = 'RNA',layer = 'counts') %>% as.matrix()meta <- Rce@meta.data
#filter genes and cellsexpr <- matr.filter(expr, min.cells = 15, min.genes = 15)#-------------------------------------------------------------------------------
ent.res <- SE_fun(expr)SEplot(ent.res)
#-------------------------------------------------------------------------------#ROGUE calculation
rogue.value <- CalculateRogue(ent.res, platform = "UMI")

rogue.res.sample <- rogue(expr, labels = meta$celltype, samples = meta$orig.ident, platform = "UMI", span = 0.6)rogue.res.database <- rogue(expr, labels = meta$celltype, samples = meta$database, platform = "UMI", span = 0.6)

write.csv(rogue.res.sample, file = 'rogue.res.sample.csv')write.csv(rogue.res.database, file = 'rogue.res.database.csv')

可視化：

rogue_sample <- read.csv('rogue.res.sample.csv', header = T,row.names = 1)myColor <- c(  "#E41B1B", "#4376AC", "#48A75A", "#87638F", "#D87F32", "#737690", "#D690C6", "#B17A7D", "#847A74", "#4285BF",           "#204B75", "#588257", "#B6DB7B", "#E3BC06", "#FA9B93", "#E9358B", "#A0094E", "#999999", "#6FCDDC", "#BD5E95")plot_rogue_sample <- rogue_sample %>%  tidyr::gather(key = clusters, value = ROGUE) %>%  filter(!is.na(ROGUE))ggplot(data = plot_rogue_sample, aes(clusters, ROGUE, color = clusters)) +  geom_boxplot(outlier.shape = NA) +  geom_jitter_rast(shape = 16, position = position_jitter(0.2)) +  scale_color_manual(values = myColor) +  theme_classic() +  theme(    axis.text = element_text(size = 12, colour = "black"),    axis.title = element_text(size = 13, colour = "black")  ) +  labs(x = "", y = "ROGUE index") +  ylim(0, 1)

rogue_database <- read.csv('rogue.res.database.csv',header = T,row.names = 1)plot_rogue_database <- rogue_database %>%  tidyr::gather(key = clusters, value = ROGUE) %>%  filter(!is.na(ROGUE))ggplot(data = plot_rogue_database, aes(clusters, ROGUE, color = clusters)) +  geom_boxplot(outlier.shape = NA) +  geom_jitter_rast(shape = 16, position = position_jitter(0.2)) +  scale_color_manual(values = myColor) +  theme_classic() +  theme(    axis.text = element_text(size = 12, colour = "black"),    axis.title = element_text(size = 13, colour = "black")  ) +  labs(x = "", y = "ROGUE index") +  ylim(0, 1)