【原】CellChat學(xué)習(xí)筆記【一】——通訊網(wǎng)絡(luò)構(gòu)建

健明 2022-11-18 發(fā)布于廣東

展開全文

細(xì)胞通訊是指細(xì)胞與細(xì)胞之間的聯(lián)系。細(xì)胞和人類一樣，多細(xì)胞生物的很多細(xì)胞會(huì)相互作用，形成“細(xì)胞社會(huì)”，在這個(gè)社會(huì)里，細(xì)胞與細(xì)胞之間會(huì)發(fā)生相互作用和信息的傳遞，細(xì)胞建立通訊聯(lián)絡(luò)是必需的。如生物體的生長(zhǎng)發(fā)育、分化、各種組織器官的形成、組織的維持以及它們各種生理活動(dòng)的協(xié)調(diào)。經(jīng)典的例子莫過于 神經(jīng)細(xì)胞之間的神經(jīng)遞質(zhì)的傳遞與接收。

細(xì)胞與細(xì)胞之間的通訊有三種形式：

細(xì)胞之間的直接接觸；
細(xì)胞之間通過其間的胞外基質(zhì)相互聯(lián)系；
細(xì)胞之間通過分子間的相互作用產(chǎn)生聯(lián)系。

在這個(gè)過程中，“受體-配體”的概念十分重要，什么是受體，什么是配體？受體與配體之間結(jié)合的結(jié)果是受體被激活，并產(chǎn)生受體激活后續(xù)信號(hào)傳遞的基本步驟。在單細(xì)胞分析當(dāng)中，不同細(xì)胞類型之間的通訊可能會(huì)對(duì)某些生物學(xué)過程具有重要意義，因此利用單細(xì)胞數(shù)據(jù)進(jìn)行細(xì)胞通訊分析是單細(xì)胞高級(jí)分析的一大重點(diǎn)。

CellChat

CellChat是一款2021年發(fā)表于Nature Communications的單細(xì)胞細(xì)胞通訊分析工具。

CellChat上游分析

安裝 CellChat

devtools::install_github("sqjin/CellChat")

細(xì)胞通訊分析

這里我們使用 pbmc3k 的公共數(shù)據(jù)集為例來一起學(xué)習(xí)如何利用 CellChat 進(jìn)行細(xì)胞通訊分析。

數(shù)據(jù)輸入

CellChat 需要的輸入文件包括：

細(xì)胞的基因表達(dá)矩陣（已經(jīng)經(jīng)過normalize的）

不同的基因作為行名，細(xì)胞ID作為列名。這一點(diǎn)和Seurat對(duì)象的結(jié)構(gòu)是保持一致的。

細(xì)胞的metadata信息

細(xì)胞層面的信息，這里可以是我們上游分析的注釋信息，可以直接把SeuratObject的metadata提取出來使用。

所以我們的輸入文件可以這樣準(zhǔn)備：

library(CellChat)
library(Seurat)
library(SeuratData)

data("pbmc3k.final")

data <- GetAssayData(object = pbmc3k.final, slot = 'data')
meta <- pbmc3k.final@meta.data

這里涉及到一個(gè)小知識(shí)點(diǎn)：SeuratObject中RNA assay中不同slot所存儲(chǔ)的是什么值：

counts：原始的基因counts數(shù)，也就是簡(jiǎn)單reads計(jì)數(shù)的結(jié)果，對(duì)于10X Genomics平臺(tái)數(shù)據(jù)來說，這是cellranger運(yùn)行的結(jié)果；
data：這是原始表達(dá)矩陣經(jīng)過質(zhì)控之后進(jìn)行NormalizeData()的數(shù)據(jù)，NormalizeData去除了不同細(xì)胞之間測(cè)序深度的差異，同時(shí)對(duì)結(jié)果進(jìn)行了對(duì)數(shù)化，這個(gè)數(shù)據(jù)是CellChat想要的數(shù)據(jù)。
scale.data：這是函數(shù)ScaleData()運(yùn)行的結(jié)果，主要是將每個(gè)基因的表達(dá)量轉(zhuǎn)換成了符合標(biāo)準(zhǔn)正態(tài)分布的數(shù)據(jù)，從而降低部分細(xì)胞異常表達(dá)值的影響。

因?yàn)閷?shí)際上我們?cè)谑褂?code style="font-size: 14px;overflow-wrap: break-word;padding: 2px 4px;border-radius: 4px;margin-right: 2px;margin-left: 2px;background-color: rgba(27, 31, 35, 0.05);font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;word-break: break-all;color: rgb(239, 112, 96);">CellChat時(shí)是都已經(jīng)默認(rèn)上游的處理已經(jīng)完成了，所以我在這里不打算介紹CellChat本身自帶的函數(shù)去normalize原始counts的分析。

創(chuàng)建CellChat對(duì)象

CellChat對(duì)象和Seurat對(duì)象很像，我們可以這樣來進(jìn)行創(chuàng)建：

cellchat <- createCellChat(object = data,
                           meta = meta,
                           group.by = 'seurat_annotations')

## The cell groups used for CellChat analysis are  Naive CD4 T Memory CD4 T CD14+ Mono B CD8 T FCGR3A+ Mono NK DC Platelet

這里的group.by參數(shù)是不可少的，來自于我們的metadata的列名，我們一般指定為細(xì)胞類型注釋的結(jié)果，這是有利于我們的分析結(jié)果的。

除此之外，不得不介紹的還有這個(gè)函數(shù)不僅僅支持將你自己提取的表達(dá)矩陣作為輸入，還支持直接用SeuratObject作為輸入，不過需要注意的是，如果是多組樣本整合的SeuratObject，我們不能利用integrated assay作為輸入，因?yàn)槠浜胸?fù)值，所以我們可用下面的這段代碼實(shí)現(xiàn)和前面代碼一樣的目的：

cellchat <- createCellChat(object = pbmc3k.final,
                           meta = meta,
                           group.by = 'seurat_annotations',
                           assay = 'RNA')

此外還有一個(gè)參數(shù)do.sparse不要去改動(dòng)它（默認(rèn)為TRUE），用稀疏矩陣處理單細(xì)胞數(shù)據(jù)能夠節(jié)省更多的空間和時(shí)間。

簡(jiǎn)單介紹一下CellChat數(shù)據(jù)的結(jié)構(gòu)：

str(cellchat)

## Formal class 'CellChat' [package "CellChat"] with 14 slots
##   ..@ data.raw      : num[0 , 0 ] 
##   ..@ data          :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. ..@ i       : int [1:2238732] 29 73 80 148 163 184 186 227 229 230 ...
##   .. .. ..@ p       : int [1:2639] 0 779 2131 3260 4220 4741 5522 6304 7094 7626 ...
##   .. .. ..@ Dim     : int [1:2] 13714 2638
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : chr [1:13714] "AL627309.1" "AP006222.2" "RP11-206L10.2" "RP11-206L10.9" ...
##   .. .. .. ..$ : chr [1:2638] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" ...
##   .. .. ..@ x       : num [1:2238732] 1.64 1.64 2.23 1.64 1.64 ...
##   .. .. ..@ factors : list()
##   ..@ data.signaling: num[0 , 0 ] 
##   ..@ data.scale    : num[0 , 0 ] 
##   ..@ data.project  : num[0 , 0 ] 
##   ..@ net           : list()
##   ..@ netP          : list()
##   ..@ meta          :'data.frame': 2638 obs. of  7 variables:
##   .. ..$ orig.ident        : Factor w/ 1 level "pbmc3k": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ nCount_RNA        : num [1:2638] 2419 4903 3147 2639 980 ...
##   .. ..$ nFeature_RNA      : int [1:2638] 779 1352 1129 960 521 781 782 790 532 550 ...
##   .. ..$ seurat_annotations: Factor w/ 9 levels "Naive CD4 T",..: 2 4 2 3 7 2 5 5 1 6 ...
##   .. ..$ percent.mt        : num [1:2638] 3.02 3.79 0.89 1.74 1.22 ...
##   .. ..$ RNA_snn_res.0.5   : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
##   .. ..$ seurat_clusters   : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
##   ..@ idents        : Factor w/ 9 levels "Naive CD4 T",..: 2 4 2 3 7 2 5 5 1 6 ...
##   ..@ DB            : list()
##   ..@ LR            : list()
##   ..@ var.features  : list()
##   ..@ dr            : list()
##   ..@ options       :List of 1
##   .. ..$ mode: chr "single"

截止到目前為止，我們的CellChat對(duì)象結(jié)構(gòu)如上所示，可以看到：

我們輸入的數(shù)據(jù)存在了data這個(gè)稀疏矩陣?yán)锩?，而后面?code style="font-size: 14px;overflow-wrap: break-word;padding: 2px 4px;border-radius: 4px;margin-right: 2px;margin-left: 2px;background-color: rgba(27, 31, 35, 0.05);font-family: "Operator Mono", Consolas, Monaco, Menlo, monospace;word-break: break-all;color: rgb(239, 112, 96);">data.signaling等對(duì)象還有待后面的分析進(jìn)行填充；
此外，我們應(yīng)該重點(diǎn)關(guān)注的就是meta部分，這個(gè)部分是一個(gè)數(shù)據(jù)框，所以我們還是可以在創(chuàng)建CellChat對(duì)象之后來更改細(xì)胞的信息，這是比較方便的，可以看到目前CellChat對(duì)象對(duì)細(xì)胞的分析是基于seurat_annotations的，這是我們前面設(shè)置好的，如果我們想要改變只需要setIdent(cellchat, ident.use = "labels")就可以了，同時(shí)我們還可以通過levels(cellchat@idents)來查看當(dāng)前的細(xì)胞分組信息。

選擇受體配體數(shù)據(jù)庫

CellChat 有一個(gè)專門的數(shù)據(jù)庫，叫做CellChatDB，這個(gè)數(shù)據(jù)庫是 CellChat 的作者們通過閱讀大量文獻(xiàn)，手動(dòng)整理出來的“受體-配體”對(duì)，目前有人、鼠以及斑馬魚的版本。其中人的叫做 CellChatDB.human，鼠的叫做 CellChatDB.mouse，斑馬魚的叫做 CellChatDB.zebrafish。關(guān)于這兩個(gè)數(shù)據(jù)庫中具體的“受體-配體”對(duì)信息可以通過showDatabaseCategory()函數(shù)獲得，在這里就不贅述了。

CellChat數(shù)據(jù)庫

mouse

2,021 validated molecular interactions, including 60% of secrete autocrine/paracrine signaling interactions, 21% of extracellular matrix (ECM)-receptor interactions and 19% of cell-cell contact interactions.

human

1,939 validated molecular interactions, including 61.8% of paracrine/autocrine signaling interactions, 21.7% of extracellular matrix (ECM)-receptor interactions and 16.5% of cell-cell contact interactions.

加載數(shù)據(jù)庫：

CellChatDB <- CellChatDB.human

來看一下數(shù)據(jù)庫的結(jié)構(gòu)（option）：

head(CellChatDB$interaction)

##                        interaction_name pathway_name ligand      receptor
## TGFB1_TGFBR1_TGFBR2 TGFB1_TGFBR1_TGFBR2         TGFb  TGFB1     TGFbR1_R2
## TGFB2_TGFBR1_TGFBR2 TGFB2_TGFBR1_TGFBR2         TGFb  TGFB2     TGFbR1_R2
## TGFB3_TGFBR1_TGFBR2 TGFB3_TGFBR1_TGFBR2         TGFb  TGFB3     TGFbR1_R2
## TGFB1_ACVR1B_TGFBR2 TGFB1_ACVR1B_TGFBR2         TGFb  TGFB1 ACVR1B_TGFbR2
## TGFB1_ACVR1C_TGFBR2 TGFB1_ACVR1C_TGFBR2         TGFb  TGFB1 ACVR1C_TGFbR2
## TGFB2_ACVR1B_TGFBR2 TGFB2_ACVR1B_TGFBR2         TGFb  TGFB2 ACVR1B_TGFbR2
##                          agonist      antagonist co_A_receptor
## TGFB1_TGFBR1_TGFBR2 TGFb agonist TGFb antagonist              
## TGFB2_TGFBR1_TGFBR2 TGFb agonist TGFb antagonist              
## TGFB3_TGFBR1_TGFBR2 TGFb agonist TGFb antagonist              
## TGFB1_ACVR1B_TGFBR2 TGFb agonist TGFb antagonist              
## TGFB1_ACVR1C_TGFBR2 TGFb agonist TGFb antagonist              
## TGFB2_ACVR1B_TGFBR2 TGFb agonist TGFb antagonist              
##                                co_I_receptor       evidence         annotation
## TGFB1_TGFBR1_TGFBR2 TGFb inhibition receptor KEGG: hsa04350 Secreted Signaling
## TGFB2_TGFBR1_TGFBR2 TGFb inhibition receptor KEGG: hsa04350 Secreted Signaling
## TGFB3_TGFBR1_TGFBR2 TGFb inhibition receptor KEGG: hsa04350 Secreted Signaling
## TGFB1_ACVR1B_TGFBR2 TGFb inhibition receptor PMID: 27449815 Secreted Signaling
## TGFB1_ACVR1C_TGFBR2 TGFb inhibition receptor PMID: 27449815 Secreted Signaling
## TGFB2_ACVR1B_TGFBR2 TGFb inhibition receptor PMID: 27449815 Secreted Signaling
##                          interaction_name_2
## TGFB1_TGFBR1_TGFBR2 TGFB1 - (TGFBR1+TGFBR2)
## TGFB2_TGFBR1_TGFBR2 TGFB2 - (TGFBR1+TGFBR2)
## TGFB3_TGFBR1_TGFBR2 TGFB3 - (TGFBR1+TGFBR2)
## TGFB1_ACVR1B_TGFBR2 TGFB1 - (ACVR1B+TGFBR2)
## TGFB1_ACVR1C_TGFBR2 TGFB1 - (ACVR1C+TGFBR2)
## TGFB2_ACVR1B_TGFBR2 TGFB2 - (ACVR1B+TGFBR2)

最后，千萬不要忘了把數(shù)據(jù)庫嵌入到 CellChat 對(duì)象中，否則后面的分析會(huì)報(bào)錯(cuò)：

cellchat@DB <- CellChatDB

有的時(shí)候我們并不想分析所有的細(xì)胞通訊，例如我們只關(guān)心Secreted Signaling，或者我們只相信經(jīng)過KEGG分析驗(yàn)證過的細(xì)胞通訊，為了節(jié)約運(yùn)行空間和時(shí)間，我們可以使用subsetDB()函數(shù)來取數(shù)據(jù)庫的子集，簡(jiǎn)單查看一下數(shù)據(jù)庫的數(shù)據(jù)結(jié)構(gòu)：

dplyr::glimpse(CellChatDB$interaction)

## Rows: 1,939
## Columns: 11
## $ interaction_name   <chr> "TGFB1_TGFBR1_TGFBR2", "TGFB2_TGFBR1_TGFBR2", "TGFB…
## $ pathway_name       <chr> "TGFb", "TGFb", "TGFb", "TGFb", "TGFb", "TGFb", "TG…
## $ ligand             <chr> "TGFB1", "TGFB2", "TGFB3", "TGFB1", "TGFB1", "TGFB2…
## $ receptor           <chr> "TGFbR1_R2", "TGFbR1_R2", "TGFbR1_R2", "ACVR1B_TGFb…
## $ agonist            <chr> "TGFb agonist", "TGFb agonist", "TGFb agonist", "TG…
## $ antagonist         <chr> "TGFb antagonist", "TGFb antagonist", "TGFb antagon…
## $ co_A_receptor      <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ co_I_receptor      <chr> "TGFb inhibition receptor", "TGFb inhibition recept…
## $ evidence           <chr> "KEGG: hsa04350", "KEGG: hsa04350", "KEGG: hsa04350…
## $ annotation         <chr> "Secreted Signaling", "Secreted Signaling", "Secret…
## $ interaction_name_2 <chr> "TGFB1 - (TGFBR1+TGFBR2)", "TGFB2 - (TGFBR1+TGFBR2)…

我們只選擇Secreted Signaling相關(guān)的細(xì)胞間通訊信息：

subsetDB(CellChatDB.human, search = 'Secreted Signaling')

當(dāng)然我們能做的絕不僅這些，我們可以通過更改subsetDB()函數(shù)的key參數(shù)來實(shí)現(xiàn)任何標(biāo)準(zhǔn)的篩選，例如我們想篩選出evidence為KEGG的細(xì)胞間通訊：

subsetDB(CellChatDB.human, search = '^KEGG', key = 'evidence') #regular expression here!!!

細(xì)胞通訊鑒定

在這個(gè)部分我們?yōu)榱私档瓦\(yùn)行的內(nèi)存和時(shí)間消耗，我們會(huì)抽取部分細(xì)胞進(jìn)行分析，分析的流程也很好理解：首先，鑒定出細(xì)胞中高表達(dá)的受體和配體編碼基因，然后再依據(jù)這個(gè)表達(dá)譜構(gòu)建細(xì)胞之間的受體與配體作用對(duì)。和單細(xì)胞分析一樣，這里我們也可以使用 future 包來進(jìn)行并行計(jì)算，加速我們的分析過程。

library(future)
plan(strategy = 'multiprocess', workers = 4)
#subset
cellchat <- subsetData(cellchat)

為什么這里要做subset，因?yàn)闉榱斯?jié)省運(yùn)算時(shí)間和空間，通過這一步，我們后面的分析將只關(guān)注于與細(xì)胞通訊有關(guān)的基因（從數(shù)據(jù)框中提取出來的），所以在這里針對(duì)于表達(dá)矩陣的基因取了子集。

cellchat <- identifyOverExpressedGenes(cellchat)
cellchat <- identifyOverExpressedInteractions(cellchat)

上面兩個(gè)函數(shù)分別鑒定除了每個(gè)細(xì)胞群中高表達(dá)的細(xì)胞通訊相關(guān)基因和細(xì)胞群之間根據(jù)細(xì)胞通訊相關(guān)基因的表達(dá)情況（identifyOverExpressedGenes()）最終鑒定出的細(xì)胞間通訊關(guān)系（identifyOverExpressedInteractions）。

cellchat <- computeCommunProb(cellchat)
cellchat <- filterCommunication(cellchat, min.cells = 10)

注意到為了獲得更可信的細(xì)胞間通訊，方便后續(xù)的驗(yàn)證，這里首先對(duì)每個(gè)通訊計(jì)算了相應(yīng)的概率值，同時(shí)進(jìn)行了基于每個(gè)細(xì)胞類群中支持該通訊的細(xì)胞數(shù)量的過濾。

需要注意的是，預(yù)測(cè)出的細(xì)胞間通訊的數(shù)量是和每個(gè)細(xì)胞群的基因表達(dá)量有關(guān)的，所以對(duì)于基因平均表達(dá)量的計(jì)算就顯得很重要，在默認(rèn)情況下，computeCommunProb()函數(shù)使用的是trimean方法，該方法默認(rèn)如果一群細(xì)胞里面不足25%的細(xì)胞表達(dá)某個(gè)基因的話，這個(gè)基因在該群細(xì)胞里面的平均表達(dá)量就是0。不過你可以通過設(shè)置type = "truncatedMean"和trim=來自己指定這個(gè)閾值，比如trim=0.1就表示閾值為10%。顯然默認(rèn)方法trimean能幫我們篩選出更少的、更可信的細(xì)胞間通訊。此外，考慮到細(xì)胞數(shù)更多的細(xì)胞群之間的細(xì)胞通訊信號(hào)往往會(huì)更強(qiáng)，為了去除不同細(xì)胞群之間的細(xì)胞數(shù)量差異，我們可以嘗試使用population.size = TRUE來輔助我們發(fā)現(xiàn)稀有細(xì)胞類群之間的細(xì)胞通訊。你可以通過computeAveExpr(cellchat, features = c("CXCL12","CXCR4"), type = "truncatedMean", trim = 0.1)來提取你所感興趣的細(xì)胞通訊相關(guān)基因的平均表達(dá)量信息。

細(xì)胞通訊與信號(hào)通路關(guān)系的構(gòu)建

細(xì)胞之間的通訊往往會(huì)是信號(hào)通路的重要組成部分，把細(xì)胞通訊放到信號(hào)通路中進(jìn)行理解可能會(huì)更有利于我們理解生物學(xué)過程。

cellchat <- computeCommunProbPathway(cellchat)

每對(duì)受配體的細(xì)胞間通訊網(wǎng)絡(luò)會(huì)被存儲(chǔ)到net的slot中，而每個(gè)信號(hào)通過的細(xì)胞間通訊網(wǎng)絡(luò)信息將會(huì)被存儲(chǔ)到netP網(wǎng)絡(luò)中。

細(xì)胞通訊網(wǎng)絡(luò)的構(gòu)建

進(jìn)一步的，我們將細(xì)胞間的通訊進(jìn)行整合，就能構(gòu)建出細(xì)胞間的通訊網(wǎng)絡(luò)。

cellchat <- aggregateNet(cellchat)

當(dāng)然，你可能只關(guān)心部分細(xì)胞之間的通訊，你可以通過指定 信號(hào)發(fā)出細(xì)胞 和 信號(hào)作用細(xì)胞 來進(jìn)行個(gè)性化的分析：

?aggregateNet

aggregateNet(
  object,
  sources.use = NULL,
  targets.use = NULL,
  signaling = NULL,
  pairLR.use = NULL,
  remove.isolate = TRUE,
  thresh = 0.05,
  return.object = TRUE
)

也就是這里的 sources.use 和 targets.use。那么指定成什么呢？這個(gè)時(shí)候metadata信息的作用就來了，指定的就是前面 group.by 所包含的細(xì)胞類群信息。

至此，上游分析已經(jīng)結(jié)束，后面我們將繼續(xù)分享如何理解這些分析結(jié)果，并對(duì)其進(jìn)行可視化。