【原】“星云圖”式單細(xì)胞UMAP密度圖

健明 2025-04-30 發(fā)布于廣東

展開(kāi)全文

寫(xiě)在開(kāi)頭

最近#單細(xì)胞實(shí)戰(zhàn)100次合集里面，開(kāi)始在解析胃腺癌的空間轉(zhuǎn)錄組與單細(xì)胞RNA測(cè)序研究系列推文，和之前胃癌單細(xì)胞圖譜是一個(gè)團(tuán)隊(duì)的成果

在研讀文獻(xiàn)的過(guò)程中發(fā)現(xiàn)，文中好喜歡用密度星云圖展示不同分組的細(xì)胞占比情況，有一說(shuō)一就怪好看的，就像嘗試復(fù)現(xiàn)一下。

主要參考借鑒的推文有：

老俊俊的生信筆記——單細(xì)胞二維密度圖一文拿捏!
生信技能樹(shù)——畫(huà)個(gè)同款新奇的“Galaxy”星系UMAP圖（Nat Immunol：IF27.8）

視頻版資料：

讀取數(shù)據(jù)并隨機(jī)下采樣

數(shù)據(jù)鏈接是：https://www.ncbi.nlm./geo/query/acc.cgi?acc=GSE183904

第一層次降維聚類(lèi)可視化——UMAP圖新解：首次分群后按大類(lèi)分組加圈

復(fù)現(xiàn)圖表及注釋?zhuān)篋ensity plot of UMAP representation comparing normal and gastric tumor samples after random downsampling to approximately 30,000 cells each to allow statistical equivalence. Each dot represents a single cell. Dashed lines highlight higher proportions of epithelial cells in normal samples and myeloid cells in tumor samples.

所以需要提取注釋后的數(shù)據(jù)，并基于Normal和tumor分組進(jìn)行隨機(jī)下采樣

#清空環(huán)境，加載需要的R包
rm(list = ls())
library(Seurat)
library(tidyverse)
source('scRNA_scripts/lib.R')

#讀取數(shù)據(jù)及注釋結(jié)果
load(file = "phe.Rdata")
sce.all <- readRDS("2-harmony/sce.all_int.rds")

sce.all@meta.data <- phe

#整理activeident，并下采樣
sce.all.int <- SetIdent(sce.all, value = "group")

sce.all.int <- subset(sce.all.int,downsample=30000)

sce.all.int <- SetIdent(sce.all.int, value = "cell_class")

得到需要的數(shù)據(jù)后，就可以嘗試復(fù)現(xiàn)密度星云圖。

亞群分類(lèi)密度圖

老俊俊的生信筆記——單細(xì)胞二維密度圖一文拿捏!中，使用ggSCvis亞群分類(lèi)密度圖，看起來(lái)得到的結(jié)果和文獻(xiàn)結(jié)果很像

代入實(shí)際分析的數(shù)據(jù)進(jìn)行可視化：

library(ggSCvis)

p1 <- ggscplot(object = sce.all.int) +
  stat_density2d(geom = "raster", aes(fill = ..density..),
                 contour = F, show.legend = T) +
  geom_scPoint(color = "white", size = 0.00001) +
  facet_wrap(~group, ncol = 2) +
  theme_bw() +
  theme(panel.grid = element_blank(),
        axis.ticks = element_blank(),
        strip.background = element_blank(),
        strip.text = element_blank(),
        axis.text = element_blank()) +
  scale_fill_viridis_c(option = "magma", direction = 1) +
  coord_cartesian(expand = F);p1

使用ggSCvis（通常用于單細(xì)胞數(shù)據(jù)的可視化）來(lái)創(chuàng)建一個(gè)二維散點(diǎn)圖，其中包含了密度圖和點(diǎn)的分布

初始化一個(gè)繪圖對(duì)象 p1，使用 ggscplot 函數(shù)，其中 object 參數(shù)指定了要可視化的單細(xì)胞數(shù)據(jù)對(duì)象 sce.all.int。
添加一個(gè)二維密度圖層 stat_density2d，使用 raster 幾何對(duì)象來(lái)顯示密度。
添加一個(gè)點(diǎn)圖層 geom_scPoint，其中點(diǎn)的顏色為白色，大小為 0.2。
使用 facet_wrap 函數(shù)根據(jù) group 變量分面顯示，每行顯示 2 個(gè)分面。

結(jié)果和文章中出入還是較大，比如密度圖顏色被覆蓋較多，不按group分組進(jìn)行展示，所以再試試別的方法

ggpointdensity繪制細(xì)胞密度圖

生信技能樹(shù)——畫(huà)個(gè)同款新奇的“Galaxy”星系UMAP圖（Nat Immunol：IF27.8）中得到帶注釋信息圈的密度圖

提取需要的UMAP坐標(biāo)數(shù)據(jù)以及注釋分組信息，繪制需要的密度圖：

#ggpointdensity
install.packages("ggpointdensity")  # 如未安裝
library(ggpointdensity)

# 提取UMAP + 分組信息
# 提取UMAP坐標(biāo)
umap_df <- Embeddings(sce.all.int, reduction = "umap") %>%
  as.data.frame()

# 確保列名統(tǒng)一為 UMAP_1 和 UMAP_2
colnames(umap_df) <- c("UMAP_1", "UMAP_2")

# 添加meta信息（如group、cell_class）
umap_df$group <- sce.all.int$group
umap_df$cell_class <- sce.all.int$cell_class

分開(kāi)可視化之后拼圖：

# 分成兩個(gè)數(shù)據(jù)框
df_normal <- umap_df %>% filter(group == "Normal")
df_tumor  <- umap_df %>% filter(group == "Tumor")

# 星云密度圖函數(shù)
plot_density_map <- function(data, title) {
  ggplot(data, aes(x = UMAP_1, y = UMAP_2)) +
    geom_pointdensity(size = 0.5) +
    scale_color_viridis_c(option = "plasma") +
    theme_void(base_size = 12) +
    ggtitle(title) +
    theme(
      legend.position = "none",
      plot.title = element_text(hjust = 0.5, face = "bold", size = 16)
    )
}


p1 <- plot_density_map(df_normal, "Normal")
p2 <- plot_density_map(df_tumor, "Tumor")

p1 + p2

可視化結(jié)果中，按照group分組分開(kāi)展示，并且按照細(xì)胞亞群密度可視化，但是和星云圖還是有點(diǎn)區(qū)別。

所以想著如果添加一個(gè)點(diǎn)圖層 geom_scPoint會(huì)不會(huì)看起來(lái)好些，也就是結(jié)合兩篇推文的內(nèi)容進(jìn)行可視化

密度圖加上點(diǎn)圖層

p <- ggplot(umap_df, aes(x = UMAP_1, y = UMAP_2)) +
  stat_density_2d(
    geom = "raster", aes(fill = after_stat(density)),
    contour = FALSE, n = 200
  ) +
  geom_point(color = "white", size = 0.0001, alpha = 0.5) +
  facet_wrap(~group, ncol = 2) +
  scale_fill_viridis(option = "magma", direction = 1) +
  coord_cartesian(expand = FALSE) +
  theme_void() +
  theme(
    strip.text = element_text(face = "bold", size = 14),
    legend.position = "none"
  );p

有點(diǎn)類(lèi)似了，但是白色有點(diǎn)掩蓋掉了密度圖本身的顏色，所以可以調(diào)整為更加淡一些的顏色，然后加上細(xì)胞大類(lèi)注釋以及圈圖即可

#加載需要的R包
library(ggplot2)
library(ggrepel)
library(ggforce)
library(viridis)

# 計(jì)算每組每類(lèi)細(xì)胞的中心位置
label_centroids <- umap_df %>%
  group_by(group, cell_class) %>%
  summarize(
    UMAP_1 = median(UMAP_1),
    UMAP_2 = median(UMAP_2),
    .groups = 'drop'
  )

#最終可視化結(jié)果
p <- ggplot(umap_df, aes(x = UMAP_1, y = UMAP_2)) +
  stat_density_2d(
    geom = "raster",
    aes(fill = after_stat(density)),
    contour = FALSE, n = 200
  ) +
  geom_point(color = "#FFFFFFAA", size = 0.005, alpha = 0.2) +
  geom_text_repel(
    data = label_centroids,
    aes(label = cell_class),
    color = "white", size = 4, fontface = "bold",
    box.padding = 0.5, max.overlaps = Inf
  ) +
  facet_wrap(~group, ncol = 2) +
  scale_fill_viridis(option = "magma", direction = 1) +
  coord_cartesian(expand = FALSE) +
  theme_void() +
  theme(
    strip.text = element_text(face = "bold", size = 14),
    legend.position = "none"
  ) + 
  stat_ellipse(aes(group = cell_class), color = "white", linetype = "dashed", alpha = 0.5);p

代碼簡(jiǎn)單解析：

計(jì)算每個(gè)細(xì)胞類(lèi)在每組中的中心位置用于標(biāo)注——label_centroids
“星云圖”外觀由 stat_density_2d() 渲染出的密度熱圖模擬
geom_point(color = "#FFFFFFAA", size = 0.005, alpha = 0.2)白色透明點(diǎn)，制造一種“邊緣星點(diǎn)”效果，視覺(jué)上增強(qiáng)結(jié)構(gòu)邊界的細(xì)膩感。
geom_text_repel 避讓機(jī)制有效防止標(biāo)簽遮擋密度區(qū)域
facet_wrap(~group, ncol = 2)分成兩列圖（Normal / Tumor）
viridis::magma 是適合暗背景圖的高對(duì)比色帶，星云感較強(qiáng),theme_void() 去除坐標(biāo)軸、背景網(wǎng)格等視覺(jué)雜項(xiàng)。
stat_ellipse(aes(group = cell_class), color = "white", linetype = "dashed", alpha = 0.5)為每個(gè) cell_class 添加輪廓線(xiàn)（高斯橢圓擬合）

對(duì)比原圖不足之處

輪廓線(xiàn)沒(méi)有很好的契合到UMAP的圖形上，細(xì)胞亞群標(biāo)注由于位置問(wèn)題，導(dǎo)致有些亞群標(biāo)題不夠突出
文章應(yīng)該對(duì)細(xì)胞亞群進(jìn)行了篩選，但復(fù)現(xiàn)的時(shí)候，只基于分組進(jìn)行了下采樣，大家按照自己的實(shí)際需求去展示即可

如果你也想做單細(xì)胞轉(zhuǎn)錄組數(shù)據(jù)分析，最好是有自己的計(jì)算機(jī)資源哦，比如我們的滿(mǎn)足你生信分析計(jì)算需求的低價(jià)解決方案，而且還需要有基本的生物信息學(xué)基礎(chǔ)，也可以看看我們的生物信息學(xué)馬拉松授課，你的生物信息學(xué)入門(mén)課。

2025年也會(huì)繼續(xù)學(xué)習(xí)分享單細(xì)胞內(nèi)容，并且組建了交流群——承包你2025全部的單細(xì)胞轉(zhuǎn)錄組降維聚類(lèi)分群，歡迎一起討論交流學(xué)習(xí)！