【原】校準(zhǔn)曲線的繪制的小技巧

生信修煉手冊 2022-06-15 發(fā)布于廣東

展開全文

在之前關(guān)于列線圖的文章中，我們介紹了利用列線圖來可視化預(yù)后模型，同時也提到了模型性能的幾種評估方式，校準(zhǔn)度以及校準(zhǔn)曲線就是其中一種方式。

校準(zhǔn)度，用來描述一個模型預(yù)測個體發(fā)生臨床結(jié)局的概率的準(zhǔn)確性。在實際應(yīng)用中，通常用校準(zhǔn)曲線來表征。校準(zhǔn)曲線展示了模型預(yù)測值與實際值之間的偏差，一個典型的校準(zhǔn)曲線示例如下

橫軸表示模型預(yù)測的不同臨床結(jié)局概率，縱軸表示實際觀察到的患者的臨床結(jié)局的概率，用中位數(shù)加均值的errorbar 形式表征，并繪制了一條斜率為1的理想曲線作為參照，實際曲線越接近理想曲線，表明模型預(yù)測結(jié)果與實際結(jié)果的偏差越小，模型效果高好。

在數(shù)據(jù)分析過程中，我們可以通過rms包中的calibrate函數(shù)來創(chuàng)建校準(zhǔn)曲線，首先來運(yùn)行下官方示例

> set.seed(1)> n <- 200> d.time <- rexp(n)> x1 <- runif(n)> x2 <- factor(sample(c('a', 'b', 'c'), n, TRUE))> f <- cph(Surv(d.time) ~ pol(x1,2) * x2, x=TRUE, y=TRUE, surv=TRUE, time.inc=1.5)> cal <- calibrate(f, u=1.5, cmethod='KM', m=50, B=20)> plot(cal)

效果圖如下

參數(shù)u指定了我們想要分析的時間節(jié)點，m指定了樣本分組個數(shù)，該參數(shù)決定了圖中errorbar的個數(shù)，示例數(shù)據(jù)有200個樣本，m取50時，group的個數(shù)為4。該函數(shù)通過有放回的抽樣方法對模型效能進(jìn)行評估，利用函數(shù)返回值可以查看具體的繪圖數(shù)據(jù)，示例如下

> calcalibrate.cph(fit = f, cmethod = "KM", u = 1.5, m = 50, B = 20)n=200  B=20  u=1.5 Day      index.orig     training         test mean.optimism mean.corrected  n[1,] -0.02180909 -0.006492867  0.053098128   -0.05959099     0.03778191 20[2,]  0.01161824  0.013463692  0.031802035   -0.01833834     0.02995658 20[3,]  0.07007320 -0.064043654 -0.007650977   -0.05639268     0.12646588 14[4,] -0.07103626 -0.015150576 -0.055302350    0.04015177    -0.11118804 20     mean.predicted   KM KM.corrected   std.err[1,]      0.1418091 0.12    0.1795910 0.3829708[2,]      0.1883818 0.20    0.2183383 0.2828427[3,]      0.2299268 0.30    0.3563927 0.2160247[4,]      0.3110363 0.24    0.1998482 0.2516611

其中，mean.predicted列代表圖中4處errorbar對應(yīng)的x軸坐標(biāo)，KM.corrected列表示圖中黑色原形散點的縱坐標(biāo)，星形散點的縱坐標(biāo)為KM列，errobar的上下區(qū)間則通過如下公式計算

cal   <- x[,"KM"]se <- x[,"std.err"]ciupper <- function(surv, d) ifelse(surv==0, 0, pmin(1, surv*exp(d)))cilower <- function(surv, d) ifelse(surv==0, 0, surv*exp(-d))cilower(cal, 1.959964*se)ciupper(cal, 1.959964*se)

利用KM列和std.err列的數(shù)據(jù)進(jìn)行計算，我們可以提取其中的數(shù)據(jù)，自己來畫圖，代碼如下

> x <- cal> plot(x = x[,"mean.predicted"], y = x[,"KM"],  pch = 20, xlab = "", ylab = "")> errbar(x[,"mean.predicted"], x[,"KM"] , cilower(x[,"KM"], 1.959964 * x[,"std.err"]), ciupper(x[,"KM"], 1.959964 * x[,"std.err"]))> points(x = x[,"mean.predicted"], y = x[,"KM.corrected"], pch = 4)> lines(x = x[,"mean.predicted"], y = x[,"KM"])> plot(x = x[,"mean.predicted"], y = x[,"KM"],  pch = 20, xlab = "", ylab = "")> errbar(x[,"mean.predicted"], x[,"KM"] , cilower(x[,"KM"], 1.959964 * x[,"std.err"]), ciupper(x[,"KM"], 1.959964 * x[,"std.err"]), xlab = "", ylab = "")> points(x = x[,"mean.predicted"], y = x[,"KM.corrected"], pch = 4)> lines(x = x[,"mean.predicted"], y = x[,"KM"]