日韩黑丝制服一区视频播放|日韩欧美人妻丝袜视频在线观看|九九影院一级蜜桃|亚洲中文在线导航|青草草视频在线观看|婷婷五月色伊人网站|日本一区二区在线|国产AV一二三四区毛片|正在播放久草视频|亚洲色图精品一区

分享

Paper:《Greedy Function Approximation: A Gradient Boosting Machine貪心函數(shù)逼近:梯度提升機(jī)器模型》翻譯與解讀—PDP來(lái)源

 處女座的程序猿 2022-07-25 發(fā)布于上海

Paper:《Greedy Function Approximation: A Gradient Boosting Machine貪心函數(shù)逼近:梯度提升機(jī)器模型》翻譯與解讀—PDP來(lái)源


《Greedy Function Approximation: A Gradient Boosting Machine貪心函數(shù)逼近:梯度提升機(jī)器模型》翻譯與解讀—PDP來(lái)源

來(lái)源地址

https:///download/pdf_1/euclid.aos/1013203451

作者

Jerome H. Friedman,Stanford University

The Annals of Statistics

1999 REITZ LECTURE

發(fā)布日期

2001年第29卷第5期?1189–1232

Abstract

Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest- descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any ?tting criterion. Speci?c algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classi?cation. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boost- ing ofregression trees produces competitive, highly robust, interpretable procedures for both regression and classi?cation, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods ofFreund and Shapire and Friedman, Hastie and Tib- shirani are discussed.

函數(shù)估計(jì)/逼近是從函數(shù)空間而非參數(shù)空間的數(shù)值優(yōu)化的角度來(lái)看的。在逐級(jí)加性擴(kuò)展和最陡下降最小化之間建立了聯(lián)系?;谌我鈹M合準(zhǔn)則,提出了可加性展開(kāi)式的一般梯度下降“助推”范式。給出了用于回歸的最小二乘、最小絕對(duì)偏差和 Huber-M 損失函數(shù)以及用于分類(lèi)的多類(lèi)邏輯似然的特定算法。針對(duì)單個(gè)可加性組件是回歸樹(shù)的特定情況得出了特殊的增強(qiáng)功能,并提供了用于解釋此類(lèi)“TreeBoost”模型的工具?;貧w樹(shù)的梯度提升為回歸和分類(lèi)產(chǎn)生了具有競(jìng)爭(zhēng)力的、高度穩(wěn)健的、可解釋的過(guò)程,特別適用于挖掘不太干凈的數(shù)據(jù)。討論了這種方法與 Freund 和 Shapire 以及 Friedman、Hastie 和 Tibbhirani 的增強(qiáng)方法之間的聯(lián)系。

8. Interpretation解釋

?In many applications it is useful to be able to interpret the derived approximation F(x). This involves gaining an understanding?of?those particular input variables that are most in?uential in contributing to its variation, and the nature of?the dependence of F(x)?on those in?uential inputs. To the extent that F(x)?at least qualitatively re?ects the nature of?the target function F?(x) (1), such tools can provide information concerning the underlying relationship between the inputs x and the output variable y. In?this section, several tools are presented for interpreting TreeBoost approximations. Although they can be used for interpreting single decision trees, they tend to be more effective in the context of boosting (especially small) trees. These interpretative tools are illustrated on real data examples in Section 9.

在許多應(yīng)用中,能夠解釋導(dǎo)出的近似值 F(x) 是很有用的。 這涉及了解那些對(duì)其變化最有影響的特定輸入變量,以及 F(x) 依賴于這些有影響的輸入的性質(zhì)。 如果F(x)至少定性地反映了目標(biāo)函數(shù)F?(x)(1),那么這些工具可以提供關(guān)于輸入x和輸出變量y之間潛在關(guān)系的信息。 在本節(jié)中,介紹了幾種用于解釋 TreeBoost 近似的工具。 盡管它們可用于解釋單個(gè)決策樹(shù),但在增強(qiáng)(特別是小的)樹(shù)的上下文中,它們往往更有效。 這些解釋工具在第 9 節(jié)中的真實(shí)數(shù)據(jù)示例中進(jìn)行了說(shuō)明。

8.1. Relative importance of input variables

Relative importance of input variables. Among the most useful descriptions of?an approximation F(x) are the relative influences Ij, of the individual inputs xj, on the variation of?F(x) over the joint input variable distribution. One such measure is

輸入變量的相對(duì)重要性。在對(duì)近似F(x)最有用的描述中,有單獨(dú)輸入xj對(duì)F(x)在聯(lián)合輸入變量分布上的變化的相對(duì)影響Ij。其中一個(gè)衡量標(biāo)準(zhǔn)是

8.2. Partial dependence plots

Partial dependence plots. Visualization is one ofthe most powerful interpretational tools. Graphical renderings ofthe value of F(x) as a function of its arguments provides a comprehensive summary ofits dependence on the joint values ofthe input variables. Unfortunately, such visualization is limited to low-dimensional arguments. Functions ofa single real-valued variable x, F(x) can be plotted as a graph ofthe values of F(x) against each corresponding value of x. Functions ofa single categorical variable can be represented by a bar plot, each bar representing one ofits values, and the bar height the value ofthe function. Functions oftwo real-valued variables can be pictured using contour or perspective mesh plots. Functions ofa categorical variable and another variable (real or categorical) are best summarized by a sequence of(“trellis”) plots, each one showing the dependence of F(x) on the second variable, conditioned on the respective values ofthe first variable?[Becker and Cleveland (1996)].

Viewing functions of higher-dimensional arguments is more difficult. It is therefore useful to be able to view the partial dependence of the approximation F(x) on selected small subsets ofthe input variables. Although a collection of such plots can seldom provide a comprehensive depiction ofthe approximation, it can often produce helpful clues, especially when F(x) is dominated by loworder interactions (Section 7).

局部依賴圖???span style="color:#ff0000;">視化是最強(qiáng)大的解釋工具之一。 F(x) 的值作為其參數(shù)的函數(shù)的圖形渲染提供了它對(duì)輸入變量聯(lián)合值的依賴性的綜合總結(jié)。不幸的是,這種可視化僅限于低維參數(shù)。單個(gè)實(shí)值變量 x,F(x) 的函數(shù)可以繪制為 F(x) 的值與 x 的每個(gè)對(duì)應(yīng)值的關(guān)系圖。單個(gè)分類(lèi)變量的函數(shù)可以用條形圖表示,每個(gè)條形代表它的一個(gè)值,條形高度代表函數(shù)的值??梢允褂玫雀呔€或透視網(wǎng)格圖來(lái)描繪兩個(gè)實(shí)值變量的函數(shù)。一個(gè)分類(lèi)變量和另一個(gè)變量(實(shí)數(shù)或分類(lèi))的函數(shù)最好用一系列(“格子”)圖來(lái)概括,每個(gè)圖都顯示了 F(x) 對(duì)第二個(gè)變量的依賴性,條件是第一個(gè)變量的各自值 [Becker and Cleveland(1996)]。

觀察高維參數(shù)的函數(shù)比較困難。因此,能夠查看近似 F(x) 對(duì)輸入變量的選定小子集的局部依賴性是很有用的。盡管此類(lèi)圖的集合很少能提供對(duì)近似值的全面描述,但它通??梢援a(chǎn)生有用的線索,尤其是當(dāng) F(x) 由低階交互作用支配時(shí)(第 7 節(jié))。

    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評(píng)論

    發(fā)表

    請(qǐng)遵守用戶 評(píng)論公約

    類(lèi)似文章 更多