【原】LLMs之PEFT：《Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey》翻譯與解讀

處女座的程序猿 2024-12-10 發(fā)布于上海

展開全文

LLMs之PEFT：《Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey》翻譯與解讀

導(dǎo)讀：這篇論文是一篇關(guān)于大型模型參數(shù)高效微調(diào)（PEFT）的綜述文章。文章系統(tǒng)地總結(jié)了PEFT領(lǐng)域的最新進(jìn)展，涵蓋了算法設(shè)計(jì)、計(jì)算效率、應(yīng)用場景和系統(tǒng)實(shí)現(xiàn)等多個方面。

>> 背景痛點(diǎn)：

● 大型模型的計(jì)算成本高昂：大型語言模型（LLM）等大型模型參數(shù)量巨大（數(shù)十億甚至上千億），直接微調(diào)到特定下游任務(wù)需要巨大的計(jì)算資源，這在計(jì)算能力受限的硬件平臺上尤其困難。

● 全模型微調(diào)的局限性：對大型模型進(jìn)行全模型微調(diào)計(jì)算成本高，且可能損害模型的泛化能力。

>> 具體的解決方案：PEFT 的核心思想是在盡可能少增加參數(shù)或計(jì)算資源的情況下，調(diào)整預(yù)訓(xùn)練大型模型的參數(shù)，使其適應(yīng)特定任務(wù)或領(lǐng)域。文章將 PEFT 方法分為四大類：

● 增量式?PEFT：在模型架構(gòu)中添加少量可訓(xùn)練模塊（例如 Adapter、Soft Prompt 等），只更新這些新增模塊的參數(shù)。各種 Adapter 的改進(jìn)（串行、并行、多任務(wù)自適應(yīng)等）和 Soft Prompt 的改進(jìn)（Prefix-tuning、p-tuning 等）都屬于此類。還包括一些其他增量方法，例如 (IA)3 和 SSF，它們通過調(diào)整激活函數(shù)來實(shí)現(xiàn)參數(shù)高效微調(diào)。

● 選擇性 PEFT：只選擇模型中一部分參數(shù)進(jìn)行微調(diào)，其余參數(shù)保持不變。這可以通過結(jié)構(gòu)化或非結(jié)構(gòu)化的掩碼來實(shí)現(xiàn)。例如，BitFit 只微調(diào)偏置參數(shù)，而其他方法則根據(jù)參數(shù)重要性（例如 Fisher 信息）選擇要微調(diào)的參數(shù)。

● 重參數(shù)化 PEFT：通過構(gòu)建原始模型參數(shù)的低秩近似來進(jìn)行訓(xùn)練，然后將訓(xùn)練好的參數(shù)轉(zhuǎn)換回原始模型參數(shù)進(jìn)行推理，保證推理速度。 LoRA 是最具代表性的方法，它通過引入兩個低秩矩陣來更新權(quán)重矩陣。其他重參數(shù)化方法包括 DyLoRA、AdaLoRA、SoRA、Compacter 等，它們在秩選擇、參數(shù)更新策略等方面進(jìn)行了改進(jìn)。

● 混合式 PEFT：結(jié)合多種?PEFT 方法的優(yōu)點(diǎn)，例如 UniPELT 結(jié)合了 LoRA、Prefix-tuning 和 Adapter。一些研究還利用神經(jīng)架構(gòu)搜索 (NAS) 來尋找最佳的 PEFT 方法組合。

>> 核心思路步驟：PEFT 方法的核心思路步驟大致如下：

● 選擇 PEFT 方法：根據(jù)具體任務(wù)和模型選擇合適的 PEFT 方法。

● 添加或選擇參數(shù)：根據(jù)所選方法，添加新的可訓(xùn)練參數(shù)或選擇現(xiàn)有參數(shù)的子集。

● 微調(diào)模型：只更新所添加或選擇的參數(shù)，保持其余參數(shù)不變。

● 評估性能：在目標(biāo)任務(wù)上評估微調(diào)后的模型性能。

>> 優(yōu)勢：

● 顯著降低計(jì)算成本：PEFT 方法顯著降低了大型模型微調(diào)的計(jì)算成本和資源消耗。

● 提高訓(xùn)練效率：相比于全模型微調(diào)，PEFT 方法能夠更快地收斂。

● 提升模型泛化能力：在某些情況下，PEFT 方法能夠提高模型的泛化能力。

>> 論文結(jié)論和觀點(diǎn)：

● PEFT 是一個高效適應(yīng)大型模型到下游任務(wù)的有效方法。

● 文章對各種 PEFT 方法進(jìn)行了全面的分類和總結(jié)。

● 文章探討了 PEFT 在不同模型架構(gòu)（LLM、ViT、VLA、Diffusion Models）和下游任務(wù)上的應(yīng)用。

● 文章分析了 PEFT 系統(tǒng)設(shè)計(jì)的挑戰(zhàn)，包括集中式 PEFT 查詢服務(wù)、分布式 PEFT 訓(xùn)練和并行 PEFT 訓(xùn)練。

● 文章提出了未來研究方向，包括簡化超參數(shù)調(diào)整、建立統(tǒng)一的基準(zhǔn)、提高訓(xùn)練效率、探索縮放規(guī)律、服務(wù)更多模型和任務(wù)、增強(qiáng)數(shù)據(jù)隱私以及 PEFT 與模型壓縮的結(jié)合。

總而言之，這篇論文對參數(shù)高效微調(diào) (PEFT) 進(jìn)行了全面的綜述，總結(jié)了該領(lǐng)域的最新進(jìn)展，并指出了未來研究方向。它為研究者理解和應(yīng)用 PEFT 提供了寶貴的參考。

《Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey》翻譯與解讀

地址	論文地址：https:///abs/2403.14608
時間	2024年8月29日
作者	東北大學(xué)，加州大學(xué)河濱分校，亞利桑那州立大學(xué)，紐約大學(xué)

Abstract

Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities.

Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large model to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design.

In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as a valuable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed insights into recent advancements and practical applications.

大型模型在多個應(yīng)用領(lǐng)域取得了突破性進(jìn)展，在各種任務(wù)中實(shí)現(xiàn)了顯著的成就。然而，其前所未有的規(guī)模也帶來了巨大的計(jì)算成本。這些模型通常包含數(shù)十億個參數(shù)，執(zhí)行它們需要大量的計(jì)算資源。特別是，龐大的規(guī)模和計(jì)算需求在為特定下游任務(wù)定制模型時帶來了巨大的挑戰(zhàn)，特別是在受計(jì)算能力限制的硬件平臺上。

參數(shù)高效微調(diào)（PEFT）提供了一種實(shí)用的解決方案，通過高效地調(diào)整大型模型以適應(yīng)各種下游任務(wù)。特別是，PEFT指的是在最小化引入的新參數(shù)數(shù)量和所需計(jì)算資源的情況下，調(diào)整預(yù)訓(xùn)練大型模型的參數(shù)以適應(yīng)特定任務(wù)或領(lǐng)域。當(dāng)處理具有大量參數(shù)的大型語言模型時，這種方法尤其重要，因?yàn)?span style="color:#ff0000;">從頭開始微調(diào)這些模型可能會非常耗時且資源密集，這給支持系統(tǒng)平臺的設(shè)計(jì)帶來了相當(dāng)大的挑戰(zhàn)。

在這項(xiàng)調(diào)查中，我們對各種PEFT算法進(jìn)行了全面的研究，考察它們的性能和計(jì)算開銷。此外，我們還提供了使用不同PEFT算法開發(fā)的應(yīng)用程序的概述，并討論了用于減輕PEFT計(jì)算成本的常見技術(shù)。除了從算法角度提供詳盡的綜述外，我們還研究了各種實(shí)際系統(tǒng)設(shè)計(jì)，以調(diào)查不同PEFT方法的實(shí)現(xiàn)成本。這項(xiàng)調(diào)查為旨在了解PEFT算法及其系統(tǒng)實(shí)現(xiàn)的研究人員提供了寶貴的資源，提供了對近期進(jìn)展和實(shí)際應(yīng)用的詳細(xì)見解。

Index Terms: Large Language Model, Parameter-Efficient Fine-tuning, Computer System, Distributed System.

關(guān)鍵詞：大型語言模型、參數(shù)高效微調(diào)、計(jì)算機(jī)系統(tǒng)、分布式系統(tǒng)。

1、Introduction

Large Models (LMs) have recently captured considerable public interest. Their ability to understand context and nuances enables them to proficiently handle diverse tasks across multiple domains, including natural language processing (NLP), computer vision (CV), etc. In the field of NLP, Large Language Models (LLMs) have achieved significant advancements across various tasks including text generation [1, 2], translation [3, 4], personalized chat-bots [5, 6, 7], and summarization [8], demonstrating remarkable proficiency.

Earlier studies [1] have suggested that LLMs exhibit high levels of generalization, enabling them to apply their acquired knowledge to new tasks not included in their original training. This capability is commonly known as zero-shot learning. Nevertheless, fine-tuning remains essential to further enhance LLMs for optimal performance on new user datasets and tasks.

大型模型（LMs）最近吸引了公眾的廣泛關(guān)注。它們能夠理解語境和細(xì)微差別，因此能夠在包括自然語言處理（NLP）、計(jì)算機(jī)視覺（CV）等多個領(lǐng)域中高效地處理各種任務(wù)。在NLP領(lǐng)域，大型語言模型（LLMs）在包括文本生成在內(nèi)的各種任務(wù)上取得了顯著進(jìn)步[1, 2]，在翻譯[3, 4]、個性化聊天機(jī)器人[5, 6, 7]和摘要[8]等方面也展現(xiàn)出了卓越的能力。

早期的研究[1]表明，LLMs具有很高的泛化能力，能夠?qū)⑺鼈兯鶎W(xué)的知識應(yīng)用于未在原始訓(xùn)練中包含的新任務(wù)。這種能力通常被稱為零樣本學(xué)習(xí)。然而，為了在新的用戶數(shù)據(jù)集和任務(wù)上實(shí)現(xiàn)最佳性能，對LLMs進(jìn)行微調(diào)仍然是必不可少的。

Due to its scale, a widely adopted strategy for fine-tuning LLMs involves adjusting a limited number of LLM parameters while keeping the remainder unchanged. This technique, termed Parameter-Efficient-Fine-Tuning (PEFT), involves selectively adjusting a small proportion of their parameters while keeping the rest unaltered. Furthermore, the application of PEFT extends beyond the realm of NLP and quickly attracts interest in the CV community for handling fine-tuning vision models with large parameters, such as Vision Transformers (ViT) and diffusion models, as well as disciplinary models such as vision-language models.

In this survey, we systematically review and categorize recent advancements in PEFT algorithms as well as the system implementation costs associated with various PEFT algorithms across diverse scenarios. Figure 1 presents the overview content for this survey. In section II, we present some fundamental concepts for LLM and PEFT, including computational flow for LLM, basic knowledge of PEFT, commonly used datasets and tasks, and evaluation benchmarks.

由于其規(guī)模龐大，一種廣泛采用的微調(diào)LLMs的方法是僅調(diào)整少量LLM參數(shù)，而其余部分保持不變。這種技術(shù)被稱為參數(shù)高效微調(diào)(PEFT)，它涉及有選擇地調(diào)整一小部分參數(shù)，而其余部分保持不變。此外，PEFT的應(yīng)用范圍不僅限于NLP領(lǐng)域，在CV社區(qū)中也迅速引起了人們對于使用大型參數(shù)的視覺模型（如ViT和擴(kuò)散模型）以及跨學(xué)科模型（如視覺語言模型）進(jìn)行微調(diào)的興趣。

在這項(xiàng)調(diào)查中，我們系統(tǒng)地回顧和分類了PEFT算法在不同場景下的系統(tǒng)實(shí)現(xiàn)成本。圖1展示了本調(diào)查的概覽內(nèi)容。在第II部分，我們介紹了一些LLM和PEFT的基本概念，包括LLM的計(jì)算流程、PEFT的基本知識、常用的數(shù)據(jù)集和任務(wù)以及評估基準(zhǔn)。

We categorize all types of PEFT algorithms in Section III according to their computational flow. In Section III-A, we detail additive algorithms that either introduce new weight parameters or modify activations. Algorithms that only require fine-tuning of existing parameters are categorized as selective approaches, which are introduced in Section III-B. In Section III-C, we explore reparameterized PEFT, which constructs a (low- dimensional) reparameterization of original model parameters for training while transforming the weights back to maintain the inference speed. Additionally, there exist algorithms that combine the above techniques, and we have classified these as hybrid approaches, elaborating on them in Section III-D. We also investigate strategies for further reducing the computational complexity of different PEFT algorithms, including KV-cache management, pruning, quantization, and memory optimization, in Section IV.

In Section V, we expand the scope of this survey beyond the computational perspective to involve various potential application scenarios. Specifically, we explore innovations that applying PEFT techniques to different model architecture, including LLMs (Section V-A), Vision Transformer (Section V-B), Vision-Language alignment models (Section V-C), and Diffusion models (Section V-D), for varied downstream tasks, underscoring PEFT’s versatility and applicability in a range of scenarios. After that, in Section VI, we explore the system design challenge for PEFT methods. The discussion includes three advanced system solutions for practical PEFT deployment: PEFT query serving (Section VI-B), distributed tuning (Section VI-C), and concurrent PEFT tuning (Section VI-D). Finally, in Section VII, we summarize our survey and propose several potential future directions from both algorithmic and systemic perspectives, aiming to offer valuable insights for further research and development in the field.

我們根據(jù)計(jì)算流程將所有類型的PEFT算法分為三類（見第三部分）。在第三部分A節(jié)中，我們詳細(xì)介紹了引入新權(quán)重參數(shù)或修改激活函數(shù)的加法算法。僅需要微調(diào)現(xiàn)有參數(shù)的算法被歸類為選擇性方法，并在第三部分B節(jié)中進(jìn)行介紹。在第三部分C節(jié)中，我們探討了可重參數(shù)化PEFT，即在訓(xùn)練過程中構(gòu)建（低維度）原始模型參數(shù)的重參數(shù)化，同時將權(quán)重轉(zhuǎn)換回以保持推理速度。此外，還存在結(jié)合上述技術(shù)的算法，我們將其歸類為混合方法，并在第三部分D節(jié)中詳細(xì)介紹。在第四部分中，我們還研究了進(jìn)一步降低不同PEFT算法計(jì)算復(fù)雜度的策略，包括KV緩存管理、剪枝、量化和內(nèi)存優(yōu)化。

在第五部分中，我們將本綜述的范圍擴(kuò)大到超出計(jì)算角度的其他潛在應(yīng)用場景。具體來說，我們探索了將PEFT技術(shù)應(yīng)用于不同模型架構(gòu)的方法，包括LLMs（第V-A節(jié)）、Vision Transformer（第V-B節(jié)）、Vision-Language對齊模型（第V-C節(jié)）和Diffusion模型（第V-D節(jié)），以適應(yīng)不同的下游任務(wù)，凸顯PEFT在各種場景下的靈活性和適用性。在第VI節(jié)中，我們探討了PEFT方法的系統(tǒng)設(shè)計(jì)挑戰(zhàn)。討論包括三個實(shí)用的PEFT部署解決方案：PEFT查詢服務(wù)（第VI-B節(jié)）、分布式調(diào)優(yōu)（第VI-C節(jié)）和并行PEFT調(diào)優(yōu)（第VI-D節(jié)）。最后，在第VII節(jié)中，我們總結(jié)了我們的調(diào)查，并從算法和系統(tǒng)角度提出了一些潛在的未來方向，旨在為該領(lǐng)域的進(jìn)一步研究和發(fā)展提供有價值的見解。

Figure 1:A content overview covered in the survey.圖1：調(diào)查中涵蓋的內(nèi)容概述。

VII Conclusion and Future Directions結(jié)論與未來方向

In the current era dominated by large models and large datasets, PEFT stands out as a highly attractive method for efficiently adapting models to downstream tasks. This technique gains its appeal by addressing the significant challenges posed by traditional full-model fine-tuning, which often places substantial computational and data demands. This survey offers a comprehensive examination of the most recent advancements in PEFT, including algorithmic design, computational efficiency, application scenarios, and system implementation for PEFT. It offers a comprehensive taxonomy and explanation that serves as an excellent guidance and knowledge base, which enables readers of various levels and disciplines to swiftly grasp the core concepts of PEFT.

For further research on PEFT, we propose a series of possible directions from both algorithm and system perspectives, hoping to inspire more researchers to engage in further studies in these areas.

在當(dāng)前由大型模型和大量數(shù)據(jù)主導(dǎo)的時代，PEFT作為一種高效的模型適配下游任務(wù)的方法脫穎而出。該技術(shù)通過解決傳統(tǒng)全模型微調(diào)帶來的重大挑戰(zhàn)而獲得吸引力，后者通常需要大量的計(jì)算和數(shù)據(jù)資源。本綜述對PEFT的最新進(jìn)展進(jìn)行了全面的考察，包括算法設(shè)計(jì)、計(jì)算效率、應(yīng)用場景和PEFT的系統(tǒng)實(shí)現(xiàn)。它提供了一個全面的分類和解釋，作為優(yōu)秀的指導(dǎo)和知識基礎(chǔ)，使不同水平和學(xué)科的讀者能夠迅速掌握PEFT的核心概念。

對于PEFT的進(jìn)一步研究，我們從算法和系統(tǒng)角度提出了一系列可能的方向，希望能夠激發(fā)更多研究人員在這些領(lǐng)域進(jìn)行進(jìn)一步的研究。