Xiaohan Xu1 Ming Li2 Chongyang Tao3 Tao Shen4 Reynold Cheng1 Jinyang Li1 Can Xu5 Dacheng Tao6 Tianyi Zhang2
1 The University of Hong Kong 2 University of Maryland 3 Microsoft 4 University of Technology Sydney 5 Peking University 6 The University of Sydney
A collection of papers related to knowledge distillation of large language models (LLMs). If you want to use LLMs for benefitting your own smaller models training, just take a look at this collection.
- 2024-2-20: 📃 We released a survey paper "A Survey on Knowledge Distillation of Large Language Models". Welcome to read and cite it. We are looking forward to your feedback and suggestions.
Feel free to open an issue/PR or e-mail [email protected], [email protected], [email protected] and [email protected] if you find any missing taxonomies or papers. We will keep updating this collection and survey.
This survey offers an in-depth exploration of knowledge distillation (KD) techniques within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in transferring sophisticated capabilities from proprietary giants such as GPT-4 and Claude to accessible, open-source models like LLaMA and Mistral. Amidst the evolving AI landscape, this work elucidates the critical disparities between proprietary and open-source LLMs, demonstrating how KD serves as an essential conduit for imbuing the latter with the former's advanced functionalities and nuanced understandings. Our analysis is meticulously structured around three foundational pillars: algorithm, skill, and verticalization -- providing a comprehensive examination of KD mechanisms, the enhancement of specific cognitive abilities, and their practical implications across diverse fields. Crucially, the survey navigates the intricate interplay between data augmentation (DA) and KD, illustrating how DA emerges as a powerful paradigm within the KD framework to bolster LLMs' performance. By leveraging DA to generate context-rich, skill-specific training data, KD transcends traditional boundaries, enabling open-source models to approximate the contextual adeptness, ethical alignment, and deep semantic insights characteristic of their proprietary counterparts. This work aims to provide an insightful guide for researchers and practitioners, offering a detailed overview of current methodologies in knowledge distillation and proposing future research directions. By bridging the gap between proprietary and open-source LLMs, this survey underscores the potential for more accessible, efficient, and sustainable AI solutions, fostering a more inclusive and equitable landscape in AI advancements.
KD Algorithms: For KD algorithms, we categorize it into two principal steps: "Knowledge Elicitation" focusing on eliciting knowledge from teacher LLMs, and "Distillation Algorithms" centered on injecting this knowledge into student models.
Skill Distillation: We delve into the enhancement of specific cognitive abilities, such as context following, alignment, agent, NLP task specialization, and multi-modality.
Verticalization Distillation: We explore the practical implications of KD across diverse fields, including law, medical & healthcare, finance, science, and miscellaneous domains.
Note that both Skill Distillation and Verticalization Distillation employ Knowledge Elicitation and Distillation Algorithms in KD Algorithms to achieve their KD. Thus, there are overlaps between them. However, this could also provide different perspectives for the papers.
Due to the large number of works applying supervised fine-tuning, we only list the most representative ones here.
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Self-Rewarding Language Models | arXiv | 2024-01 | Github | |
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models | arXiv | 2024-01 | Github | Data |
Zephyr: Direct Distillation of Language Model Alignment | arXiv | 2023-10 | Github | Data |
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment | arXiv | 2023-10 |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Zephyr: Direct Distillation of LM Alignment | arXiv | 2023-10 | Github | Data |
OPENCHAT: ADVANCING OPEN-SOURCE LANGUAGE MODELS WITH MIXED-QUALITY DATA | ICLR | 2023-09 | Github | Data |
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations | arXiv | 2023-05 | Github | Data |
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | EMNLP | 2023-04 | Github | Data |
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality* | - | 2023-03 | Github | Data |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | NIPS | 2023-10 | Github | Data |
SAIL: Search-Augmented Instruction Learning | arXiv | 2023-05 | Github | Data |
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks | NIPS | 2023-05 | Github | Data |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning | arXiv | 2024-02 | Github | Data |
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements | arXiv | 2024-02 | Github | Data |
Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering | arXiv | 2023-11 | Github | |
Orca 2: Teaching Small Language Models How to Reason | arXiv | 2023-11 | ||
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning | NIPS Workshop | 2023-10 | Github | Data |
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 | arXiv | 2023-06 | ||
SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation | arXiv | 2023-05 |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Ultrafeedback: Boosting language models with high-quality feedback | arXiv | 2023-10 | Github | Data |
Zephyr: Direct Distillation of LM Alignment | arXiv | 2023-10 | Github | Data |
Rlaif: Scaling Reinforcement Learning from Human Feedback with AI Feedback | arXiv | 2023-09 | ||
OPENCHAT: ADVANCING OPEN-SOURCE LANGUAGE MODELS WITH MIXED-QUALITY DATA | ICLR | 2023-09 | Github | Data |
RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment | arXiv | 2023-07 | Github | |
Aligning Large Language Models through Synthetic Feedbacks | EMNLP | 2023-05 | Github | Data |
Reward Design with Language Models | ICLR | 2023-03 | Github | |
Training Language Models with Language Feedback at Scale | arXiv | 2023-03 | ||
Constitutional AI: Harmlessness from AI Feedback | arXiv | 2022-12 |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Ultrafeedback: Boosting language models with high-quality feedback | arXiv | 2023-10 | Github | Data |
RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment | arXiv | 2023-07 | Github | |
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision | NeurIPS | 2023-05 | Github | Data |
Training Socially Aligned Language Models on Simulated Social Interactions | arXiv | 2023-05 | ||
Constitutional AI: Harmlessness from AI Feedback | arXiv | 2022-12 |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Mixed Distillation Helps Smaller Language Model Better Reasoning | arXiv | 2023-12 | ||
Targeted Data Generation: Finding and Fixing Model Weaknesses | ACL | 2023-05 | Github | |
Distilling ChatGPT for Explainable Automated Student Answer Assessment | arXiv | 2023-05 | Github | |
ChatGPT outperforms crowd workers for text-annotation tasks | arXiv | 2023-03 | ||
Annollm: Making large language models to be better crowdsourced annotators | arXiv | 2023-03 | ||
AugGPT: Leveraging ChatGPT for Text Data Augmentation | arXiv | 2023-02 | Github | |
Is GPT-3 a Good Data Annotator? | ACL | 2022-12 | Github | |
SunGen: Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning | ICLR | 2022-05 | Github | |
ZeroGen: Efficient Zero-shot Learning via Dataset Generation | EMNLP | 2022-02 | Github | |
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding | NeurIPS | 2022-02 | Github | |
Towards Zero-Label Language Learning | arXiv | 2021-09 | ||
Generate, Annotate, and Learn: NLP with Synthetic Text | TACL | 2021-06 |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Tailoring Self-Rationalizers with Multi-Reward Distillation | arXiv | 2023-11 | Github | Data |
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation | arXiv | 2023-10 | Github | |
Neural Machine Translation Data Generation and Augmentation using ChatGPT | arXiv | 2023-07 | ||
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes | ICLR | 2023-06 | ||
Can LLMs generate high-quality synthetic note-oriented doctor-patient conversations? | arXiv | 2023-06 | Github | Data |
InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT | EMNLP | 2023-05 | ||
Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing | arXiv | 2023-05 | Github | |
Data Augmentation for Radiology Report Simplification | Findings of EACL | 2023-04 | Github | |
Want To Reduce Labeling Cost? GPT-3 Can Help | Findings of EMNLP | 2021-08 |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Large Language Model Augmented Narrative Driven Recommendations | arXiv | 2023-06 | ||
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach | arXiv | 2023-05 | ||
ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models | WSDM | 2023-05 | Github | Data |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models | ICLR | 2023-10 | Github | Data |
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks | arXiv | 2023-10 | Github | Data |
Generative Judge for Evaluating Alignment | ICLR | 2023-10 | Github | Data |
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization | arXiv | 2023-06 | Github | Data |
INSTRUCTSCORE: Explainable Text Generation Evaluation with Fine-grained Feedback | EMNLP | 2023-05 | Github | Data |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Magicoder: Source Code Is All You Need | arXiv | 2023-12 | Github | Data Data |
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation | arXiv | 2023-12 | ||
Instruction Fusion: Advancing Prompt Evolution through Hybridization | arXiv | 2023-12 | ||
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning | arXiv | 2023-11 | Github | Data Data |
LLM-Assisted Code Cleaning For Training Accurate Code Generators | arXiv | 2023-11 | ||
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | EMNLP | 2023-10 | Github | |
Code Llama: Open Foundation Models for Code | arXiv | 2023-08 | Github | |
Distilled GPT for Source Code Summarization | arXiv | 2023-08 | Github | Data |
Textbooks Are All You Need: A Large-Scale Instructional Text Data Set for Language Models | arXiv | 2023-06 | ||
Code Alpaca: An Instruction-following LLaMA model for code generation | - | 2023-03 | Github | Data |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
Fuzi | - | 2023-08 | Github | |
ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases | arXiv | 2023-06 | Github | |
Lawyer LLaMA Technical Report | arXiv | 2023-05 | Github | Data |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters | CIKM | 2023-05 |
Title | Venue | Date | Code | Data |
---|---|---|---|---|
OWL: A Large Language Model for IT Operations | arXiv | 2023-09 | Github | Data |
EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education | arXiv | 2023-08 | Github | Data |
If you find this repository helpful, please consider citing the following paper:
@misc{xu2024survey,
title={A Survey on Knowledge Distillation of Large Language Models},
author={Xiaohan Xu and Ming Li and Chongyang Tao and Tao Shen and Reynold Cheng and Jinyang Li and Can Xu and Dacheng Tao and Tianyi Zhou},
year={2024},
eprint={2402.13116},
archivePrefix={arXiv},
primaryClass={cs.CL}
}