Giter Site home page Giter Site logo

rucaibox / llmsurvey Goto Github PK

View Code? Open in Web Editor NEW
8.8K 137.0 682.0 36.71 MB

The official GitHub page for the survey paper "A Survey of Large Language Models".

Home Page: https://arxiv.org/abs/2303.18223

Python 91.63% Shell 4.73% C++ 0.11% JavaScript 2.85% Rust 0.42% Scheme 0.26%
chain-of-thought chatgpt in-context-learning instruction-tuning large-language-models llm llms natural-language-processing pre-trained-language-models pre-training

llmsurvey's Introduction

LLMSurvey

A collection of papers and resources related to Large Language Models.

The organization of papers refers to our survey "A Survey of Large Language Models". Paper page

Please let us know if you find out a mistake or have any suggestions by e-mail: [email protected]

(we suggest ccing another email [email protected] meanwhile, in case of any unsuccessful delivery issue.)

If you find our survey useful for your research, please cite the following paper:

@article{LLMSurvey,
    title={A Survey of Large Language Models},
    author={Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-Rong},
    year={2023},
    journal={arXiv preprint arXiv:2303.18223},
    url={http://arxiv.org/abs/2303.18223}
}

Chinese Version

To facilitate the reading of our (English-verison) survey, we also translate a Chinese version for this survey. We will continue to update the Chinese version.

chinese_version

🚀(New) The trends of the number of papers related to LLMs on arXiv

Here are the trends of the cumulative numbers of arXiv papers that contain the keyphrases “language model” (since June 2018) and “large language model” (since October 2019), respectively.

arxiv_llms

The statistics are calculated using exact match by querying the keyphrases in title or abstract by months. We set different x-axis ranges for the two keyphrases, because “language models” have been explored at an earlier time. We label the points corresponding to important landmarks in the research progress of LLMs. A sharp increase occurs after the release of ChatGPT: the average number of published arXiv papers that contain “large language model” in title or abstract goes from 0.40 per day to 8.58 per day.

🚀(New) Technical Evolution of GPT-series Models

A brief illustration for the technical evolution of GPT-series models. We plot this figure mainly based on the papers, blog articles and official APIs from OpenAI. Here, solid lines denote that there exists an explicit evidence (e.g., the official statement that a new model is developed based on a base model) on the evolution path between two models, while dashed lines denote a relatively weaker evolution relation.

gpt-series

🚀(New) Evolutionary Graph of LLaMA Family

An evolutionary graph of the research work conducted on LLaMA. Due to the huge number, we cannot include all the LLaMA variants in this figure, even much excellent work.

LLaMA_family

To support incremental update, we share the source file of this figure, and welcome the readers to include the desired models by submitting the pull requests on our GitHub page. If you're instrested, please request by application.

🚀(New) Prompts

We collect some useful tips for designing prompts that are collected from online notes and experiences from our authors, where we also show the related ingredients and principles (introduced in Section 8.1).

prompt examples

Please click here to view more detailed information.

Welcome everyone to provide us with more relevant tips in the form of issues. After selection, we will regularly update them on GitHub and indicate the source.

🚀(New) Experiments

Instruction Tuning Experiments

We will explore the effect of different types of instructions in fine-tuning LLMs (i.e., 7B LLaMA26), as well as examine the usefulness of several instruction improvement strategies.

instruction_tuning_table

Please click here to view more detailed information.

Ability Evaluaition Experiments

We conduct a fine-grained evaluation on the abilities discussed in Section 7.1 and Section 7.2. For each kind of ability, we select representative tasks and datasets for conducting evaluation experiments to examine the corresponding performance of LLMs.

ability_main

Please click here to view more detailed information.

We also call for support of computing power for conducting more comprehensive experiments.

Table of Contents

Timeline of LLMs

LLMs_timeline

List of LLMs

Category model Release Time Size(B) Link
Publicly
Accessbile
T5 2019/10 11 Paper
mT5 2021/03 13 Paper
PanGu-α 2021/05 13 Paper
CPM-2 2021/05 198 Paper
T0 2021/10 11 Paper
GPT-NeoX-20B 2022/02 20 Paper
CodeGen 2022/03 16 Paper
Tk-Instruct 2022/04 11 Paper
UL2 2022/02 20 Paper
OPT 2022/05 175 Paper
YaLM 2022/06 100 GitHub
NLLB 2022/07 55 Paper
BLOOM 2022/07 176 Paper
GLM 2022/08 130 Paper
Flan-T5 2022/10 11 Paper
mT0 2022/11 13 Paper
Galatica 2022/11 120 Paper
BLOOMZ 2022/11 176 Paper
OPT-IML 2022/12 175 Paper
Pythia 2023/01 12 Paper
LLaMA 2023/02 65 Paper
Vicuna 2023/03 13 Blog
ChatGLM 2023/03 6 GitHub
CodeGeeX 2023/03 13 Paper
Alpaca 2023/03 7 Blog
Koala 2023/04 13 Blog
Mistral 2023/09 7 Blog
Closed
Source
GShard 2020/01 600 Paper
GPT-3 2020/05 175 Paper
LaMDA 2021/05 137 Paper
HyperCLOVA 2021/06 82 Paper
Codex 2021/07 12 Paper
ERNIE 3.0 2021/07 10 Paper
Jurassic-1 2021/08 178 Paper
FLAN 2021/10 137 Paper
MT-NLG 2021/10 530 Paper
Yuan 1.0 2021/10 245 Paper
Anthropic 2021/12 52 Paper
WebGPT 2021/12 175 Paper
Gopher 2021/12 280 Paper
ERNIE 3.0 Titan 2021/12 260 Paper
GLaM 2021/12 1200 Paper
InstructGPT 2022/01 175 Paper
AlphaCode 2022/02 41 Paper
Chinchilla 2022/03 70 Paper
PaLM 2022/04 540 Paper
Cohere 2022/06 54 Homepage
AlexaTM 2022/08 20 Paper
Luminous 2022/09 70 Docs
Sparrow 2022/09 70 Paper
WeLM 2022/09 10 Paper
U-PaLM 2022/10 540 Paper
Flan-PaLM 2022/10 540 Paper
Flan-U-PaLM 2022/10 540 Paper
GPT-4 2023/3 - Paper
PanGU-Σ 2023/3 1085 Paper

Paper List

Resources of LLMs

Publicly Available Models

  1. T5: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Checkpoint]
  2. mT5: "mT5: A massively multilingual pre-trained text-to-text transformer". Linting Xue et al. NAACL 2021. [Paper] [Checkpoint]
  3. PanGu-α: "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation". Wei Zeng et al. arXiv 2021. [Paper] [Checkpoint]
  4. CPM-2: "CPM-2: Large-scale Cost-effective Pre-trained Language Models". Zhengyan Zhang et al. arXiv 2021. [Paper] [Checkpoint]
  5. T0: "Multitask Prompted Training Enables Zero-Shot Task Generalization". Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
  6. GPT-NeoX-20B: "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Sid Black et al. arXiv 2022. [Paper] [Checkpoint]
  7. CodeGen: "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". Erik Nijkamp et al. arXiv 2022. [Paper] [Checkpoint]
  8. Tk-Instruct: "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". Yizhong Wang et al. EMNLP 2022. [Paper] [Checkpoint]
  9. UL2: "UL2: Unifying Language Learning Paradigms". Yi Tay et al. arXiv 2022. [Paper] [Checkpoint]
  10. OPT: "OPT: Open Pre-trained Transformer Language Models". Susan Zhang et al. arXiv 2022. [Paper] [Checkpoint]
  11. NLLB: "No Language Left Behind: Scaling Human-Centered Machine Translation". NLLB Team. arXiv 2022. [Paper] [Checkpoint]
  12. BLOOM: "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". BigScience Workshop. arXiv 2022. [Paper] [Checkpoint]
  13. GLM: "GLM-130B: An Open Bilingual Pre-trained Model". Aohan Zeng et al. arXiv 2022. [Paper] [Checkpoint]
  14. Flan-T5: "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper] [Checkpoint]
  15. mT0 && BLOOMZ: "Crosslingual Generalization through Multitask Finetuning". Niklas Muennighoff et al. arXiv 2022. [Paper] [Checkpoint]
  16. Galactica: "Galactica: A Large Language Model for Science". Ross Taylor et al. arXiv 2022. [Paper] [Checkpoint]
  17. OPT-IML: "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". Srinivasan et al. . arXiv 2022. [Paper] [Checkpoint]
  18. CodeGeeX: "CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X". Qinkai Zheng et al. . arXiv 2023. [Paper] [Checkpoint]
  19. Pythia: "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling". Stella Biderman et al. . arXiv 2023. [Paper] [Checkpoint]
  20. LLaMA: "LLaMA: Open and Efficient Foundation Language Models". Hugo Touvron et al. arXiv 2023. [Paper] [Checkpoint]

Closed-source Models

  1. GShard: "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". Dmitry Lepikhin et al. ICLR 2021. [Paper]
  2. GPT-3: "Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [Paper]
  3. LaMDA: "LaMDA: Language Models for Dialog Applications". Romal Thoppilan et al. arXiv 2021. [Paper]
  4. HyperCLOVA: "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers". Boseop Kim et al. EMNLP 2021. [Paper]
  5. CodeX: "Evaluating Large Language Models Trained on Code". Mark Chen et al. arXiv 2021. [Paper]
  6. ERNIE 3.0: "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". Yu Sun et al. arXiv 2021. [Paper]
  7. Jurassic-1: "Jurassic-1: Technical details and evaluation". Opher Lieber et al. 2021. [Paper]
  8. FLAN: "Finetuned Language Models Are Zero-Shot Learners". Jason Wei et al. ICLR 2021. [Paper]
  9. MT-NLG: "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". Shaden Smith et al. arXiv 2021. [Paper]
  10. Yuan 1.0: "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". Shaohua Wu et al. arXiv 2021. [Paper]
  11. Anthropic: "A General Language Assistant as a Laboratory for Alignment" . Amanda Askell et al. arXiv 2021. [Paper]
  12. WebGPT: "WebGPT: Browser-assisted question-answering with human feedback" . Reiichiro Nakano et al. arXiv 2021. [Paper]
  13. Gopher: "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". Jack W. Rae et al. arXiv 2021. [Paper]
  14. ERNIE 3.0 Titan: "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". *Shuohuan Wang et al. *arXiv 2021. [Paper]
  15. GLaM: "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". Nan Du et al. ICML 2022. [Paper]
  16. InstructGPT: "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  17. AlphaCode: "Competition-Level Code Generation with AlphaCode". Yujia Li et al. arXiv 2022. [Paper]
  18. Chinchilla: "Training Compute-Optimal Large Language Models". Jordan Hoffmann et al. arXiv. [Paper]
  19. PaLM: "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [Paper]
  20. AlexaTM: "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". Saleh Soltan et al. arXiv 2022. [Paper]
  21. Sparrow: "Improving alignment of dialogue agents via targeted human judgements". Amelia Glaese et al. . arXiv 2022. [Paper]
  22. WeLM: "WeLM: A Well-Read Pre-trained Language Model for Chinese". Hui Su et al. . arXiv 2022. [Paper]
  23. U-PaLM: "Transcending Scaling Laws with 0.1% Extra Compute". Yi Tay et al. arXiv 2022. [Paper]
  24. Flan-PaLM && Flan-U-PaLM: "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv. [Paper]
  25. GPT-4: "GPT-4 Technical Report". OpenAI. arXiv 2023. [Paper]
  26. PanGu-Σ: "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". Xiaozhe Ren et al. arXiv 2023. [Paper]

Commonly Used Corpora

  1. BookCorpus: "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". Yukun Zhu et al. ICCV 2015. [Paper] [Source]
  2. Guntenburg: [Source]
  3. CommonCrawl: [Source]
  4. C4: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Source]
  5. CC-stories-R: "A Simple Method for Commonsense Reasoning". Trieu H. Trinh el al. arXiv 2018. [Paper] [Source]
  6. CC-NEWS: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". Yinhan Liu et al. arXiv 2019. [Paper] [Source]
  7. REALNEWs: "Defending Against Neural Fake News". Rowan Zellers et al. NeurIPS 2019. [Paper] [Source]
  8. OpenWebText: [Source]
  9. Pushshift.io: "The Pushshift Reddit Dataset". Jason Baumgartner et al. AAAI 2020. [Paper] [Source]
  10. Wikipedia: [Source]
  11. BigQuery: [Source]
  12. The Pile: "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". Leo Gao et al. arxiv 2021. [Paper] [Source]
  13. ROOTS: "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]

Library Resource

  1. Transformers: "Transformers: State-of-the-Art Natural Language Processing". Thomas Wolf et al. EMNLP 2020. [Paper] [Source]
  2. DeepSpeed: "Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters". Rasley et al. KDD 2020. [Paper] [Source]
  3. Megatron-LM: "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". Mohammad Shoeybi et al. arXiv 2019. [Paper] [Source]
  4. JAX: [Source]
  5. Colossal-AI: "Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training". Zhengda Bian et al. arXiv 2021. [Paper] [Source]
  6. BMTrain: [Source]
  7. FastMoE: "FastMoE: A Fast Mixture-of-Expert Training System". Jiaao He et al. arXiv 2021. [Paper] [Source]

Deep Learning Frameworks

  1. Pytorch: "PyTorch: An Imperative Style, High-Performance Deep Learning Library". Adam Paszke el al. NeurIPS 2019. [Paper] [Source]
  2. TensorFlow: "TensorFlow: A system for large-scale machine learning". Martín Abadi et al. OSDI 2016. [Paper] [Source]
  3. MXNet: "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems". Tianqi Chen et al. arXiv 2015. [Paper] [Source]
  4. PaddlePaddle: "PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice" . Yanjun Ma et al. Frontiers of Data and Domputing 2019. [Paper] [Source]
  5. MindSpore: "Huawei MindSpore AI Development Framework" . Huawei Technologies Co., Ltd. Artificial Intelligence Technology 2022. [Paper] [Source]
  6. OneFlow: "OneFlow: Redesign the Distributed Deep Learning Framework from Scratch" . Jinhui Yuan et al. arXiv 2021. [Paper] [Source]

Pre-training

Data Collection

  1. "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]
  2. "Deduplicating Training Data Makes Language Models Better". Katherine Lee et al. ACL 2022. [paper]
  3. "Deduplicating Training Data Mitigates Privacy Risks in Language Models". Nikhil Kandpal et al. ICML 2022. [paper]
  4. "Scaling Laws and Interpretability of Learning from Repeated Data". Danny Hernandez et al. arXiv 2022. [paper]
  5. "A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity". Shayne Longpre et al. arXiv 2023. [paper]

Architecture

Mainstream Architectures

Causal Decoder

  1. "Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [paper]
  2. "OPT: Open Pre-trained Transformer Language Models". Susan Zhang et al. arXiv 2022. [paper]
  3. "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". Teven Le Scao et al. arXiv 2022. [paper]
  4. "Training Compute-Optimal Large Language Models". Jordan Hoffmann et al. arXiv 2022. [paper]
  5. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". Jack W. Rae et al. arXiv 2021. [paper]
  6. "Galactica: A Large Language Model for Science". Ross Taylor et al. arXiv 2022. [paper]
  7. "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [paper]
  8. "Jurassic-1: Technical Details and Evaluation". Opher Lieber et al. AI21 Labs. [paper]
  9. "LaMDA: Language Models for Dialog Applications". Romal Thoppilan et al. arXiv 2022. [paper]

Prefix Decoder

  1. "GLM-130B: An Open Bilingual Pre-trained Model". Aohan Zeng et al. arXiv 2022. [paper]
  2. "GLM: General Language Model Pretraining with Autoregressive Blank Infilling". Zhengxiao Du et al. ACL 2022. [paper]
  3. "Transcending Scaling Laws with 0.1% Extra Compute". Yi Tay et al. arXiv 2022. [paper]

MoE

  1. "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". William Fedus et al. JMLR. [paper]
  2. "Unified Scaling Laws for Routed Language Models". Aidan Clark et al. ICML 2022. [paper]

SSM

  1. "Pretraining Without Attention". Junxiong Wang et al. arXiv 2022. [paper]
  2. "Efficiently Modeling Long Sequences with Structured State Spaces". Albert Gu et al. ICLR 2022. [paper]
  3. "Long Range Language Modeling via Gated State Spaces". Harsh Mehta et al. arXiv 2022. [paper]
  4. "Hungry Hungry Hippos: Towards Language Modeling with State Space Models". Daniel Y. Fu et al. ICLR 2023. [paper]
Detailed Configuration

Layer Normalization

  1. RMSNorm: "Root Mean Square Layer Normalization". Biao Zhang et al. NeurIPS 2019. [paper]
  2. DeepNorm: "DeepNet: Scaling Transformers to 1,000 Layers". Hongyu Wang et al. arXiv 2022. [paper]
  3. Sandwich-LN: "CogView: Mastering Text-to-Image Generation via Transformers". Ming Ding et al. NeirIPS 2021. [paper]

Position Encoding

  1. T5 bias: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [paper]
  2. ALiBi: "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation". Ofir Press et al. ICLR 2022. [paper]
  3. RoPE: "RoFormer: Enhanced Transformer with Rotary Position Embedding". Jianlin Su et al. arXiv 2021. [paper]
  4. xPos: "A Length-Extrapolatable Transformer". Yutao Sun et al. arXiv 2022. [paper]

Attention

  1. Multi-query attention: "Fast Transformer Decoding: One Write-Head is All You Need". Noam Shazeer. arXiv 2019. [paper]
  2. FlashAttention: "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness". Tri Dao et al. NeurIPS 2022. [paper]
  3. PagedAttention: "vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention". Woosuk Kwon et al. 2023. paper(Stay Tuned) [Offical WebSite]
Analysis
  1. "What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?". Thomas Wang et al. ICML 2022. [paper]
  2. "What Language Model to Train if You Have One Million GPU Hours?". Teven Le Scao et al. Findings of EMNLP 2022. [paper]
  3. "Examining Scaling and Transfer of Language Model Architectures for Machine Translation". Biao Zhang et al. ICML 2022. [paper]
  4. "Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?". Yi Tay et al. arXiv 2022. [paper]
  5. "Do Transformer Modifications Transfer Across Implementations and Applications?". Sharan Narang et al. EMNLP 2021. [paper]

Training Algorithms

  1. "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". Mohammad Shoeybi et al. arXiv 2019. [paper]
  2. "An Efficient 2D Method for Training Super-Large Deep Learning Models". Qifan Xu et al. arXiv 2021. [paper]
  3. "Tesseract: Parallelize the Tensor Parallelism Efficiently". Boxiang Wang et al. ICPP 2022. [paper]
  4. "Maximizing Parallelism in Distributed Training for Huge Neural Networks". Zhengda Bian et al. arXiv 2021. [paper]
  5. "GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism". Yanping Huang et al. NeurIPS 2019. [paper]
  6. "PipeDream: Fast and Efficient Pipeline Parallel DNN Training". Aaron Harlap et al. arXiv 2018. [paper]
  7. "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models". Samyam Rajbhandari et al. SC 2020. [paper]
  8. "ZeRO-Offload: Democratizing Billion-Scale Model Training". Jie Ren et al. USENIX 2021. [paper]

Pre-training on Code

LLMs for Program Synthesis
  1. "Evaluating Large Language Models Trained on Code". Mark Chen et al. arXiv 2021. [paper]
  2. "Program Synthesis with Large Language Models". Jacob Austin et al. arXiv 2021. [paper]
  3. "Show Your Work: Scratchpads for Intermediate Computation with Language Models". Maxwell Nye et al. arXiv 2021. [paper]
  4. "A Systematic Evaluation of Large Language Models of Code". Frank F. Xu et al. arXiv 2022. [paper]
  5. "Competition-Level Code Generation with AlphaCode". Yujia Li et al. Science. [paper]
  6. "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". Erik Nijkamp et al. ICLR 2023. [paper]
  7. "InCoder: A Generative Model for Code Infilling and Synthesis". Daniel Fried et al. ICLR 2023. [paper]
  8. "CodeT: Code Generation with Generated Tests". Bei Chen et al. ICLR 2023. [paper]
  9. "StarCoder: may the source be with you!". Raymond Li et al. arXiv 2023. [paper]
NLP Tasks Formatted as Code
  1. "Language Models of Code are Few-Shot Commonsense Learners". Aman Madaan et al. EMNLP 2022. [paper]
  2. "Autoformalization with Large Language Models". Yuhuai Wu et al. NeurIPS 2022. [paper]

Adaptation Tuning

Instruction Tuning

  1. "Multi-Task Deep Neural Networks for Natural Language Understanding". Xiaodong Liu et al. ACL 2019. [Paper] [Homepage]
  2. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2020. [Paper] [Checkpoint]
  3. "Muppet: Massive Multi-task Representations with Pre-Finetuning". Armen Aghajanyan et al. EMNLP 2021. [Paper] [Checkpoint]
  4. "Cross-Task Generalization via Natural Language Crowdsourcing Instructions". Swaroop Mishra et al. ACL 2022. [Paper] [Collection]
  5. "Finetuned Language Models Are Zero-Shot Learners". Jason Wei et al. ICLR 2022. [Paper] [Homepage]
  6. "Multitask Prompted Training Enables Zero-Shot Task Generalization". Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
  7. "PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts". Stephen H. Bach et al. ACL 2022. [Paper] [Collection]
  8. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  9. "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". Yizhong Wang et al. EMNLP 2022. [Paper] [Collection] [Checkpoint]
  10. "MVP: Multi-task Supervised Pre-training for Natural Language Generation". Tianyi Tang et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
  11. "Crosslingual Generalization through Multitask Finetuning". Niklas Muennighoff et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
  12. "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper] [Homepage]
  13. "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor". Or Honovich et al. arXiv 2022. [Paper] [Homepage]
  14. "Self-Instruct: Aligning Language Model with Self Generated Instructions". Yizhong Wang et al. arXiv 2022. [Paper] [Homepage]
  15. "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". Srinivasan Iyer et al. arXiv 2022. [Paper] [Checkpoint]
  16. "The Flan Collection: Designing Data and Methods for Effective Instruction Tuning". Shayne Longpre et al. arXiv 2023. [Paper] [Homepage]
  17. "Is Prompt All You Need No. A Comprehensive and Broader View of Instruction Learning". Renze Lou et al. arXiv 2023. [Paper]
  18. "Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning". Hao Chen et al. arXiv 2023. [Paper]
  19. "LIMA: Less Is More for Alignment". Chunting Zhou. arXiv 2023. [Paper]

Alignment Tuning

  1. "TAMER: Training an Agent Manually via Evaluative Reinforcement". W. Bradley Knox et al. ICDL 2008. [Paper]
  2. "Interactive Learning from Policy-Dependent Human Feedback". James MacGlashan et al. ICML 2017. [Paper]
  3. "Deep Reinforcement Learning from Human Preferences". Paul Christiano et al. NIPS 2017. [Paper]
  4. "Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces". Garrett Warnell et al. AAAI 2018. [Paper]
  5. "Fine-Tuning Language Models from Human Preferences". Daniel M. Ziegler et al. arXiv 2019. [Paper]
  6. "Learning to summarize from human feedback". Nisan Stiennon et al. NeurIPS 2020. [Paper]
  7. "Alignment of Language Agents". Zachary Kenton et al. arXiv 2021. [Paper]
  8. "Recursively Summarizing Books with Human Feedback". Jeff Wu et al. arXiv 2021. [Paper]
  9. "A General Language Assistant as a Laboratory for Alignment". Amanda Askell et al. arXiv 2021. [Paper]
  10. "WebGPT: Browser-assisted question-answering with human feedback". Reiichiro Nakano et al. arXiv 2021. [Paper]
  11. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  12. "Teaching language models to support answers with verified quotes". Jacob Menick et al. arXiv 2022. [Paper]
  13. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". Yuntao Bai et al. arXiv 2022. [Paper]
  14. "Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning". Deborah Cohen et al. arXiv 2022. [Paper]
  15. "Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned". Deep Ganguli et al. arXiv 2022. [Paper]
  16. "Improving alignment of dialogue agents via targeted human judgements". Amelia Glaese et al. arXiv 2022. [Paper]
  17. "Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization". Rajkumar Ramamurthy et al. arXiv 2022. [Paper]
  18. "Scaling Laws for Reward Model Overoptimization". Leo Gao et al. arXiv 2022. [Paper]
  19. "The Wisdom of Hindsight Makes Language Models Better Instruction Followers". Tianjun Zhang et al. arXiv 2023. [Paper]
  20. "RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment". Hanze Dong et al. arXiv 2023. [Paper]
  21. "Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment". Rishabh Bhardwaj et al. arXiv 2023. [Paper]

Parameter-Efficient Model Adaptation

  1. "Parameter-Efficient Transfer Learning for NLP". Neil Houlsby et al. ICML 2019. [Paper] [GitHub]
  2. "MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer". Jonas Pfeiffer et al. EMNLP 2020. [Paper] [GitHub]
  3. "AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts". Taylor Shin et al. EMNLP 2020. [Paper] [GitHub]
  4. "Prefix-Tuning: Optimizing Continuous Prompts for Generation". Xiang Lisa Li et al. ACL 2021. [Paper] [GitHub]
  5. "GPT Understands, Too". Xiao Liu et al. arXiv 2021. [Paper] [GitHub]
  6. "The Power of Scale for Parameter-Efficient Prompt Tuning". Brian Lester et al. EMNLP 2021. [Paper]
  7. "LoRA: Low-Rank Adaptation of Large Language Models". Edward J. Hu et al. arXiv 2021. [Paper] [GitHub]
  8. "Towards a Unified View of Parameter-Efficient Transfer Learning". Junxian He et al. ICLR 2022. [Paper] [GitHub]
  9. "P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks". Xiao Liu et al. ACL 2022. [Paper] [GitHub]
  10. "DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation". Mojtaba Valipour et al. EACL 2023. [Paper] [GitHub]
  11. "Parameter-efficient fine-tuning of large-scale pre-trained language models". Ning Ding et al. Nat Mach Intell. [Paper] [GitHub]
  12. "Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning". Qingru Zhang et al. arXiv 2023. [Paper] [GitHub]
  13. "LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention". Renrui Zhang et al. arXiv 2023. [Paper] [GitHub]
  14. "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models". Zhiqiang Hu et al. arXiv 2023. [Paper] [GitHub]

Memory-Efficient Model Adaptation

  1. "A Survey of Quantization Methods for Efficient Neural Network Inference". Amir Gholami et al. arXiv 2021. [Paper]
  2. "8-bit Optimizers via Block-wise Quantization". Tim Dettmers et al. arXiv 2021. [Paper]
  3. "Compression of Generative Pre-trained Language Models via Quantization". Chaofan Tao et al. ACL 2022. [Paper]
  4. "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers". Zhewei Yao et al. NeurIPS 2022. [Paper] [GitHub]
  5. "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale". Tim Dettmers et al. arXiv 2022. [Paper] [GitHub]
  6. "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers". Elias Frantar et al. ICLR 2023. [Paper] [GitHub]
  7. "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models". Guangxuan Xiao et al. arXiv 2022. [Paper] [GitHub]
  8. "The case for 4-bit precision: k-bit Inference Scaling Laws". Tim Dettmers et al. arXiv 2022. [Paper]
  9. "ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation". Zhewei Yao et al. arXiv 2023. [Paper]
  10. "QLoRA: Efficient Finetuning of Quantized LLMs". Tim Dettmers et al. arXiv 2023. [Paper] [GitHub]
  11. "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models". Zechun Liu et al. arXiv 2023. [Paper]
  12. "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration". Ji Lin et al. arXiv 2023. [Paper] [GitHub]

Utilization

In-Context Learning (ICL)

  1. "An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels". Taylor Sorensen et al. ACL 2022. [Paper]
  2. "What Makes Good In-Context Examples for GPT-3?". Jiachang Liu et al. ACL 2022. [Paper]
  3. "Learning to retrieve prompts for in-context learning". Ohad Rubin et al. NAACL 2022. [Paper]
  4. "Diverse demonstrations improve in-context compositional generalization". Itay Levy et al. arXiv 2022. [Paper]
  5. "Demystifying Prompts in Language Models via Perplexity Estimation". Hila Gonen et al. arXiv 2022. [Paper]
  6. "Active Example Selection for In-Context Learning". Yiming Zhang et al. EMNLP 2022. [Paper]
  7. "Self-adaptive In-context Learning". Zhiyong Wu et al. arXiv 2022. [Paper]
  8. "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Yao Lu et al. ACL 2022. [Paper]
  9. "Structured Prompting: Scaling In-Context Learning to 1,000 Examples". Hao, Yaru et al. arXiv 2022. [Paper]
  10. "The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning". Ye, Xi et al. arXiv 2022. [Paper]
  11. "Cross-Task Generalization via Natural Language Crowdsourcing Instructions". Swaroop Mishra et al. ACL 2022. [Paper]
  12. "Prompt-Augmented Linear Probing: Scaling Beyond the Limit of Few-shot In-Context Learner". Hyunsoo Cho et al. arXiv 2022. [Paper]
  13. "An Explanation of In-context Learning as Implicit Bayesian Inference". Sang Michael Xie et al. ICLR 2022. [Paper]
  14. "Calibrate Before Use: Improving Few-Shot Performance of Language Models". Zihao Zhao et al. ICML 2021. [Paper]
  15. "Data distributional properties drive emergent in-context learning in transformers". Stephanie C. Y. Chan et al. arXiv 2022. [Paper]
  16. "In-context Learning and Induction Heads". Catherine Olsson et al. arXiv 2022. [Paper]
  17. "On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model". Seongjin Shin et al. NAACL 2022. [Paper]
  18. "Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?". Sewon Min et al. EMNLP 2022. [Paper]
  19. "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale". Hritik Bansal et al. arXiv 2022. [Paper]
  20. "Transformers as algorithms: Generalization and implicit model selection in in-context learning". Yingcong Li et al. arXiv 2023. [Paper]
  21. "Transformers learn in-context by gradient descent". Johannes von Oswald et al. arXiv 2022. [Paper]
  22. "What learning algorithm is in-context learning? investigations with linear models". Ekin Aky{"{u}}rek et al. arXiv 2022. [Paper]
  23. "A Survey for In-context Learning". Qingxiu Dong et al. arXiv 2023. [Paper]
  24. What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. Jane Pan et al. arXiv 2023. [Paper]
  25. The Learnability of In-Context Learning. Noam Wies et al. arXiv 2023. [Paper]
  26. Do Prompt-Based Models Really Understand the Meaning of Their Prompts? Albert Webson et al. NAACL 2022. [Paper]
  27. Larger language models do in-context learning differently. Jerry Wei. arXiv 2023. [Paper]
  28. Meta-in-context learning in large language models. Julian Coda-Forno. arXiv 2023. [Paper]
  29. Symbol tuning improves in-context learning in language models. Jerry Wei. arXiv 2023. [Paper]

Chain-of-Thought Reasoning (CoT)

  1. "Automatic Chain of Thought Prompting in Large Language Models". Zhuosheng Zhang et al. arXiv 2022. [Paper]
  2. "Chain of Thought Prompting Elicits Reasoning in Large Language Models". Jason Wei et al. arXiv 2022. [Paper]
  3. "STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning". Zelikman et al. arXiv 2022. [Paper]
  4. "Large language models are zero-shot reasoners". Takeshi Kojima et al. arXiv 2022. [Paper]
  5. "Automatic Chain of Thought Prompting in Large Language Models". Zhuosheng Zhang et al. arXiv. [Paper]
  6. "Complexity-Based Prompting for Multi-Step Reasoning". Yao Fu et al. arXiv 2022. [Paper]
  7. "Language Models are Multilingual Chain-of-Thought Reasoners". Freda Shi et al. arXiv 2022. [Paper]
  8. "Rationale-Augmented Ensembles in Language Models". Xuezhi Wang et al. arXiv 2022. [Paper]
  9. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models". Denny Zhou et al. arXiv 2022. [Paper]
  10. "Multimodal Chain-of-Thought Reasoning in Language Models". Zhuosheng Zhang et al. arXiv 2023. [Paper]
  11. "Self-Consistency Improves Chain of Thought Reasoning in Language Models". Xuezhi Wang et al. arXiv 2022. [Paper]
  12. "Large Language Models Can Self-Improve". Jiaxin Huang et al. arXiv 2022. [Paper]
  13. "Training Verifiers to Solve Math Word Problems". Karl Cobbe et al. arXiv 2021. [Paper]
  14. "On the Advance of Making Language Models Better Reasoners". Yifei Li et al. arXiv 2022. [Paper]
  15. "Large Language Models are reasoners with Self-Verification". Yixuan Weng et al. arXiv 2022. [Paper]
  16. "Teaching small language models to reason". Lucie Charlotte Magister et al. arXiv 2022. [Paper]
  17. "Large language models are reasoning teachers". Namgyu Ho et al. arXiv 2022. [Paper]
  18. "The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning". Ye, Xi et al. arXiv 2022. [Paper]
  19. "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper]
  20. "Solving Quantitative Reasoning Problems with Language Models". Aitor Lewkowycz et al. arXiv 2022. [Paper]
  21. "Text and patterns: For effective chain of thought, it takes two to tango". Aman Madaan et al. arXiv 2022. [Paper]
  22. "Challenging BIG-Bench tasks and whether chain-of-thought can solve them". Mirac Suzgun et al. arXiv 2022. [Paper]
  23. "Reasoning with Language Model Prompting: A Survey". Shuofei Qiao et al. arXiv 2022. [Paper]
  24. "Towards Reasoning in Large Language Models: A Survey". Jie Huang et al. arXiv 2022. [Paper]

Planning for Complex Task Solving

  1. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. Denny Zhou et al. ICLR 2023. [Paper]
  2. PAL: Program-aided Language Models. Luyu Gao et al. ICML 2023. [Paper]
  3. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. Lei Wang et al. ACL 2023. [Paper]
  4. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. Ishika Singh et al. ICRA 2022. [Paper]
  5. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. Shunyu Yao et al. arXiv 2023. [Paper]
  6. Voyager: An Open-Ended Embodied Agent with Large Language Models. Guanzhi Wang et al. arXiv 2023. [Paper]
  7. Reflexion: Language Agents with Verbal Reinforcement Learning. Noah Shinn et al. arXiv 2023. [Paper]
  8. Multimodal Procedural Planning via Dual Text-Image Prompting. Yujie Lu et al. arXiv 2023. [Paper]
  9. Self-planning Code Generation with Large Language Model. Xue Jiang et al. arXiv 2023. [Paper]
  10. Decomposed Prompting: A Modular Approach for Solving Complex Tasks. Tushar Khot et al. ICLR 2023 [Paper]
  11. Toolformer: Language Models Can Teach Themselves to Use Tools. Timo Schick et al. arXiv 2023. [Paper]
  12. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. Yongliang Shen et al. arXiv 2023. [Paper]
  13. Faithful Chain-of-Thought Reasoning. Qing Lyu et al. arXiv 2023. [Paper]
  14. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. Bo Liu et al. arXiv 2023. [Paper]
  15. Reasoning with Language Model is Planning with World Model. Shibo Hao et al. arXiv 2023. [Paper]
  16. Generative Agents: Interactive Simulacra of Human Behavior. Joon Sung Park et al. arXiv 2023. [Paper]
  17. ReAct: Synergizing Reasoning and Acting in Language Models. Shunyu Yao et al. ICLR 2023. [Paper]
  18. ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models. Zhipeng Chen et al. arXiv 2023. [Paper]
  19. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents. Zihao Wang et al. arXiv 2023. [Paper]
  20. AdaPlanner: Adaptive Planning from Feedback with Language Models. Haotian Sun et al. arXiv 2023. [Paper]

Capacity Evaluation

  1. "Measuring Massive Multitask Language Understanding". Dan Hendrycks et al. ICLR 2021. [Paper]
  2. "Persistent Anti-Muslim Bias in Large Language Models". Abubakar Abid et al. AIES 2021. [Paper]
  3. "Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models". Alex Tamkin et al. arXiv 2021. [Paper]
  4. "BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments". Sanjana Srivastava et al. CoRL 2021. [Paper]
  5. "Program Synthesis with Large Language Models". Jacob Austin et al. arXiv 2021. [Paper]
  6. "Training Verifiers to Solve Math Word Problems". Karl Cobbe et al. arXiv 2021. [Paper]
  7. "Show Your Work: Scratchpads for Intermediate Computation with Language Models". Maxwell I. Nye et al. arXiv 2021. [Paper]
  8. "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents". Wenlong Huang et al. ICML 2022. [Paper]
  9. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". Jason Wei et al. NeurIPS 2022. [Paper]
  10. "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
  11. "Competition-Level Code Generation with AlphaCode". Yujia Li et al. Science 2022. [Paper]
  12. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances". Michael Ahn et al. arXiv 2022. [Paper]
  13. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". Yuntao Bai et al. arXiv 2022. [Paper]
  14. "Autoformalization with Large Language Models". Yuhuai Wu et al. NeurIPS 2022. [Paper]
  15. "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models". Aarohi Srivastava et al. arXiv 2022. [Paper]
  16. "Exploring Length Generalization in Large Language Models". Cem Anil et al. NeurIPS 2022. [Paper]
  17. "Few-shot Learning with Retrieval Augmented Language Models". Gautier Izacard et al. arXiv 2022. [Paper]
  18. "Limitations of Language Models in Arithmetic and Symbolic Induction". Jing Qian et al. arXiv 2022. [Paper]
  19. "Code as Policies: Language Model Programs for Embodied Control". Jacky Liang et al. arXiv 2022. [Paper]
  20. "ProgPrompt: Generating Situated Robot Task Plans using Large Language Models". Ishika Singh et al. arXiv 2022. [Paper]
  21. "Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans". John J. Nay et al. arXiv 2022. [Paper]
  22. "Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought". Abulhair Saparov et al. ICLR 2023. [Paper]
  23. "Language Models are Multilingual Chain-of-Thought Reasoners". Freda Shi et al. ICLR 2023. [Paper]
  24. "Re3: Generating Longer Stories With Recursive Reprompting and Revision". Kevin Yang et al. EMNLP 2022. [Paper]
  25. "Language Models of Code are Few-Shot Commonsense Learners". Aman Madaan et al. EMNLP 2022. [Paper]
  26. "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them". Mirac Suzgun et al. arXiv 2022. [Paper]
  27. "Large Language Models Can Self-Improve". Jiaxin Huang et al. arXiv 2022. [Paper]
  28. "Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs". Albert Q. Jiang et al. ICLR 2023. [Paper]
  29. "Holistic Evaluation of Language Models". Percy Liang et al. arXiv 2022. [Paper]
  30. "PAL: Program-aided Language Models". Luyu Gao et al. arXiv 2022. [Paper]
  31. "Legal Prompt Engineering for Multilingual Legal Judgement Prediction". Dietrich Trautmann et al. arXiv 2022. [Paper]
  32. "How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment". Aidan Gilson et al. medRxiv 2022. [Paper]
  33. "ChatGPT: The End of Online Exam Integrity?". Teo Susnjak et al. arXiv 2022. [Paper]
  34. "Large Language Models are reasoners with Self-Verification". Yixuan Weng et al. arXiv 2022. [Paper]
  35. "Self-Instruct: Aligning Language Model with Self Generated Instructions". Yizhong Wang et al. arXiv 2022. [Paper]
  36. "ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports". Katharina Jeblick et al. arXiv 2022. [Paper]
  37. "The End of Programming". Matt Welsh et al. ACM 2023. [Paper]
  38. "Chatgpt goes to law school". Choi Jonathan H et al. SSRN 2023. [Paper]
  39. "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection". Biyang Guo et al. arXiv 2023. [Paper]
  40. "Is ChatGPT A Good Translator? A Preliminary Study". Wenxiang Jiao et al. arXiv 2023. [Paper]
  41. "Could an Artificial-Intelligence agent pass an introductory physics course?". Gerd Kortemeyer et al. arXiv 2023. [Paper]
  42. "Mathematical Capabilities of ChatGPT". Simon Frieder et al. arXiv 2023. [Paper]
  43. "Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models". Zhihong Shao et al. arXiv 2023. [Paper]
  44. "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning". Thomas Carta et al. arXiv 2023. [Paper]
  45. "Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making". Arya Yao et al. medRxiv 2023. [Paper]
  46. "Theory of Mind May Have Spontaneously Emerged in Large Language Models". Michal Kosinski et al. arXiv 2023. [Paper]
  47. "A Categorical Archive of ChatGPT Failures". Ali Borji et al. arXiv 2023. [Paper]
  48. "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity". Yejin Bang et al. arXiv 2023. [Paper]
  49. "Toolformer: Language Models Can Teach Themselves to Use Tools". Timo Schick et al. arXiv 2023. [Paper]
  50. "Is ChatGPT a General-Purpose Natural Language Processing Task Solver?". Chengwei Qin et al. arXiv 2023. [Paper]
  51. "How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation". Hendy Amr et al. arXiv 2023. [Paper]
  52. "Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT". Qihuang Zhong et al. arXiv 2023. [Paper]
  53. "Zero-Shot Information Extraction via Chatting with ChatGPT". Xiang Wei et al. arXiv 2023. [Paper]
  54. "ChatGPT: Jack of all trades, master of none". Jan Kocon et al. arXiv 2023. [Paper]
  55. "On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective". Jindong Wang et al. arXiv 2023. [Paper]
  56. "Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback". Baolin Peng et al. arXiv 2023. [Paper]
  57. "An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)". Paulo Shakarian et al. arXiv 2023. [Paper]
  58. "How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks". Chen Xuanting et al. arXiv 2023. [Paper]
  59. "The utility of ChatGPT for cancer treatment information". Shen Chen et al. medRxiv 2023. [Paper]
  60. "Can ChatGPT Assess Human Personalities? A General Evaluation Framework". Haocong Rao et al. arXiv 2023. [Paper]
  61. "Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT.". Mostafa M. Amin et al. arXiv 2023. [Paper]
  62. "Exploring the Feasibility of ChatGPT for Event Extraction.". Jun Gao et al. arXiv 2023. [Paper]
  63. "Does Synthetic Data Generation of LLMs Help Clinical Text Mining?". Tang Ruixiang et al. arXiv 2023. [Paper]
  64. "Consistency Analysis of ChatGPT". Myeongjun Jang et al. arXiv 2023. [Paper]
  65. "Self-planning Code Generation with Large Language Model". Shun Zhang et al. ICLR 2023. [Paper]
  66. "Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions". Yiming Tan et al. arXiv 2023. [Paper]
  67. "GPT-4 Technical Report". OpenAI et al. OpenAI 2023. [Paper]
  68. "A Short Survey of Viewing Large Language Models in Legal Aspect". Zhongxiang Sun et al. arXiv 2023. [Paper]
  69. "ChatGPT Participates in a Computer Science Exam". Sebastian Bordt et al. arXiv 2023. [Paper]
  70. "A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models". Junjie Ye et al. arXiv 2023. [Paper]
  71. "On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree?". Kamil Malinka et al. arXiv 2023. [Paper]
  72. "Sparks of Artificial General Intelligence: Early experiments with GPT-4". S'ebastien Bubeck et al. arXiv 2023. [Paper]
  73. "Is ChatGPT A Good Keyphrase Generator? A Preliminary Study". Mingyang Song et al. arXiv 2023. [Paper]
  74. "Capabilities of GPT-4 on Medical Challenge Problems". Harsha Nori et al. arXiv 2023. [Paper]
  75. "Can we trust the evaluation on ChatGPT?". Rachith Aiyappa et al. arXiv 2023. [Paper]
  76. "ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks". Fabrizio Gilardi et al. arXiv 2023. [Paper]
  77. "Evaluation of ChatGPT for NLP-based Mental Health Applications". Bishal Lamichhane et al. arXiv 2023. [Paper]
  78. "ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models". Bian Ning et al. arXiv 2023. [Paper]
  79. "Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams". Desnes Nunes et al. arXiv 2023. [Paper]
  80. "Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure". Philipp Koralus et al. arXiv 2023. [Paper]
  81. "Yes but.. Can ChatGPT Identify Entities in Historical Documents?". Carlos-Emiliano González-Gallardo et al. arXiv 2023. [Paper]
  82. "Uncovering ChatGPT's Capabilities in Recommender Systems". Sunhao Dai et al. arXiv 2023. [Paper]
  83. "Editing Large Language Models: Problems, Methods, and Opportunities". Yunzhi Yao et al. arXiv 2023. [Paper]
  84. "Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity". Terry Yue Zhuo et al. arXiv 2023. [Paper]
  85. "On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex". Terry Yue Zhuo et al. EACL 2023. [Paper]
  86. "A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets". Laskar et al.* ACL'23. [Paper]
  87. "Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment". Rishabh Bhardwaj et al. arXiv 2023. [Paper]

The Team

Here is the list of our student contributors in each section.

Section Student Contributors
The whole paper Kun Zhou, Junyi Li
Overview && Resources of LLMs Yingqian Min (Lead), Chen Yang
Pretraining Yupeng Hou (Lead), Junjie Zhang, Zican Dong, Yushuo Chen
Adaptaion Tuning Tianyi Tang (Lead), Jinhao Jiang, Ruiyang Ren, Zikang Liu, Peiyu Liu
Utilization Xiaolei Wang (Lead), Yifan Du, Xinyu Tang
Capacity Evaluation Beichen Zhang (Lead), Zhipeng Chen, Yifan Li

Acknowledgments

The authors would like to thank Yankai Lin and Yutao Zhu for proofreading this paper. Since the first release of this paper, we have received a number of valuable comments from the readers. We sincerely thank the readers who have written to us with constructive suggestions and comments: Tyler Suard, Damai Dai, Liang Ding, Stella Biderman, Kevin Gray, Jay Alammar and Yubo Feng.

Update Log

Version Time Update Content
V1 2023/03/31 The initial version.
V2 2023/04/09 Add the affiliation information.
Revise Figure 1 and Table 1 and clarify the
corresponding selection criterion for LLMs.
Improve the writing.
Correct some minor errors.
V3 2023/04/11 Correct the errors for library resources.
V4 2023/04/12 Revise Figure 1 and Table 1 and clarify the release date of LLMs.
V5 2023/04/16 Add a new Section 2.2 about
the technical evolution of GPT-series models.
V6 2023/04/24 Add some new models in Table 1 and Figure 1.
Add the discussion about scaling laws.
Add some explanations about the
model sizes for emergent abilities (Section 2.1).
Add an illustrative figure for the attention patterns
for different architectures in Figure 4.
Add the detailed formulas in Table 4.
V7 2023/04/25 Revise some copy errors in figures and tables.
V8 2023/04/27 Add efficient tuning in Section 5.3
V9 2023/04/28 Revise Section 5.3
V10 2023/05/07 Revise Table 1, Table 2, and some minor points.
V11
(major revision)
2023/06/29 – Section 1: add Figure 1 for the trends of published
LLM papers in arXiv;
– Section 2: add Figure 3 for GPT’s evolution and the
corresponding discussion;
– Section 3: add Figure 4 for LLaMA family and the
corresponding discussion;
– Section 5: add latest discussion about the synthetic
data formatting of instruction tuning in Section 5.1.1,
the empirical analysis for instruction tuning in Sec-
tion 5.1.4, parameter-efficient model adaptation in
Section 5.3 and memory-efficient adaptation in Sec-
tion 5.4;
– Section 6: add latest discussion about the underlying
mechanism of ICL 6.1.3, planning for complex task
solving in Section 6.3;
– Section 7: add Table 10 for representative datasets for
evaluating advanced abilities of LLMs, and empirical
ability evaluation in Section 7.3.2;
– Section 8: add prompt design;
– Section 9: add the discussions on applications of
LLMs in finance and scientific research domains;

llmsurvey's People

Contributors

bhardwaj-rishabh avatar dbz0825 avatar deeptecher avatar eliverq avatar eltociear avatar hyp1231 avatar lancelot39 avatar merrymercy avatar steventang1998 avatar terryyz avatar toheartzhang avatar wxl1999 avatar zxlzr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llmsurvey's Issues

How to get the Ratios of various data sources in the pre-training data?

How u get Ratios of various data sources in the pre-training data for existing LLMs in Fig2?
As for me, the data in the Fig2 differs from the paper I read.
For example, GPT-3 paper (https://arxiv.org/abs/2005.14165) did not mention conversation or code data. But in Fig2 GPT-3 used conversation and code data as pretrain data.
And for PaLM, the Proportion of data in Table 2(https://arxiv.org/pdf/2204.02311.pdf) was also different from your ratios.

image

image

Suggestion: Incorporation of Hugging Face's OpenLLM Leaderboard and Results in Future Revision

Dear Authors,

I recently had the pleasure of reading your "A Survey of Large Language Models" paper. The content is insightful, comprehensive, and provides a remarkable reference point for those who are interested in the world of large language models.

As an extension to the wealth of knowledge you've presented, I believe there's an additional resource that could be of great value for your readers. Hugging Face's open LLM leaderboard offers an up-to-date view on the performance and capabilities of the most current language models. The leaderboard can be found here: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.

By including these results in your paper, I believe your readers will have a more immediate understanding of the present landscape of large language models. This addition could especially benefit readers who are keen on understanding the latest advancements and top-performing models in the field.

As a suggestion, you might also consider highlighting a few of the top models from the leaderboard. This could provide a snapshot of the current state-of-the-art, helping readers to appreciate how these models are advancing the field in real-time.

In conclusion, I'd like to reiterate how impressed I was with your work and how much I enjoyed reading your paper. I firmly believe that the inclusion of the Hugging Face leaderboard and its results would provide a significant addition to your paper's already rich content.

Thank you for considering my suggestion. I am very much looking forward to the next revision of your paper!

请教一个问题,LORA,QLORA等高效微调的方式与全参数微调方式在性能方面具体有多少差距?

目前7B版本的模型基座可以使用QLORA等方式在单卡3090或者双卡3090上微调起来,但是对于很多玩家,目前手里并没有那么大的显卡做全参数的微调。请问可以在综述之后的版本中,增加实验,来验证一下相同模型基座使用LORA、QLORA等方式和全参数微调的具体性能区别么?同时我也想知道,大参数量模型的高效微调和小参数量模型的全参数量微调的性能对比情况。

Some errors in Table 1

I think there might be some errors in Table 1.

  1. The training resources(Tokens, Hardware and Training Time) for Baidu's ERNIE 3.0 are incorrect. The data is only estimated according to GPT-3 in the paper, not for ERNIE 3.0 Titan, as shown below.

image

  1. The training tokens for LaMDA are not 2.81T; that number represents the entire dataset. In actual training, 768B tokens were used, based on the paper's mention of 256K tokens per batch and a total of 3M steps, as shown below.
    image

Comments on paper

First - nicely done. This must have been a herculean effort to review all of these papers. Here are some ideas:

  1. It would be nice to include more information about Falcon when their paper is released (still "coming soon" per HF). In particular, it seems that the creators of Falcon made a decision to use multi-query attention with an eye toward inference speed. It might nice to provide a little more detail about how different architecture choices (e.g. attention mechanisms, etc.) impact tokens generated per second, which is what engineers and the open source community are very focused on (and quality of generation of course too). Tokens/second really impacts the user experience plus I would love to see how people are thinking about truly enormous context sizes.
  2. This is a small point and feel free to disregard it but the word "besides" has a certain usage pattern among native English speakers. It's commonly used as follows: make a claim about something in your first sentence, then say, "besides", and then make an even stronger claim that basically says, feel free to disregard the first claim because here's an even stronger claim. Here's an example: Tom would never survive life in the army; he's not tough enough. Besides, he's too old to be accepted. The point here is that every time you use "besides" in the paper, you undermine the strength of the sentence before "besides", which is not what you're trying to do. One final note is that "besides" is pretty colloquial and is seldom used in professional writing. What you're really looking for here are the following three linking phrases: also, in addition, and furthermore.

Again, thank you for your work here. There's so much happening in the LLM space so up-to-date reviews like this are really helpful.

samples error?

in 4.3.1 Batch Training
Batch Training. For language model pre-training, existing work generally sets the batch size to a large number (e.g., 8,196 examples or 1.6M tokens)

maybe there are 8196*2048=16785408=16M tokens?

Typos

V11-06/29版本 Page 34
Quantized LLMs小节
nubmer --> number

Welcome more tips for designing prompts

We welcome everyone to provide us with more relevant tips in the form of issues. After selection, we will regularly update them on GitHub and indicate the source.

If you are interested, please open new issues just like #32.

Thank you for your contribution!

An issue example of new prompt designing tips.

Ingredient : Task Description

Prompt content: Make your prompt as detailed as possible, e.g., "Summarize the article into a short paragraph within 50 words. The major storyline and conclusion should be included, and the unimportant details can be omitted."

Principles: expressing the task goal clearly

Typos

  1. Wikipeida: [Source]

should be:

wikipedia

should contain prefix-tunine in chapter 5

In Chapter 5, besides instruction tuning and alignment tuning, There are also other ways of finetuning LLM to solve different tasks such as prefix-tuning.

I suggest adding it to this review because I think the arrangement now is quite confusing for beginners. More techniques could be covered like blog. The actual trend is "auto prompt, prefix-tuning,P-tuning, prompt tuning(instruction tuning)"

Letter from Colossal-AI Team

Dear authors of arXiv:2303.18223 Team:

Thank you for mentioning Colossal-AI in your latest paper, “A Survey of Large Language Models.”

However, there are serious factual errors in the introduction to Colossal-AI at the end of page 7:

  1. Colossal-AI is developed by HPC-AI Tech (https://www.hpc-ai.tech/) rather than EleutherAI.
  2. Colossal-AI is developed based on PyTorch rather than JAX.
  3. Colossal-AI provides comprehensive and efficient distributed optimization for large models, rather than mixed-precision training.
    It not only includes all the DeepSpeed and Megatron-LM optimization techniques mentioned above, but also further provides more parallel strategies, automatic parallelism, and better heterogeneous memory optimization to improve the efficiency of large model training and inference, and reduce the cost of AI large model applications.
  4. Colossal-AI is applicable to diverse large models, not just LLMs, such as Biomedicine (AlphaFold), AIGC (Stable Diffusion), etc.

These can be verified in our open source repo: https://github.com/hpcaitech/ColossalAI

We have sent this email to [email protected] on April 3rd but have never heard back. We would appreciate it if you could fix these issues on the paper in time. Thank you very much.

Sincerely,
Colossal-AI Team

Comments and tips for the prompt

Hi,

very solid and useful works.

In this repository they suggest an approach similar to tree-of-thoughts but which should be done in one prompt

an example of this type of prompt:

Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is...

Another interesting approach has been described in this paper: Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models where they have collected impressive datasets about college questions. The authors decided to test different techniques such as self-critique, the chain of thought, and few-shot to see how these impact. In addition, the authors decided to test a new approach the authors call expert prompting. In short, the authors ask the model to nominate experts for a question and what the response of these experts would be. Finally, based on these responses make a collective decision.

example of expert prompting:

# from the official repository: https://github.com/idrori/MITQ/blob/main/code/experts.py
generic_expert = f"an MIT Professor of {department} teaching the {course_name} course"
You are " + generic_expert + f". Give an educated guess of who are three experts most capable of solving the following question.
\n Question: {question}.\n Return a comma-separated list of three names."

#example from the article

E = You are an MIT Professor of Computer Science and Mathematics teaching Calculus I.
P3 = Give an educated guess of the three experts most capable of solving this question.
System: You are E.
User: Solve Q

About CoT an interesting article about CoT has just been published by anthropic [here](Measuring Faithfulness in Chain-of-Thought Reasoning) that could be interesting to include in the review

At what model size does LLM begin to exhibit emergent abilities?

At what model size does LLM begin to exhibit emergent abilities?

In "2.1 Background for LLMs", paper says "some abilities are unpredictable according to the scaling law, which can be observed only when the model size exceeds a certain level (as discussed below)."

But when I go to the below, I do not found the certain level between emergent and un-emergent.

Typos

Version: arXiv:2303.18223v11 [cs.CL] 29 Jun 2023

section 2.2 decoder-onlly -> decoder-only

And another issue:

Figure 3 seems not be cited in the content. Other figures may also have this problem.

An explanation of the model selection rule in Figure 1

As we mention in the survey, we only include LLMs (larger than 10B) with publicly reported evaluation results in Figure 1. Excluding models with papers (because formal evaluation results are generally included in papers), models without papers contain Cohere, YaLM, Luminous, ChatGPT, Bard, and Vicuna. Among these models:

  • Cohere, YaLM, Luminous, and ChatGPT are evaluated by HELM.
  • Vicuna reports its results compared with other models at here.
  • Bard is evaluated by paper 1, paper 2, and paper 3.

While some models do not comply with the criteria, they have played an important role in the development of large language models. We add them to the list and provide corresponding links for those who need them.
We will continue collecting related models but will not be adding them until May 2023. Please let us know if you come across any models that meet the inclusion criteria. Thank you to everyone who provided suggestions for our paper.

What are the limitations of LLMs besides "hallucinations" ?

Hi all,

Thank you for your efforts in conveying such a wonderful survey paper!
I have followed your fabulous work from version 5 until now.
Here is the link to my previous issues

I have a new question: what are the other limitations of LLMs?

Nowadays, there are overwhelming papers focusing on the "hallucination" of LLMs, but I wonder if LLMs have any other drawbacks.

When I searched for "hallucination" in your paper or explored the outlines, there was no section discussing the limitations of LLMs.

I believe that adding a "Limitations" section will enhance the paper!

Best wishes!

good

好人呐,作者整理的非常全面!

About the grammar of "while"

There are more than two dozen appearances of the conjunction "while" followed by a comma at the beginning of the sentence in the paper ("While,"). Every time I came upon the word, I expected an interrupter to follow, only to find none. I believe this usage is grammatically incorrect, and it becomes too painful to overlook as I read, since there are just too many of them. Please see this link for a detailed explanation.

A possible substitution for most of them, if not all, would be the word "however", which is often followed by a comma at the beginning of a sentence. It would be great if it could be corrected.

Besides, the word "sublayer" for Post Norm in Table 4 is misspelled as "su-layer-b".

Data error in KM scaling law

Regarding the KM scaling law in Chapter 2.1 of your paper, the model size range should be 768~1.5B, not 7.68B~1.5B, according to Figure 1(c) in the original OpenAI paper.

new chinese version and github page

hi,
thanks for your awesome work!
seems like you submit the v12 to the arxiv, could you pls update the github page and chinese version accordingly

many thanks!

大模型知识增强,比如检索增强

你好!现在有看到一些特定领域的大模型,为了减少幻觉,采用外挂知识库或者知识图谱检索增强的形式来做,也取得了不错的效果。后续有打算加入这方面的内容吗?

Typos

Version: arXiv:2303.18223v11 [cs.CL] 29 Jun 2023

section 5.3.1 The illustration of these four methods are shown in Figure 12.
Maybe they are shown in Figure 10.

Request for add a new paper of medical LLM.

Hi, thanks for your excellent survey.

We recently proposed a new open-source medical LLM (based on Ziya-LLaMA), and has also achieved a certain degree of influence.

Please consider adding our work to your repository and paper:
Title: Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue
Github: https://github.com/SupritYoung/Zhongjing
Paper: https://arxiv.org/abs/2308.03549

Thanks. 😊

Add a new Paper about Alignment Tuning

Hi, thanks for your excellent survey.

We recently proposed a new alignment fine-tuning paradigm to enhance the reasoning ability of large language models.

Please consider adding our work to your repository and paper:
Title: Making Large Language Models Better Reasoners with Alignment
Link: https://arxiv.org/pdf/2309.02144.pdf

Thanks. 😊

中文综述typo

P15,表4,Post Norm的公式sublayer打错了,打成sulayerb了

A question about the evaluation of CrowS-Pairs

Hello! I am a fresh man in the field of LLMs. I am reading your code and I have a question about the evaluation of CrowS-Pairs. In

acc = int(sent_more_ppl_score < sent_less_ppl_score)

why it is '<' instead of '>'? I think the model prefers a sentence with a smaller perplexity. The smaller is the perplexity, the more tendency have the model to output the sentence. So I think it's correct that acc = 1 when sent_more_ppl_score > sent_less_ppl_score. I don't know if I‘m right .Could you explain it to me? Thank you very much!

By the way, I am a prospective graduate student of RUC and I am going to enter Gaoling next year!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.