Giter Site home page Giter Site logo

awesome-lm-system's Introduction

Awesome Large Model (LM) System Awesome

This repo collects papers, repos, tools for large model system, including training, inference, serving and compression.

Papers

Training

Year Publisher Title Framework
2023 Training Diffusion Models with Reinforcement Learning
2023 Extracting Training Data from Diffusion Models
2023 QLoRA: Efficient Finetuning of Quantized LLMs
2023 Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases DeepSpeed
2023 ICLR DySR: Adaptive Super-Resolution via Algorithm and System Co-design DeepSpeed
2023 Scaling Vision-Language Models with Sparse Mixture of Experts DeepSpeed
2023 IPDPS MCR-DL: Mix-and-Match Communication Runtime for Deep Learning DeepSpeed
2023 ICS A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training DeepSpeed
2023 OSDI AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving Alpa
2023 MLSys On Optimizing the Communication of Model Parallelism Alpa
2023 Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models ColossalAI
2022 CVPR Perception Prioritized Training of Diffusion Models
2022 Reducing Activation Recomputation in Large Transformer Models Megatron-LM
2022 HiPC 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed DeepSpeed
2022 NeurIPS The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models DeepSpeed
2022 Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam DeepSpeed
2022 ICML DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale DeepSpeed
2022 Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model DeepSpeed
2022 NeuraIPS Extreme Compression for Pre-trained Transformers Made Simple and Efficient DeepSpeed
2022 NeuraIPS ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers DeepSpeed
2022 Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers DeepSpeed
2022 DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing DeepSpeed
2022 OSDI Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Alpa
2022 ICPP Tesseract: Parallelize the Tensor Parallelism Efficiently ColossalAI
2022 A Frequency-aware Software Cache for Large Recommendation System Embeddings ColossalAI
2022 TPDS Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management ColossalAI
2021 Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM Megatron-LM
2021 LoRA: Low-Rank Adaptation of Large Language Models
2021 SC ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning DeepSpeed
2021 ICML 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed DeepSpeed
2021 ATC ZeRO-Offload: Democratizing Billion-Scale Model Training. DeepSpeed
2021 PPoPP DAPPLE: a pipelined data parallel approach for training large models
2021 ICML TeraPipe: Token-Level Pipeline Parallelism for Training Large TeraPipe
2021 ICML Memory-Efficient Pipeline-Parallel DNN Training PipeDream
2021 An Efficient 2D Method for Training Super-Large Deep Learning Models ColossalAI
2021 Maximizing Parallelism in Distributed Training for Huge Neural Networks ColossalAI
2021 Sequence Parallelism: Long Sequence Training from System Perspective ColossalAI
2021 Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training ColossalAI
2020 KDD Tutorial DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. DeepSpeed
2020 SC ZeRO: memory optimizations toward training trillion parameter models. DeepSpeed
2020 NeuraIPS Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping DeepSpeed
2020 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron-LM
2020 torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models TorchGpipe
2019 NeuraIPS GPipe: efficient training of giant neural networks using pipeline parallelism TorchGpipe
2019 SOSP PipeDream: Generalized pipeline parallelism for DNN training PipeDream

Compression

Year Publisher Title Framework
2023 On Architectural Compression of Text-to-Image Diffusion Models

Inference

Year Publisher Title Framework
2023 Fast Inference in Denoising Diffusion Models via MMD Finetuning
2023 EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models EnergonAI
2023 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
2023 FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
2023 SqueezeLLM: Dense-and-Sparse Quantization
2023 A Simple and Effective Pruning Approach for Large Language Models
2023 ICML SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
2023 AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
2023 OWQ: Lessons learned from activation outliers for weight quantization in large language models
2023 ICLR GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
2023 ISCA OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization
2023 Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
2023 ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
2023 ICML SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
2023 Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
2022 ICML DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale DeepSpeed
2022 SC DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale DeepSpeed
2022 NeuraIPS Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
2022 NeuraIPS LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
2022 NeuraIPS ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
2021 EMNLP Understanding and Overcoming the Challenges of Efficient Transformer Quantization

Benchmark

Year Publisher Title Framework
Year Pub Title Framework
Year Pub Title1 Framework

Survey

Year Publisher Title Framework
Year Pub Title Framework
Year Pub Title1 Framework

Frameworks

Year Name Training Inference Serving Comments
2023 EnergonAI
2022 Alpa Compilation based mixed parallelism
2021 Megatron-DeepSpeed Add MoE model training, Curriculum Learning, 3D Parallelism from DeepSpeed to Megatron
2021 TeraPipe
2021 ColossalAI
2021 FasterTransformer
2020 DeepSpeed General Support of Transformers and MoE with 3d-parallelism
2019 Megatron-LM
2019 PipeDream
2019 TorchGipe The torchgipe has been merged to PyTorch in 2020.
2019 PipeDream

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.