huanglk / transpeeder Goto Github PK

View Code? Open in Web Editor NEW

205.0 205.0 19.0 67 KB

train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism

License: Apache License 2.0

Python 100.00%

transpeeder's People

Contributors

Stargazers

Watchers

Forkers

lumosity4tpj zhangsanfeng86 muou55555 wangjiaqiys luka0612 xie-minghui hudengjunai gongcq lyzkf bytes-lost suc16 expresschen parsamorsal seekpoint hongdangshao jy-ren techthiyanes lipiji suxuping

transpeeder's Issues

change dataset meet StopIteration

拆分成4个分片模型，如何设置每个GPU只加载一个分片

拆分成pp模型，按照目前train_llama_deepspeed.sh加载方法，每个GPU加载一遍，这跟没拆分没有区别。如果是个4分片的模型，如何设置0,1,2,3 加载一个模型，4,5,6,7加载一个模型，而不是加载8个模型。

flash_attn_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol

I had flash_attn reinstalled

(gh_llama-deepspeed) r730ub20@r730ub20-M0:/llm_dev/llama-deepspeed$ python3 scripts/convert2ckpt.py --model_name_or_path /data-ssd-1t/hf_model/llama-7b-hf/ --output_dir llama-7b-init-ckpt/
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/r730ub20/llm_dev/llama-deepspeed/scripts/convert2ckpt.py:11 in │
│ │
│ 8 import torch │
│ 9 import transformers │
│ 10 │
│ ❱ 11 from models.patching import ( │
│ 12 │ smart_tokenizer_and_embedding_resize, │
│ 13 ) │
│ 14 from feeder import ( │
│ │
│ /home/r730ub20/llm_dev/llama-deepspeed/./models/patching.py:11 in │
│ │
│ 8 from transformers.models.llama.modeling_llama import apply_rotary_pos_emb │
│ 9 │
│ 10 from einops import rearrange │
│ ❱ 11 from flash_attn.flash_attn_interface import flash_attn_unpadded_qkvpacked_func │
│ 12 from flash_attn.bert_padding import unpad_input, pad_input │
│ 13 │
│ 14 │
│ │
│ /home/r730ub20/.local/lib/python3.8/site-packages/flash_attn/flash_attn_interface.py:5 in │
│ │
│ │
│ 2 import torch.nn as nn │
│ 3 import torch.nn.functional as F │
│ 4 │
│ ❱ 5 import flash_attn_cuda │
│ 6 │
│ 7 │
│ 8 def _get_block_size(device, head_dim, is_dropout): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: /home/r730ub20/.local/lib/python3.8/site-packages/flash_attn_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl8GPUTrace13gpuTraceStateE
(gh_llama-deepspeed) r730ub20@r730ub20-M0:/llm_dev/llama-deepspeed$

ImportError: cannot import name 'flash_attn_unpadded_qkvpacked_func' from 'flash_attn.flash_attn_interface'

你好，我在执行python convert2ckpt.py --mp_world_size 4 --model_name_or_path /path/to/llama-7b-hf --output_dir /path/to/llama-7b-init-ckpt时报了以下错误：

`ImportError: cannot import name 'flash_attn_unpadded_qkvpacked_func' from 'flash_attn.flash_attn_interface'

看了下flash_attn.flash_attn_interface脚本里面确实没有flash_attn_unpadded_qkvpacked_func函数，我用的环境是pytorch1.13, python3.10, flash-attn.2.0.8, 能否提供下你的环境或者解决方案吗？

请问有计划支持peft吗

使用4块3090全量微调7B-llama时发现启动训练阶段非常缓慢

特别是初始化这个wandb的阶段，卡了10几分钟，请问你有遇到多卡训练启动慢的问题吗，有什么可能的改善方案吗@HuangLK

Why do we need to add 1 to the vocab_size when constructing the model?

https://github.com/HuangLK/llama-deepspeed/blob/faedea514b11c18c695e1b2a6adb63b102ef001c/models/llama_pipeline_model.py#LL159C33-L159C33

Flash attention integration failed

Hello,

when I try to use flash attention, I have encountered the following problem:

│ /export/home2/fangkai/merit-v2/trainer_base_ds_mp.py:346 in main             │
│                                                                              │
│   343 │   │   │   logger.info("Resuming training from the latest checkpoint: │
│   344 │   │   │   continue_from_global_step = int(checkpoint.split('-')[-1]) │
│   345 │   │                                                                  │
│ ❱ 346 │   │   global_step, tr_loss = train(cfg, model_pipe, tokenizer, conti │
│   347 │   │   logger.info(" global_step = %s, average loss = %s", global_ste │
│   348                                                                        │
│   349                                                                        │
│                                                                              │
│ /export/home2/fangkai/merit-v2/trainer_base_ds_mp.py:236 in train            │
│                                                                              │
│   233 │   │   │   │   │   continue                                           │
│   234 │   │   │   │                                                          │
│   235 │   │   │   │   model.train()                                          │
│ ❱ 236 │   │   │   │   loss = model.train_batch(data_iter=sub_train_dataloade │
│   237 │   │   │   │   global_step += 1                                       │
│   238 │   │   │   │                                                          │
│   239 │   │   │   │   tr_loss += loss.item()                                 │
│                                                                              │
│ /export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/de │
│ epspeed/runtime/pipe/engine.py:336 in train_batch                            │
│                                                                              │
│    333 │   │   sched = schedule.TrainSchedule(micro_batches=self.micro_batch │
│    334 │   │   │   │   │   │   │   │   │      stages=self.num_stages,        │
│    335 │   │   │   │   │   │   │   │   │      stage_id=self.stage_id)        │
│ ❱  336 │   │   self._exec_schedule(sched)                                    │
│    337 │   │   self.agg_train_loss = self._aggregate_total_loss()            │
│    338 │   │                                                                 │
│    339 │   │   self.timers('train_batch').stop()                             │
│                                                                              │
│ /export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/de │
│ epspeed/runtime/pipe/engine.py:1307 in _exec_schedule                        │
│                                                                              │
│   1304 │   │   │   │                                                         │
│   1305 │   │   │   │   # Equivalent to: self._exec_forward_pass(buffer_id=0) │
│   1306 │   │   │   │   self._exec_instr = MethodType(self._INSTRUCTION_MAP[t │
│ ❱ 1307 │   │   │   │   self._exec_instr(**cmd.kwargs)                        │
│   1308                                                                       │
│                                                                              │
│ /export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/de │
│ epspeed/runtime/pipe/engine.py:996 in _exec_send_grads                       │
│                                                                              │
│    993 │   │   │   │   │   if not buffer.is_floating_point():                │
│    994 │   │   │   │   │   │   assert buffer.grad is None                    │
│    995 │   │   │   │   │   │   continue                                      │
│ ❱  996 │   │   │   │   │   assert buffer.grad is not None                    │
│    997 │   │   │   │   │   p2p.send(buffer.grad, self.prev_stage)            │
│    998 │   │                                                                 │
│    999 │   │   # We can free up the input buffer now                         │
╰──────────────────────────────────────────────────────────────────────────────╯
AssertionError

I also test it by using the torch.nn.functional.scaled_dot_product_attention, which implements flash attention in torch2.0, but I met the same problem. May I know if you have encountered with the problem?

Thanks for your help very much!

Best,
Fangkai

关于batchsize问题

你好，config.json里train_micro_batch_size_per_gpu在pipeline机制下是表示chunk吗？train_batch_size是总的batch size。

请问现在还不支持张量并行么？只支持流水线并行和数据并行？

我看代码貌似是这样

模型加载

您好，我看您在ParallelTransformerLayerPipe里增加了self.activation_checkpointing = activation_checkpointing,但是这个参数在llama模型里是没有的，加载llama的模型不会出错吗。
我看在更新的代码中，是先把hf格式转化为deepspeed的格式，然后engine.load_checkpoint(model_args.init_ckpt, load_module_only=True)加载，这个地方加载的过程中会默认不加载吗？

how can run it with 24G GPU card like 3090

I got GPU OOM

(gh_llama-deepspeed) amd00@asus00:/llm_dev/llama-deepspeed$
(gh_llama-deepspeed) amd00@asus00:/llm_dev/llama-deepspeed$ deepspeed --include localhost:0 --master_port 22384 train.py --output_dir out_dir --init_ckpt llama-7b-init-ckpt/ --data_path ./data/alpaca_data_sample_oneline_format.json --max_seq_len 8 --train_steps 1000 --eval_steps 10 --save_steps 200 --log_steps 1 --pipe_parallel_size 1 --model_parallel_size 1 --use_flash_attn false --deepspeed_config ./configs/ds_config_zero1.json
[2023-05-31 17:15:04,883] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-05-31 17:15:04,892] [INFO] [runner.py:541:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=22384 --enable_each_rank_log=None train.py --output_dir out_dir --init_ckpt llama-7b-init-ckpt/ --data_path ./data/alpaca_data_sample_oneline_format.json --max_seq_len 8 --train_steps 1000 --eval_steps 10 --save_steps 200 --log_steps 1 --pipe_parallel_size 1 --model_parallel_size 1 --use_flash_attn false --deepspeed_config ./configs/ds_config_zero1.json
[2023-05-31 17:15:06,134] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-05-31 17:15:06,134] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-05-31 17:15:06,134] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-05-31 17:15:06,134] [INFO] [launch.py:247:main] dist_world_size=1
[2023-05-31 17:15:06,134] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-05-31 17:15:07,635] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3358.13it/s]
total samples num: 50
Traceback (most recent call last):
File "train.py", line 130, in
main()
File "train.py", line 99, in main
model = get_model(model_config, ds_args, activation_checkpointing_config)
File "/home/amd00/llm_dev/llama-deepspeed/models/llama_pipeline_model.py", line 167, in get_model
print("pp is %d, mp is %d, world_size is:", pp, mp, args.world_size)
UnboundLocalError: local variable 'pp' referenced before assignment
[2023-05-31 17:15:08,142] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 26374
[2023-05-31 17:15:08,143] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'train.py', '--local_rank=0', '--output_dir', 'out_dir', '--init_ckpt', 'llama-7b-init-ckpt/', '--data_path', './data/alpaca_data_sample_oneline_format.json', '--max_seq_len', '8', '--train_steps', '1000', '--eval_steps', '10', '--save_steps', '200', '--log_steps', '1', '--pipe_parallel_size', '1', '--model_parallel_size', '1', '--use_flash_attn', 'false', '--deepspeed_config', './configs/ds_config_zero1.json'] exits with return code = 1
(gh_llama-deepspeed) amd00@asus00:/llm_dev/llama-deepspeed$ vim train.py
(gh_llama-deepspeed) amd00@asus00:/llm_dev/llama-deepspeed$ vim models/llama_pipeline_model.py
(gh_llama-deepspeed) amd00@asus00:/llm_dev/llama-deepspeed$ deepspeed --include localhost:0 --master_port 22384 train.py --output_dir out_dir --init_ckpt llama-7b-init-ckpt/ --data_path ./data/alpaca_data_sample_oneline_format.json --max_seq_len 8 --train_steps 1000 --eval_steps 10 --save_steps 200 --log_steps 1 --pipe_parallel_size 1 --model_parallel_size 1 --use_flash_attn false --deepspeed_config ./configs/ds_config_zero1.json
[2023-05-31 17:16:32,333] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-05-31 17:16:32,342] [INFO] [runner.py:541:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=22384 --enable_each_rank_log=None train.py --output_dir out_dir --init_ckpt llama-7b-init-ckpt/ --data_path ./data/alpaca_data_sample_oneline_format.json --max_seq_len 8 --train_steps 1000 --eval_steps 10 --save_steps 200 --log_steps 1 --pipe_parallel_size 1 --model_parallel_size 1 --use_flash_attn false --deepspeed_config ./configs/ds_config_zero1.json
[2023-05-31 17:16:33,582] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-05-31 17:16:33,582] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-05-31 17:16:33,582] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-05-31 17:16:33,582] [INFO] [launch.py:247:main] dist_world_size=1
[2023-05-31 17:16:33,582] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-05-31 17:16:35,093] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3368.92it/s]
total samples num: 50
pp is %d, mp is %d, world_size is: 1 1 1
SEED_LAYERS=False BASE_SEED=42 SEED_FN=None
Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0}
[2023-05-31 17:16:35,204] [INFO] [module.py:358:_partition_layers] Partitioning pipeline stages with method parameters
stage=0 layers=35
0: EmbeddingPipe
1: ParallelTransformerLayerPipe
2: ParallelTransformerLayerPipe
3: ParallelTransformerLayerPipe
4: ParallelTransformerLayerPipe
5: ParallelTransformerLayerPipe
6: ParallelTransformerLayerPipe
7: ParallelTransformerLayerPipe
8: ParallelTransformerLayerPipe
9: ParallelTransformerLayerPipe
10: ParallelTransformerLayerPipe
11: ParallelTransformerLayerPipe
12: ParallelTransformerLayerPipe
13: ParallelTransformerLayerPipe
14: ParallelTransformerLayerPipe
15: ParallelTransformerLayerPipe
16: ParallelTransformerLayerPipe
17: ParallelTransformerLayerPipe
18: ParallelTransformerLayerPipe
19: ParallelTransformerLayerPipe
20: ParallelTransformerLayerPipe
21: ParallelTransformerLayerPipe
22: ParallelTransformerLayerPipe
23: ParallelTransformerLayerPipe
24: ParallelTransformerLayerPipe
25: ParallelTransformerLayerPipe
26: ParallelTransformerLayerPipe
27: ParallelTransformerLayerPipe
28: ParallelTransformerLayerPipe
29: ParallelTransformerLayerPipe
30: ParallelTransformerLayerPipe
31: ParallelTransformerLayerPipe
32: ParallelTransformerLayerPipe
33: LayerNormPipe
34: LMLayerPipe
loss: loss_fn
Traceback (most recent call last):
File "train.py", line 130, in
main()
File "train.py", line 99, in main
model = get_model(model_config, ds_args, activation_checkpointing_config)
File "/home/amd00/llm_dev/llama-deepspeed/models/llama_pipeline_model.py", line 182, in get_model
return GPT2ModelPipe(model_config,
File "/home/amd00/llm_dev/llama-deepspeed/models/llama_pipeline_model.py", line 157, in init
super().init(
File "/home/amd00/.local/lib/python3.8/site-packages/deepspeed/runtime/pipe/module.py", line 200, in init
self.to(get_accelerator().device_name(self.local_rank))
File "/home/amd00/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 987, in to
return self._apply(convert)
File "/home/amd00/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 639, in _apply
module._apply(fn)
File "/home/amd00/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 639, in _apply
module._apply(fn)
File "/home/amd00/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 639, in _apply
module._apply(fn)
File "/home/amd00/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 662, in _apply
param_applied = fn(param)
File "/home/amd00/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 985, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.70 GiB total capacity; 22.83 GiB already allocated; 97.88 MiB free; 22.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2023-05-31 17:17:30,649] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 26532
[2023-05-31 17:17:30,650] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'train.py', '--local_rank=0', '--output_dir', 'out_dir', '--init_ckpt', 'llama-7b-init-ckpt/', '--data_path', './data/alpaca_data_sample_oneline_format.json', '--max_seq_len', '8', '--train_steps', '1000', '--eval_steps', '10', '--save_steps', '200', '--log_steps', '1', '--pipe_parallel_size', '1', '--model_parallel_size', '1', '--use_flash_attn', 'false', '--deepspeed_config', './configs/ds_config_zero1.json'] exits with return code = 1
(gh_llama-deepspeed) amd00@asus00:/llm_dev/llama-deepspeed$

四卡训7B-llama清空缓存再训练报错

使用默认的ds_config.json配置文件，只修改了wandb部分为false(因为慢)，然后就发现显存分配了却不开始训练（卡在Using /root/.cache/torch_extensions as PyTorch extensions root...）
于是清空root/.cache后再重新训练，就发现报错了，error信息如下

Using /root/.cache/torch_extensions/py38_cu116 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py38_cu116/fused_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py38_cu116/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] =/usr/local/cuda-11.6/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem =/usr/local/cuda-11.6/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -std=c++14 -c /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
FAILED: multi_tensor_adam.cuda.o
=/usr/local/cuda-11.6/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem =/usr/local/cuda-11.6/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -std=c++14 -c /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
/bin/sh: 1: =/usr/local/cuda-11.6/bin/nvcc: not found
Using /root/.cache/torch_extensions/py38_cu116 as PyTorch extensions root...
[2023-04-21 17:47:56,170] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-04-21 17:47:56,315] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /root/.cache/torch_extensions/py38_cu116 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py38_cu116 as PyTorch extensions root...
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -I/usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem =/usr/local/cuda-11.6/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train.py", line 143, in
main()
File "train.py", line 109, in main
engine, _, _, _ = deepspeed.initialize(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 180, in initialize
engine = PipelineEngine(args=args,
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/pipe/engine.py", line 53, in init
super().init(*super_args, **super_kwargs)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 308, in init
self._configure_optimizer(optimizer, model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1156, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1222, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/fused_adam.py", line 71, in init
fused_adam_cuda = FusedAdamBuilder().load()
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 445, in load
return self.jit_load(verbose)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
op_module = load(name=self.name,
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'
Loading extension module fused_adam...
Traceback (most recent call last):
File "train.py", line 143, in
main()
File "train.py", line 109, in main
engine, _, _, _ = deepspeed.initialize(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 180, in initialize
engine = PipelineEngine(args=args,
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/pipe/engine.py", line 53, in init
super().init(*super_args, **super_kwargs)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 308, in init
self._configure_optimizer(optimizer, model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1156, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1222, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/fused_adam.py", line 71, in init
fused_adam_cuda = FusedAdamBuilder().load()
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 445, in load
return self.jit_load(verbose)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
op_module = load(name=self.name,
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1450, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1844, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 556, in module_from_spec
File "", line 1166, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py38_cu116/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory
Loading extension module fused_adam...
Traceback (most recent call last):
File "train.py", line 143, in
main()
File "train.py", line 109, in main
engine, _, _, _ = deepspeed.initialize(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 180, in initialize
engine = PipelineEngine(args=args,
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/pipe/engine.py", line 53, in init
super().init(*super_args, **super_kwargs)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 308, in init
self._configure_optimizer(optimizer, model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1156, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1222, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/fused_adam.py", line 71, in init
fused_adam_cuda = FusedAdamBuilder().load()
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 445, in load
return self.jit_load(verbose)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
op_module = load(name=self.name,
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1450, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1844, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 556, in module_from_spec
File "", line 1166, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py38_cu116/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory
Loading extension module fused_adam...
Traceback (most recent call last):
File "train.py", line 143, in
main()
File "train.py", line 109, in main
engine, _, _, _ = deepspeed.initialize(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 180, in initialize
engine = PipelineEngine(args=args,
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/pipe/engine.py", line 53, in init
super().init(*super_args, **super_kwargs)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 308, in init
self._configure_optimizer(optimizer, model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1156, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1222, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/adam/fused_adam.py", line 71, in init
fused_adam_cuda = FusedAdamBuilder().load()
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 445, in load
return self.jit_load(verbose)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
op_module = load(name=self.name,
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1450, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1844, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 556, in module_from_spec
File "", line 1166, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py38_cu116/fused_adam/fused_adam.so: cannot open shared object file: No such file or directory
[2023-04-21 17:48:12,493] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 105683
[2023-04-21 17:48:12,710] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 105684
[2023-04-21 17:48:12,710] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 105685
[2023-04-21 17:48:12,845] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 105686
[2023-04-21 17:48:12,847] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'train.py', '--local_rank=3', '--output_dir', '/root/nas-private/output', '--init_ckpt', '/root/nas-private/llama-7B-init-ckpt', '--data_path', './data/alpaca_data_sample_oneline_format.json', '--max_seq_len', '1024', '--train_steps', '1000', '--eval_steps', '10', '--save_steps', '200', '--log_steps', '1', '--pipe_parallel_size', '4', '--model_parallel_size', '1', '--use_flash_attn', 'true', '--deepspeed_config', './configs/ds_config.json'] exits with return code = 1

How to set distributed sampler when using hybrid training of pipeline parallelism and data parallel

Hi,

I saw your code have used PipeModelDataParallelTopology API to specify the group size of pipeline parallelism and data parallel. However, I didn't see DistribtuedSampler is used for dataset shard. May I know if you have explored about this?

Thanks for your help very much!

Fangkai

error when use zero1

Traceback (most recent call last): File "train.py", line 131, in <module> main() File "train.py", line 109, in main engine.load_checkpoint(model_args.init_ckpt,load_module_only=True)#load_module_only=True File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2769, in load_checkpoint success = self._load_zero_checkpoint( File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2948, in _load_zero_checkpoint zero_sd_list = self._get_all_zero_checkpoints(load_dir, tag) File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 3042, in _get_all_zero_checkpoints return self._get_all_zero_checkpoint_state_dicts(zero_ckpt_names) File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 3014, in _get_all_zero_checkpoint_state_dicts _state = self.checkpoint_engine.load( File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/checkpoint_engine/torch_checkpoint_engine.py", line 22, in load partition = torch.load(path, map_location=map_location) File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 699, in load with _open_file_like(f, 'rb') as opened_file: File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 231, in _open_file_like return _open_file(name_or_buffer, mode) File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 212, in __init__ super(_open_file, self).__init__(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: './llama-7B-init-ckpt/global_step001/zero_pp_rank_0_mp_rank_01_optim_states.pt' [2023-08-13 20:35:08,552] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ./llama-7B-init-ckpt/global_step001/zero_pp_rank_0_mp_rank_02_optim_states.pt...

pipeline model的一些问题

嗨，帅哥，你这边的工程很棒，我在学习的工程中有一些疑问，希望你能抽空解答一下。具体问提如下：

pipeline model与加载预训练模型的先后顺序问题，是因为ckpt里只存了权重信息，所以一定要先定义模型，然后加载权重？
engine.load_checkpoint这里是不是必须加载ckpt，hf格式的为啥不可以呢？

    # pipeline model
    model = get_model(model_config, ds_args, activation_checkpointing_config)

    engine, _, _, _ = deepspeed.initialize(
        ds_args,
        model=model,
        model_parameters=[p for p in model.parameters() if p.requires_grad]
    )

    # use `convert2ckpt.py`
    engine.load_checkpoint(model_args.init_ckpt, load_module_only=True)

生成的模型，怎么转为huggleface格式呢

train.py中加载checkpoint似乎没效

train.py中的第108行

engine.load_checkpoint(model_args.init_ckpt, load_module_only=True)

有没有这一行，训练初始的loss都一样。好像并没有成功加载到模型参数

请问这个错误，是transformer的版本问题吗？

@HuangLK

Running 7b succeed. next 30B

Thank your for your implementation of pipeline parallel for llama model training.
I have encountered hang when run 7b training in 4xA40 training machine.
can you give me a Dockerfile than can running in some machine?

TypeError: 'NoneType' object is not subscriptable

deling_llama.py:134 in apply_rotary_pos_emb │
│ │
│ 131 │
│ 132 │
│ 133 def apply_rotary_pos_emb(q, k, cos, sin, position_ids): │
│ ❱ 134 │ gather_indices = position_ids[:, None, :, None] # [bs, 1, seq_len │
│ 135 │ gather_indices = gather_indices.repeat(1, cos.shape[1], 1, cos.sha │
│ 136 │ cos = torch.gather(cos.repeat(gather_indices.shape[0], 1, 1, 1), 2 │
│ 137 │ sin = torch.gather(sin.repeat(gather_indices.shape[0], 1, 1, 1), 2 │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: 'NoneType' object is not subscriptable
[2023-04-13 11:32:44,508] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 197
[2023-04-13 11:32:44,508] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 198
[2023-04-13 11:32:47,255] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 199
[2023-04-13 11:32:49,894] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 200
@HuangLK do you know how this happend and how to solve it ?

size mismatch

你好，使用covert2ckpt.py转换后的模型embedding size会加大1是吗？需要修改对应config.json中的vocab_size?

loss很快降为0

在训练时，loss很快降为0，配置一样万分感谢：D

RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn

Hi, wonderful work!

I didn't use your code but I following your code to implement my own llama-pipeline parallelism. But I'm encountering the following problem. May I know if you have encountered similar problems? I have no ideas about the solution.

Thanks for your help very much!

The error message:

Traceback (most recent call last)
  File "/home/fangkai/merit-v2/trainer_base_ds_mp.py", line 418, in <module>
    main() 
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/main.py", line 90, in decorated_main
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app( 
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/_internal/utils.py", line 216, in run_and_report
    raise ex
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/_internal/utils.py", line 453, in <lambda>                                                 
    lambda: hydra.run(
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run                                                                          
    _ = ret.return_value 
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value                                                                      
    raise self._return_value
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job                                                                           
    ret.return_value = task_function(task_cfg)
  File "/home/fangkai/merit-v2/trainer_base_ds_mp.py", line 352, in main                                                                                                                    
    global_step, tr_loss = train(cfg, model, tokenizer, continue_from_global_step)                                                                                                          
  File "/home/fangkai/merit-v2/trainer_base_ds_mp.py", line 212, in train                                                                                                                   
    loss = model.train_batch(sub_train_dataloader)                                                                                                                                          
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/deepspeed/runtime/pipe/engine.py", line 336, in train_batch 
    self._exec_schedule(sched) 
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/deepspeed/runtime/pipe/engine.py", line 1307, in _exec_schedule 
    self._exec_instr(**cmd.kwargs)
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/deepspeed/runtime/pipe/engine.py", line 733, in _exec_backward_pass
    torch.autograd.backward(tensors=out_tensors, grad_tensors=grad_tensors)                                                                                                                 
  File "/home/fangkai/anaconda3/envs/py3.9/lib/python3.9/site-packages/torch/autograd/__init__.py", line 200, in backward                                                                   
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass                                                                                          
RuntimeError: element 1 of tensors does not require grad and does not have a grad_fn

Here is a toy dataset:

class TestDataset(Dataset):
    def __init__(self, file_path, tokenizer):
        super().__init__()
        self.data = ["My name is Jiao Fangkai."]

    def __len__(self):
        return 100000000

    def __getitem__(self, index):
        return {"flan": {
            "inputs": self.data[0],
            "targets": self.data[0],
        }}

Here is the collator:

def vanilla_seq2seq_convertor(examples, tokenizer: PreTrainedTokenizer, max_seq_length, decoder_only: bool = False):
    inputs = []
    outputs = []
    for exp in examples:
        inputs.append(exp["inputs"])
        if decoder_only:
            outputs.append(exp["inputs"] + " " + exp["targets"] + tokenizer.eos_token)
        else:
            outputs.append(exp["targets"])

    model_inputs = tokenizer(inputs, text_target=outputs, max_length=max_seq_length, padding="longest",
                             truncation=True, return_tensors="pt")
    if decoder_only:
        input_lens = model_inputs["input_ids"].ne(tokenizer.pad_token_id).sum(dim=1)
        model_inputs = tokenizer(outputs, max_length=max_seq_length, padding="longest",
                                 truncation=True, return_tensors="pt")
        new_input_lens = model_inputs["input_ids"].ne(tokenizer.pad_token_id).sum(dim=1)
        input_lens = input_lens - input_lens.eq(new_input_lens).to(input_lens.dtype) * (input_lens // 2)
        input_lens = input_lens.to(torch.long)
        model_inputs["input_lens"] = input_lens

    return model_inputs

def get_lm_labels(input_lens, input_ids, pad_token_id):
    labels = input_ids.clone()

    label_mask = labels.ne(pad_token_id)
    lens_mask = torch.arange(labels.size(1))[None, :] >= input_lens[:, None]
    label_mask = label_mask & lens_mask

    labels = labels.masked_fill(~label_mask, -100).contiguous()

    return labels

class FlanCollatorOverCollator:
    def __init__(self, tokenizer: str, max_seq_length: int, decoder_only: bool = False):
        self.tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained(tokenizer, use_fast=False)
        expand_special_tokenizer(self.tokenizer)
        self.max_seq_length = max_seq_length
        self.decoder_only = decoder_only

    def __call__(self, batch):
        flan_batch = []
        for item in batch:
            flan_batch.append(item.pop("flan"))

        model_inputs = vanilla_seq2seq_convertor(flan_batch, self.tokenizer, self.max_seq_length, self.decoder_only)

        
        # Add suffix `input_ids` to tackle the deepspeed logic.
        seq_length = model_inputs["input_ids"].size(1)
        position_ids = torch.arange(0, seq_length, dtype=torch.long)
        position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
        return (
                (
                    model_inputs["input_ids"],
                    model_inputs["attention_mask"],
                    # position_ids,
                    # model_inputs["input_lens"],
                    # model_inputs["input_ids"].detach().clone()
                ),
                # model_inputs["input_ids"].detach().clone()
                get_lm_labels(model_inputs["input_lens"], model_inputs["input_ids"], self.tokenizer.pad_token_id)
        )

        return model_inputs

And the initialization:

topo = PipeModelDataParallelTopology(num_pp=4, num_mp=1, num_dp=1)
model = PipelineModule(layers=layers,
                           # num_stages=cfg.num_stages,
                           topology=topo,
                           loss_fn=models.llama_ds_mp_wrap.loss_fn,
                           activation_checkpoint_interval=getattr(cfg, "activation_checkpoint_interval", 0))

support bf16？

how to support bf16?

hidden_states=bool变量

大佬好！我运行出错后

便直接修改了attention_mask=None.结果又出现了以下错误

打印变量发现是bool型变量，导致失败，大佬知道是什么原因不？

attention mask

在feeder.py里给模型提供的是因果mask，但是没有提供pad mask，这个地方似乎需要改进一下。

File not found error

Hi Huang, nice work!

when I tried to train with a 13B model, I got the error:
[Errno 2] No such file or directory: 'llama_13b_pp/global_step001/zero_pp_rank_0_mp_rank_03_optim_states.pt'

Any ideas on this? The 'convert2ckpt.py' script does not generate files with prefix 'zero_pp_....'

Output is not getting saved

I have tried finetuning LLaMa 30B on an A100 with 2 GPUs with 80 GB each. The script got completed running in 5 min and there is no output generated. I couldn't find any error as well.

The command used to run the script:

deepspeed --include A1:0,1 --master_port 22384 train.py --output_dir output --init_ckpt /root/llama-30b-init-ckpt/ --data_path /root/alpaca_deepspeed.json --max_seq_len 1024 --train_steps 1000 --eval_steps 10 --save_steps 200 --log_steps 1 --pipe_parallel_size 2 --model_parallel_size 1 --use_flash_attn true --deepspeed_config ./configs/ds_config_zero1.json

训练时，loss =nan

您好！ @HuangLK ，

在训练时，loss=nan，这个您遇到过吗？

huanglk / transpeeder Goto Github PK

transpeeder's People

Contributors

Stargazers

Watchers

Forkers

transpeeder's Issues

Recommend Projects

Recommend Topics

Recommend Org