yandex / yalm-100b Goto Github PK
View Code? Open in Web Editor NEWPretrained language model with 100B parameters
License: Apache License 2.0
Pretrained language model with 100B parameters
License: Apache License 2.0
Sorry for maybe a stupid question, fortunately, I find your product and want to integrate it into social media accounts; I don't release how to use it from the box (via Docker). As it said in the instruction I need a strong PC with GPU chips (which are pretty expensive for me) and I wonder if there is a way to utilize " input text prompt/variables - get a response in console / API"
?
Can you please, @artnitolog, comment on this?
For reference, a way how https://porfirevich.ru/ works.
there are 10x video cards, more than 200 GB of video memory.
If connect them to PCI x1, how much will performance decrease and does PCI x1 or PCI x16 affect in this particular case?
It's great that you endure the community. Thank you.
Please add an online example so that you can test with it without downloading 200GB to your computer.
It would be really useful to have a pruned version of the model (like Balaboba) to launch on weaker video card setups.
Thanks for the very interesting model release.
If possible, could a bit information about the dataset used for training be provided (e.g. language split percentages)?
Hi! I wanted to ask for an official citation bibex one can use when referring to the model in the paper.
I see attempt to usage host ssh-agent, its security risk.
What's the [NL] token appearing in generation?
Is it an artifact or a special token?
This model looks amazing, thank you! We have a machine with 8 x 3090 (192GB total), I tried to run the examples, but I get:
building GPT2 model ...
RuntimeError: CUDA out of memory. Tried to allocate 76.00 MiB (GPU 3; 23.70 GiB total capacity; 22.48 GiB already allocated; 70.56 MiB free; 22.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
For someone who is not an expert with pytorch etc., perhaps you have a suggestion?
We would try to make a conversation partner for language learning (add TTS, translation, NLP etc.) for our project: https://dev.languagereactor.com/
Regards, David :)
Hello, Thank you for sharing the YaLM-100B checkpoints.
I am downloading those checkpoints, in order to make sure the files I have downloaded in good, can you share the md5sum value for those checkpoint files?
Dear Yandex Team,
I hope this message finds you well. I am writing to express my admiration for your work on the YaLM-100B model, which has demonstrated exceptional performance in generating and processing text in both English and Russian languages. Your dedication to providing this model for free use by developers and researchers worldwide is commendable.
As a researcher in the field of natural language processing, I am particularly interested in the dataset you have used to train the YaLM-100B model, specifically the 75% of the dataset consisting of Russian texts. I would like to respectfully request that you consider making this dataset, which I propose to call the "Russian Pile," openly available to the broader research community. Below are some strong arguments in favor of opening the dataset:
In conclusion, I believe that making the Russian Pile dataset openly available will bring about numerous benefits for the global research community, promote language diversity, and contribute to the development of more inclusive and responsible AI systems. Your willingness to share the YaLM-100B model is already a significant contribution to the field, and opening the Russian Pile dataset would further solidify your commitment to openness and collaboration in AI research.
Thank you for considering this request. I am looking forward to your response and the potential positive impact that opening the Russian Pile dataset will have on the research community and beyond.
Sincerely,
Mikhail Grankin
I pulled the docker image and downloaded the checkpoint. When running generate_interactive.sh, I encountered the following error:
Traceback (most recent call last):
File "megatron_lm/tools/generate_samples_gpt2.py", line 104, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "megatron_lm/tools/generate_samples_gpt2.py", line 104, in <module>
main()
File "megatron_lm/tools/generate_samples_gpt2.py", line 89, in main
File "megatron_lm/tools/generate_samples_gpt2.py", line 104, in <module>
File "megatron_lm/tools/generate_samples_gpt2.py", line 104, in <module>
_ = load_checkpoint(model, None, None)
Traceback (most recent call last):
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 183, in load_checkpoint
File "megatron_lm/tools/generate_samples_gpt2.py", line 104, in <module>
main()
main()
load_checkpoint_new(model, optimizer, lr_scheduler)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 373, in load_checkpoint_new
Traceback (most recent call last):
File "megatron_lm/tools/generate_samples_gpt2.py", line 89, in main
File "megatron_lm/tools/generate_samples_gpt2.py", line 89, in main
File "megatron_lm/tools/generate_samples_gpt2.py", line 104, in <module>
Traceback (most recent call last):
File "megatron_lm/tools/generate_samples_gpt2.py", line 104, in <module>
_ = load_checkpoint(model, None, None)
_ = load_checkpoint(model, None, None)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 183, in load_checkpoint
torch.distributed.barrier()
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 183, in load_checkpoint
main()
main()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
File "megatron_lm/tools/generate_samples_gpt2.py", line 89, in main
load_checkpoint_new(model, optimizer, lr_scheduler)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 373, in load_checkpoint_new
load_checkpoint_new(model, optimizer, lr_scheduler)
File "megatron_lm/tools/generate_samples_gpt2.py", line 89, in main
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 373, in load_checkpoint_new
_ = load_checkpoint(model, None, None)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 183, in load_checkpoint
torch.distributed.barrier()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
_ = load_checkpoint(model, None, None)
main()
torch.distributed.barrier()
load_checkpoint_new(model, optimizer, lr_scheduler)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 183, in load_checkpoint
main()
File "megatron_lm/tools/generate_samples_gpt2.py", line 89, in main
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 373, in load_checkpoint_new
File "megatron_lm/tools/generate_samples_gpt2.py", line 89, in main
_ = load_checkpoint(model, None, None)
_ = load_checkpoint(model, None, None)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 183, in load_checkpoint
load_checkpoint_new(model, optimizer, lr_scheduler)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 183, in load_checkpoint
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 373, in load_checkpoint_new
torch.distributed.barrier()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
load_checkpoint_new(model, optimizer, lr_scheduler)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 373, in load_checkpoint_new
load_checkpoint_new(model, optimizer, lr_scheduler)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 373, in load_checkpoint_new
torch.distributed.barrier()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
work = default_pg.barrier(opts=opts)
torch.distributed.barrier()
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc). torch.distributed.barrier()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc). work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
work = default_pg.barrier(opts=opts)
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
Traceback (most recent call last):
File "megatron_lm/tools/generate_samples_gpt2.py", line 104, in <module>
main()
File "megatron_lm/tools/generate_samples_gpt2.py", line 89, in main
_ = load_checkpoint(model, None, None)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 183, in load_checkpoint
load_checkpoint_new(model, optimizer, lr_scheduler)
File "/workspace/YaLM-100B/megatron_lm/megatron/checkpointing.py", line 373, in load_checkpoint_new
torch.distributed.barrier()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 2709, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 554) of binary: /opt/conda/bin/python3
Traceback (most recent call last):
File "/opt/conda/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==1.8.0a0+17f8c32', 'console_scripts', 'torchrun')())
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
megatron_lm/tools/generate_samples_gpt2.py FAILED
Hello!
I managed to run the model on 8xA100, unfortunately AFAIK GCP don't provide 80GB models, so it's the 40GB model.
It did fail on 4xA100 40GB and got out-of-memory error.
I was wondering if it would be somehow possible to run it on less hardware,
perhaps a single A100 40GB or 2xV100 32GB that can reduce running cost significantly.
I come under the impression that it might be possible with tweaking some of the runtime configuration.
Or even by modifying some code, but I'm not sure what parameters should I modify.
When I ran the model on 8xA100 it was using 27GB~ of GPU memory on each device, and used total of 50GB~ of the machine memory. I wonder if it should be possible to utilize more machine threads / RAM / disk in favor of consuming less GPU memory?
Hello and thanks for open-sourcing the model!
As it doesn't seem to be any ready to use gguf or mlx formats (for llama.cpp and macos respectively) - is there any chance you can give a hint on how to convert YaLM there?
It would be of real help to enable model to run on non-Nvidia enabled HW, like any modern pc and mobile.
Thanks in advance!
Thank you for making your work publicly available!
I am trying to test your model on a 8xRTX6000 cards, and I'm getting a timeout error:
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
building GPT2 model ...
[E ProcessGroupNCCL.cpp:587] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805074 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805088 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805091 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805093 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805099 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805104 milliseconds before timing out.
>> Loading layer_00-model_00-model_states.pt on CPU [mp 06 / 8]
>> Loading layer_00-model_00-model_states.pt on CPU [mp 05 / 8]
>> Loading layer_00-model_00-model_states.pt on CPU [mp 03 / 8]
[E ProcessGroupNCCL.cpp:587] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805221 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805235 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805074 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805091 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 5] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805093 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805104 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805088 milliseconds before timing out.
>> Loading layer_00-model_00-model_states.pt on CPU [mp 04 / 8]
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805099 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 4] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805235 milliseconds before timing out.
> Start loading from release checkpoint from folder yalm100b_checkpoint/weights
[E ProcessGroupNCCL.cpp:341] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data. To avoid this inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1805221 milliseconds before timing out.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1824 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1827 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1828 closing signal SIGTERM
What causes this error, and how could I overcome it?
Firstly, thank you for making the weights open to all!
Now, how to do inference/fine-tuning with ZeRO 3 NVMe/CPU Offload, since this project is based on Megatron project, that shouldn't be much difficult to implement?
generate_*.sh
scripts appear to create fp16
model and load weights into it, but the weights are bfloat16
.
Hi,
Thank you for releasing the model dataset 😃!
Ive just downloaded the data and done a quick spot check on the sizes of the layers. Most layers have a size of 2.4Gb, however layer 44 is 817Mb - is there an issue with this layers data?
Full layers sizes:
501M layer_00-model_00-model_states.pt
9.0K layer_01-model_00-model_states.pt
2.4G layer_03-model_00-model_states.pt
2.4G layer_04-model_00-model_states.pt
2.4G layer_05-model_00-model_states.pt
2.4G layer_06-model_00-model_states.pt
2.4G layer_07-model_00-model_states.pt
2.4G layer_08-model_00-model_states.pt
2.4G layer_09-model_00-model_states.pt
2.4G layer_10-model_00-model_states.pt
2.4G layer_11-model_00-model_states.pt
2.4G layer_12-model_00-model_states.pt
2.4G layer_13-model_00-model_states.pt
2.4G layer_14-model_00-model_states.pt
2.4G layer_15-model_00-model_states.pt
2.4G layer_16-model_00-model_states.pt
2.4G layer_17-model_00-model_states.pt
2.4G layer_18-model_00-model_states.pt
2.4G layer_19-model_00-model_states.pt
2.4G layer_20-model_00-model_states.pt
2.4G layer_21-model_00-model_states.pt
2.4G layer_22-model_00-model_states.pt
2.4G layer_23-model_00-model_states.pt
2.4G layer_24-model_00-model_states.pt
2.4G layer_25-model_00-model_states.pt
2.4G layer_26-model_00-model_states.pt
2.4G layer_27-model_00-model_states.pt
2.4G layer_28-model_00-model_states.pt
2.4G layer_29-model_00-model_states.pt
2.4G layer_30-model_00-model_states.pt
2.4G layer_31-model_00-model_states.pt
2.4G layer_32-model_00-model_states.pt
2.4G layer_33-model_00-model_states.pt
2.4G layer_34-model_00-model_states.pt
2.4G layer_35-model_00-model_states.pt
2.4G layer_36-model_00-model_states.pt
2.4G layer_37-model_00-model_states.pt
2.4G layer_38-model_00-model_states.pt
2.4G layer_39-model_00-model_states.pt
2.4G layer_40-model_00-model_states.pt
2.4G layer_41-model_00-model_states.pt
2.4G layer_42-model_00-model_states.pt
2.4G layer_43-model_00-model_states.pt
817M layer_44-model_00-model_states.pt
1.8G layer_45-model_00-model_states.pt
2.4G layer_46-model_00-model_states.pt
2.4G layer_47-model_00-model_states.pt
2.4G layer_48-model_00-model_states.pt
2.4G layer_49-model_00-model_states.pt
2.4G layer_50-model_00-model_states.pt
2.4G layer_51-model_00-model_states.pt
2.4G layer_52-model_00-model_states.pt
2.4G layer_53-model_00-model_states.pt
2.4G layer_54-model_00-model_states.pt
2.4G layer_55-model_00-model_states.pt
2.4G layer_56-model_00-model_states.pt
2.4G layer_57-model_00-model_states.pt
2.4G layer_58-model_00-model_states.pt
2.4G layer_59-model_00-model_states.pt
2.4G layer_60-model_00-model_states.pt
2.4G layer_61-model_00-model_states.pt
2.4G layer_62-model_00-model_states.pt
2.4G layer_63-model_00-model_states.pt
2.4G layer_64-model_00-model_states.pt
2.4G layer_65-model_00-model_states.pt
2.4G layer_66-model_00-model_states.pt
2.4G layer_67-model_00-model_states.pt
2.4G layer_68-model_00-model_states.pt
2.4G layer_69-model_00-model_states.pt
2.4G layer_70-model_00-model_states.pt
2.4G layer_71-model_00-model_states.pt
2.4G layer_72-model_00-model_states.pt
2.4G layer_73-model_00-model_states.pt
2.4G layer_74-model_00-model_states.pt
2.4G layer_75-model_00-model_states.pt
2.4G layer_76-model_00-model_states.pt
2.4G layer_77-model_00-model_states.pt
2.4G layer_78-model_00-model_states.pt
2.4G layer_79-model_00-model_states.pt
2.4G layer_80-model_00-model_states.pt
2.4G layer_81-model_00-model_states.pt
2.4G layer_82-model_00-model_states.pt
41M layer_84-model_00-model_states.pt
Thanks for the awesome work! (and a especially for choosing to make it freely available)
If you have time, please also consider running the evaluation benchmarks from lm-eval-harness
https://github.com/EleutherAI/lm-evaluation-harness
[despite it having a ton of different benchmarks, you only need to implement one interface, and it runs all benchmarks for you]
It is a more-or-less standard tool for benchmarking how well does your model perform on a range of tasks (generation, common sense, math, etc)
There's a huge bunch of tasks, so if you want to choose some initial set, consider taking the ones that gpt-J reports here https://huggingface.co/EleutherAI/gpt-j-6B#evaluation-results
Thanks for open-sourcing this! Because the GPU ram requirements are so high, it's hard to rent a large enough single node from any of the major cloud providers. How can you run it in inference mode networked between multiple physical machines?
Thanks!
Is there a way to run the model on AWS?
Thanks for this great project.
In your blog: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6, you have used LAMB optimizer and ZeRO offload, but isn't ZeRO CPU offload have to use DeepSpeedCPUAdam for good performace?
And i did not find LAMB optimizer codes in this project code.
Hello, I'm trying to use YaLM to generate text. I am using pretrained models. But when I try to run the generation, I get an error:
RuntimeError: CUDA out of memory. Tried to allocate 76.00 MiB (GPU 0; 5.80 GiB total capacity; 62.50 MiB already allocated; 20.81 MiB free; 64.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
GPU is 1660, 6gb vram. Is there anything I can do about it or have I wasted a few weeks?
For example if i can't use transformers pipeline to use it with LangChain, how i need it use it that it connects to LangChain?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.