Is 40 GB VRAM enough for training?,about guoqincode/open-animateanyone

guoqincode commented on August 14, 2024 6

torchrun --nnodes=1 --nproc_per_node=2 train.py --config configs/training/train_stage_1.yaml with A100-80G is OOM

The bug is from ReferenceNetAttention Class. Some Tensors on CUDA are not released which causes the memory to increase at each step.

Thanks for identifying the issue in ReferenceNetAttention Class. Could you create a pull request with your fix?

I fixed the bug, thank you!

from open-animateanyone.

guoqincode commented on August 14, 2024 2

I was able to train on an 80G machine, if you want to train on a 40G machine, I would recommend lowering the batch size and increasing the gradient accumulation, and if it's still OOM, you can use deepspeed (I'll be integrating deepspeed training in the near future if there's enough training data)

from open-animateanyone.

SmileTAT commented on August 14, 2024 2

torchrun --nnodes=1 --nproc_per_node=2 train.py --config configs/training/train_stage_1.yaml with A100-80G is OOM

The bug is from ReferenceNetAttention Class. Some Tensors on CUDA are not released which causes the memory to increase at each step.

from open-animateanyone.

SmileTAT commented on August 14, 2024 1

1. NVIDIA-SMI

2. pip list

accelerate 0.25.0
aiohttp 3.9.1
aiosignal 1.3.1
altair 5.2.0
antlr4-python3-runtime 4.9.3
appdirs 1.4.4
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.1.0
black 23.7.0
blinker 1.7.0
braceexpand 0.1.7
cachetools 5.3.2
certifi 2023.11.17
chardet 5.1.0
charset-normalizer 3.3.2
click 8.1.7
clip 1.0
cmake 3.27.9
contourpy 1.2.0
cycler 0.12.1
decorator 5.1.1
decord 0.6.0
diffusers 0.24.0
docker-pycreds 0.4.0
einops 0.7.0
exceptiongroup 1.2.0
executing 2.0.1
fairscale 0.4.13
filelock 3.13.1
fire 0.5.0
fonttools 4.46.0
frozenlist 1.4.0
fsspec 2023.12.1
ftfy 6.1.3
gitdb 4.0.11
GitPython 3.1.40
huggingface-hub 0.19.4
idna 3.6
imageio 2.33.1
importlib-metadata 6.11.0
invisible-watermark 0.2.0
ipython 8.18.1
jedi 0.19.1
Jinja2 3.1.2
jsonschema 4.20.0
jsonschema-specifications 2023.11.2
kiwisolver 1.4.5
kornia 0.6.9
lightning-utilities 0.10.0
lit 17.0.6
loralib 0.1.2
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.2
matplotlib-inline 0.1.6
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.4
mypy-extensions 1.0.0
natsort 8.4.0
networkx 3.2.1
ninja 1.11.1.1
numpy 1.26.2
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
omegaconf 2.3.0
open-clip-torch 2.23.0
opencv-python 4.6.0.66
packaging 23.2
pandas 2.1.3
parso 0.8.3
pathspec 0.11.2
pexpect 4.9.0
Pillow 10.1.0
pip 23.3.1
platformdirs 4.1.0
prompt-toolkit 3.0.41
protobuf 3.20.3
psutil 5.9.6
ptyprocess 0.7.0
pudb 2023.1
pure-eval 0.2.2
pyarrow 14.0.1
pydeck 0.8.1b0
Pygments 2.17.2
pyparsing 3.1.1
python-dateutil 2.8.2
pytorch-lightning 2.0.1
pytz 2023.3.post1
PyWavelets 1.5.0
PyYAML 6.0.1
referencing 0.31.1
regex 2023.10.3
requests 2.31.0
rich 13.7.0
rpds-py 0.13.2
safetensors 0.4.1
scipy 1.11.4
sentencepiece 0.1.99
sentry-sdk 1.38.0
setproctitle 1.3.3
setuptools 68.0.0
six 1.16.0
smmap 5.0.1
stack-data 0.6.3
streamlit 1.29.0
sympy 1.12
tenacity 8.2.3
tensorboardX 2.6
termcolor 2.4.0
timm 0.9.12
tokenizers 0.12.1
toml 0.10.2
tomli 2.0.1
toolz 0.12.0
torch 2.0.1
torchaudio 2.0.2
torchdata 0.6.1
torchmetrics 1.2.1
torchvision 0.15.2
tornado 6.4
tqdm 4.66.1
traitlets 5.14.0
transformers 4.32.0
triton 2.0.0
typing_extensions 4.8.0
tzdata 2023.3
tzlocal 5.2
urllib3 1.26.18
urwid 2.3.4
urwid-readline 0.13
validators 0.22.0
wandb 0.16.1
watchdog 3.0.0
wcwidth 0.2.12
webdataset 0.2.83
wheel 0.41.2
xformers 0.0.22
yarl 1.9.3
zipp 3.17.0

3. cmd

CUDA_VISIBLE_DEVICES=6,7 torchrun --nnodes=1 --nproc_per_node=2 train.py --config configs/training/train_stage_1.yaml

4. logs

Steps: 0%| | 27/30000 [00:24<7:22:00, 1.13it/s, lr=0.0001, step_loss=0.0587]Traceback (most recent call last):
File "AnimateAnyone-unofficial/train.py", line 574, in
main(name=name, launcher=args.launcher, use_wandb=args.wandb, **config)
File "AnimateAnyone-unofficial/train.py", line 468, in main
scaler.step(optimizer)
File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 374, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step
retval = optimizer.step(*args, **kwargs)
File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(*args, **kwargs)
File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
ret = func(self, *args, **kwargs)
File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 171, in step
adamw(
File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 321, in adamw
func(
File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 566, in _multi_tensor_adamw
denom = torch._foreach_add(exp_avg_sq_sqrt, eps)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 79.35 GiB total capacity; 77.04 GiB already allocated; 5.19 MiB free; 77.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

from open-animateanyone.

SmileTAT commented on August 14, 2024

torchrun --nnodes=1 --nproc_per_node=2 train.py --config configs/training/train_stage_1.yaml with A100-80G is OOM

from open-animateanyone.

guoqincode commented on August 14, 2024

torchrun --nnodes=1 --nproc_per_node=2 train.py --config configs/training/train_stage_1.yaml with A100-80G is OOM

Can you provide your environment and training logs?

from open-animateanyone.

guoqincode commented on August 14, 2024

1. NVIDIA-SMI

470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 +-------------------------------+----------------------+----------------------+ | 6 NVIDIA A100-SXM... On | 00000000:C9:00.0 Off | 0 | | N/A 32C P0 65W / 400W | 3MiB / 81251MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 7 NVIDIA A100-SXM... On | 00000000:CF:00.0 Off | 0 | | N/A 30C P0 68W / 400W | 3MiB / 81251MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

2. pip list

accelerate 0.25.0 aiohttp 3.9.1 aiosignal 1.3.1 altair 5.2.0 antlr4-python3-runtime 4.9.3 appdirs 1.4.4 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.1.0 black 23.7.0 blinker 1.7.0 braceexpand 0.1.7 cachetools 5.3.2 certifi 2023.11.17 chardet 5.1.0 charset-normalizer 3.3.2 click 8.1.7 clip 1.0 cmake 3.27.9 contourpy 1.2.0 cycler 0.12.1 decorator 5.1.1 decord 0.6.0 diffusers 0.24.0 docker-pycreds 0.4.0 einops 0.7.0 exceptiongroup 1.2.0 executing 2.0.1 fairscale 0.4.13 filelock 3.13.1 fire 0.5.0 fonttools 4.46.0 frozenlist 1.4.0 fsspec 2023.12.1 ftfy 6.1.3 gitdb 4.0.11 GitPython 3.1.40 huggingface-hub 0.19.4 idna 3.6 imageio 2.33.1 importlib-metadata 6.11.0 invisible-watermark 0.2.0 ipython 8.18.1 jedi 0.19.1 Jinja2 3.1.2 jsonschema 4.20.0 jsonschema-specifications 2023.11.2 kiwisolver 1.4.5 kornia 0.6.9 lightning-utilities 0.10.0 lit 17.0.6 loralib 0.1.2 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.8.2 matplotlib-inline 0.1.6 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.4 mypy-extensions 1.0.0 natsort 8.4.0 networkx 3.2.1 ninja 1.11.1.1 numpy 1.26.2 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 omegaconf 2.3.0 open-clip-torch 2.23.0 opencv-python 4.6.0.66 packaging 23.2 pandas 2.1.3 parso 0.8.3 pathspec 0.11.2 pexpect 4.9.0 Pillow 10.1.0 pip 23.3.1 platformdirs 4.1.0 prompt-toolkit 3.0.41 protobuf 3.20.3 psutil 5.9.6 ptyprocess 0.7.0 pudb 2023.1 pure-eval 0.2.2 pyarrow 14.0.1 pydeck 0.8.1b0 Pygments 2.17.2 pyparsing 3.1.1 python-dateutil 2.8.2 pytorch-lightning 2.0.1 pytz 2023.3.post1 PyWavelets 1.5.0 PyYAML 6.0.1 referencing 0.31.1 regex 2023.10.3 requests 2.31.0 rich 13.7.0 rpds-py 0.13.2 safetensors 0.4.1 scipy 1.11.4 sentencepiece 0.1.99 sentry-sdk 1.38.0 setproctitle 1.3.3 setuptools 68.0.0 six 1.16.0 smmap 5.0.1 stack-data 0.6.3 streamlit 1.29.0 sympy 1.12 tenacity 8.2.3 tensorboardX 2.6 termcolor 2.4.0 timm 0.9.12 tokenizers 0.12.1 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 torch 2.0.1 torchaudio 2.0.2 torchdata 0.6.1 torchmetrics 1.2.1 torchvision 0.15.2 tornado 6.4 tqdm 4.66.1 traitlets 5.14.0 transformers 4.32.0 triton 2.0.0 typing_extensions 4.8.0 tzdata 2023.3 tzlocal 5.2 urllib3 1.26.18 urwid 2.3.4 urwid-readline 0.13 validators 0.22.0 wandb 0.16.1 watchdog 3.0.0 wcwidth 0.2.12 webdataset 0.2.83 wheel 0.41.2 xformers 0.0.22 yarl 1.9.3 zipp 3.17.0

3. cmd

CUDA_VISIBLE_DEVICES=6,7 torchrun --nnodes=1 --nproc_per_node=2 train.py --config configs/training/train_stage_1.yaml

4. logs

Steps: 0%| | 27/30000 [00:24<7:22:00, 1.13it/s, lr=0.0001, step_loss=0.0587]Traceback (most recent call last): File "AnimateAnyone-unofficial/train.py", line 574, in main(name=name, launcher=args.launcher, use_wandb=args.wandb, **config) File "AnimateAnyone-unofficial/train.py", line 468, in main scaler.step(optimizer) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 374, in step retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step retval = optimizer.step(*args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper return wrapped(*args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper out = func(*args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad ret = func(self, *args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 171, in step adamw( File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 321, in adamw func( File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 566, in _multi_tensor_adamw denom = torch._foreach_add(exp_avg_sq_sqrt, eps) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 79.35 GiB total capacity; 77.04 GiB already allocated; 5.19 MiB free; 77.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

You can contact me at [email protected] and I will check this issue carefully when I have time

from open-animateanyone.

SmileTAT commented on August 14, 2024

1. NVIDIA-SMI

470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 +-------------------------------+----------------------+----------------------+ | 6 NVIDIA A100-SXM... On | 00000000:C9:00.0 Off | 0 | | N/A 32C P0 65W / 400W | 3MiB / 81251MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ | 7 NVIDIA A100-SXM... On | 00000000:CF:00.0 Off | 0 | | N/A 30C P0 68W / 400W | 3MiB / 81251MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

2. pip list

accelerate 0.25.0 aiohttp 3.9.1 aiosignal 1.3.1 altair 5.2.0 antlr4-python3-runtime 4.9.3 appdirs 1.4.4 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.1.0 black 23.7.0 blinker 1.7.0 braceexpand 0.1.7 cachetools 5.3.2 certifi 2023.11.17 chardet 5.1.0 charset-normalizer 3.3.2 click 8.1.7 clip 1.0 cmake 3.27.9 contourpy 1.2.0 cycler 0.12.1 decorator 5.1.1 decord 0.6.0 diffusers 0.24.0 docker-pycreds 0.4.0 einops 0.7.0 exceptiongroup 1.2.0 executing 2.0.1 fairscale 0.4.13 filelock 3.13.1 fire 0.5.0 fonttools 4.46.0 frozenlist 1.4.0 fsspec 2023.12.1 ftfy 6.1.3 gitdb 4.0.11 GitPython 3.1.40 huggingface-hub 0.19.4 idna 3.6 imageio 2.33.1 importlib-metadata 6.11.0 invisible-watermark 0.2.0 ipython 8.18.1 jedi 0.19.1 Jinja2 3.1.2 jsonschema 4.20.0 jsonschema-specifications 2023.11.2 kiwisolver 1.4.5 kornia 0.6.9 lightning-utilities 0.10.0 lit 17.0.6 loralib 0.1.2 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.8.2 matplotlib-inline 0.1.6 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.4 mypy-extensions 1.0.0 natsort 8.4.0 networkx 3.2.1 ninja 1.11.1.1 numpy 1.26.2 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 omegaconf 2.3.0 open-clip-torch 2.23.0 opencv-python 4.6.0.66 packaging 23.2 pandas 2.1.3 parso 0.8.3 pathspec 0.11.2 pexpect 4.9.0 Pillow 10.1.0 pip 23.3.1 platformdirs 4.1.0 prompt-toolkit 3.0.41 protobuf 3.20.3 psutil 5.9.6 ptyprocess 0.7.0 pudb 2023.1 pure-eval 0.2.2 pyarrow 14.0.1 pydeck 0.8.1b0 Pygments 2.17.2 pyparsing 3.1.1 python-dateutil 2.8.2 pytorch-lightning 2.0.1 pytz 2023.3.post1 PyWavelets 1.5.0 PyYAML 6.0.1 referencing 0.31.1 regex 2023.10.3 requests 2.31.0 rich 13.7.0 rpds-py 0.13.2 safetensors 0.4.1 scipy 1.11.4 sentencepiece 0.1.99 sentry-sdk 1.38.0 setproctitle 1.3.3 setuptools 68.0.0 six 1.16.0 smmap 5.0.1 stack-data 0.6.3 streamlit 1.29.0 sympy 1.12 tenacity 8.2.3 tensorboardX 2.6 termcolor 2.4.0 timm 0.9.12 tokenizers 0.12.1 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 torch 2.0.1 torchaudio 2.0.2 torchdata 0.6.1 torchmetrics 1.2.1 torchvision 0.15.2 tornado 6.4 tqdm 4.66.1 traitlets 5.14.0 transformers 4.32.0 triton 2.0.0 typing_extensions 4.8.0 tzdata 2023.3 tzlocal 5.2 urllib3 1.26.18 urwid 2.3.4 urwid-readline 0.13 validators 0.22.0 wandb 0.16.1 watchdog 3.0.0 wcwidth 0.2.12 webdataset 0.2.83 wheel 0.41.2 xformers 0.0.22 yarl 1.9.3 zipp 3.17.0

3. cmd

CUDA_VISIBLE_DEVICES=6,7 torchrun --nnodes=1 --nproc_per_node=2 train.py --config configs/training/train_stage_1.yaml

4. logs

Steps: 0%| | 27/30000 [00:24<7:22:00, 1.13it/s, lr=0.0001, step_loss=0.0587]Traceback (most recent call last): File "AnimateAnyone-unofficial/train.py", line 574, in main(name=name, launcher=args.launcher, use_wandb=args.wandb, **config) File "AnimateAnyone-unofficial/train.py", line 468, in main scaler.step(optimizer) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 374, in step retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step retval = optimizer.step(*args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper return wrapped(*args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper out = func(*args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad ret = func(self, *args, **kwargs) File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 171, in step adamw( File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 321, in adamw func( File "anaconda/envs/generative-models/lib/python3.10/site-packages/torch/optim/adamw.py", line 566, in _multi_tensor_adamw denom = torch._foreach_add(exp_avg_sq_sqrt, eps) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 79.35 GiB total capacity; 77.04 GiB already allocated; 5.19 MiB free; 77.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

You can contact me at [email protected] and I will check this issue carefully when I have time

I will reuse the environment of [magic-animate] first, and then check the code.

from open-animateanyone.

guoqincode commented on August 14, 2024

torchrun --nnodes=1 --nproc_per_node=2 train.py --config configs/training/train_stage_1.yaml with A100-80G is OOM

The bug is from ReferenceNetAttention Class. Some Tensors on CUDA are not released which causes the memory to increase at each step.

Thanks for identifying the issue in ReferenceNetAttention Class. Could you create a pull request with your fix?

from open-animateanyone.

Is 40 GB VRAM enough for training? about open-animateanyone HOT 9 CLOSED

Comments (9)

1. NVIDIA-SMI

2. pip list

3. cmd

4. logs

1. NVIDIA-SMI

2. pip list

3. cmd

4. logs

1. NVIDIA-SMI

2. pip list

3. cmd

4. logs

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent