I'm trying to run the code on a cloud machine with an NVIDIA A10 gpu (24gb vram) and it is getting the CUDA out of memory error. Do you have any suggestions for running this with less gpu memory usage?
Here is the full error:
Traceback (most recent call last): File "/app/paint-it/paint_it.py", line 320, in <module> main(args, guidance) File "/app/paint-it/paint_it.py", line 226, in main sd_loss = guidance.batch_train_step(text_embedding, obj_image, File "/app/paint-it/sd.py", line 135, in batch_train_step noise_pred = self.unet(latent_model_input, tt, encoder_hidden_states=text_embeddings).sample File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unets/unet_2d_condition.py", line 1121, in forward sample, res_samples = downsample_block( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unets/unet_2d_blocks.py", line 1199, in forward hidden_states = attn( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/transformers/transformer_2d.py", line 391, in forward hidden_states = block( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 400, in forward ff_output = self.ff(norm_hidden_states, scale=lora_scale) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 672, in forward hidden_states = module(hidden_states, scale) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/activations.py", line 103, in forward return hidden_states * self.gelu(gate) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 23.73 GiB total capacity; 20.89 GiB already allocated; 53.62 MiB free; 21.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF