Hi. I tried to find a way to control the generate function in a more flexible way. Cou

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

generate options about stylegan2-pytorch HOT 13 OPEN

lucidrains commented on May 24, 2024

generate options

from stylegan2-pytorch.

Comments (13)

lucidrains commented on May 24, 2024 3

@materialvision ohh not yet, but I can look into adding one, perhaps with a flag like --generate-interpolation?

from stylegan2-pytorch.

dancrew32 commented on May 24, 2024 3

I upgraded Google Drive storage (100GB for $1.99/mo), went in on Colab Pro ($9.99/mo) and it appears to be able to train with defaults at --image_size=256, getting 1-2s iterations! Looks like 54 hours required for 100k iterations.

I tried image_size=1024 and image_size=512 out but couldn't get it to fit in the 16GB GPU Google Colab pro offers. This is the "high memory" runtime GPU option. I've not tried TPU.

Logging my trials to make it fit in 1024 or 512 below for anyone who's interested:

1024 defaults

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=1024

Runtime error during the first iteration:

tcmalloc: large alloc 2415919104 bytes == 0x7088000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 1207959552 bytes == 0x99098000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 2415919104 bytes == 0x7fbfd0000000 @  0x7fc0e5fedb6b ...
tcmalloc: large alloc 2415919104 bytes == 0x7fbf40000000 @  0x7fc0e5fedb6b ...

Traceback (most recent call last):
  File "/usr/local/bin/stylegan2_pytorch", line 66, in <module>
    fire.Fire(train_from_folder)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/bin/stylegan2_pytorch", line 61, in train_from_folder
    retry_call(model.train, tries=3, exceptions=NanException)
  File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 101, in retry_call
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
  File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 527, in train
    generated_images = self.GAN.G(w_styles, noise)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 337, in forward
    x, rgb = block(x, rgb, style, input_noise)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 274, in forward
    x = self.conv2(x, style2)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 228, in forward
    weights = w2 * (w1 + 1)

# RuntimeError: CUDA out of memory. Tried to allocate 6.75 GiB (GPU 0; 15.90 GiB total capacity; 14.78 GiB already allocated; 403.88 MiB free; 14.80 GiB reserved in total by PyTorch)

1024 and 512 via "Memory considerations" recommendations

I tried the following settings from the Memory considerations section of the readme for 1024 and 512, but experience a similar error where there is just not enough memory.

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=1024 --batch-size=3 --gradient-accumulate-every=5 --network-capacity=16

fp16, Apex not available

Tried --fp16 at 1024, but picked up this Apex not available error:

Traceback (most recent call last):
  File "/usr/local/bin/stylegan2_pytorch", line 66, in <module>
    fire.Fire(train_from_folder)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/bin/stylegan2_pytorch", line 42, in train_from_folder
    fp16 = fp16
  File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 458, in __init__
    assert not fp16 or fp16 and APEX_AVAILABLE, 'Apex is not available for you to use mixed precision training'
AssertionError: Apex is not available for you to use mixed precision training

512 defaults

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512

Similar runtime error:

# RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 15.90 GiB total capacity; 12.73 GiB already allocated; 1.49 GiB free; 13.71 GiB reserved in total by PyTorch)

So close to fitting, but just ~200MB over.

512 batch=1

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512 --batch-size=1 --gradient-accumulate-every=5 --network-capacity=16

RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 15.90 GiB total capacity; 13.94 GiB already allocated; 29.88 MiB free; 15.17 GiB reserved in total by PyTorch)

Smaller network capacity

I was able to get it to start training for 512px images by lowering all values to 1: batch size, gradient-accumulate-every and setting network-capacity to 8 (1s/iteration):

stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512 --batch-size=1 --gradient-accumulate-every=1 --network-capacity=8

Conclusion

Not sure if it is worth waiting two days with these settings if it will be low quality, so kicking off a default 256 run now. Thanks again for the library!

from stylegan2-pytorch.

materialvision commented on May 24, 2024 2

Thank you. Of course then 1024px is difficult to get to work on my 8GB GPU... will try more different settings but seems like 512 is the largest possible....

from stylegan2-pytorch.

lucidrains commented on May 24, 2024 1

@dancrew32 Hello Dan, unfortunately not at the moment. I think the cheapest route is to use the official stylegan2 repository from Nvidia, and to train on Colab for free. You can checkpoint every so often to your google drive, and resume for a couple days. I will eventually get around to making this library compatible with Microsoft's Deepspeed for accelerated distributed training

from stylegan2-pytorch.

materialvision commented on May 24, 2024 1

Thanks for sharing your trials. I had been planning to test on Colab for 512... but it seems you had problems there also. The only thing I would try is to raise the gradient-accumulate-every, as stated in the readme it should be higher as you go down on batch-size. I manage to run on my RTX 2070 with: --image-size 512 --batch-size 1 --gradient-accumulate-every 20 --network-capacity 7 but results stops improving after some epochs... Let us know if you get some good colab results!

from stylegan2-pytorch.

lucidrains commented on May 24, 2024

@materialvision hello! there is a setting --num_image_tiles that you can set to 1 if you desire single images

from stylegan2-pytorch.

materialvision commented on May 24, 2024

Thanks! What about size? Right now the images are only 128x128 px. --image_size setting makes no difference...

from stylegan2-pytorch.

lucidrains commented on May 24, 2024

@materialvision you can only generate on the image size that you trained it on

from stylegan2-pytorch.

materialvision commented on May 24, 2024

One more newbie question... Will it be possible to generate interpolations as series of images (for animation)?

from stylegan2-pytorch.

materialvision commented on May 24, 2024

That would be amazing, thanks!

from stylegan2-pytorch.

dancrew32 commented on May 24, 2024

Say that you trained at image_size 128px and you spent a little while (and $) training it on a p2.xlarge (https://aws.amazon.com/ec2/instance-types/p2/). Say the results are great, but now you want to increase resolution. Is there a strategy for "upgrading" the model to 1024px or larger (2048px? 7680 × 4320 8K?) without having to redo all of the 100k num_train_steps at --image_size 1024?

Also any suggestions for training this thing faster/cheaper? Recommendations for GPUs also welcome. Thanks for sharing your implementations/setups, everyone.

from stylegan2-pytorch.

dancrew32 commented on May 24, 2024

Thanks for the suggestion @materialvision, those settings are working on Colab getting ~5s/iteration (140 hours to 100k iterations, so 6 days haha). I guess you can run two notebooks with the high ram GPU, so I'm doing a shootout of 256 default vs. 512 with your settings.

from stylegan2-pytorch.

ugurcansakizli commented on May 24, 2024

Hello, total noob here. Is there a way to generate similar images to a specific image we like? And can we ask for more than one image with --generate? (like --generate --num-image-tiles 1 --num-images 5)
-upon some more reading I guess what I want is to be able to explore the latent space.

from stylegan2-pytorch.

generate options about stylegan2-pytorch HOT 13 OPEN

Comments (13)

1024 defaults

1024 and 512 via "Memory considerations" recommendations

fp16, Apex not available

512 defaults

512 batch=1

Smaller network capacity

Conclusion

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent