Comments (13)
@materialvision ohh not yet, but I can look into adding one, perhaps with a flag like --generate-interpolation
?
from stylegan2-pytorch.
I upgraded Google Drive storage (100GB for $1.99/mo), went in on Colab Pro ($9.99/mo) and it appears to be able to train with defaults at --image_size=256
, getting 1-2s iterations! Looks like 54 hours required for 100k iterations.
I tried image_size=1024
and image_size=512
out but couldn't get it to fit in the 16GB GPU Google Colab pro offers. This is the "high memory" runtime GPU option. I've not tried TPU.
Logging my trials to make it fit in 1024 or 512 below for anyone who's interested:
1024 defaults
stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=1024
Runtime error during the first iteration:
tcmalloc: large alloc 2415919104 bytes == 0x7088000 @ 0x7fc0e5fedb6b ...
tcmalloc: large alloc 1207959552 bytes == 0x99098000 @ 0x7fc0e5fedb6b ...
tcmalloc: large alloc 2415919104 bytes == 0x7fbfd0000000 @ 0x7fc0e5fedb6b ...
tcmalloc: large alloc 2415919104 bytes == 0x7fbf40000000 @ 0x7fc0e5fedb6b ...
Traceback (most recent call last):
File "/usr/local/bin/stylegan2_pytorch", line 66, in <module>
fire.Fire(train_from_folder)
File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
target=component.__name__)
File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/bin/stylegan2_pytorch", line 61, in train_from_folder
retry_call(model.train, tries=3, exceptions=NanException)
File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 101, in retry_call
return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger)
File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 527, in train
generated_images = self.GAN.G(w_styles, noise)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 337, in forward
x, rgb = block(x, rgb, style, input_noise)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 274, in forward
x = self.conv2(x, style2)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 228, in forward
weights = w2 * (w1 + 1)
# RuntimeError: CUDA out of memory. Tried to allocate 6.75 GiB (GPU 0; 15.90 GiB total capacity; 14.78 GiB already allocated; 403.88 MiB free; 14.80 GiB reserved in total by PyTorch)
1024 and 512 via "Memory considerations" recommendations
I tried the following settings from the Memory considerations
section of the readme for 1024 and 512, but experience a similar error where there is just not enough memory.
stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=1024 --batch-size=3 --gradient-accumulate-every=5 --network-capacity=16
fp16, Apex not available
Tried --fp16
at 1024, but picked up this Apex not available error:
Traceback (most recent call last):
File "/usr/local/bin/stylegan2_pytorch", line 66, in <module>
fire.Fire(train_from_folder)
File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
target=component.__name__)
File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/bin/stylegan2_pytorch", line 42, in train_from_folder
fp16 = fp16
File "/usr/local/lib/python3.6/dist-packages/stylegan2_pytorch/stylegan2_pytorch.py", line 458, in __init__
assert not fp16 or fp16 and APEX_AVAILABLE, 'Apex is not available for you to use mixed precision training'
AssertionError: Apex is not available for you to use mixed precision training
512 defaults
stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512
Similar runtime error:
# RuntimeError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 15.90 GiB total capacity; 12.73 GiB already allocated; 1.49 GiB free; 13.71 GiB reserved in total by PyTorch)
So close to fitting, but just ~200MB
over.
512 batch=1
stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512 --batch-size=1 --gradient-accumulate-every=5 --network-capacity=16
RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 15.90 GiB total capacity; 13.94 GiB already allocated; 29.88 MiB free; 15.17 GiB reserved in total by PyTorch)
Smaller network capacity
I was able to get it to start training for 512px images by lowering all values to 1: batch size, gradient-accumulate-every and setting network-capacity to 8 (1s/iteration):
stylegan2_pytorch --data='/content/drive/My Drive/path/to/images' --image_size=512 --batch-size=1 --gradient-accumulate-every=1 --network-capacity=8
Conclusion
Not sure if it is worth waiting two days with these settings if it will be low quality, so kicking off a default 256 run now. Thanks again for the library!
from stylegan2-pytorch.
Thank you. Of course then 1024px is difficult to get to work on my 8GB GPU... will try more different settings but seems like 512 is the largest possible....
from stylegan2-pytorch.
@dancrew32 Hello Dan, unfortunately not at the moment. I think the cheapest route is to use the official stylegan2 repository from Nvidia, and to train on Colab for free. You can checkpoint every so often to your google drive, and resume for a couple days. I will eventually get around to making this library compatible with Microsoft's Deepspeed for accelerated distributed training
from stylegan2-pytorch.
Thanks for sharing your trials. I had been planning to test on Colab for 512... but it seems you had problems there also. The only thing I would try is to raise the gradient-accumulate-every, as stated in the readme it should be higher as you go down on batch-size. I manage to run on my RTX 2070 with: --image-size 512 --batch-size 1 --gradient-accumulate-every 20 --network-capacity 7 but results stops improving after some epochs... Let us know if you get some good colab results!
from stylegan2-pytorch.
@materialvision hello! there is a setting --num_image_tiles
that you can set to 1 if you desire single images
from stylegan2-pytorch.
Thanks! What about size? Right now the images are only 128x128 px. --image_size setting makes no difference...
from stylegan2-pytorch.
@materialvision you can only generate on the image size that you trained it on
from stylegan2-pytorch.
One more newbie question... Will it be possible to generate interpolations as series of images (for animation)?
from stylegan2-pytorch.
That would be amazing, thanks!
from stylegan2-pytorch.
Say that you trained at image_size 128px and you spent a little while (and $) training it on a p2.xlarge (https://aws.amazon.com/ec2/instance-types/p2/). Say the results are great, but now you want to increase resolution. Is there a strategy for "upgrading" the model to 1024px or larger (2048px? 7680 × 4320 8K?) without having to redo all of the 100k num_train_steps at --image_size 1024
?
Also any suggestions for training this thing faster/cheaper? Recommendations for GPUs also welcome. Thanks for sharing your implementations/setups, everyone.
from stylegan2-pytorch.
Thanks for the suggestion @materialvision, those settings are working on Colab getting ~5s/iteration (140 hours to 100k iterations, so 6 days haha). I guess you can run two notebooks with the high ram GPU, so I'm doing a shootout of 256 default vs. 512 with your settings.
from stylegan2-pytorch.
Hello, total noob here. Is there a way to generate similar images to a specific image we like? And can we ask for more than one image with --generate? (like --generate --num-image-tiles 1 --num-images 5)
-upon some more reading I guess what I want is to be able to explore the latent space.
from stylegan2-pytorch.
Related Issues (20)
- What do the G, D, GP, and SS numbers mean? HOT 1
- I get a "DefaultCPUAllocation error: not enough memory" error
- Parameters for generating faces HOT 1
- Performance on MNIST
- Out of memory after exactly 5024 iterations? HOT 1
- Web application for generating content while reloading.
- Tesla V100 GPU - 2600 Image - Too Slow Training HOT 1
- Uneven GPU utilization.
- Trainning on images with one (single) channel HOT 1
- Bug: random_hflip function HOT 1
- where can i download train data?
- Bug: gradient_accumulate_contexts function HOT 1
- Generate full resolution images 1024x1024 HOT 1
- generate all seeds of latent space
- ability to calculate Perpectual Path Length (PPL)? HOT 1
- Save Interval Flag
- /torch_utils/custom_ops.py - _find_compiler_bindir: incorrect Visual Studio Path
- Inconsistent evaluation of self.av HOT 2
- How to Train on a Single Image
- Examples on save_every and evaluate_every in README section needed.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stylegan2-pytorch.