Comments (27)
@oliverguhr it is because, in the latest version, I introduce a hack that is used in BigGAN and StyleGAN, called truncation. What it does is it brings the intermediate style vector closer to its average, cutting out the outlier distributions. this results in general better image quality.
from stylegan2-pytorch.
For me, there was no noticeable difference in the results between a batch size of 3 and a network capacity of 32 and batch size 7 and network capacity of 24. But there is a difference with the newest 0.4.23 version. The model pics up the structure of the faces much quicker and produces better results with fewer iterations. Here is a preview of my current training results from 0 to 160000 iterations in 10000 iteration steps
from stylegan2-pytorch.
Could anyone provide us with a pre-trained PyTorch model? I assume most people won't bother training their own models and you'd also help save this planet by not allowing everybody to train a model for a week on 1313432 V100 GPUs.
from stylegan2-pytorch.
In the original implementation (as in https://github.com/nvlabs/stylegan2), the default is to train for 25,000 kimgs, or equivalently 25,000,000 iterations. I believe that this is due to the lack of training. After all, the paper claims to have trained on 8 V100s for as long as a week to yield superior results.
from stylegan2-pytorch.
I trained the model a while longer (200k) iterations, with the best results at about 160k iterations
However, after that, it got only worse and there are still some artefacts in it.
Since I want to know which parameter leads to the improvement, I am currently running a second try with the default batch size of 3 and network capacity of 32. Will post some update on that tomorrow. And than retrain with your latest patches.
from stylegan2-pytorch.
Thank you very much! I downloaded the full resolution images and started the training with your updates. I post the results as soon as it's ready (+33h).
from stylegan2-pytorch.
First of all, thank you for all the work you put in this!
I played a bit with the parameters and started a new run with a batch size of 7 and network capacity of 24. This is the maximum that fits in my 11 GB VRAM. It's way slower and still running but the losses are much more stable and the result is looking better. Here is the result after 123k iterations.
It trained the previous model with the default batch size of 3. I think that these small batches could be the reason why the model hat problems. I had problems in the past where small batches lead to unstable loss gradients. But since I also changed the network capacity I am not sure, which parameter lead to the improvement.
from stylegan2-pytorch.
@oliverguhr oh my, that looks great! I have made some further changes, and fully removed the ratio data augmentation on the newest version. Yes, the network capacity linearly corresponds with the number of parameters, and as you know, with deep learning, the bigger the better lol
I will need to look into setting some new defaults for batch size. I agree they are probably not as big as they should be.
from stylegan2-pytorch.
Hi @yuanlunxi here you can read more about FP16
I did not share my model, because the results are not perfect yet. I don't know what I can expect, but the results looked not as good as what Nvidia published.
from stylegan2-pytorch.
Hi Oliver,
Thanks for trying this replication. I made a small change to the data augmentation that may give you better results. f00ba1f You should try it at a larger resolution if you can. The amount of data should be sufficient.
from stylegan2-pytorch.
The results are looking a bit strange. Here is the model output after 195 epoch:
Left the model trained on Version 0.4.15 and the small images,
right the model trained on version 0.4.16 with the full resolution images.
Also, the model output didn't change much from epoch 50 to 195.
from stylegan2-pytorch.
@oliverguhr Indeed, I tried doing a couple runs on my own dataset (which took an entire day) and confirmed that it stopped learning - perhaps the relaxed data augmentation made it too easy for the discriminator. I brought back the random aspect ratios for now, but a bit closer to a ratio of 1, so it should be less distorted than before. I am doing another training run at the moment to verify that learning is restored. Sorry about that, and I will continue to look into this issue to see how I can fully remove the random aspect ratios.
from stylegan2-pytorch.
hi, what does the argment 'fp16' mean? and how to use?
from stylegan2-pytorch.
could you share a pretained model for face?
from stylegan2-pytorch.
@oliverguhr Hi, if you have some better result please share here, and Which version of code did you try ? only 0.4.23 ?
Do you try with newst version ?? like 0.14.1 ?
from stylegan2-pytorch.
@Johnson-yue I started a new training run with the latest version of the code and it looks promising. I am using two attention layers and a resolution of 128x128.
This is a sample after 472,000 iterations. Way to go until 25 million iterations.
Unfortunately, I was not able to start the training using FP16. Apex is running, but at some point, the script fails with a null exception.
from stylegan2-pytorch.
@oliverguhr good result!!
from stylegan2-pytorch.
I don't know what happed, but until iteration 682k the results got worse:
one(!) iteration later the image looked like this:
And after some more iterations, the images went completely dark.
@lucidrains Do you have any idea what happened here? I can provide the models and results if this helps.
from stylegan2-pytorch.
Sorry for the late response. Here is a list of trained models (and some sample results) that you can download:
model_203.pt
model_300.pt
model_400.pt
model_500.pt
model_550.pt
model_600.pt
model_650.pt
model_700.pt
model_757.pt
from stylegan2-pytorch.
@oliverguhr which commit were you using to train? I'm trying to load the model you provided but I'm not able to load it into the GAN. Missing some keys on loading the module... "..._blocks.1.1.fn.fn.2.weight", "D_aug.D.attn_blocks.1.1.fn.fn.2.bias", "D_aug.D.final_conv.weight", "D_aug.D.final_conv.bias", "D_aug.D.to_logit.weight", "D_aug.D.to_logit.bias" ..."
from stylegan2-pytorch.
@jomach Version 1.2.3
I wonder if this should be part of the config.json.
from stylegan2-pytorch.
I think this comes from saving only the dictionary instead of the full model...
@jomach Version 1.2.3
I wonder if this should be part of the config.json.
by bad. Never mind.
from stylegan2-pytorch.
@jomach Version 1.2.3
I wonder if this should be part of the config.json.
excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...
from stylegan2-pytorch.
excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...
It is in the first post.
Hello,
I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,
from stylegan2-pytorch.
excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...
It is in the first post.
Hello,
I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,
Are you using multiple GPUs? How long can I run to achieve a good result? I ran the 1000 pictures in ffhq on colab and it took 120 hours to iterate 150,000 times. Is this normal?
from stylegan2-pytorch.
You can find expected training times for StyleGAN2 here: https://github.com/NVlabs/stylegan2-ada-pytorch#expected-training-time
For 128x128 resolution, with only 1 GPU, you should expect 13 seconds per kimg of training.
For full training with the recommended 25000 kimg, that is about 4 days of training (with 24h/day, which you cannot have on Colab).
Moreover, you won't have the same GPU every time on Colab. So if you end up with a bad one, that is more training time.
Finally, it is hard to judge your 150,000 iterations, because you don't mention the batch size, or the kimg/iteration. If you have parameters similar to the ones mentioned in this post, I guess you should have similar results: #33 (comment)
from stylegan2-pytorch.
excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...
It is in the first post.
Hello,
I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,Are you using multiple GPUs? How long can I run to achieve a good result? I ran the 1000 pictures in ffhq on colab and it took 120 hours to iterate 150,000 times. Is this normal?
Do you mean with 1000 pictures 1 kimgs? 1 kimg would be 1000 faked image iterations as I understood, is this true?
from stylegan2-pytorch.
Related Issues (20)
- What do the G, D, GP, and SS numbers mean? HOT 1
- I get a "DefaultCPUAllocation error: not enough memory" error
- Parameters for generating faces HOT 1
- Performance on MNIST
- Out of memory after exactly 5024 iterations? HOT 1
- Web application for generating content while reloading.
- Tesla V100 GPU - 2600 Image - Too Slow Training HOT 1
- Uneven GPU utilization.
- Trainning on images with one (single) channel HOT 1
- Bug: random_hflip function HOT 1
- where can i download train data?
- Bug: gradient_accumulate_contexts function HOT 1
- Generate full resolution images 1024x1024 HOT 1
- generate all seeds of latent space
- ability to calculate Perpectual Path Length (PPL)? HOT 1
- Save Interval Flag
- /torch_utils/custom_ops.py - _find_compiler_bindir: incorrect Visual Studio Path
- Inconsistent evaluation of self.av HOT 2
- How to Train on a Single Image
- Examples on save_every and evaluate_every in README section needed.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stylegan2-pytorch.