The scaling-up-stylegan2 from l4rz

scaling-up-stylegan2's Issues

Attention Layer

Hey l4rz,
thanks for the extensive research on this topic.

Have you considered adding attention layer(s) instead of increasing the capacity to achieve higher quality?
E.g. lucidrains (https://github.com/lucidrains/stylegan2-pytorch) claims this greatly improves results.

However, adding attention to the XXL model will probably yield OOMs. It would be interesting to see what benefits results more, more convolutional filters or attention.
Also we know that for harder tasks (more classes and poses), e.g. imagenet, StyleGANs fail at modeling larger structures.
We also know, BigGAN which employs attention on scale 64^2 (correct me if I'm wrong) does a far better job here.

I would just like to know what your experiences and thoughts are here.

Too much artifacts on generated images -Will try your recommendation

This is my original setup StyleGanV2-ADA. Even at 24,000 kimg (24 million images went through network).

Total images 47,000 (256x256) with 19 conditional classes
4xV100 batch 64.
RAM/CPU much more than enough
Ubuntu 18/Torch 1.8 CUDA 11.1
Used paper256, R1 (gamma)=10, mit-han-lab augmentation all 3

REAL

FAKE

Not good AT ALL!
I am trying your setup now. Any hint for this dataset that you can recommend further:
'paper256_STEVE_START': dict(ref_gpus=8, kimg=10000, mb=64, mbstd=8, fmaps=0.5, lrate=2e-3, gamma=10, ema=20, ramp=None, map=8),
'paper256_STEVE_MID_10000': dict(ref_gpus=8, kimg=10000, mb=64, mbstd=8, fmaps=0.5, lrate=2e-4, gamma=10, ema=20, ramp=None, map=8),
'paper256_STEVE_END_20000' : dict(ref_gpus=8, kimg=50000, mb=64, mbstd=8, fmaps=0.5, lrate=1e-4, gamma=5, ema=20, ramp=None, map=8),

I also setting D_lrate = 2x G_lrate.
args.G_opt_kwargs = dnnlib.EasyDict(class_name='torch.optim.Adam', lr=spec.lrate, betas=[0,0.99], eps=1e-8)
args.D_opt_kwargs = dnnlib.EasyDict(class_name='torch.optim.Adam', lr=spec.lrate*2, betas=[0,0.99], eps=1e-8)

So at each level of kimg, I will try the above setting and seems like my dataset is somewhat cifar-alike hence I eliminate style mixing, pl_weight regularization as well . Should I ?
args.loss_kwargs.pl_weight = 0 # disable path length regularization
args.loss_kwargs.style_mixing_prob = 0 # disable style mixing
args.D_kwargs.architecture = 'orig' # disable residual skip connections

I do not re-train from scratch again but resume from last ptre-trained (which has mixing style, gamma=10 etc.). Is it a problem or should I re-train completely ?

Thanks, mate, any help is appreciated.
Steve

Great article, would love to see if you could try diffusion model?

Great article, from "Practical aspects of StyleGAN2 training" to "Scaling up StyleGAN2". A lot of insights really helpful.
Just wondering if you could try something like diffusion model or other state-of-the-art model for a better synthetic image quality?

Thanks.

l4rz / scaling-up-stylegan2 Goto Github PK

scaling-up-stylegan2's People

Contributors

Stargazers

Watchers

Forkers

scaling-up-stylegan2's Issues

Attention Layer

Too much artifacts on generated images -Will try your recommendation

Great article, would love to see if you could try diffusion model?

could you publish the code your modified

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent