rosinality / stylegan2-pytorch Goto Github PK

View Code? Open in Web Editor NEW

2.7K 38.0 620.0 122.51 MB

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch

License: MIT License

Python 90.23% C++ 1.49% Cuda 8.28%

stylegan2

stylegan2-pytorch's Introduction

StyleGAN 2 in PyTorch

Implementation of Analyzing and Improving the Image Quality of StyleGAN (https://arxiv.org/abs/1912.04958) in PyTorch

Notice

I have tried to match official implementation as close as possible, but maybe there are some details I missed. So please use this implementation with care.

Requirements

I have tested on:

PyTorch 1.3.1
CUDA 10.1/10.2

Usage

First create lmdb datasets:

python prepare_data.py --out LMDB_PATH --n_worker N_WORKER --size SIZE1,SIZE2,SIZE3,... DATASET_PATH

This will convert images to jpeg and pre-resizes it. This implementation does not use progressive growing, but you can create multiple resolution datasets using size arguments with comma separated lists, for the cases that you want to try another resolutions later.

Then you can train model in distributed settings

python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT train.py --batch BATCH_SIZE LMDB_PATH

train.py supports Weights & Biases logging. If you want to use it, add --wandb arguments to the script.

SWAGAN

This implementation experimentally supports SWAGAN: A Style-based Wavelet-driven Generative Model (https://arxiv.org/abs/2102.06108). You can train SWAGAN by using

python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT train.py --arch swagan --batch BATCH_SIZE LMDB_PATH

As noted in the paper, SWAGAN trains much faster. (About ~2x at 256px.)

Convert weight from official checkpoints

You need to clone official repositories, (https://github.com/NVlabs/stylegan2) as it is requires for load official checkpoints.

For example, if you cloned repositories in ~/stylegan2 and downloaded stylegan2-ffhq-config-f.pkl, You can convert it like this:

python convert_weight.py --repo ~/stylegan2 stylegan2-ffhq-config-f.pkl

This will create converted stylegan2-ffhq-config-f.pt file.

Generate samples

python generate.py --sample N_FACES --pics N_PICS --ckpt PATH_CHECKPOINT

You should change your size (--size 256 for example) if you train with another dimension.

Project images to latent spaces

python projector.py --ckpt [CHECKPOINT] --size [GENERATOR_OUTPUT_SIZE] FILE1 FILE2 ...

Closed-Form Factorization (https://arxiv.org/abs/2007.06600)

You can use closed_form_factorization.py and apply_factor.py to discover meaningful latent semantic factor or directions in unsupervised manner.

First, you need to extract eigenvectors of weight matrices using closed_form_factorization.py

python closed_form_factorization.py [CHECKPOINT]

This will create factor file that contains eigenvectors. (Default: factor.pt) And you can use apply_factor.py to test the meaning of extracted directions

python apply_factor.py -i [INDEX_OF_EIGENVECTOR] -d [DEGREE_OF_MOVE] -n [NUMBER_OF_SAMPLES] --ckpt [CHECKPOINT] [FACTOR_FILE]

For example,

python apply_factor.py -i 19 -d 5 -n 10 --ckpt [CHECKPOINT] factor.pt

Will generate 10 random samples, and samples generated from latents that moved along 19th eigenvector with size/degree +-5.

Pretrained Checkpoints

Link

I have trained the 256px model on FFHQ 550k iterations. I got FID about 4.5. Maybe data preprocessing, resolution, training loop could made this difference, but currently I don't know the exact reason of FID differences.

Samples

Sample from FFHQ. At 110,000 iterations. (trained on 3.52M images)

Sample from MetFaces with Non-leaking augmentations. At 150,000 iterations. (trained on 4.8M images)

Samples from converted weights

Sample from FFHQ (1024px)

Sample from LSUN Church (256px)

License

Model details and custom CUDA kernel codes are from official repostiories: https://github.com/NVlabs/stylegan2

Codes for Learned Perceptual Image Patch Similarity, LPIPS came from https://github.com/richzhang/PerceptualSimilarity

To match FID scores more closely to tensorflow official implementations, I have used FID Inception V3 implementations in https://github.com/mseitzer/pytorch-fid

stylegan2-pytorch's People

Contributors

Stargazers

Watchers

Forkers

s-aiueo32 zfbi wangpan2017 chroneus wangxuewen99 gaohuazuo ahuirecome christinaliang lotayou qingzi02010 jackerz312 pacifinapacific harskish onion-liu viuts nazarblch llltttppp nadavbh12 jwhennessey kevinstan felime matanby likeafoolqvq phymucs lechangalex guoxpl swj0418 asears arjdesign khuongnd apollack-novetta crleagure young1403 mohammedalghamdi bitzl xd666 binahhu azhangwei yaru-zhang terrybroad heliang-zheng immocat mickelzhang wyuzyf dayu1979 ml-and-ai-repo jone1222 peterouzh zhiyugege jproney bennymi lulu-meng abollo zergey marcusrobbins conson0214 zengyh1900 aaxwaz wangke0809 lendelthegreat fengqian1989 feathernox dvschultz pnsuau obake2ai synctrust roxanneluo easy-shu maradonna90 antoinesueur oguzhanca thopliterce raijinspecial evgenykashin pableeto antonpaquin naomi-ken-korem gabriel-hurtado animadversio redaihanyu rayoct18 njuhaozhang cclauss ssitb xinntao shimashahfar xiaoye77 brainud howie-cn hackgoofer lelechen63 jingjing-you nivha jacobwjs hugosenetaire sebastianberns patrickacole sizhky andrewjong caroline-xinyue

stylegan2-pytorch's Issues

Training from checkpoint fails when using multiple GPUs

Hi,

Thanks for implementing the paper in pytorch. I am having problems training a model starting from a specific checkpoint when using multiple GPUs.

When using a single GPU I can run train.py using the following command:
python -m torch.distributed.launch --nproc_per_node=1 train.py --batch 16 --iter 150000 --ckpt checkpoint/start_ckpt.pt dataset
The first sample images look like they are from when I stopped training the network previously.

When I use 4 GPUs I run into a Cuda out of memory issue, using the command:
python -m torch.distributed.launch --nproc_per_node=4 train.py --batch 16 --iter 150000 --ckpt checkpoint/start_ckpt.pt dataset

I then get the following CUDA: Out of memory error

load model: checkpoint/start_ckpt.pt load model: checkpoint/start_ckpt.pt load model: checkpoint/start_ckpt.pt Traceback (most recent call last): File "trainoad model: checkpoint/start_ckpt.pt train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, device) File "train.py", line 189, in train real_pred = discriminator(real_img) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward output = self.module(*inputs[0], **kwargs[0]) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/jameswhennessey/stylegan2-pytorch/model.py", line 647, in forward out = self.convs(input) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/jameswhennessey/stylegan2-pytorch/model.py", line 598, in forward out = self.conv2(out) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 82, in forward return fused_leaky_relu(input, self.bias, self.negative_slope, self.scale) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 86, in fused_leaky_relu return FusedLeakyReLUFunction.apply(input, bias, negative_slope, scale) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 55, in forward out = fused.fused_bias_act(input, bias, empty, 3, 0, negative_slope, scale) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 86, in fused_leaky_relu return FusedLeakyReLUFunction.apply(input, bias, negative_slope, scale) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 55, in forward out = fused.fused_bias_act(input, bias, empty, 3, 0, negative_slope, scale) RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.75 GiB total capacity; 8.03 GiB already allocated; 101.38 MiB free; 565.07 MiB cached) (malloc at /opt/conda/conda-bld/pytorch_1573049306803/work/c10/cuda/CUDACachingAllocator.cpp:267) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f397a01f687 in /opt/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so) ......

Do you have any suggestions on how to resolve this issue?

Thanks in advance.

PILLOW_VERSION doesn't exist anymore

From pillow version 7.0.0 PILLOW_VERSION doesn't exist anymore but is replaced by version.

# VERSION was removed in Pillow 6.0.0.
# PILLOW_VERSION was removed in Pillow 7.0.0.
# Use __version__ instead.

error when running train.py

line 135 in train.py:
fake, latent = generator.module([test_in], return_latents=True)

when run train.py, it seems that "generator.module
AttributeError: 'Generator' object has no attribute 'module'".
My pytorch is 1.3.1, do you have some idea about that? Thank you so much!

floating point exception when batch size is 1

When batch size is set to 1 and path_batch_shrink is unchanged, the program halts to floating point exception due to args.batch // args.path_batch_shrink == 0 in
noise = mixing_noise(
args.batch // args.path_batch_shrink, args.latent, args.mixing, device
)
probably worth pointing out.

Also the implementation follows config-f in the paper i am assuming?

RuntimeError: Error building extension 'fused'

Hi,
Thanks for your great work
Currently i am trying for tweaking latents using InterFaceGan which requires pytorch pickle file.
I want to implement this for stylegan2 architechture for which yours stylegan2-pytorch useful for creating pth file.
created a folder src in the stylegan2-pytorch directory and copied all the files in src directory.

Also written a shell script for converting weight is
convert_weight.sh (inside the src directory)
python convert_weight.py --repo ./stylegan2 stylegan2-ffhq-config-f.pkl

Now at outsider I have created a docker file like below
########################################################################
FROM tensorflow/tensorflow:1.15.0-gpu-py3
########Installing conda
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates
libglib2.0-0 libxext6 libsm6 libxrender1
git mercurial subversion
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh &&
/bin/bash ~/miniconda.sh -b -p /opt/conda &&
rm ~/miniconda.sh &&
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh &&
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc &&
echo "conda activate base" >> ~/.bashrc
##INSTALL PYTORCH AND TORCHVISION
RUN conda install pytorch==1.3.1 torchvision==0.4.2 cudatoolkit=10.1 -c pytorch
###Install pillow with version 6.2.1
RUN conda install Pillow==6.2.1
ADD src/ /
RUN chmod +x ./convert_weight.sh
CMD ./convert_weight.sh
#########################################################################

installations i did are:
/Miniconda3
python3.7
pytorch==1.3.1
torchvision==0.4.2
cudatoolkit=10.1
Pillow==6.2.1

While running the docker file, in model.py while importing Generator getting error while running
os.path.join(module_path, 'fused_bias_act_kernel.cu')
as
RuntimeError: Error building extension 'fused': [1/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
In file included from /op/fused_bias_act_kernel.cu:11:0:
/opt/conda/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:12:10: fatal error: cusparse.h: No such file or directory
#include <cusparse.h>
^~~~~~~~~~~~
compilation terminated.
[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /op/fused_bias_act.cpp -o fused_bias_act.o
ninja: build stopped: subcommand failed.

I have installed whatever mentioned in the requirement place.
Can any one help of how to solve

If you provide a Dockerfile for executing , installations helpful to me alot.
Thanks in advance.

Regards,
SandhyaLaxmi Kanna

Windows10 Error building extension

Windows10
VS 2017
PyTorch 1.3.1
CUDA 10.1

Any idea? Thanks

[3/3] "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64/link.exe" fused_bias_act.o fused_bias_act_kernel.cuda.o /nologo /DLL c10.lib torch.lib torch_python.lib _C.lib "/LIBPATH:C:\Program Files\Python37\libs" "/LIBPATH:C:\Program Files\Python37\lib\site-packages\torch\lib" "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib/x64" cudart.lib /out:fused.pyd

FAILED: fused.pyd 

"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64/link.exe" fused_bias_act.o fused_bias_act_kernel.cuda.o /nologo /DLL c10.lib torch.lib torch_python.lib _C.lib "/LIBPATH:C:\Program Files\Python37\libs" "/LIBPATH:C:\Program Files\Python37\lib\site-packages\torch\lib" "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib/x64" cudart.lib /out:fused.pyd

   Creating library fused.lib and object fused.exp

fused_bias_act_kernel.cuda.o : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl c10::cuda::CUDAStream::operator struct CUstream_st *(void)const " (__imp_??BCUDAStream@cuda@c10@@QEBAPEAUCUstream_st@@XZ) referenced in function "class at::Tensor __cdecl fused_bias_act_op(class at::Tensor const &,class at::Tensor const &,class at::Tensor const &,int,int,float,float)" (?fused_bias_act_op@@YA?AVTensor@at@@AEBV12@00HHMM@Z)

fused_bias_act_kernel.cuda.o : error LNK2019: unresolved external symbol "__declspec(dllimport) class c10::cuda::CUDAStream __cdecl c10::cuda::getCurrentCUDAStream(short)" (__imp_?getCurrentCUDAStream@cuda@c10@@YA?AVCUDAStream@12@F@Z) referenced in function "class at::Tensor __cdecl fused_bias_act_op(class at::Tensor const &,class at::Tensor const &,class at::Tensor const &,int,int,float,float)" (?fused_bias_act_op@@YA?AVTensor@at@@AEBV12@00HHMM@Z)

fused.pyd : fatal error LNK1120: 2 unresolved externals

ninja: build stopped: subcommand failed.

Train.py hanging when running on a single GPU

I am having issues to use your reimplementation to train an agent on my data. When I run the code on my desktop, I get the error CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.76 GiB total capacity; 7.05 GiB already allocated; 55.69 MiB free; 166.59 MiB cached) 0%| | 0/800000 [00:00<?, ?it/s]

I also have access to a GPU cluster and I tried to run the script there using CUDA_VISIBLE_DEVICES=7 python train.py --batch 4 ./Maps_512/. Here, I don't get any output after launching the command and from nvidia-smi it looks like the GPU is never used. Do you have suggestions on why is that?

Pretrained Discriminator

Thanks for your repo!
Can you add functions to convert the pretrained TF Discriminators?

E.g.
In convert_weight.py, you have
_, _, g_ema = pickle.load(f)

Can you update the conversion scripts to allow:
_, D, g_ema = pickle.load(f)

Error on running train.py

Hello,

I've been trying to get the training script to run but have been running into some issues. The current issue I'm facing seems to be somewhat similar to another post by cyrilzakka from this week. when running !python -m torch.distributed.launch --nproc_per_node=1 --master_port=1234 /content/stylegan2-pytorch/train.py --batch 5 /content/, however his fix of changing the size of the batch doesn't seem to solve my issue. The bulk of the error message I receive is very similar to their, noting similar lines and faults, however also stating that there is a missing file for the 'op/fused_bias_act.cpp', even though the repo has been cloned. The error is:

Traceback (most recent call last):
File "/content/stylegan2-pytorch/train.py", line 21, in
from model import Generator, Discriminator
File "/content/stylegan2-pytorch/model.py", line 11, in
from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/content/stylegan2-pytorch/op/init.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/stylegan2-pytorch/op/fused_act.py", line 6, in
fused = load('fused', sources=['op/fused_bias_act.cpp', 'op/fused_bias_act_kernel.cu'])
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 809, in _jit_compile
with_cuda=with_cuda
File "/usr/local/lib/python3.6/dist-packages/torch/utils/_cpp_extension_versioner.py", line 44, in bump_version_if_changed
hash_value = hash_source_files(hash_value, source_files)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/_cpp_extension_versioner.py", line 16, in hash_source_files
with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: 'op/fused_bias_act.cpp'
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', '/content/stylegan2-pytorch/train.py', '--local_rank=0', '--batch', '5', '/content/']' returned non-zero exit status 1.

Generate faces

Hi,
I have a checkpoint model, but now how can i generate faces? In stylegan repo you have a generate.py but here i dont know how can i run the checkpoint for samples.

Details of perceptual path regularization

Hello, thank you for this amazing repo! I had a couple of questions related to the perceptual path regularization.

Could you give an explanation of the arguments "path_regularize" and "path_batch_shrink" and why they are both set to a default of 2?
Would you mind explaining the purpose of the bolded summing operation on line 227 in train.py?

weighted_path_loss = args.path_regularize * args.g_reg_every * path_loss
if args.path_batch_shrink:
weighted_path_loss += 0 * fake_img[0, 0, 0, 0]

How to load the already compiled C++/cuda extension?

First, thank you so much for sharing your gggggggggggggggggggggggggggreat work! (I hate tf so much :()

I modified the code a little and I'm tring to debug the code in pycharm with a single GPU by switching nn.parallel.DistributedDataParallel to torch.nn.DataParallel.
The code goes well if excuted in shell command line and the extension lib are correctly generated, but if I run it in pycharm, I got:

'''
Traceback (most recent call last):
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module
check=True)
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/xxxxx/xxxxx/stylegan2-pytorch-master/train.py", line 26, in
from model import Generator, Discriminator
File "/home/xxxxx/xxxxx/stylegan2-pytorch-master/model.py", line 11, in
from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/home/xxxxx/xxxxx/stylegan2-pytorch-master/op/init.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_act.py", line 6, in
fused = load('fused', sources=['op/fused_bias_act.cpp', 'op/fused_bias_act_kernel.cu'])
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 830, in jit_compile
with_cuda=with_cuda)
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 883, in write_ninja_file_and_build
build_extension_module(name, build_directory, verbose)
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1043, in build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/3] /home/xxxxx/share/cuda-10.0 /bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/TH -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/THC -isystem /home/xxxxx/share/cuda-10.0 /include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/home/xxxxx/share/cuda-10.0 /bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/TH -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/THC -isystem /home/xxxxx/share/cuda-10.0 /include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/bin/sh: 1: /home/xxxxx/share/cuda-10.0: Permission denied
[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/TH -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/THC -isystem /home/xxxxx/share/cuda-10.0 /include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o
FAILED: fused_bias_act.o
c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/TH -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/THC -isystem /home/xxxxx/share/cuda-10.0 /include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o
c++: error: /include: No such file or directory
ninja: build stopped: subcommand failed.
'''

How can I just load the pre-compiled extension library? Which can skip the permission problem and save a lot of time

ffhq-e cannot be converted from official model but ffhq-f made it

Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.
Traceback (most recent call last):
  File "convert_weight.py", line 214, in <module>
    state_dict = fill_statedict(state_dict, g_ema.vars, size)
  File "convert_weight.py", line 148, in fill_statedict
    convert_torgb(vars, f'G_synthesis/{reso}x{reso}/ToRGB', f'to_rgbs.{i}'),
  File "convert_weight.py", line 101, in update
    raise ValueError(f'Shape mismatch: {v.shape} vs {state_dict[k].shape}')
ValueError: Shape mismatch: torch.Size([1, 3, 256, 1, 1]) vs torch.Size([1, 3, 512, 1, 1])
`

how to get the style_mixing_example like the official version?

error importing fused operation in model.py with pytorch 1.4 version

When i try to convert weight from official tensorflow checkpoint, there is an error importing fused operations in model.py. Like error message in the bottom, file called 'fused' is missing in /tmp/torch_extensions folder.
I'm trying on PyTorch 1.4, since 1.3 seems to have other problems with various library's versions.

Do you have any clue regarding to this error?

Transfer learning

Did you try to use transfer learning for generator training, ie using FFHQ generator weights as initial state for LSUN datasets training?

Training 1024 * 1024 resolution images default setting

Hi:
According to the generated sample,i think that the result is very good, so i want to train the 1024 * 1024 resolution images, could i know your default setting.Besides, there is another problem bother me is that how to evaluate the trained model, could you give me some suggestion. Thank you.

Build Custom Extensions on Windows

So I've been toying around with the custom extensions in C++ and CUDA, and was initially able to build and use them successfully in a dev environment within a Google Colab notebook, which I know are hosted on Linux servers. My goal is to use the extensions on a task that I plan to run locally on a Windows machine. After getting it setup in exactly the same way and build and install the extensions as I did in the Linux environment but now on Windows, when I try to import or use the extensions I'm met with NVIDIA and Visual Studio errors. All packages and requirements are the same version, with the only difference being Windows instead of Linux, and compiling C++ using VS command support over gcc/g++. Are either of those factors truly required to use the extensions that you know of/can you think of any work around?

Several errors each of the following format:

c:\program files\nvidia gpu computing toolkit\cuda\v10.1\include\sm_32_intrinsics.hpp(135): error: asm operand type size(8) does not match type/size implied by constraint 'r'

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\vcruntime_new.h(194): error: first parameter of allocation function must be of type "size_t"

The generated images are weird when using 120000.pt trained by myself

The training steps are follow your guide，somewhere error？The generated images as follows: @rosinality

Overall batch size and batch size per gpu

It looks like in NVIDIA's styleGAN2, they split the batch between gpus such that 4 samples are sent to each gpu (minibatch_gpu_base = 4). Based on the default settings in the repo and this comment, it seems as though you do not require 4 samples per gpu. Why is that?

Generating from 18x512 latents

First, thanks for making this library.

I have some 18x512 latents that I optimized using the official version. What parameters should I use to generate the same image from those latents as the official version? I'm using the same weights but getting different results.

Continue training from converted official weights?

Is it possible to continue training using the converted weights? I noticed that the converted weights only contain the state_dict for g_ema. However, train.py assumes that the checkpoint has state_dict for g, d, g_ema, g_optim, and d_optim.

Can convert_weight.py be modified to fix this? Or would it be easier to just train from scratch?

Ask for Software environment

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
RuntimeError: Error building extension 'fused': [1/3] /usr/local/cuda/bin/nvcc
ninja: build stopped: subcommand failed.

I spent a day, but I can't train it.

How to generate attribute specific images

Hi. Thanks for the grate works.

In https://github.com/Puzer/stylegan-encoder, new direction in latent space can be obtained for smile, age, gender.
In https://github.com/spiorf/stylegan-encoder given to get latent space directions for more attributes

How can we get latent-space directions for the attributes like pose, smile, age, gender, hair colour,... in stylegan2???

Thanks in advance.

Regards,
Sandhya

Why use nn.Distributed.DataParallel vs. nn.DataParallel

Hello! I wanted to ask 2 questions.

Why did choose to use Distributed DataParallel instead of normal DataParallel?
I tried to modify your code with normal DataParallel, but found that it errors out on the autograd.grad() call in the path regularization loss. Particularly, it indicates that there are unused inputs and asks that the "allow_unsused" flag be set to True. However, when that flag is set to true, autograd.grad() returns "None" as an output, and we therefore cannot perform the necessary calculations. This does not occur when running in the Distributed DataParallel configuration, but I don't quite understand why.

Thank you again for your great work, and any help would be appreciated!

Ask for pre-trained models

Hi, thank you for sharing your code!
Do you have any plan to share your pre-trained models?

torch version

Some errors occurred during compiling the code, can you tell us the version of the torch, and other software environment, such as cuda, cudnn, gcc, ninja, re2c. Thank you !

Errors Running on Colab

When running the command !python -m torch.distributed.launch --nproc_per_node=1 --master_port=1234 train.py --batch 1 /content/data/preprocessed on Google Colab, it first complains that 'Ninja is required to load C++ extensions' which can be fixed by running pip install ninja.

Unfortunately running the command again after restarting runtime returns the following error which I can't seem to get around:

0% 0/800000 [00:00<?, ?it/s]Traceback (most recent call last):
 File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
   "__main__", mod_spec)
 File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
   exec(code, run_globals)
 File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in <module>
   main()
 File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main
   cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'train.py', '--local_rank=0', '--batch', '1', '/content/data/preprocessed']' died with <Signals.SIGFPE: 8>.

Any idea on how to get this to work?

Error of pickle.load() in conver_wight.py

Thank you very much for sharing this amazing code!!

I failed to run the 'convert_weight.py' file.
After setting up the tensorflow plugin successfully, the converting process broke down when trying to load the "stylegan2-ffhq-config-f.pkl" file. It seems the '.pkl' file can't by loaded. But the '.pkl' file is downloaded from the official google drive directly. Do you have any idea about my problem?

Noise Injection

Small bug in projector.py causes --w_plus option to be ignored

I just noticed that in the current version of projector.py the --w_plus option that would allow to use the [18,512] latent vectors instead of the [1,512] gets ignored.

The current code is:

if args.w_plus:
    latent_in.unsqueeze(1).repeat(1, g_ema.n_latent, 1)

But it should probably be:

if args.w_plus:
    latent_in = latent_in.unsqueeze(1).repeat(1, g_ema.n_latent, 1)

CUDNN_STATUS_NOT_SUPPORTED

Hi,

When I try to run the code on multiple GPUs (or even a single GPU), I get the following error:

  File "./stylegan2-pytorch/model.py", line 256, in forward
    out = F.conv_transpose2d(input, weight, padding=0, stride=2, groups=batch)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

Any idea why this might be happening and how to resolve it?

My conda env uses python 3.7, cudatoolkit 10.1, cuda drivers 10.1 and pytorch 1.3. Thanks a lot!

GPU memory useage

Thanks for your great work about reproducing the StyleGAN and its v2.
I am running the styleganv2 code on my 2 GPU (2080ti) workstation. I found if I set the image resolution to 256x256, the maximum batch size I could achieve is 2. Yet, in the official implementation, they list that config-f on 256 resolution only takes 6.4GB GPU memory. I am wondering to know the difference between them, since it seems that I cannot implement 1024 resolution on my workstation (or even larger GPU station). BTW, have you tried how many GPUs could deal with 1024 resolution?
Thanks again!

plan for generate style-mixing images?

Hi, do you have some plan to add the style-mixing part?

returned non-zero exit status 1

I am working with jupyter with those parameters : pytorch1.1.0, cuda 10.1, gcc 7.4.0

And I always have the following error :

Do you know where is it from ? Thank you for your help !

Error on loading checkpoint when training

Loading a model checkpoint as so:

!python train.py --ckpt '/content/stylegan2-pytorch/checkpoint/010000.pt' --batch 128 --size 16 --iter 10001 /content/

results in the following error:

load model: /content/stylegan2-pytorch/checkpoint/010000.pt
Traceback (most recent call last):
  File "train.py", line 393, in <module>
    generator.load_state_dict(ckpt['g'])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
	Missing key(s) in state_dict: "convs.2.conv.weight", "convs.2.conv.blur.kernel", "convs.2.conv.modulation.weight", "convs.2.conv.modulation.bias", "convs.2.noise.weight", "convs.2.activate.bias", "convs.3.conv.weight", "convs.3.conv.modulation.weight", "convs.3.conv.modulation.bias", "convs.3.noise.weight", "convs.3.activate.bias", "to_rgbs.1.bias", "to_rgbs.1.upsample.kernel", "to_rgbs.1.conv.weight", "to_rgbs.1.conv.modulation.weight", "to_rgbs.1.conv.modulation.bias".

I will debug and get back to you as soon as I find a workaround!

Edit 1: This might be related to the checkpoint being trained on a different image size (e.g. loading an 8x8 checkpoint for images of size 16x16 a la progressive growing)

Ask for Projector

Hi, thank you for this repo.
Can you add the projector to find the closest matching latent vector to an input image as in the original tensorflow implementation?

Errors regarding compilation of FusedLeakyRelu cuda kernels

Hi I get the following errors while trying to use the Fused activation.Do any one have an idea why?

Traceback (most recent call last):
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 960, in _build_extension_module
check=True)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 17, in
from model import StyledGenerator, Discriminator, TextureSpaceDiscriminator
File "/is/cluster/work/pghosh/gif1.0/model.py", line 19, in
from my_utils.stylegan2_model import StyledConv
File "/is/cluster/work/pghosh/gif1.0/my_utils/stylegan2_model.py", line 11, in
from my_utils.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/is/cluster/work/pghosh/gif1.0/my_utils/op/init.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_act.py", line 14, in
os.path.join(module_path, 'fused_bias_act_kernel.cu'),
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 658, in load
is_python_module)
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 827, in jit_compile
with_cuda=with_cuda)
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 880, in write_ninja_file_and_build
build_extension_module(name, build_directory, verbose)
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 973, in build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/2] /is/software/nvidia/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/TH -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/THC -isystem /is/software/nvidia/cuda-10.0/include -isystem /is/ps2/pghosh/.virtualenvs/gif/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/is/software/nvidia/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/TH -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/THC -isystem /is/software/nvidia/cuda-10.0/include -isystem /is/ps2/pghosh/.virtualenvs/gif/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed