Giter Site home page Giter Site logo

stylegan2-pytorch's Introduction

StyleGAN 2 in PyTorch

Implementation of Analyzing and Improving the Image Quality of StyleGAN (https://arxiv.org/abs/1912.04958) in PyTorch

Notice

I have tried to match official implementation as close as possible, but maybe there are some details I missed. So please use this implementation with care.

Requirements

I have tested on:

  • PyTorch 1.3.1
  • CUDA 10.1/10.2

Usage

First create lmdb datasets:

python prepare_data.py --out LMDB_PATH --n_worker N_WORKER --size SIZE1,SIZE2,SIZE3,... DATASET_PATH

This will convert images to jpeg and pre-resizes it. This implementation does not use progressive growing, but you can create multiple resolution datasets using size arguments with comma separated lists, for the cases that you want to try another resolutions later.

Then you can train model in distributed settings

python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT train.py --batch BATCH_SIZE LMDB_PATH

train.py supports Weights & Biases logging. If you want to use it, add --wandb arguments to the script.

SWAGAN

This implementation experimentally supports SWAGAN: A Style-based Wavelet-driven Generative Model (https://arxiv.org/abs/2102.06108). You can train SWAGAN by using

python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT train.py --arch swagan --batch BATCH_SIZE LMDB_PATH

As noted in the paper, SWAGAN trains much faster. (About ~2x at 256px.)

Convert weight from official checkpoints

You need to clone official repositories, (https://github.com/NVlabs/stylegan2) as it is requires for load official checkpoints.

For example, if you cloned repositories in ~/stylegan2 and downloaded stylegan2-ffhq-config-f.pkl, You can convert it like this:

python convert_weight.py --repo ~/stylegan2 stylegan2-ffhq-config-f.pkl

This will create converted stylegan2-ffhq-config-f.pt file.

Generate samples

python generate.py --sample N_FACES --pics N_PICS --ckpt PATH_CHECKPOINT

You should change your size (--size 256 for example) if you train with another dimension.

Project images to latent spaces

python projector.py --ckpt [CHECKPOINT] --size [GENERATOR_OUTPUT_SIZE] FILE1 FILE2 ...

Closed-Form Factorization (https://arxiv.org/abs/2007.06600)

You can use closed_form_factorization.py and apply_factor.py to discover meaningful latent semantic factor or directions in unsupervised manner.

First, you need to extract eigenvectors of weight matrices using closed_form_factorization.py

python closed_form_factorization.py [CHECKPOINT]

This will create factor file that contains eigenvectors. (Default: factor.pt) And you can use apply_factor.py to test the meaning of extracted directions

python apply_factor.py -i [INDEX_OF_EIGENVECTOR] -d [DEGREE_OF_MOVE] -n [NUMBER_OF_SAMPLES] --ckpt [CHECKPOINT] [FACTOR_FILE]

For example,

python apply_factor.py -i 19 -d 5 -n 10 --ckpt [CHECKPOINT] factor.pt

Will generate 10 random samples, and samples generated from latents that moved along 19th eigenvector with size/degree +-5.

Sample of closed form factorization

Pretrained Checkpoints

Link

I have trained the 256px model on FFHQ 550k iterations. I got FID about 4.5. Maybe data preprocessing, resolution, training loop could made this difference, but currently I don't know the exact reason of FID differences.

Samples

Sample with truncation

Sample from FFHQ. At 110,000 iterations. (trained on 3.52M images)

MetFaces sample with non-leaking augmentations

Sample from MetFaces with Non-leaking augmentations. At 150,000 iterations. (trained on 4.8M images)

Samples from converted weights

Sample from FFHQ

Sample from FFHQ (1024px)

Sample from LSUN Church

Sample from LSUN Church (256px)

License

Model details and custom CUDA kernel codes are from official repostiories: https://github.com/NVlabs/stylegan2

Codes for Learned Perceptual Image Patch Similarity, LPIPS came from https://github.com/richzhang/PerceptualSimilarity

To match FID scores more closely to tensorflow official implementations, I have used FID Inception V3 implementations in https://github.com/mseitzer/pytorch-fid

stylegan2-pytorch's People

Contributors

cclauss avatar jackerz312 avatar levindabhi avatar matanby avatar nivha avatar onion-liu avatar rosinality avatar terrybroad avatar woctezuma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stylegan2-pytorch's Issues

PPL z space

Hi, I tried the following for calculating PPL score in 'z' space. ppl.py contains the case for 'w' but not 'z'.

Before:

            inputs = torch.randn([batch * 2, latent_dim], device=device)
            lerp_t = torch.rand(batch, device=device)

            if args.space == 'w':
                latent = g.get_latent(inputs)
                latent_t0, latent_t1 = latent[::2], latent[1::2]
                latent_e0 = lerp(latent_t0, latent_t1, lerp_t[:, None])
                latent_e1 = lerp(latent_t0, latent_t1, lerp_t[:, None] + args.eps)
                latent_e = torch.stack([latent_e0, latent_e1], 1).view(*latent.shape)

            image, _ = g([latent_e], input_is_latent=True, noise=noise)

After:

            inputs = torch.randn([batch * 2, latent_dim], device=device)
            lerp_t = torch.rand(batch, device=device)
                
            latent = g.get_latent(inputs)
            latent_t0, latent_t1 = latent[::2], latent[1::2]
            if args.space == 'w':
                    latent_e0 = lerp(latent_t0, latent_t1, lerp_t[:, None])
                    latent_e1 = lerp(latent_t0, latent_t1, lerp_t[:, None] + args.eps)
                    latent_e = torch.stack([latent_e0, latent_e1], 1).view(*latent.shape)
            else:
                    latent_e0 = slerp(latent_t0, latent_t1, lerp_t[:, None])
                    latent_e1 = slerp(latent_t0, latent_t1, lerp_t[:, None] + args.eps)
                    latent_e = torch.stack([latent_e0, latent_e1], 1).view(*latent.shape)

            image, _ = g([latent_e], input_is_latent=True, noise=noise)

but received a PPL score in the single digits (far too low to be reasonable). is this the right way to implement the PPL for 'z'?

Generate faces

Hi,
I have a checkpoint model, but now how can i generate faces? In stylegan repo you have a generate.py but here i dont know how can i run the checkpoint for samples.

Training 1024 * 1024 resolution images default setting

Hi:
According to the generated sample,i think that the result is very good, so i want to train the 1024 * 1024 resolution images, could i know your default setting.Besides, there is another problem bother me is that how to evaluate the trained model, could you give me some suggestion. Thank you.

torch version

Some errors occurred during compiling the code, can you tell us the version of the torch, and other software environment, such as cuda, cudnn, gcc, ninja, re2c. Thank you !

Errors Running on Colab

When running the command !python -m torch.distributed.launch --nproc_per_node=1 --master_port=1234 train.py --batch 1 /content/data/preprocessed on Google Colab, it first complains that 'Ninja is required to load C++ extensions' which can be fixed by running pip install ninja.

Unfortunately running the command again after restarting runtime returns the following error which I can't seem to get around:

0% 0/800000 [00:00<?, ?it/s]Traceback (most recent call last):
 File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
   "__main__", mod_spec)
 File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
   exec(code, run_globals)
 File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in <module>
   main()
 File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main
   cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'train.py', '--local_rank=0', '--batch', '1', '/content/data/preprocessed']' died with <Signals.SIGFPE: 8>.

Any idea on how to get this to work?

no module named fused error

Hi, I prepared the dataset and run the train.py as you said in the readme but got into a problem regrading the cpp extention.

Traceback (most recent call last): File "train.py", line 20, in <module> from model import Generator, Discriminator File "/data/llh/projects/stylegan2/model.py", line 11, in <module> from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d File "/data/llh/projects/stylegan2/op/__init__.py", line 1, in <module> from .fused_act import FusedLeakyReLU, fused_leaky_relu File "/data/llh/projects/stylegan2/op/fused_act.py", line 6, in <module> fused = load('fused', sources=['op/fused_bias_act.cpp', 'op/fused_bias_act_kernel.cu']) File "/home/llh/miniconda3/envs/py3pt11/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in load is_python_module) File "/home/llh/miniconda3/envs/py3pt11/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 824, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/home/llh/miniconda3/envs/py3pt11/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 967, in _import_module_from_library file, path, description = imp.find_module(module_name, [path]) File "/home/llh/miniconda3/envs/py3pt11/lib/python3.6/imp.py", line 297, in find_module raise ImportError(_ERR_MSG.format(name), name=name) ImportError: No module named 'fused'

any idea?

Generating from 18x512 latents

First, thanks for making this library.

I have some 18x512 latents that I optimized using the official version. What parameters should I use to generate the same image from those latents as the official version? I'm using the same weights but getting different results.

floating point exception when batch size is 1

When batch size is set to 1 and path_batch_shrink is unchanged, the program halts to floating point exception due to args.batch // args.path_batch_shrink == 0 in
noise = mixing_noise(
args.batch // args.path_batch_shrink, args.latent, args.mixing, device
)
probably worth pointing out.

Also the implementation follows config-f in the paper i am assuming?

Pretrained Discriminator

Thanks for your repo!
Can you add functions to convert the pretrained TF Discriminators?

E.g.
In convert_weight.py, you have
_, _, g_ema = pickle.load(f)

Can you update the conversion scripts to allow:
_, D, g_ema = pickle.load(f)

Overall batch size and batch size per gpu

It looks like in NVIDIA's styleGAN2, they split the batch between gpus such that 4 samples are sent to each gpu (minibatch_gpu_base = 4). Based on the default settings in the repo and this comment, it seems as though you do not require 4 samples per gpu. Why is that?

Ask for Projector

Hi, thank you for this repo.
Can you add the projector to find the closest matching latent vector to an input image as in the original tensorflow implementation?

Error of pickle.load() in conver_wight.py

Thank you very much for sharing this amazing code!!

I failed to run the 'convert_weight.py' file.
After setting up the tensorflow plugin successfully, the converting process broke down when trying to load the "stylegan2-ffhq-config-f.pkl" file. It seems the '.pkl' file can't by loaded. But the '.pkl' file is downloaded from the official google drive directly. Do you have any idea about my problem?

image
image

Questions about g_path_regularize function

Hi,
Thank you so much for sharing this, I am also studying how to implement stylegan2 using pytorch. At present, the overall framework is basically completed, but there are still some details to be implemented. On how to implement delayed regularization with a single machine and multiple graphics cards. I am a little confused here about path[0].backward(),In my understanding, path[0]'s grad_fn is False and cannot be back-propagated. Because the grad_fn of the gradient grad is also False in the function g_path_regularize, it is not clear why path_penalty can calculate the gradient by backpropagation alone. Hope to get answers, thank you.

error importing fused operation in model.py with pytorch 1.4 version

When i try to convert weight from official tensorflow checkpoint, there is an error importing fused operations in model.py. Like error message in the bottom, file called 'fused' is missing in /tmp/torch_extensions folder.
I'm trying on PyTorch 1.4, since 1.3 seems to have other problems with various library's versions.

Do you have any clue regarding to this error?

image

Windows10 Error building extension

Windows10
VS 2017
PyTorch 1.3.1
CUDA 10.1

Any idea? Thanks

[3/3] "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64/link.exe" fused_bias_act.o fused_bias_act_kernel.cuda.o /nologo /DLL c10.lib torch.lib torch_python.lib _C.lib "/LIBPATH:C:\Program Files\Python37\libs" "/LIBPATH:C:\Program Files\Python37\lib\site-packages\torch\lib" "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib/x64" cudart.lib /out:fused.pyd

FAILED: fused.pyd 

"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64/link.exe" fused_bias_act.o fused_bias_act_kernel.cuda.o /nologo /DLL c10.lib torch.lib torch_python.lib _C.lib "/LIBPATH:C:\Program Files\Python37\libs" "/LIBPATH:C:\Program Files\Python37\lib\site-packages\torch\lib" "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib/x64" cudart.lib /out:fused.pyd

   Creating library fused.lib and object fused.exp

fused_bias_act_kernel.cuda.o : error LNK2019: unresolved external symbol "__declspec(dllimport) public: __cdecl c10::cuda::CUDAStream::operator struct CUstream_st *(void)const " (__imp_??BCUDAStream@cuda@c10@@QEBAPEAUCUstream_st@@XZ) referenced in function "class at::Tensor __cdecl fused_bias_act_op(class at::Tensor const &,class at::Tensor const &,class at::Tensor const &,int,int,float,float)" (?fused_bias_act_op@@YA?AVTensor@at@@AEBV12@00HHMM@Z)

fused_bias_act_kernel.cuda.o : error LNK2019: unresolved external symbol "__declspec(dllimport) class c10::cuda::CUDAStream __cdecl c10::cuda::getCurrentCUDAStream(short)" (__imp_?getCurrentCUDAStream@cuda@c10@@YA?AVCUDAStream@12@F@Z) referenced in function "class at::Tensor __cdecl fused_bias_act_op(class at::Tensor const &,class at::Tensor const &,class at::Tensor const &,int,int,float,float)" (?fused_bias_act_op@@YA?AVTensor@at@@AEBV12@00HHMM@Z)

fused.pyd : fatal error LNK1120: 2 unresolved externals

ninja: build stopped: subcommand failed.

Details of perceptual path regularization

Hello, thank you for this amazing repo! I had a couple of questions related to the perceptual path regularization.

  1. Could you give an explanation of the arguments "path_regularize" and "path_batch_shrink" and why they are both set to a default of 2?

  2. Would you mind explaining the purpose of the bolded summing operation on line 227 in train.py?

weighted_path_loss = args.path_regularize * args.g_reg_every * path_loss
if args.path_batch_shrink:
     weighted_path_loss += 0 * fake_img[0, 0, 0, 0]

CUDA error: an illegal memory access was encontered

Thanks for your great work!
When i run train.py at my dataset, i got this error. I got my dataset by
python prepare_data.py --out '/data/kmaeii/dataset/stylegan2/bag_texture_mdb' --n_worker 16 --size 128,256 '/data/kmaeii/dataset/stylegan2'
image

error when running train.py

line 135 in train.py:
fake, latent = generator.module([test_in], return_latents=True)

when run train.py, it seems that "generator.module
AttributeError: 'Generator' object has no attribute 'module'".
My pytorch is 1.3.1, do you have some idea about that? Thank you so much!

returned non-zero exit status 1

I am working with jupyter with those parameters : pytorch1.1.0, cuda 10.1, gcc 7.4.0

And I always have the following error :

Capture d’écran 2020-03-02 à 22 09 38

Do you know where is it from ? Thank you for your help !

How to load the already compiled C++/cuda extension?

First, thank you so much for sharing your gggggggggggggggggggggggggggreat work! (I hate tf so much :()

I modified the code a little and I'm tring to debug the code in pycharm with a single GPU by switching nn.parallel.DistributedDataParallel to torch.nn.DataParallel.
The code goes well if excuted in shell command line and the extension lib are correctly generated, but if I run it in pycharm, I got:

'''
Traceback (most recent call last):
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module
check=True)
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/xxxxx/xxxxx/stylegan2-pytorch-master/train.py", line 26, in
from model import Generator, Discriminator
File "/home/xxxxx/xxxxx/stylegan2-pytorch-master/model.py", line 11, in
from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/home/xxxxx/xxxxx/stylegan2-pytorch-master/op/init.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_act.py", line 6, in
fused = load('fused', sources=['op/fused_bias_act.cpp', 'op/fused_bias_act_kernel.cu'])
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 830, in jit_compile
with_cuda=with_cuda)
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 883, in write_ninja_file_and_build
build_extension_module(name, build_directory, verbose)
File "/home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1043, in build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/3] /home/xxxxx/share/cuda-10.0 /bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/TH -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/THC -isystem /home/xxxxx/share/cuda-10.0 /include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/home/xxxxx/share/cuda-10.0 /bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/TH -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/THC -isystem /home/xxxxx/share/cuda-10.0 /include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/bin/sh: 1: /home/xxxxx/share/cuda-10.0: Permission denied
[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/TH -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/THC -isystem /home/xxxxx/share/cuda-10.0 /include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o
FAILED: fused_bias_act.o
c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/TH -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/lib/python3.6/site-packages/torch/include/THC -isystem /home/xxxxx/share/cuda-10.0 /include -isystem /home/xxxxx/miniconda3/envs/py3.6_tensoflow1.14_n/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/xxxxx/xxxxx/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o
c++: error: /include: No such file or directory
ninja: build stopped: subcommand failed.
'''

How can I just load the pre-compiled extension library? Which can skip the permission problem and save a lot of time

Error on loading checkpoint when training

Loading a model checkpoint as so:

!python train.py --ckpt '/content/stylegan2-pytorch/checkpoint/010000.pt' --batch 128 --size 16 --iter 10001 /content/

results in the following error:

load model: /content/stylegan2-pytorch/checkpoint/010000.pt
Traceback (most recent call last):
  File "train.py", line 393, in <module>
    generator.load_state_dict(ckpt['g'])
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
	Missing key(s) in state_dict: "convs.2.conv.weight", "convs.2.conv.blur.kernel", "convs.2.conv.modulation.weight", "convs.2.conv.modulation.bias", "convs.2.noise.weight", "convs.2.activate.bias", "convs.3.conv.weight", "convs.3.conv.modulation.weight", "convs.3.conv.modulation.bias", "convs.3.noise.weight", "convs.3.activate.bias", "to_rgbs.1.bias", "to_rgbs.1.upsample.kernel", "to_rgbs.1.conv.weight", "to_rgbs.1.conv.modulation.weight", "to_rgbs.1.conv.modulation.bias".

I will debug and get back to you as soon as I find a workaround!

Edit 1: This might be related to the checkpoint being trained on a different image size (e.g. loading an 8x8 checkpoint for images of size 16x16 a la progressive growing)

ffhq-e cannot be converted from official model but ffhq-f made it

Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.
Traceback (most recent call last):
  File "convert_weight.py", line 214, in <module>
    state_dict = fill_statedict(state_dict, g_ema.vars, size)
  File "convert_weight.py", line 148, in fill_statedict
    convert_torgb(vars, f'G_synthesis/{reso}x{reso}/ToRGB', f'to_rgbs.{i}'),
  File "convert_weight.py", line 101, in update
    raise ValueError(f'Shape mismatch: {v.shape} vs {state_dict[k].shape}')
ValueError: Shape mismatch: torch.Size([1, 3, 256, 1, 1]) vs torch.Size([1, 3, 512, 1, 1])
`

Build Custom Extensions on Windows

So I've been toying around with the custom extensions in C++ and CUDA, and was initially able to build and use them successfully in a dev environment within a Google Colab notebook, which I know are hosted on Linux servers. My goal is to use the extensions on a task that I plan to run locally on a Windows machine. After getting it setup in exactly the same way and build and install the extensions as I did in the Linux environment but now on Windows, when I try to import or use the extensions I'm met with NVIDIA and Visual Studio errors. All packages and requirements are the same version, with the only difference being Windows instead of Linux, and compiling C++ using VS command support over gcc/g++. Are either of those factors truly required to use the extensions that you know of/can you think of any work around?

Several errors each of the following format:

c:\program files\nvidia gpu computing toolkit\cuda\v10.1\include\sm_32_intrinsics.hpp(135): error: asm operand type size(8) does not match type/size implied by constraint 'r'

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\include\vcruntime_new.h(194): error: first parameter of allocation function must be of type "size_t"

Continue training from converted official weights?

Is it possible to continue training using the converted weights? I noticed that the converted weights only contain the state_dict for g_ema. However, train.py assumes that the checkpoint has state_dict for g, d, g_ema, g_optim, and d_optim.

Can convert_weight.py be modified to fix this? Or would it be easier to just train from scratch?

error occurred when run train.py

When I run train.py, this error occurred: ‘TORCH_CHECK’ was not declared in this scope. I am confused and have no idea how to solver it.
Thank you!

Multiple GPUs: Illegal Memory Access

Hi, thanks for the great work. This is the same problem as #13. When initializing either the generator or discriminator on cuda:1 while having 2+ GPUs available, an illegal memory access is triggered.

When using torch.distributed on a multi-gpu machine, this does not occur, since you manually set the cuda device to only one device, so device -1 is always that GPU. However, my CUDA skills are not great and I do not know how this can be solved, but it looks like an easy fix to me.

RuntimeError: Error building extension 'fused'

Hi,
Thanks for your great work
Currently i am trying for tweaking latents using InterFaceGan which requires pytorch pickle file.
I want to implement this for stylegan2 architechture for which yours stylegan2-pytorch useful for creating pth file.
created a folder src in the stylegan2-pytorch directory and copied all the files in src directory.

Also written a shell script for converting weight is
convert_weight.sh (inside the src directory)
python convert_weight.py --repo ./stylegan2 stylegan2-ffhq-config-f.pkl

Now at outsider I have created a docker file like below
########################################################################
FROM tensorflow/tensorflow:1.15.0-gpu-py3
########Installing conda
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates
libglib2.0-0 libxext6 libsm6 libxrender1
git mercurial subversion
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh &&
/bin/bash ~/miniconda.sh -b -p /opt/conda &&
rm ~/miniconda.sh &&
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh &&
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc &&
echo "conda activate base" >> ~/.bashrc
##INSTALL PYTORCH AND TORCHVISION
RUN conda install pytorch==1.3.1 torchvision==0.4.2 cudatoolkit=10.1 -c pytorch
###Install pillow with version 6.2.1
RUN conda install Pillow==6.2.1
ADD src/ /
RUN chmod +x ./convert_weight.sh
CMD ./convert_weight.sh
#########################################################################

installations i did are:
/Miniconda3
python3.7
pytorch==1.3.1
torchvision==0.4.2
cudatoolkit=10.1
Pillow==6.2.1

While running the docker file, in model.py while importing Generator getting error while running
os.path.join(module_path, 'fused_bias_act_kernel.cu')
as
RuntimeError: Error building extension 'fused': [1/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
In file included from /op/fused_bias_act_kernel.cu:11:0:
/opt/conda/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:12:10: fatal error: cusparse.h: No such file or directory
#include <cusparse.h>
^~~~~~~~~~~~
compilation terminated.
[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/lib/python3.7/site-packages/torch/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.7/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /op/fused_bias_act.cpp -o fused_bias_act.o
ninja: build stopped: subcommand failed.

I have installed whatever mentioned in the requirement place.
Can any one help of how to solve

If you provide a Dockerfile for executing , installations helpful to me alot.
Thanks in advance.

Regards,
SandhyaLaxmi Kanna

Errors regarding compilation of FusedLeakyRelu cuda kernels

Hi I get the following errors while trying to use the Fused activation.Do any one have an idea why?

Traceback (most recent call last):
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 960, in _build_extension_module
check=True)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 17, in
from model import StyledGenerator, Discriminator, TextureSpaceDiscriminator
File "/is/cluster/work/pghosh/gif1.0/model.py", line 19, in
from my_utils.stylegan2_model import StyledConv
File "/is/cluster/work/pghosh/gif1.0/my_utils/stylegan2_model.py", line 11, in
from my_utils.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/is/cluster/work/pghosh/gif1.0/my_utils/op/init.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_act.py", line 14, in
os.path.join(module_path, 'fused_bias_act_kernel.cu'),
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 658, in load
is_python_module)
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 827, in jit_compile
with_cuda=with_cuda)
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 880, in write_ninja_file_and_build
build_extension_module(name, build_directory, verbose)
File "/is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 973, in build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'fused': [1/2] /is/software/nvidia/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/TH -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/THC -isystem /is/software/nvidia/cuda-10.0/include -isystem /is/ps2/pghosh/.virtualenvs/gif/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
FAILED: fused_bias_act_kernel.cuda.o
/is/software/nvidia/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/TH -isystem /is/ps2/pghosh/.virtualenvs/gif/lib/python3.6/site-packages/torch/include/THC -isystem /is/software/nvidia/cuda-10.0/include -isystem /is/ps2/pghosh/.virtualenvs/gif/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o
/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: a pointer to a bound function may only be used to call the function

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: type name is not allowed

/is/cluster/work/pghosh/gif1.0/my_utils/op/fused_bias_act_kernel.cu(79): error: expected an expression

36 errors detected in the compilation of "/tmp/tmpxft_00004c5b_00000000-6_fused_bias_act_kernel.cpp1.ii".
ninja: build stopped: subcommand failed.

Ask for Software environment

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
RuntimeError: Error building extension 'fused': [1/3] /usr/local/cuda/bin/nvcc
ninja: build stopped: subcommand failed.

I spent a day, but I can't train it.

Train.py hanging when running on a single GPU

I am having issues to use your reimplementation to train an agent on my data. When I run the code on my desktop, I get the error CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.76 GiB total capacity; 7.05 GiB already allocated; 55.69 MiB free; 166.59 MiB cached) 0%| | 0/800000 [00:00<?, ?it/s]

I also have access to a GPU cluster and I tried to run the script there using CUDA_VISIBLE_DEVICES=7 python train.py --batch 4 ./Maps_512/. Here, I don't get any output after launching the command and from nvidia-smi it looks like the GPU is never used. Do you have suggestions on why is that?

Error on running train.py

Hello,

I've been trying to get the training script to run but have been running into some issues. The current issue I'm facing seems to be somewhat similar to another post by cyrilzakka from this week. when running !python -m torch.distributed.launch --nproc_per_node=1 --master_port=1234 /content/stylegan2-pytorch/train.py --batch 5 /content/, however his fix of changing the size of the batch doesn't seem to solve my issue. The bulk of the error message I receive is very similar to their, noting similar lines and faults, however also stating that there is a missing file for the 'op/fused_bias_act.cpp', even though the repo has been cloned. The error is:

Traceback (most recent call last):
File "/content/stylegan2-pytorch/train.py", line 21, in
from model import Generator, Discriminator
File "/content/stylegan2-pytorch/model.py", line 11, in
from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d
File "/content/stylegan2-pytorch/op/init.py", line 1, in
from .fused_act import FusedLeakyReLU, fused_leaky_relu
File "/content/stylegan2-pytorch/op/fused_act.py", line 6, in
fused = load('fused', sources=['op/fused_bias_act.cpp', 'op/fused_bias_act_kernel.cu'])
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 809, in _jit_compile
with_cuda=with_cuda
File "/usr/local/lib/python3.6/dist-packages/torch/utils/_cpp_extension_versioner.py", line 44, in bump_version_if_changed
hash_value = hash_source_files(hash_value, source_files)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/_cpp_extension_versioner.py", line 16, in hash_source_files
with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: 'op/fused_bias_act.cpp'
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 253, in
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 249, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', '/content/stylegan2-pytorch/train.py', '--local_rank=0', '--batch', '5', '/content/']' returned non-zero exit status 1.

PILLOW_VERSION doesn't exist anymore

From pillow version 7.0.0 PILLOW_VERSION doesn't exist anymore but is replaced by version.

# VERSION was removed in Pillow 6.0.0.
# PILLOW_VERSION was removed in Pillow 7.0.0.
# Use __version__ instead.

Transfer learning

Did you try to use transfer learning for generator training, ie using FFHQ generator weights as initial state for LSUN datasets training?

Why use nn.Distributed.DataParallel vs. nn.DataParallel

Hello! I wanted to ask 2 questions.

  1. Why did choose to use Distributed DataParallel instead of normal DataParallel?

  2. I tried to modify your code with normal DataParallel, but found that it errors out on the autograd.grad() call in the path regularization loss. Particularly, it indicates that there are unused inputs and asks that the "allow_unsused" flag be set to True. However, when that flag is set to true, autograd.grad() returns "None" as an output, and we therefore cannot perform the necessary calculations. This does not occur when running in the Distributed DataParallel configuration, but I don't quite understand why.

Thank you again for your great work, and any help would be appreciated!

Small bug in projector.py causes --w_plus option to be ignored

I just noticed that in the current version of projector.py the --w_plus option that would allow to use the [18,512] latent vectors instead of the [1,512] gets ignored.

The current code is:

if args.w_plus:
    latent_in.unsqueeze(1).repeat(1, g_ema.n_latent, 1) 

But it should probably be:

if args.w_plus:
    latent_in = latent_in.unsqueeze(1).repeat(1, g_ema.n_latent, 1) 

GPU memory useage

Thanks for your great work about reproducing the StyleGAN and its v2.
I am running the styleganv2 code on my 2 GPU (2080ti) workstation. I found if I set the image resolution to 256x256, the maximum batch size I could achieve is 2. Yet, in the official implementation, they list that config-f on 256 resolution only takes 6.4GB GPU memory. I am wondering to know the difference between them, since it seems that I cannot implement 1024 resolution on my workstation (or even larger GPU station). BTW, have you tried how many GPUs could deal with 1024 resolution?
Thanks again!

Error reading images when running train.py on custom dataset

When running !python train.py --batch 32 /content/data/preprocessed/ on a custom dataset that was run through !python prepare_data.py --out /content/data/preprocessed/ --n_worker 2 --size 32 /content/ first, an error is returned regarding Image being unable to read the image from buffer:

0% 0/800000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train.py", line 434, in <module>
    train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, device)
  File "train.py", line 179, in train
    real_img = next(loader)
  File "train.py", line 58, in sample_data
    for batch in loader:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 346, in __next__
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/stylegan2-pytorch/dataset.py", line 37, in __getitem__
    img = Image.open(buffer)
  File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2818, in open
    raise IOError("cannot identify image file %r" % (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x7f2714825af0>

I thought this would be a buffer issue but still unable to isolate the problem. I'll report back as soon as I do

CUDNN_STATUS_NOT_SUPPORTED

Hi,

When I try to run the code on multiple GPUs (or even a single GPU), I get the following error:

  File "./stylegan2-pytorch/model.py", line 256, in forward
    out = F.conv_transpose2d(input, weight, padding=0, stride=2, groups=batch)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

Any idea why this might be happening and how to resolve it?

My conda env uses python 3.7, cudatoolkit 10.1, cuda drivers 10.1 and pytorch 1.3. Thanks a lot!

Taking up memory on the primary GPU loading from checkpoint

Hi,
I'm wondering what is taking up memory on the main GPU when I am resuming training from a checkpoint. As a result i cannot train with a larger patch size, which led to memory limitation.
Screenshot from 2020-01-27 22-50-26

I think this is caused by the distributed training system and wonder if there is any way to either avoid the memory cost or distribute it evenly to other GPUs?
Many thanks!

Training from checkpoint fails when using multiple GPUs

Hi,

Thanks for implementing the paper in pytorch. I am having problems training a model starting from a specific checkpoint when using multiple GPUs.

When using a single GPU I can run train.py using the following command:
python -m torch.distributed.launch --nproc_per_node=1 train.py --batch 16 --iter 150000 --ckpt checkpoint/start_ckpt.pt dataset
The first sample images look like they are from when I stopped training the network previously.

When I use 4 GPUs I run into a Cuda out of memory issue, using the command:
python -m torch.distributed.launch --nproc_per_node=4 train.py --batch 16 --iter 150000 --ckpt checkpoint/start_ckpt.pt dataset

I then get the following CUDA: Out of memory error

load model: checkpoint/start_ckpt.pt load model: checkpoint/start_ckpt.pt load model: checkpoint/start_ckpt.pt Traceback (most recent call last): File "trainoad model: checkpoint/start_ckpt.pt train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, device) File "train.py", line 189, in train real_pred = discriminator(real_img) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward output = self.module(*inputs[0], **kwargs[0]) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/jameswhennessey/stylegan2-pytorch/model.py", line 647, in forward out = self.convs(input) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/jameswhennessey/stylegan2-pytorch/model.py", line 598, in forward out = self.conv2(out) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 82, in forward return fused_leaky_relu(input, self.bias, self.negative_slope, self.scale) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 86, in fused_leaky_relu return FusedLeakyReLUFunction.apply(input, bias, negative_slope, scale) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 55, in forward out = fused.fused_bias_act(input, bias, empty, 3, 0, negative_slope, scale) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 86, in fused_leaky_relu return FusedLeakyReLUFunction.apply(input, bias, negative_slope, scale) File "/home/jameswhennessey/stylegan2-pytorch/op/fused_act.py", line 55, in forward out = fused.fused_bias_act(input, bias, empty, 3, 0, negative_slope, scale) RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.75 GiB total capacity; 8.03 GiB already allocated; 101.38 MiB free; 565.07 MiB cached) (malloc at /opt/conda/conda-bld/pytorch_1573049306803/work/c10/cuda/CUDACachingAllocator.cpp:267) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f397a01f687 in /opt/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so) ......

Do you have any suggestions on how to resolve this issue?

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.