Giter Site home page Giter Site logo

pi-gan's People

Contributors

marcoamonteiro avatar matthew-a-chan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pi-gan's Issues

Puzzled by Head Position

Hi, @ericryanchan,

I am really curious about how you solve the head position problem. I see that real images are not paired with ground-truth head positions, thus, the network learns the head position in an unsupervised way.

After checking your code, I find that in every turn, you sample the head position, and let the discriminator to predict the head position of the rendered faces. The output of discriminator is corrected by the sampled head position in a self-supervised way.

I am really puzzled by this mechanism, and fail to figure out why this works. Can you help me?

TypeError: can't pickle weakref objects

I fixed it by adding .state_dict():
torch.save(ema.state_dict(), os.path.join(opt.output_dir, now + 'ema.pth'))
torch.save(ema2.state_dict(), os.path.join(opt.output_dir, now + 'ema2.pth'))

Dataset cannot be found

Hi,

I was tried to run the code and found that dataset is not specified. I tried to download the dataset and it was not worked at all. I think that the link is broken !

Is there any solution to get rid of it and run the code properly ?

kind regards,
Atikul Islam Sajib

train.py only works under distributed settings

marco, thanks for the work

i had to fix a few things to get this to run on a single GPU setting - if you care, i added a new file train_local.py that makes single gpu training functional on a fork https://github.com/xvdp/pi-GAN, I did not combine the two to avoid excess if statements. you could want to refactor everything to call the local and distributed functions from single code but I dont know if it is worth it.

I noticed double definitions, so i linted the whole project; I saw some missing arguments in some functions and so on.
I also noticed that you have your local dataset_path for cats and carla reversed.
If you want i can pull request.

xvdp

Inverse rendering script

Hi, great work!
I wonder if the inverse rendering(GAN Inversion) script is currently available, and how to use it?

RuntimeError: [/opt/conda/conda-bld/pytorch_1603729006826/work/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:84] Timed out waiting 1800000ms for recv operation to complete

Hi,
Thanks for sharing your work!
I got this problem when training CelebA with pi-GAN. And I don't know how to solve it. It was runned in one GPU V100-32GB with pytorch 1.7.0 and cuda 10.1.

Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/mnt/lustre/gaosicheng/anaconda3/envs/pigan/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/mnt/lustre/gaosicheng/codes/pi-GAN-master/train.py", line 249, in train
scaler.scale(d_loss).backward()
File "/mnt/lustre/gaosicheng/anaconda3/envs/pigan/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/mnt/lustre/gaosicheng/anaconda3/envs/pigan/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in
allow_unreachable=True) # allow_unreachable flag
RuntimeError: [/opt/conda/conda-bld/pytorch_1603729006826/work/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:84] Timed out waiting 1800000ms for recv operation to complete

Inverse rendering script assumes strictly oriented input image

I've been trying to get the inverse rendering script to work and keep getting really janky results. From what I can tell, it seems that the script assumes the input image is perfectly oriented with respect to the default position, orientation and camera distance of the learned "canonical head". Consequently, I'm experiencing result quality that is heavily influenced by how well I can get the input image to align with the canonical head, which makes itself evident in clearly visible artifacts.

Can the script be modified to learn the orientation and distance of the camera for the input image? Or if I'm just missing something or doing something wrong, could you maybe add a usage example demonstrating how to apply the inverse rendering script to a natural (but otherwise reasonably well framed) portrait image?

Multi-GPU training error while training with the `CARLA` curriculum

Hi,

First of all, thanks for sharing this great work. I am currently trying to train a model on images of full-bodies of humans and is using a curriculum based on the CARLA curriculum. While training on multiple GPUS, I encountered this error:

RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module pa
rameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward p
asses. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameter
s in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model,
it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a var
iable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if
your module graph does not change over iterations.
Parameter at index 52 with name fromRGB.5.model.0.weight has been marked as ready twice. This means that multiple autograd engine  hoo
ks have fired for this particular parameter during this iteration.

I noticed that fromRGB.5.model.0.weight is a variable in the ProgressiveEncoderDiscriminator. Since I have been able to reproduce results with the Celeba curriculum, and that the CARLA curriculum uses a different Discriminator class, I think the problem can be narrowed down to the ProgressiveEncoderDiscriminator class. But currently I do not have other clues. Could you please have a look at this?

Many thanks and best!

Error when Rendering Videos

I use the command python render_video.py ckpt/CelebA/generator.pth --curriculum CelebA --seeds 0 1 2 3 in readme, it seems the pipe is broken?

I tried to re-install scikit-video, it doesn't work.

image

Is there anything I am doing wrong ? Please help me out.

Why not freezing discriminator while training generator

Hi,

Many thanks for the great work and releasing the code.
If I'm not mistaken, you are not freezing the discriminator while training the generator:

pi-GAN/train.py

Line 264 in 0800af7

g_preds, g_pred_latent, g_pred_position = discriminator_ddp(gen_imgs, alpha, **metadata)

I'm wondering if there is a reason for this?

Many thanks.

Can't pickle weakref objects

I am going to build my project and data is fetched from my database with specific Project_id. and then train my model using LSTM. Epochs are clearly running but after that, It shows an Internal Server Error


admin.py

 def build(self, request, queryset):
        count = 0

        for p in queryset:
            if build_id(p.project_management.id):
                count += 1
            else:
                messages.warning(request, f"Could not build model for {p}")

        messages.success(
            request, f"Successfully built models for {count} projects")

    build.short_description = "Build models for selected Projects"

bild.py
here the model is built via a specific Project_id. Model store only model.pkl data but not completed. And other files scalar_in and scalar_out do not save in a specific folder.

def build_id(project_id):
    # get directory path to store models in
    path = fetch_model_path(project_id, True)

    # train model
    model, scaler_in, scaler_out = train_project_models(project_id)

    # ensure model was trained
    if model is None:
        return False

    # store models
    store_model(f'{path}/model.pkl', model)
    store_model(f'{path}/scaler_in.pkl', scaler_in)
    store_model(f'{path}/scaler_out.pkl', scaler_out)

    # clear current loaded model from memory
    keras_clear()

    return True

utils.py

    with open(path, 'wb') as f:
        model_file = File(f)
        pickle.dump(model, model_file)

when I Comment on the pickle.dump(model,model_file) then model.pkl, scalar_in.pkl, and scalar_out.pkl save files with 0 kb data. If pkl files exist already with data then it removes and builds the project successfully. I debug this code and the Django debuger_tool shows that the page is temporarily moved.


output

Epoch 1/4
11/11 [==============================] - 9s 302ms/step - loss: 0.4594 - val_loss: 0.2777
Epoch 2/4
11/11 [==============================] - 2s 177ms/step - loss: 0.1039 - val_loss: 0.0395
Epoch 3/4
11/11 [==============================] - 2s 170ms/step - loss: 0.0545 - val_loss: 0.0361
Epoch 4/4
11/11 [==============================] - 2s 169ms/step - loss: 0.0414 - val_loss: 0.0551
Internal Server Error: /turboai/turboAI/jaaiparameters/
Traceback (most recent call last):
  File "E:\.Space\project\venv\lib\site-packages\django\core\handlers\exception.py", line 47, in inner
    response = get_response(request)
  File "E:\.Space\project\venv\lib\site-packages\django\core\handlers\base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "E:\.Space\project\venv\lib\site-packages\django\contrib\admin\options.py", line 616, in wrapper
    return self.admin_site.admin_view(view)(*args, **kwargs)
  File "E:\.Space\project\venv\lib\site-packages\django\utils\decorators.py", line 130, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "E:\.Space\project\venv\lib\site-packages\django\views\decorators\cache.py", line 44, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "E:\.Space\project\venv\lib\site-packages\django\contrib\admin\sites.py", line 232, in inner
    return view(request, *args, **kwargs)
  File "E:\.Space\project\venv\lib\site-packages\django\utils\decorators.py", line 43, in _wrapper
    return bound_method(*args, **kwargs)
  File "E:\.Space\project\venv\lib\site-packages\django\utils\decorators.py", line 130, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "E:\.Space\project\venv\lib\site-packages\django\contrib\admin\options.py", line 1723, in changelist_view
    response = self.response_action(request, queryset=cl.get_queryset(request))
  File "E:\.Space\project\venv\lib\site-packages\django\contrib\admin\options.py", line 1408, in response_action
    response = func(self, request, queryset)
  File "E:\.Space\project\TurboAnchor\turboAI\admin.py", line 125, in build
    if build_id(p.project_management.id):
  File "E:\.Space\project\TurboAnchor\turboAI\build.py", line 48, in build_id
    store_model(f'{path}/model.pkl', model)
  File "E:\.Space\project\TurboAnchor\turboAI\utils.py", line 154, in store_model
    pickle.dump(model, model_file)
TypeError: can't pickle weakref objects
[29/Oct/2021 17:50:31] "POST /turboai/turboAI/jaaiparameters/ HTTP/1.1" 500 126722

Errors during training

Hi,

Thank you for sharing the code for this interesting work.
I setup a virtualenv and installed the dependencies using "pip install -r requirements.txt". Then I tried to run the training script as follows: CUDA_VISIBLE_DEVICES=0 python train.py --curriculum CARLA --output_dir carla_output. But I am getting the following errors, when I tried with python 3.6, 3.7 and 3.8. I am using cuda 10.2. What version of python should we be using?

  1. With python 3.6, installation of requirements fails because there is no matching gdown version 3.12.2

  2. With python 3.7, installation of requirements succeeds, but at runtime there are some errors due to torchvision

  3. With python 3.8, installation of requirements succeeds, but I get the following error at runtime:

    -- Process 0 terminated with the following error:
    Traceback (most recent call last):
    File "/venv_pi_gan/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
    File "pi-GAN/train.py", line 181, in train
    torch.save(ema, os.path.join(opt.output_dir, now + 'ema.pth'))
    File "/venv_pi_gan/lib/python3.8/site-packages/torch/serialization.py", line 379, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
    File "/venv_pi_gan/lib/python3.8/site-packages/torch/serialization.py", line 484, in _save
    pickler.dump(obj)
    TypeError: cannot pickle 'weakref' object

Thanks

License?

Apologies if this is mentioned elsewhere, but I was wondering if you had released this code under a particular license.

Inverse code can't render fine detail as in demo

Hi,

I just run the inverse_render.py with Biden's portrait as shown in your demo, but I fail to get the same reconstruct video like yours as shown below:

image

image

Is there anything I can do to refine the result? If anyone knows how to solve this problem, please leave your comment! Thanks.

inverse_render.py CUDAout of memory

i ​use a k80 gpu when i run
!python /content/pi-GAN/inverse_render.py /content/CelebA/generator.pth /content/face.png

/```
usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py:258: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
"Argument interpolation should be of type InterpolationMode instead of int. "
Traceback (most recent call last):
File "/content/pi-GAN/inverse_render.py", line 108, in
frame, _ = generator.forward_with_frequencies(w_frequencies + noise_w_frequencies + w_frequency_offsets, w_phase_shifts + noise_w_phase_shifts + w_phase_shift_offsets, **options)
File "/content/pi-GAN/generators/generators.py", line 306, in forward_with_frequencies
coarse_output = self.siren.forward_with_frequencies_phase_shifts(transformed_points, frequencies, phase_shifts, ray_directions=transformed_ray_directions_expanded).reshape(batch_size, img_size * img_size, num_steps, 4)
File "/content/pi-GAN/siren/siren.py", line 212, in forward_with_frequencies_phase_shifts
rbg = self.color_layer_sine(torch.cat([ray_directions, x], dim=-1), frequencies[..., -self.hidden_dim:], phase_shifts[..., -self.hidden_dim:])
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/pi-GAN/siren/siren.py", line 94, in forward
return torch.sin(freq * x + phase_shift)
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 11.17 GiB total capacity; 10.36 GiB already allocated; 377.81 MiB free; 10.37 GiB reserved in total by PyTorch)

[No positional encoding for position and direction]

Dear marco,
Thanks for open source the code! I am curious about why positional encoding is not involved in pi-GAN's implementation.
As in https://github.com/marcoamonteiro/pi-GAN/blob/master/generators/generators.py#L49, could you please be so kind to tell me the reason of removing positional encoding to transformed_points (xyz) and ray_directions (d)?
Thank you for your time!

To confirm the curriculum for reproduction

Hi @marcoamonteiro and @ericryanchan, congrats on this great work, and thanks a lot for open-sourcing the code!

I just checked the curriculum for training on CelebA and CATS, it seems like the curriculum only provides hyperparameters for training on the resolution of 64x64.

pi-GAN/curriculums.py

Lines 86 to 87 in 0800af7

0: {'batch_size': 28 * 2, 'num_steps': 12, 'img_size': 64, 'batch_split': 2, 'gen_lr': 6e-5, 'disc_lr': 2e-4},
int(200e3): {},

However, Sec. C in the paper states the following:

π-GAN trained for 10 hours at 32×32, 10 hours at 64×64, and 36 hours at 128×128

I hypothesize the curriculum for CARLA is what has been stated in the paper and we can use that to reproduce the results. Is this correct?

Meanwhile, I guess the gen_lr in the following might be 5e-5 instead of 4e-5 (following Sec. 3.4 in the paper)?

0: {'batch_size': 30, 'num_steps': 48, 'img_size': 32, 'batch_split': 1, 'gen_lr': 4e-5, 'disc_lr': 4e-4},

Thanks a lot in advance.

generator identity penalty loss

Hi,

Thank you very much for the well-organized repository and inspiring work! I find the codebase very clean and convenient since it has all the necessary visualization code ready.

I'm confused about one thing though: In line 278 of train.py, there is an identity penalty loss for the generator. But as far as I understand, the gradient for identity penalty does not flow through the generator because the sampling of z and pitch yaw values is not learnable. Is that right?

Cheers,
Yufeng

Unbalanced GPU memory usage

Thanks for the great work!
I noticed that the GPU load is unbalanced. There are 7 additional processes on GPU0, each requires roughly 500+ Mb of GPU memory. These additional processes are triggered by self._distributed_broadcast_coalesced() in torch.DistributedDataParallel() when instantiating a DDP model.
Do you have any idea about balancing the memory requirement on each GPU?
Thank you.

Cuda OOM error during training

Hi,

I tried to train the code on the CARLA dataset. But I am getting Cuda out of memory error. These are the things I have tried so far:

  1. I have tried running on a single as well multiple 2080 Ti GpuS (specified using CUDA_VISIBLE_DEVICES), each with 11GB memory, but it still generates OOM error.
  2. I tried on 3090 GPU, but the code generates errors on 3090 GPU (that are not related to Cuda OOM error)
  3. I have also tried to reduce the batch size for the Carla dataset in curriculum.py from 30 to 10 as shown below. But I still get the OOM error when I run on a single or multiple 2080Ti GPUs.

CARLA = {
0: {'batch_size': 10, 'num_steps': 48, 'img_size': 32, 'batch_split': 1, 'gen_lr': 4e-5, 'disc_lr': 4e-4},
int(10e3): {'batch_size': 14, 'num_steps': 48, 'img_size': 64, 'batch_split': 2, 'gen_lr': 2e-5, 'disc_lr': 2e-4},
int(55e3): {'batch_size': 10, 'num_steps': 48, 'img_size': 128, 'batch_split': 5, 'gen_lr': 10e-6, 'disc_lr': 10e-5},
int(200e3): {},

Is there anything else I can do to fix the OOM error?

thanks

Traing strategy on celeba

Hi, thanks for your great work! I am wondering that the training stategy on celeba showed in curriculum.py seems to be not consistent with what you stated in paper, i.e. paper said progressive training strategy was used, while celeba is only trained on 64 x 64 resolution in code. Could you update the cirrculum.py for reproducing result more conveniently?

About the lr for G and D

Dear friends:
I am wondering that why the lr for G and D are 5e-5 and 4e-4, respectively. Do you have any idea?
I came up with this queation because I saw that in GRAF and GIRAFFE, the lr for G and D are 5e-4 and 1e-4, respectively.
Does it mean that in pi-GAN the G is "more powerful" than the D?
Looking forward to hear from you, thanks!

How to decide fov and ray_start and ray_end for customer datasets

Hi, thanks for the interesting work. When I want to apply this approach to a new dataset, I wonder how to decide the fov, ray_start, and ray_end. Seems you are directly using the ndc space when sampling rays, thus I am not sure if I shouldn't use the ground truth fov and the radius. Could you please give some ideas on how you select these hyper-parameters? Thanks!

Not able to pickle weakref objects.

Hi.

I'm using the celeb_a dataset from Kaggle because I want to test the pi-GAN and try it with a custom dataset should I get it working. After changing the curriculum's dataset_path, I ran this command:

CUDA_VISIBLE_DEVICES=0 python3 pi-GAN/train.py --curriculum CelebA --output_dir celebAOutputDir

Unfortunately, I got hit with this error:

Namespace(curriculum='CelebA', eval_freq=5000, load_dir='', model_save_interval=5000, n_epochs=3000, output_dir='celebAOutputDir', port='12355', sample_interval=200, set_step=None)
Total progress:   0% 0/3000 [00:00<?, ?it/s]
0it [00:00, ?it/s]/usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py:281: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
  "Argument interpolation should be of type InterpolationMode instead of int. "

  0% 0/200000 [00:00<?, ?it/s]
Total progress:   0% 1/3000 [00:07<5:56:04,  7.12s/it]
Progress to next stage:   0% 0/200000 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "pi-GAN/train.py", line 400, in <module>
    mp.spawn(train, args=(num_gpus, opt), nprocs=num_gpus, join=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/content/pi-GAN/train.py", line 181, in train
    torch.save(ema, os.path.join(opt.output_dir, now + 'ema.pth'))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 379, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 484, in _save
    pickler.dump(obj)
TypeError: can't pickle weakref objects

Can you share pretrained model for discriminator?

Hey, thank you for your splendid work!
After some tests and diving into your example codes, I am shocked by your model's performance on generating consistent and truthful faces on different views quickly. A Neural Radiance Field representing human faces of diverse situations is reached. Thus, I want to write an encoder to project real faces into w space quickly and faithfully.
However, during training, I found that the discriminator in pi-GAN may help the encoder perform better since it not only outputs whether faces are real or fake, which can be used as an additional adversarial loss term, but also outputs pitch and yaw, which should be used for parameters when rendering. Tilted faces from real dataset seem fail to be encoded well into w space, given h_mean and v_mean are set to pi / 2.
Of course, I could train a discriminator of pi-GAN by myself based on pretrained generators and I am currently doing it, but because of the potential instability of initialization and the cost of time, I will really appreciate it if you could share the pretrained model for the discriminator.
Thank you!

3D shape extraction

Hi,
According to the Readme, the command for shape extraction is:

python3 shape_extraction.py path/to/generator.pth --curriculum CelebA --seed 0

However, there is no shape_extraction.py file. Instead, please correct this as follows:

python extract_shapes.py --seeds 0 --output_dir path/to/output path/to/generator.pth

Also, the generator extracts the shape as a voxel in .mrc format. Could you please provide some guidance on how we can extract the 3D shape as a mesh in .obj format?

thanks

generator.losses can not visual using tensorboard

Hi,
I try to use 'tensorboard --logdir=path'to visualize the loss curves,path is the result folder include generator.losses and discriminator.losses and other pth file,but tensorboard shows'No dashboards are active for the current data set.' ,Do you have idea about it ?how can i visual loss curve?
thanks!

Use the same z for training G and D in each iteration.

In other GAN papers, we usually train D for k steps and train G for only one step in each GAN training iteration In this case, the z used for training G and D are obviously different. But in this paper, we train G and D simultaneously, i.e., training G and D for both one step in a training iteration. I'm wondering if I could use the same z to train both two networks in each iteration to reduce the computational cost? Just as shown in Pytorch's official GAN tutorial (https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html#training):

# Training Loop

# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0

print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
    # For each batch in the dataloader
    for i, data in enumerate(dataloader, 0):

        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        ## Train with all-real batch
        netD.zero_grad()
        # Format batch
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        # Forward pass real batch through D
        output = netD(real_cpu).view(-1)
        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output.mean().item()

        ## Train with all-fake batch
        # Generate batch of latent vectors
        noise = torch.randn(b_size, nz, 1, 1, device=device)
        # Generate fake image batch with G
        fake = netG(noise)
        label.fill_(fake_label)
        # Classify all fake batch with D
        output = netD(fake.detach()).view(-1)
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch, accumulated (summed) with previous gradients
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        # Compute error of D as sum over the fake and the real batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()
        label.fill_(real_label)  # fake labels are real for generator cost
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output = netD(fake).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output.mean().item()
        # Update G
        optimizerG.step()

        # Output training stats
        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

        # Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())

        # Check how the generator is doing by saving G's output on fixed_noise
        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
            with torch.no_grad():
                fake = netG(fixed_noise).detach().cpu()
            img_list.append(vutils.make_grid(fake, padding=2, normalize=True))

        iters += 1

Such that we only need to call G.forward once in each iteration. Will it affect the model performance?

generated images colours

Hello, thank you for sharing the code. I have tried training code with CelebA Dataset and went 20 epochs (28 hours using A 6000), It produces bluish images and it does not seem because of channels disorder, will it improve and look better in longer training or it is because of something else (nerf etc.)?

71500_fixed_ema
71500_tilted_ema
Thank you.

3d mesh reconstruction quality

Hi,

On the project website I found a pretty good 3d shape rendered from the CARLA dataset, so I am trying to extract similar mesh using the file provided in this repository. However, I noticed that no matter how I change the parameters, I am not able to generate a 3d mesh with similar quality using the checkpoint provided by the authors. Here is an example using a slightly larger cube length and 512 resolution:
Screen Shot 2022-04-29 at 1 12 46 PM
The car itself seems fine, but I am not sure why the big mass below the car is generated. I believe this is partially due to the fact that CARLA dataset does not include images view from the bottom or lower angle? Can you provide any hint on how to generate the fine 3D mesh like the one on the project website?

Thanks

Loss curve visualization

Hello, thanks for sharing this interesting work!
At lines 360 and 361 in train.py, there exist

                torch.save(generator_losses, os.path.join(opt.output_dir, 'generator.losses'))
                torch.save(discriminator_losses, os.path.join(opt.output_dir, 'discriminator.losses'))

Could you please tell me how to load the saved generator.losses file and discriminator.losses file?
I have tried torch.load, pickle.load, but none of them work.

Query about Training Time

Hi,

Thanks for the code, I was trying to run the training over CelebA dataset with 4 RTX 6000 GPUs, and see the below log --

[Experiment: celebAOutputDir] [GPU: 0,1,2,3] [Epoch: 0/3000] [D loss: 341.0094299316406] [G loss: 342.2994079589844] [Step: 10] [Alpha: 0.00] [Img Size: 64] [Batch Size: 54] [TopK: 9] [Scale: 256.0]        
Total progress:   0%|                                            | 1/3000 [02:28<123:51:11, 148.67s/it]
Progress to next stage:   0%|                                  | 14/200000 [03:01<620:16:24, 11.17s/it]

I notice that the training time very high, could you confirm if the training took similar time at your end?

Appreciate your help!

-- Yash

TypeError: __init__() missing 1 required positional argument: 'dataset_path'

when i train the script, this problem occur when running the fid_evaluation.setup_evaluation (Progress to next stage: 2%|▊ | 5000/200000 )

Traceback (most recent call last):
  File "/home/ubuntu541/anaconda3/envs/pigan/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/ubuntu541/yhj_lsp/nerf/pi-GAN-master(1)/pi-GAN-master/train.py", line 362, in train
    fid_evaluation.setup_evaluation(metadata['dataset'], generated_dir, target_size=128)
  File "/home/ubuntu541/yhj_lsp/nerf/pi-GAN-master(1)/pi-GAN-master/fid_evaluation.py", line 37, in setup_evaluation
    dataloader, CHANNELS = datasets.get_dataset(dataset_name, img_size=target_size)
  File "/home/ubuntu541/yhj_lsp/nerf/pi-GAN-master(1)/pi-GAN-master/datasets.py", line 79, in get_dataset
    dataset = globals()[name](**kwargs)
TypeError: __init__() missing 1 required positional argument: 'dataset_path'

Does the pi-GAN discriminator receive the camera pose during training?

This is a question about the method/paper, not so much the implementation.

During training, do you provide the corresponding camera pose (denoted as ξ in the paper) to the discriminator? It appears the answer is no. If this is the case, why doesn't the generator just ignore the camera pose altogether, and just learn to generate images from a random angle each time? In my mind, the discriminator wouldn't be able to tell? Perhaps you train on multiple samples with the same z per batch, enforcing that different ξ give reasonable results for the same z?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.