Giter Site home page Giter Site logo

nvidia / pix2pixhd Goto Github PK

View Code? Open in Web Editor NEW
6.5K 169.0 1.4K 55.68 MB

Synthesizing and manipulating 2048x1024 images with conditional GANs

Home Page: https://tcwang0509.github.io/pix2pixHD/

License: Other

Python 96.95% Shell 3.05%
gan deep-learning deep-neural-networks pytorch pix2pix image-to-image-translation generative-adversarial-network computer-vision computer-graphics

pix2pixhd's Introduction





pix2pixHD

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translation. It can be used for turning semantic label maps into photo-realistic images or synthesizing portraits from face label maps.

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Ting-Chun Wang1, Ming-Yu Liu1, Jun-Yan Zhu2, Andrew Tao1, Jan Kautz1, Bryan Catanzaro1
1NVIDIA Corporation, 2UC Berkeley
In CVPR 2018.

Image-to-image translation at 2k/1k resolution

  • Our label-to-streetview results

- Interactive editing results

- Additional streetview results

  • Label-to-face and interactive editing results

  • Our editing interface

Prerequisites

  • Linux or macOS
  • Python 2 or 3
  • NVIDIA GPU (11G memory or larger) + CUDA cuDNN

Getting Started

Installation

pip install dominate
  • Clone this repo:
git clone https://github.com/NVIDIA/pix2pixHD
cd pix2pixHD

Testing

  • A few example Cityscapes test images are included in the datasets folder.
  • Please download the pre-trained Cityscapes model from here (google drive link), and put it under ./checkpoints/label2city_1024p/
  • Test the model (bash ./scripts/test_1024p.sh):
#!./scripts/test_1024p.sh
python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop none

The test results will be saved to a html file here: ./results/label2city_1024p/test_latest/index.html.

More example scripts can be found in the scripts directory.

Dataset

  • We use the Cityscapes dataset. To train a model on the full dataset, please download it from the official website (registration required). After downloading, please put it under the datasets folder in the same way the example images are provided.

Training

  • Train a model at 1024 x 512 resolution (bash ./scripts/train_512p.sh):
#!./scripts/train_512p.sh
python train.py --name label2city_512p
  • To view training results, please checkout intermediate results in ./checkpoints/label2city_512p/web/index.html. If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/label2city_512p/logs by adding --tf_log to the training scripts.

Multi-GPU training

  • Train a model using multiple GPUs (bash ./scripts/train_512p_multigpu.sh):
#!./scripts/train_512p_multigpu.sh
python train.py --name label2city_512p --batchSize 8 --gpu_ids 0,1,2,3,4,5,6,7

Note: this is not tested and we trained our model using single GPU only. Please use at your own discretion.

Training with Automatic Mixed Precision (AMP) for faster speed

  • To train with mixed precision support, please first install apex from: https://github.com/NVIDIA/apex
  • You can then train the model by adding --fp16. For example,
#!./scripts/train_512p_fp16.sh
python -m torch.distributed.launch train.py --name label2city_512p --fp16

In our test case, it trains about 80% faster with AMP on a Volta machine.

Training at full resolution

  • To train the images at full resolution (2048 x 1024) requires a GPU with 24G memory (bash ./scripts/train_1024p_24G.sh), or 16G memory if using mixed precision (AMP).
  • If only GPUs with 12G memory are available, please use the 12G script (bash ./scripts/train_1024p_12G.sh), which will crop the images during training. Performance is not guaranteed using this script.

Training with your own dataset

  • If you want to train with your own dataset, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please also specity --label_nc N during both training and testing.
  • If your input is not a label map, please just specify --label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A, train_B instead of train_label, train_img, where the goal is to translate images from A to B.
  • If you don't have instance maps or don't want to use them, please specify --no_instance.
  • The default setting for preprocessing is scale_width, which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the --resize_or_crop option. For example, scale_width_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize). crop skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none, which will do nothing other than making sure the image is divisible by 32.

More Training/Test Details

  • Flags: see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags.
  • Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag --no_instance.

Citation

If you find this useful for your research, please use the following.

@inproceedings{wang2018pix2pixHD,
  title={High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs},
  author={Ting-Chun Wang and Ming-Yu Liu and Jun-Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro},  
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2018}
}

Acknowledgments

This code borrows heavily from pytorch-CycleGAN-and-pix2pix.

pix2pixhd's People

Contributors

borisfom avatar elfprince13 avatar javl avatar junyanz avatar mingyuliutw avatar tcwang0509 avatar ufoym avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pix2pixhd's Issues

netD checkpoint

hank you so much for providing the code!
Can you also provide the checkpoint for netD(label2city_1024p)?

pix2pixHD Windows 7 error

when i run train.py on windows 7 , i got below runtime error.
does anyone has any idea about how to adapt the python code that executable in windows
ps:sh file i run from busybox

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Continue train doesn't run from the same learning rate

Hey,

There is a problem while resuming training.
By passing --continue_train, though i start from the latest save model, it doesn't load the same learning rate ? any idea how that can be fixed ? Thanks in advance.

Training on 700 X 1100 results in error for train_512p.sh

Hey,

I am training a pix2pixHD model on images of size 700 X 1100.
However it results into concatenation error. Can it be trained for 700 X 1100 images ?
For the moment, I have adapted to train on sizes of 512 X 512 crops.
I am also wandering if the code supports validation on only square images (or fixed 1024 X 2048 size images) or does it also support any variable rectangular size images ?
Thanks in advance.

what does this Assertion error mean how to resolve it

Traceback (most recent call last):
File "train.py", line 69, in
loss_G.backward()
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/init.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/_functions.py", line 25, in backward
return comm.reduce_add_coalesced(grad_outputs, self.input_device)
File "/usr/local/lib/python3.5/dist-packages/torch/cuda/comm.py", line 122, in reduce_add_coalesced
result = reduce_add(flattened, destination)
File "/usr/local/lib/python3.5/dist-packages/torch/cuda/comm.py", line 92, in reduce_add
nccl.reduce(inputs, outputs, root=destination)
File "/usr/local/lib/python3.5/dist-packages/torch/cuda/nccl.py", line 161, in reduce
assert(root >= 0 and root < len(inputs))
AssertionError

I'm using 224x224 pixel images for training and my shell command is
python3 test.py --name label2mycityscapes_224p --dataroot ./datasets/city_scapes_224/ --gpu_ids 0,1,2 --loadSize 224 --fineSize 224 --label_nc 3 --no_instance

Why zero padding in the GlobalGenerator's conv2d layers?

After having trained a lot of models now with pix2pixHD and seeing the same type of erroneous artifacts emerging I am starting to believe that one potential source for those errors might be the default conv2d zero padding in the early layers of the GlobalGenerator. Looking at the code I wonder what the motivation was to use ReflectionPadding in the Resnet layers but not in the earlier ones? Did you make some comparisons between the different padding types?

After a visual analysis of the activations in the various layers it appears to me like the model relies quite a lot on the information provided by the zero padding. Actually, when I replace the padding in the GlobalGenerator with ReflectionPadding and manually copy the weights of a pre-trained model into the corresponding layers the model produces only visual garbage - I interpret this as the model using this outer edge information a vital part. When training the "ReflectionPadding everywhere" model from scratch it appears to me like I do not get those artifacts anymore.

About Super Resolution with pix2pixHD

Hi, I'm trying to do a Super Resolution job with 1024*1024 images.
And I'm supposed to transform some low-resolution images into high-resolution ones, but i do not know how to name my inputs, is there any rules of naming? And which path should I put my low-resolution images and high-resolution images?

How much RAM is recommended to run Test?

How much RAM is recommended to run Test??? Will testing run on a Nvidia TX1 (4GB RAM)???

python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop none

1080Ti is out of memory for testing 1024P pretrained model

pytorch_pix2pixHD |     result = self.forward(*input, **kwargs)
pytorch_pix2pixHD |   File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
pytorch_pix2pixHD |     self.padding, self.dilation, self.groups)
pytorch_pix2pixHD | RuntimeError: cuda runtime error (2) : out of memory at /tmp/pip-z3dlenmr-build/aten/src/THC/generic/THCStorage.cu:58

So could you offer 512p prtrained model for testing?

Issue about instance feature

I think there is a problem when you train with instance-wise feature and batch size > 1. For semantic classes like road, sky, building across different images in a batch, in models/network.py:Encoder they are mixed together.

Cuda out of memory even when trainning 512_p images?

Hi,I try to train 512_p images using my data.But when I try to run "python train.py --name label2city_512" command,it show cuda out of memory,my set :GPU is 12G,cuda 8.0,pytorch 0.3.I wonder is there anyone have this problem when using examples given in github to train?

multi-GPU issues around --no_vgg_loss

This code:

loss_G_GAN_Feat = 0
if not self.opt.no_ganFeat_loss:
feat_weights = 4.0 / (self.opt.n_layers_D + 1)
D_weights = 1.0 / self.opt.num_D
for i in range(self.opt.num_D):
for j in range(len(pred_fake[i])-1):
loss_G_GAN_Feat += D_weights * feat_weights * \
self.criterionFeat(pred_fake[i][j], pred_real[i][j].detach()) * self.opt.lambda_feat
# VGG feature matching loss
loss_G_VGG = 0
if not self.opt.no_vgg_loss:
loss_G_VGG = self.criterionVGG(fake_image, real_image) * self.opt.lambda_feat

Causes trouble when using --no_vgg_loss with multiple GPUs (and I suspect would also cause trouble for --no_ganFeat_loss), because the value 0 is not compatible with the scatter/gather APIs used by torch.nn.DataParallel. I suspect it needs to be Variable containing a 0-dimensional tensor, but I haven't quite figured out how to make it work.

Problem training on multiple GPUs

I'm successfully training the net on my data without specifying GPU ids.
But as soon as i do so i encounter this error:

Traceback (most recent call last): File "train.py", line 56, in <module> Variable(data['image']), Variable(data['feat']), infer=save_fake) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 68, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 78, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker output = module(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__ result = self.forward(*input, **kwargs) TypeError: forward() missing 4 required positional arguments: 'label', 'inst', 'image', and 'feat'

The whole code is running inside a docker environment which can access all available GPUs

>>> torch.cuda.device_count() 8

my run command looks like this:

python train.py --name liver_TEST --gpu_ids 0,1,2,3,4,5,6,7 --label_nc 3 --netG local --ngf 32 --num_D 3 --niter_fix_global 20 --fineSize 1024 --no_instance --dataroot ./datasets/liver/

Any tips what could be the issue here?
Thanks in advance and this is really awesome work!!

Run inference over custom pre-trained model

I have a model that was trained with no label maps (--label_nc 0) and no instance maps (--no_instance).

What's the best way to run inference over that model with a single image?

thanks!

ValueError: NestedIOFunction doesn't know how to process an input object of type torch.FloatTensor

I'm attempting to test and export a .onnx version of a trained model with the following command:

python test.py --name xxxx --dataroot ./datasets/xxxx --label_nc 0 --no_instance --which_epoch 30 --resize_or_crop scale_width --export_onnx ./xxxx-001.onnx

and I'm getting the following error:

Exporting to ONNX: ./xxxx-001.onnx Traceback (most recent call last): File "test.py", line 53, in <module> torch.onnx.export(model, [tuple(data['label']), tuple(data['inst'])], opt.export_onnx, verbose=True) File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/onnx/__init__.py", line 75, in export _export(model, args, f, export_params, verbose, training) File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/onnx/__init__.py", line 116, in _export trace, torch_out = torch.jit.trace(model, args) File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/jit/__init__.py", line 241, in trace return TracedModule(f, nderivs=nderivs)(*args, **kwargs) File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/jit/__init__.py", line 266, in forward in_vars, in_struct = _flatten((args, tuple(kw_items)), self.state_dict(keep_vars=True).values()) File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/jit/__init__.py", line 568, in _flatten obj_vars = tuple(itertools.chain(function._iter_variables(obj), params)) File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/function.py", line 277, in _iter for var in _iter(o): File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/function.py", line 277, in _iter for var in _iter(o): File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/function.py", line 277, in _iter for var in _iter(o): File "/home/matthewjarviswall/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/autograd/function.py", line 281, in _iter "an input object of type " + torch.typename(obj)) ValueError: NestedIOFunction doesn't know how to process an input object of type torch.FloatTensor

CUDA Version 9.2.88
Tensorflow Version 1.8.0
Conda environment
CuDNN Version 7.0.5

How to get more realistic image?

Hi, can you suggest me how can I improve my results? Which train parameters should I change, and how?

These are my goals:

1 - I would like to obtain more realistic images.
2- I don't care about the similarity (difference) between the single output image and the relative goal image.

For example, in the DAY to NIGHT scene transformation, I don't care if the networks add a car that does't exist, or change the color of a house, or anything, but the result must seem realistic for an human eye. So I would like a more "imaginative" network, that can be wrong recreating a scene, but maintain a realistic appeal.

pix2pixHD and tensorflow

Hi, I am looking for a tensorflow porting of pix2pixHD.
Does it exist, or is someone working on it?
I can collaborate.

emoji dataset

Hi, I want to transform real face image to emoji, does anyone know whether there is dataset for it.
Thank you!

Perceptual Loss Issues

Hi: @mingyuliutw @junyanz
I have read your paper carefully, I noticed you use a perceptual loss in paper with hyperparameters λ=10, weights = 1/Mi and Mi is the number of elements of i-th layer in VGG19, however, I checked the codes you released where the weight of loss is [1.0/32, 1.0/16, 1.0/8, 1.0/4, 1.0], I am confused with these numbers 32, 16... What's the elements you're referring to?

Do you train global and local generator seperately or together ?

Hey,

While in the paper you mention that the global and local generator are trained separately before fine tuning them together, in the default settings of the code I see both of them trained together.

Can you clarify a bit on that ? For eg. if I launch train_512.sh, would they be trained together from the start ?

Thanks in advance

error trying to train 1024p net on 12G with --no_instance --label_nc 0

first, amazing work!
I have successfully trained a 512p network using --no_instance --label_nc 0 and it works great.
I am now trying to train it up to 1024p following your example in train_1024p_12G.sh. I modify the command by again adding --no_instance and --label_nc 0 (which I seem to need to get the tensor sizes to match) but when I start training I get this error:

---------- Networks initialized -------------
Pretrained network G has fewer layers; The following are not initialized:
Traceback (most recent call last):
File "/pix2pixHD/models/base_model.py", line 63, in load_network
network.load_state_dict(torch.load(save_path))
File "
/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 490, in load_state_dict
.format(name))
KeyError: 'unexpected key "model.38.weight" in state_dict'

A couple other errors follow on after that ("During handling of the above exception, another exception occurred:"), but the above seems to be the fail point.

Any ideas on how to get this training?

Thanks!

Light spot in trainning.

Hi,I have a problem in my trainning.It works well when I train 256x256 images using train_A to train_B,but when I train 512x1024 images using Label to image without instance map,I have a light spot in every generated images,what's wrong?

-1

the red rectangle shows the light spot position.

-2
Anyone has the same problems?

Editing interface

The README and paper both mention an editing interface, but I cannot find it anywhere in the project.

Can't use my own dataset for training

Traceback (most recent call last):
File "train.py", line 61, in
Variable(data['image']), Variable(data['feat']), infer=save_fake)
File "/home/201683105/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/201683105/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/201683105/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/201683105/pix2pixHD/models/pix2pixHD_model.py", line 163, in forward
fake_image = self.netG.forward(input_concat)
File "/home/201683105/pix2pixHD/models/networks.py", line 213, in forward
return self.model(input)
File "/home/201683105/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/201683105/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/201683105/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/home/201683105/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 36, 7, 7], expected input[1, 38, 582, 1030] to have 36 channels, but got 38 channels instead
terminate called after throwing an instance of 'at::Error'
what(): CUDA error (59): device-side assert triggered (check_status at /opt/conda/conda-bld/pytorch_1532579805626/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: at::detail::CUDAStream_free(CUDAStreamInternals
&) + 0x50 (0x7f6d3a831c90 in /home/201683105/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: THCStream_free + 0x13 (0x7f6d177f00a3 in /home/201683105/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #2: + 0xc382cd (0x7f6d3cbaf2cd in /home/201683105/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: __libc_start_main + 0xf0 (0x7f6d50de0830 in /lib/x86_64-linux-gnu/libc.so.6)

CUDA assertion error binary_cross_entropy loss

A CUDA assertion error pops up when setting --no_lsgan. It seems it's because there are negative values thrown into the nn.BCELoss(). Get's fixed applying nn.BCEWithLogitsLoss() instead.

(...)
/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THCUNN/BCECriterion.cu:30: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::tuple<float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [16,0,0], thread: [31,0,0] Assertion `input >= 0. && input <= 1.` failed.
CUDA error after cudaEventDestroy in future dtor: device-side assert triggeredTraceback (most recent call last):
  File "train.py", line 56, in <module>
    Variable(data['image']), Variable(data['feat']), infer=save_fake)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 66, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/blanca/project/wip/pix2pixHD-master/models/pix2pixHD_model.py", line 158, in forward
    loss_D_fake = self.criterionGAN(pred_fake_pool, False)
  File "/blanca/project/wip/pix2pixHD-master/models/networks.py", line 110, in __call__
    loss += self.loss(pred, target_tensor)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 372, in forward
    size_average=self.size_average)
  File "/root/miniconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1179, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, size_average)
RuntimeError: cudaEventSynchronize in future::wait: device-side assert triggered
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.c line=184 error=59 : device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
  what():  cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.c:184
Aborted (core dumped)

Losses explanation

Could someone provide e briefly explanation of the different losses implemented?
In particular:
'G_GAN', 'G_GAN_Feat', 'G_VGG', 'D_real', 'D_fake'

Thanks

Thermal to Color

Hi,

Between this github repo and the one based on Coupled GAN, which is recommended for thermal image to visible image(RGB) translation?
Thanks!

Pretrained network E

Thank you so much for providing the code!
It's nice that you provide the label2city_1024p/latest_net_G.pth checkpoint. Can you also provide the checkpoint for netE? I would like to play around with the features. Thank you!

about --label_nc?

When I set the '' label_nc'' to 18 (my own dataset), I get back error messages. But I set it to the default setting(35,i.e.,cityscapes), it works properly.

Clarification question: Instance mapping appears to be missing

Hi,

I'm trying to understand how instance boundary maps are used by your code to improve the synthesized output of the ign-G.

This excerpt is from section 3.3 of your paper and is very clear. I agree with it as well.

"Instead, we argue that the most important information
the instance map provides, which is not available in the
semantic label map, is the object boundary. For example,
when a number of same-class objects are next to one another,
looking at the semantic label map alone cannot tell
them apart
. This is especially true for the street scene since
many parked cars or walking pedestrians are often next to
one another, as shown in Fig. 3a. However, with the instance
map, separating these objects becomes an easier task.
"

I was able to successfully run your code and synthesize output as expected. However, I am confused when I look deeper into the inputs of the examples provided to the ign-G.

The instance boundary maps (found in ./datasets/cityscapes/test_inst) don't appear to provide boundary information. For example, frankfurt_000001_047552_gtFine_instanceIds.png (below) doesn't define boundaries of the vehicles parked on either side of the street.

image

In other examples, boundary mapping appears but only for a small part of the image (frankfurt_000001_054640_gtFine_InstanceIds.png). In the below image, the red box shows boundary mapping but not consistently throughout the image.

image

I used GIMP to inspect the hex color codes to make sure there is no tiny variation that my eyes cannot detect. I used this technique to inspect the label map which contains overt and subtle color labeling distinctions.

Is this because that file is not an instance boundary map? If so, is this file the concatenation of the one hot vector representation of the semantic label map and the boundary map? If not, is it the channel wise concat of the instance boundary map, semantic label map, and the real image?

They are "_labelIds" and "_instanceIds" and this is why I am confused.

Please help me clear up my confusion. Because otherwise, wouldn't the ign-G considers these groups of cars and people to be a single object during synthesis?

Thank you for sharing your hard work. I really am enjoying experimenting with it.

RuntimeError: dimension specified as 0 but tensor has no dimensions

I try the newest code update 6.28. And the test_1024p.sh still meet the out of memory problem.
And the train_512p.sh works fine on single GPU, but when using multiple GPUs, I always get

Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f95756ddf90>> ignored
Traceback (most recent call last):
File "train.py", line 61, in
Variable(data['image']), Variable(data['feat']), infer=save_fake)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
return self.gather(outputs, self.output_device)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
return gather_map(outputs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions

I also try to modify the GPUs with --gpu_ids=1,2 or 1,2,3, same error occurred.

when using train_1024p.sh, I get
Traceback (most recent call last):
File "train.py", line 38, in
model = create_model(opt)
File "/media/f214/workspace/gan/pix2pixHD/models/models.py", line 15, in create_model
model.initialize(opt)
File "/media/f214/workspace/gan/pix2pixHD/models/pix2pixHD_model.py", line 60, in initialize
self.load_network(self.netG, 'G', opt.which_epoch, pretrained_path)
File "/media/f214/workspace/gan/pix2pixHD/models/base_model.py", line 60, in load_network
raise('Generator must exist!')
TypeError: exceptions must be old-style classes or derived from BaseException, not str

I try the code on both servers with 41080ti and 3Titan X.
tensorrt4.0
conda environment
cuda9.0 and cudnn7.1.3

device-side error

Either train or test,the code can't run successfully,the error occurs:

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:18

and it's difficulty to locate where it is from。。。

Cityscapes Test mean IOU

Does your Cityscapes test dataset have labels?I have downloaded a label without a test data set from the official website.So I can't get the test mean IOU

Looking for GAN loss parameters

In pix2pix tensorflow implementation I can set the following parameters:
--l1_weight: weight on L1 term for generator gradient
--gan_weight: weight on GAN term for generator gradient"

They are used to create the GAN loss, in this way:
gen_loss = gen_loss_GAN * a.gan_weight + gen_loss_L1 * a.l1_weight
Row 455 of pix2pix.py

Now I am moving to pix2pixHD, but I can't find these parameters in this implementation. How can I modify it? I didn't find also the same equation in this code.

No module named 'tensorrt'

I got an error running the new version of the test.py file. In order to do the testing I had to replace this file with an old one (May commit).

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    from run_engine import run_trt_engine, run_onnx
  File "/home/ubuntu/pix2pixHD/run_engine.py", line 5, in <module>
    import tensorrt
ModuleNotFoundError: No module named 'tensorrt'

How can I resolve this?

System Overview:
Amazon EC2 gpu instance
Deep Learning AMI 10.0 (Ubuntu)
PyTorch with Python3 (CUDA 9.0 and Intel MKL)
Ubuntu 16.04.4 LTS

TensorRT problem

Installed the tensorrt(3.0 for cuda9.0 and cudnn7.0.5) followed the introduction from the official doc, add the path to ~/.bashrc.
both the python and conda python can import tenosrrt correctly.
However, when I try the example 'python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop none '

ERROR: failed to import module (cannot import name onnxparser)
Please make sure you have the TensorRT Library installed
and accessible in your LD_LIBRARY_PATH

How should I add the path to in conda?
Or any other soluations?

RuntimeError: invalid argument 0

Could you help check the following error? Thanks. The images have the same width and height. But this error appears to related with dimension W?

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 1184 and 1181 in dimension 2 at /pytorch/torch/lib/THC/generic/THCTensorMath.cu:111

Output image format

I noticed that though the input image is a PNG, the output of the network (images saved in ./results/ ) are compressed JPG with lower quality. Is there a way to avoid compression and obtain PNG as output?

out of memory when i use own dataset

I use the Titan X with 12G memory.
But when i train the model on my own dataset it take the message:
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

my dataset only have RGB image and corresponding labels, I have add the --no_instance, and the classes of label is 23, so I add --label_nc 23

Training Problem?

(epoch: 13, iters: 1878, time: 3.216) G_GAN_Feat: 9.109 G_VGG: 7.598 G_GAN: 0.842 D_fake: 0.309 D_real: 0.296
(epoch: 13, iters: 1879, time: 3.255) G_GAN_Feat: 10.054 G_VGG: 8.721 G_GAN: 1.329 D_fake: 0.114 D_real: 0.057
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Exception socket.error: error(111, 'Connection refused') in <bound method DataLoaderIter.del of <torch.utils.data.dataloader.DataLoaderIter object at 0x7fc64fe3c190>> ignored
Traceback (most recent call last):
File "train.py", line 92, in
visualizer.display_current_results(visuals, epoch, total_steps)
File "/data2/dx/th/pix2pixHD/util/visualizer.py", line 66, in display_current_results
util.save_image(image_numpy, img_path)
File "/data2/dx/th/pix2pixHD/util/util.py", line 39, in save_image
image_pil.save(image_path)
File "/home/dx/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 1893, in save
save_handler(self, fp, filename)
File "/home/dx/anaconda2/lib/python2.7/site-packages/PIL/JpegImagePlugin.py", line 739, in _save
ImageFile._save(im, fp, [("jpeg", (0, 0)+im.size, 0, rawmode)], bufsize)
File "/home/dx/anaconda2/lib/python2.7/site-packages/PIL/ImageFile.py", line 496, in _save
s = e.encode_to_file(fh, bufsize)
File "/home/dx/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 175, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 16100) is killed by signal: Bus error.

Can you help me to fix this bug?@NVIDIA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.