Giter Site home page Giter Site logo

jiahuiyu / generative_inpainting Goto Github PK

View Code? Open in Web Editor NEW
3.2K 3.2K 777.0 13.76 MB

DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral

Home Page: http://jiahuiyu.com/deepfill/

License: Other

Python 100.00%
attention-model deep-neural-networks deepfill generative-adversarial-network image-inpainting tensorflow

generative_inpainting's People

Contributors

jiahuiyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

generative_inpainting's Issues

loss does not converge

I'm training network with Stanford Cars dataset, and now it's on 10th epoch.
But the loss showed in terminal does not change from 0.14xx to 0.19xx.
Also, in TensorBoard, there are many losses but all of them are just oscillating and not converging.
Should I modify some of parameters in inpaint.xml to solve this problem?

requirements.txt

I think It would be better if this repo had requirements.txt file and added description on README :D

Question about the training speed

Thanks for your great work on inpaiinting! I am trying to train the inpainting network according to your guide on the README. The training seems okay.
However, the training is so slow. Below shows the training log:

|####################| 100.00%, 109326/0 sec. train epoch 1, iter 10000/10000, loss 0.089837, 0.09 batches/sec.
[2018-04-20 02:47:04 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-10000.
|####################| 100.00%, 108851/0 sec. train epoch 2, iter 10000/10000, loss 0.030459, 0.09 batches/sec.
[2018-04-21 09:01:15 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-20000.
|####################| 100.00%, 108510/0 sec. train epoch 3, iter 10000/10000, loss 0.030714, 0.09 batches/sec.
[2018-04-22 15:09:45 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-30000.
|####################| 100.00%, 108276/0 sec. train epoch 4, iter 10000/10000, loss 0.038624, 0.09 batches/sec.
[2018-04-23 21:14:21 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-40000.
|############--------| 61.50%, 66691/41605 sec. train epoch 5, iter 6150/10000, loss 0.038147, 0.09 batches/sec.

I use one K80 to train the network. 
I have a question about the training, how can I speed up the training? Thanks a lot!

Different input size during testing stage

During the testing stage (test.py), the network has to be built with a fixed size of image, such as


#the image size is determined here 
input_image = np.concatenate([image, mask], axis=2)

sess_config = tf.ConfigProto()
sess_config.gpu_options.allow_growth = True
with tf.Session(config=sess_config) as sess:
    input_image = tf.constant(input_image, dtype=tf.float32)
    #the inference can now only accept a pre-defined size of image
    output = model.build_server_graph(input_image)

Just wander can I build an inference network that can accept various input size (i.e., with no need to pre-define the image size)?

or to be more specific, can I build a inference network that can achieve

output = model.build_server_graph(input_image)

where the input_image is a tensor of (batch_size, width, height, channel) = (1, None, None, 3)

model size

Hi Jiahui,

Nice work!
I have a question about the model size. In your provided trained model, the size is pretty small (e.g., around 14 MB). However, the output after running your train.py is far larger (e.g., 126M). Just wonder did you conduct any post-processing to the model? or is there any parameters in your code to tune the model size? (any trade-off for such a compression?)

[Released model]

 69 May 22 16:15 checkpoint*

14M May 22 16:15 snap-0.data-00000-of-00001*
3.6K May 22 16:15 snap-0.index*
14M May 22 16:15 snap-0.meta*

[output of train.py]

77 May 23 15:50 checkpoint
84M May 23 20:01 events.out.tfevents.1527040937.dgx1-server2
126M May 23 15:50 snap-10000.data-00000-of-00001
12K May 23 15:50 snap-10000.index
14M May 23 15:50 snap-10000.meta

File preparation

Hi,

Great work! I would like to play with your code and I was wondering the following:

  1. How do you prepare the data in the flist file?
  2. I want to create my own masked images, what steps should I follow?

Thanks,
Alex

Questiona about the results on faces

I saw you uploaded more results on faces that look good. How long have you trained? I have trained model with celeba_hq dataset for 8 days with the same hyper-parameters as in your inpain.yml on a Titan X(Pascal). But the test result is terrible.
b16_i240000_01
Can you give me some advices? Thank you!

About inpaint_ops.py#L283

Hi, JiahuiYu

Thank you for your contribution and release those code!
I have a question about inpaint_ops.py#L283,
xi should be normed?

您好根據您的 paper, xi 是否應該也要 normed?
另外你的 work 非常有趣!

Thanks!

Adding new training images with pretrained model

When trying to resume training new images based on pretrained model(restored model), do i have to train with not only new images but also previous trained images or with only new images?
In the former case, i wonder if it can cause overfitting with previous trained images as training iteration increases.

padding type for generator

In the paper, it is written in Section 3 that "we use mirror padding for all convolution layers". However, the code seem to use 'SAME' padding (i.e. zero padding) for the generator since the 'PADDING' field of the .yml file is specified as 'SAME'. Which type of padding did you use exactly for the pretrained models?

Thank you for your help!

Issue with tf.arg_max in inpainting_ops.py

I tried to run the test script on the pre-trained models for the image net task on cpu on Mac OS X Sierra with Tensorflow 1.7.0 and it gave me an error that there is not output_type argument for tf.arg_max in inpainting_ops.py on line 302, I was able to fix that bug using this line.

offset = tf.cast(tf.argmax(yi, axis=3),tf.int32)

Just bringing this to notice if anyone faces similar bug in the future and would be better to cast it for preventing the bug from causing.

Couldn't find 'checkpoint' file or checkpoints

Hello,
Thanks for sharing the codebase publicly. So, after I clone the repo, I downloaded the pretrained models for HQ CelebA and tried running the test.py code (Tensorflow version:1.6.0). I created a folder model_logs and placed all the files of the downloaded pre-trained model in that folder. However, I get the error that:
` Couldn't find 'checkpoint' file or checkpoints in given directory model_logs/'

I run using the command:
python test.py --image examples/celeba/celebahr_patches_164036_input.png --mask examples/center_mask_256.png --output examples/output.png --checkpoint_dir model_logs/

And the contents of model_logs are:

checkpoint.txt
snap-0.data-00000-of-00001
snap-0.index
snap-0.meta
train_shuffled.flist
val_shuffled.flist

Can you please let me know the correct way to just run your pretrained models.

Thanks,
Avisek

about the wgan-gp

I am sorry I am a little confused about the training of the WGAN-GP. Does it train the discriminator five times and the generator one time alternately?

snap file

Hello. I have another question.
In each epoch, snap file is created and added the line in checkpoint.
model_checkpoint_path: "snap-30000" all_model_checkpoint_paths: "snap-10000" all_model_checkpoint_paths: "snap-20000" all_model_checkpoint_paths: "snap-30000"
What does it mean? each epoch has different loss value.
In your pretrained model, there is only one snap model.
Do I choose one snap with low loss value or
Is it a structure that uses all snap files together?

Regarding, JiyoungAn

question about the inpaint_ops.py

In my opinion, the kernel size when extract patches to generate deconvolution filters (raw_w) should be the same with the kernel size to generate the convolution filters (w). But why raw_w use kernel=2*rate, but w use ksize?

Questions for run train.py

Hi, I am a beginner in tensorflow and I'm very interested in your paper.
I'll be very grateful if you could answer my question.

  1. run only using CPU
    I am trying to run train.py using CPU only.
    So I changed NUM_GPUS to 0 (inpaint.yml).
    However error related with reading gpu still has occurred and it stops after below sentence.
    32m[2018-05-23 19:02:54 @weights_viewer.py:60]�[0m Total size of trainable weights: 0G 10M 184K 136B (Assuming32-bit data type.)
    Are there any settings that need to be changed or Is it impossible to run only using cpu?

  2. Adding train image in existing model
    I'd like to add some training images(#800(shape 255,255,3)) in your place2 model.

  1. downloading the place2 model you've built
  2. locate the model into the model_logs file
  3. train
    In this way, will my image be added to the existing model?

Regarding, jiyoungAn

the program runs error

when I running the program ,there are some errors like this:
[2018-05-20 20:13:24 @logger.py:43] Trigger callback: Total counts of trainable weights: 10582152.
[2018-05-20 20:13:24 @weights_viewer.py:60] Total size of trainable weights: 0G 10M 94K 136B (Assuming32-bit data type.)
|##############################---------------------| 58.00%, 36937/26403 sec. train epoch 1, iter 2900/5000, loss 0.891349, 0.08 batches/sec.Traceback (most recent call last):
File "train.py", line 112, in
trainer.train()
File "/home/anaconda3/lib/python3.6/site-packages/neuralgym/train/trainer.py", line 133, in train
cb.run(sess, step)
File "/home/anaconda3/lib/python3.6/site-packages/neuralgym/callbacks/secondary_trainer.py", line 26, in run
self.train()
File "/home/anaconda3/lib/python3.6/site-packages/neuralgym/train/trainer.py", line 143, in train
assert not np.isnan(loss_value)
AssertionError

I can't solve it. is it machine problem? I hope you give me some ideas. thanks.

custom data setting

i wanted to use custom dataset so edited 'inpaint.yml' as following:

DATASET: 'buildings'
DATA_FLIST: {
   buildings: [
     'data/buildings/train_shuffled.flist'
   ]
}

and also added absolute file path of training img to 'data/buildings/train_shuffled.flist' as following example:

'/Users/jeongjoonhyun/Desktop/generative_inpainting/data/buildings/img1.png'
'/Users/jeongjoonhyun/Desktop/generative_inpainting/data/buildings/img2.png'
....

but ng.data_from_fnames.py > function 'read_img'

 img = cv2.imread(filename)
// result : None

why is img None..?

Question about your paper

Great work!
I have a question about the STN-based attention for image inpainting (Fig. 8 in the paper). I wonder what the loss function of STN-based attention network is ? L1 or WGAN-GP + L1?
Thank you.

SyntaxError: invalid syntax

When I try to run the inference with test.py using the given command for CelebA 256 HQ images from the examples, I get the following error

File "/home/tbalaji/.local/lib/python2.7/site-packages/neuralgym/utils/config.py", line 21
class Loader(yaml.Loader, metaclass=LoaderMeta):
^
SyntaxError: invalid syntax

Please guide me with fixing this.
I am using python 2.7 with TensorFlow 1.3 in Virtual env. Thanks

running program error

when I input my picture shape (63,44,3) and running train.py . But It shows some errors like this:
Traceback (most recent call last):
File "train.py", line 47, in
images, config=config)
File "/opencv/generative_inpainting/inpaint_model.py", line 153, in build_graph_with_losses
padding=config.PADDING)
File opencv/generative_inpainting/inpaint_model.py", line 97, in build_inpaint_net
x, offset_flow = contextual_attention(x, x, mask_s, 3, 1, rate=2)
File "/opencv/generative_inpainting/inpaint_ops.py", line 307, in contextual_attention
yi = tf.nn.conv2d_transpose(yi, wi_center, tf.concat([[1], raw_fs[1:]], axis=0), strides=[1,rate,rate,1]) / 4.
File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1187, in conv2d_transpose
filter.get_shape()[3]))
ValueError: input channels does not match filter's input channels, 40 != 48
I don't know how to fix it. can you give me any suggestions? Thanks.

No enough gpus for dedicated usage.

Hello, I have a question and I am writing.
I want to 'test' the 'canyon_input.png' image in the 'examples' folder.
I also want to 'train' the 'places2' dataset
Look for your github and save the files to be stored in 'model_logs'
I wrote an error below the article saying that an error occurred in 'neuralgym'.
Your answers I'll wait.

<Error
Traceback (most recent call last):
File "test.py", line 23, in
ng.get_gpus(1)
File "/home/pmi/Documents/Songhee/songhee/lib/python3.5/site-packages/neuralgym/utils/gpus.py", line 70, in get_gpus
' [(gpu id: num of processes)]: {}'.format(sorted_gpus))
SystemError: No enough gpus for dedicated usage. [(gpu id: num of processes)]: [(0, 3)]

training with my own dataset

Hello there,
first of all congratulations for this awesome project!

I need help regarding the training process.
I learned from #15 that we need a training and a validation dataset but I still have some confusion.

Is the training dataset just a set of pictures with white rectangles covering some parts and the validation dataset the same pictures without any occlusion?

How to run multiple images with multiple masks respectively

Thank you for your contribution,

At the moment, I have checked the test file and it only can run on 1 image/mask. I have tried to put code in the for loop, but I got error at this line : output = model.build_server_graph(input_image)
output = model.build_server_graph(input_image)
File "/home/ubuntu/trinh/generative_inpainting/inpaint_model.py", line 307, in build_server_graph
config=None)
File "/home/ubuntu/trinh/generative_inpainting/inpaint_model.py", line 50, in build_inpaint_net
x = gen_conv(x, cnum, 5, 1, name='conv1')
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/home/ubuntu/trinh/generative_inpainting/inpaint_ops.py", line 45, in gen_conv
activation=activation, padding=padding, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/convolutional.py", line 608, in conv2d
return layer.apply(inputs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 671, in apply
return self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 559, in call
self.build(input_shapes[0])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/convolutional.py", line 143, in build
dtype=self.dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 458, in add_variable
trainable=trainable and self.trainable)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1203, in get_variable
constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1092, in get_variable
constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 742, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable inpaint_net/conv1/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/home/ubuntu/trinh/generative_inpainting/inpaint_ops.py", line 45, in gen_conv
activation=activation, padding=padding, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/home/ubuntu/trinh/generative_inpainting/inpaint_model.py", line 50, in build_inpaint_net
x = gen_conv(x, cnum, 5, 1, name='conv1')

Do you have any ideas what I have done wrong?
Thank you.

about the contextual attention

In the code of contextual attention, foreground is x, background also x. I'm confused what's the sense to match x with x. Can you help me?

neuralgym get_gpus error

gpu pid type sm mem enc dec command
Idx C/G % % % % name
0 14891 C 90 67 0 0 python3

Traceback (most recent call last):
File "train.py", line 37, in
ng.get_gpus(config.NUM_GPUS)
File "/home/gpuguest/Downloads/generative_inpainting/neuralgym/utils/gpus.py", line 70, in get_gpus
' [(gpu id: num of processes)]: {}'.format(sorted_gpus))
SystemError: No enough gpus for dedicated usage. [(gpu id: num of processes)]: [(0, 1)]

iwhy is this error raised..?

How should the filelist look like?

First I want to say congratulations for the amazing job you did!

I did not find any example of what the Prepare training images filelist. step consists on. How must the images be listed? What is the format? One file name per line, like this?

pic-1.jpg
pic-2.jpg

how to create a flist file?

Recently ,I read your paper and want to run your code. but I don't know how to create a .flist file? can you tell me in detail? Thank you.

About the training of two stages

When you train the model, do you optimize the params of stage-1 and stage-2 jointly?
Or you first pre-train the stage-1 and then train stage-2 and finally optimize them jointly?

about loss

First of all, Thank you for your good research.

When I run the code, it looks like g_loss is the only loss that is printed, is that correct?

and d_loss is sometimes negative, is it ok?

Question about the training speed

Thanks for your great work on inpaiinting! I am trying to train the inpainting network according to your guide on the README. The training seems okay.
However, the training is so slow. Below shows the training log:

|####################| 100.00%, 109326/0 sec. train epoch 1, iter 10000/10000, loss 0.089837, 0.09 batches/sec.
[2018-04-20 02:47:04 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-10000.
|####################| 100.00%, 108851/0 sec. train epoch 2, iter 10000/10000, loss 0.030459, 0.09 batches/sec.
[2018-04-21 09:01:15 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-20000.
|####################| 100.00%, 108510/0 sec. train epoch 3, iter 10000/10000, loss 0.030714, 0.09 batches/sec.
[2018-04-22 15:09:45 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-30000.
|####################| 100.00%, 108276/0 sec. train epoch 4, iter 10000/10000, loss 0.038624, 0.09 batches/sec.
[2018-04-23 21:14:21 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-40000.
|############--------| 61.50%, 66691/41605 sec. train epoch 5, iter 6150/10000, loss 0.038147, 0.09 batches/sec.

I use one K80 to train the network. 
I have a question about the training, how can I speed up the training? Thanks a lot!

Question about the training speed

Thanks for your great work on inpaiinting! I am trying to train the inpainting network according to your guide on the README. The training seems okay.
However, the training is so slow. Below shows the training log:

|####################| 100.00%, 109326/0 sec. train epoch 1, iter 10000/10000, loss 0.089837, 0.09 batches/sec.
[2018-04-20 02:47:04 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-10000.
|####################| 100.00%, 108851/0 sec. train epoch 2, iter 10000/10000, loss 0.030459, 0.09 batches/sec.
[2018-04-21 09:01:15 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-20000.
|####################| 100.00%, 108510/0 sec. train epoch 3, iter 10000/10000, loss 0.030714, 0.09 batches/sec.
[2018-04-22 15:09:45 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-30000.
|####################| 100.00%, 108276/0 sec. train epoch 4, iter 10000/10000, loss 0.038624, 0.09 batches/sec.
[2018-04-23 21:14:21 @logger.py:43] Trigger callback: Trigger ModelSaver: Save model to model_logs/20180418202435371480_node1_imagenet_NORMAL_wgan_gp_full_model_image_256/snap-40000.
|############--------| 61.50%, 66691/41605 sec. train epoch 5, iter 6150/10000, loss 0.038147, 0.09 batches/sec.

I use one K80 to train the network. 
I have a question about the training, how can I speed up the training? Thanks a lot!

Questions about multi-gpu training

It's a great work. Does this code support multi-gpu training? I've tried to alter NUM_GPUS and GPU_ID, but it seems like that the code just selects one gpu for training. Is there any clue about it? Thanks.

'Image is None' Problem

Hello, JiahuiYu.
Thank you for answering me.
However, I got this error again..
[2018-05-25 17:43:37 @data_from_fnames.py:153] image is None, sleep this thread for 0.1s. Exception in thread Thread-16: Traceback (most recent call last): File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/usr/lib/python3.5/threading.py", line 862, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.5/dist-packages/neuralgym/data/feeding_queue_runner.py", line 194, in _run data = func() File "/usr/local/lib/python3.5/dist-packages/neuralgym/data/data_from_fnames.py", line 143, in <lambda> feed_dict_op=[lambda: self.next_batch()], File "/usr/local/lib/python3.5/dist-packages/neuralgym/data/data_from_fnames.py", line 182, in next_batch img = cv2.resize(img, tuple(self.shapes[i][:-1])) cv2.error: /io/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215) ssize.width > 0 && ssize.height > 0 in function resize

It seems like there is no image.
I used code in #15 for creating flist.
They are all in "data_flist" file.
At inpaint.yml, I add DATA_FLIST: city: ['data_flist/city/train_shuffled.flist','data_flist/city/validation_shuffled.flist']
and Set dataset as city.

Below is log about my Dataset
[2018-05-25 17:42:59 @dataset.py:26] --------------------------------- Dataset Info --------------------------------- [2018-05-25 17:42:59 @dataset.py:36] fn_preprocess: None [2018-05-25 17:42:59 @dataset.py:36] file_length: 726 [2018-05-25 17:42:59 @dataset.py:36] return_fnames: False [2018-05-25 17:42:59 @dataset.py:36] nthreads: 16 [2018-05-25 17:42:59 @dataset.py:36] batch_phs: [<tf.Tensor 'Placeholder:0' shape=(?, 256, 256, 3) dtype=float32>] [2018-05-25 17:42:59 @dataset.py:36] dtypes: [tf.float32] [2018-05-25 17:42:59 @dataset.py:36] index: 0 [2018-05-25 17:42:59 @dataset.py:36] queue_size: 256 [2018-05-25 17:42:59 @dataset.py:36] filetype: image [2018-05-25 17:42:59 @dataset.py:36] enqueue_size: 32 [2018-05-25 17:42:59 @dataset.py:36] random_crop: False [2018-05-25 17:42:59 @dataset.py:36] random: False [2018-05-25 17:42:59 @dataset.py:36] shapes: [[256, 256, 3]] [2018-05-25 17:42:59 @dataset.py:37] --------------------------------------------------------------------------------

What should I do?
Thank you.

OOM issue

inpaint.yml setting:

IMG_SHAPES: [256,256,3]
BATCH_SIZE: 16
MAX_ITERS : 1000000
NUM_GPUS: 1 // NVIDIA GTX 960 4GB

During training, i met OOM error :

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,257,257,32]
	 [[Node: inpaint_net_2/xconv2_downsample/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](inpaint_net_2/xconv1/Elu, inpaint_net/xconv2_downsample/kernel/read)]]
	 [[Node: inpaint_net_2/Reshape_72/_249 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1739_inpaint_net_2/Reshape_72", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

any possible suggestions..? (e.g. reduce BATCH_SIZE or IMG_SHAPES. In case of reducing IMG_SHAPES, i wonder if it downgrades output quality..)

Format of 'flist'

I want to train network with new dataset.
For training, I tried to modify inpaint.yml file, but I'm not sure about how to set dataset path.
It seems that I have to add new dataset as 'flist' file to DATA_FLIST, but I cannot find how to make appropriate 'flist' files.
Is there any reference for flist file format, or some examples of them?

Attribute Error during training with neuralgym

Hi,

Thank you so much for such an impressive work and sharing it.

I would like to train this approach on a new dataset. However during training, I get the following error:
Exception in thread Thread-13:

Traceback (most recent call last):

File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/neuralgym/data/feeding_queue_runner.py", line 194, in _run
data = func()
File "/usr/local/lib/python3.5/dist-packages/neuralgym/data/data_from_fnames.py", line 143, in
feed_dict_op=[lambda: self.next_batch()],
File "/usr/local/lib/python3.5/dist-packages/neuralgym/data/data_from_fnames.py", line 180, in next_batch
random_h, random_w, align=False) # use last rand
File "/usr/local/lib/python3.5/dist-packages/neuralgym/ops/image_ops.py", line 50, in np_random_crop
image = np_scale_to_shape(image, shape, align=align)
File "/usr/local/lib/python3.5/dist-packages/neuralgym/ops/image_ops.py", line 23, in np_scale_to_shape
imgh, imgw = image.shape[0:2]
AttributeError: 'NoneType' object has no attribute 'shape'
�[32m[2018-03-29 10:02:11 @data_from_fnames.py:153]�[0m image is None, sleep this thread for 0.1s.

Do you have any idea what might be causing that?

During training, the model is saved in model_logs directory, however, I cannot observe any generated/inpainted images since no images are being saved during training. Is that right?

Sorry for many questions!

Thank you so much for your help!

Regards,
Ecem

Batch Inpainting

Thank you for shaing your code!

I am trying to do batch infilling but I find it crashes in the function 'contextual_attention' in the file inpaint_op.py. I run the test.py with batch size set as 5, then I get the error:

"/home/zzzace2000/Googledrive/Lab/nl_markov_network/exp/vbd_imagenet/generative_inpainting/inpaint_ops.py", line 308, in contextual_attention strides=[1,rate,rate,1]) / 4. File "/home/zzzace2000/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1187, in conv2d_transpose filter.get_shape()[3])) ValueError: input channels does not match filter's input channels, 5120 != 1024

If I set the batch size as 1, the code runs flawlessly.
I can see from the code that it should be able to handle situation when batch size > 1.
Can you let me know where I might get it wrong?

Thank you!

how to set hyper-parameter

Although you have released pretrained models, I want to train a model by myself with the celeba_hq dataset. With images of size 1024*1024, I set the hyper-parameter
IMG_SHAPES: [256, 256, 3]
HEIGHT: 128
WIDTH: 128
MAX_DELTA_HEIGHT: 64
MAX_DELTA_WIDTH: 64
RANDOM_CROP: False
other hyper-parameters same as yours in inpaint.yml,
but I can't get a good model like yours.
Can you give me some advice with the hyper-parameter?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.