jwsoh / mzsr Goto Github PK

Meta-Transfer Learning for Zero-Shot Super-Resolution (CVPR, 2020)

Python 100.00%

super-resolution meta-transfer-learning meta-learning zero-shot

mzsr's Issues

I do not understand how to calculate the weight loss ?

Please can you kindly explain me how to calculate this weight loss ?

def get_loss_weights(self):
loss_weights = tf.ones(shape=[self.TASK_ITER]) * (1.0/self.TASK_ITER)
decay_rate = 1.0 / self.TASK_ITER / (10000 / 3)
min_value= 0.03 / self.TASK_ITER

    loss_weights_pre = tf.maximum(loss_weights[:-1] - (tf.multiply(tf.to_float(self.global_step), decay_rate)), min_value)

    loss_weight_cur= tf.minimum(loss_weights[-1] + (tf.multiply(tf.to_float(self.global_step),(self.TASK_ITER- 1) * decay_rate)), 1.0 - ((self.TASK_ITER - 1) * min_value))
    loss_weights = tf.concat([[loss_weights_pre], [[loss_weight_cur]]], axis=1)
    return loss_weights

Unable to create event file

While i am trying to setup the tensorboard, i am not being able to create the event log file. It seems like to be some issues with tf.summary.FileWriter(). This issues only happens on the particular computer. I could not find any related solution online. Would you give me suggestion of how to fix it?

Error message
T:\src\github\tensorflow\tensorflow\core\util\events_writer.cc:104] Write failed because file could not be opened.

When I ran large-scale training code, I have some problems. Could you help me?

(base) wit@wit:/media/wit/Data1/WH/MZSR-master-new/Large-Scale_Training$ python3 main.py --gpu 0 --trial 2 --step 0
Initialize Training
Build Model MODEL
Initialize weights MODEL
Setting Train Configuration
Model Params: 225 K
========== Reading Checkpoints ============
============= Failed to find a checkpoint =============
========== No model to load ===========
Training Starts
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image (data type: string) is required but could not be found.
[[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_STRING, DT_STRING], dense_keys=["image", "label"], dense_shapes=[[], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const)]]
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?,16,16,3]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 49, in
main()
File "main.py", line 46, in main
Trainer.run()
File "/media/wit/Data1/WH/MZSR-master-new/Large-Scale_Training/train.py", line 88, in run
label_train_, input_train_ = sess.run([label_train, input_train])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image (data type: string) is required but could not be found.
[[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_STRING, DT_STRING], dense_keys=["image", "label"], dense_shapes=[[], []], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const, ParseSingleExample/Const)]]
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?,16,16,3]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

about the training code

Hello, the author, this method is really great. Can you tell me when the training code will be released

The outputs for arbitary images look bad. What datasets can be used to train the network for new images?

My question was would training the model on a dataset more specific to the use case be a better choice compared to datasets such as Div2k.

distributed training

Is it possible to set up distributed training (running on multiple GPUs)?
When I set parser.add_argument('--gpu', type=str, dest='gpu', default='0,1') it still runs on one GPU
Thank you

train the model

Thank you for your outstanding work ， When I run training steps, the models of X2 and X4 are not saved, can you help me？
I run it:
python main.py --train --gpu 0 --trial 1 --step 0
The results only log1 file. Where are the models?

How does tfrecord generate ?

Hi,

You said that "Remove 'label' key in 'write_to_tfrecord()' function.“ and save the ground-truth patches in your GitHub. However, the label key is relevant to grond-truth. Is this wrong?

I "Remove all contents regarding low-resolution images in the 'generate_TFRecord()' function" as you said and save the label key.

Then ** I have got the following error :**

tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image (data type: string) is required but could not be found.
[[Node: ParseSingleExample/ParseSingleExample = ParseSingleExample[Tdense=[DT_STRING], dense_keys=["image"], dense_shapes=[[]], num_sparse=0, sparse_keys=[], sparse_types=[]](arg0, ParseSingleExample/Const)]]
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3]], output_types=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Does anyone have ideas ?

Where is the path I should insert of checkpoint the trained large scale training model ?

Dear Sir,
Amazing work ! Congratulation!!
please , I have a question.can you kindly provide me with the full path I should insert of checkpoint the trained large scale training model to be able to use it as a pre-trained to meta transfer training?
I'm waiting for your reply.
Thanks in advance

When I test with my dataset, can I use the image at different resolutions?

Thanks for your great job!
I want to use my own dataset during the test,but I find that there are errors due to different image resolutions.How should I change the model parameters to change the size of the input image resolution?

hi,I used my datasets to test, but the result is not good.

Reproduction with the given model

Thanks your work. And I found a problem when reproduced the results in the paper using the pre-trained models you gave. I found the result of setting of g^d_2.0 is corresponding to the paper(using the kernel.mat and the demo test commandline). But I just replaced the kernel.mat with the kernel of g^d_0.2, the performace dropped a lot, only psnr 30.185 even with 10 iterations. But the result in paper is 33.74(MZSR(10)), so what's the problem of my test process? Maybe the model tested for g^d_2.0 and the one for g^d_0.2 is not the same model?

How to obtain X3 experimental results

How to get kernel for the real image dataset?

Thanks for the nice work, and very well documentation.

I have some real low-resolution images without any ground-truth. I wanted to test your state-of-the-art MZSR model on that. I noticed that for testing as mention in the readme file I need "Ready for the input data (low-resolution) and corresponding kernel (kernel.mat file.)". I couldn't find any information neither in the paper nor in the repo regarding how to get the Kernel file. Could you please let me know what is the code for that? Did you use any other paper to compute the kernel? Thank you.

about the high_resolution image

Hello, there is a problem when I run your code, could you please help me to solve it?

[*] Training Starts
Traceback (most recent call last):
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[{{node IteratorGetNext}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "G:/class/papper/MZSR-master/main.py", line 67, in
main()
File "G:/class/papper/MZSR-master/main.py", line 26, in main
Trainer()
File "G:\class\papper\MZSR-master\train.py", line 170, in call
inputa, labela, inputb, labelb = self.data_generator.make_data_tensor(sess, self.scale_list, noise_std=0.0)
File "G:\class\papper\MZSR-master\dataGenerator.py", line 21, in make_data_tensor
label_train_=sess.run(self.label_train)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
run_metadata_ptr)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
run_metadata)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[node IteratorGetNext (defined at G:\class\papper\MZSR-master\dataGenerator.py:82) ]]

Errors may have originated from an input operation.
Input Source operations connected to node IteratorGetNext:
OneShotIterator (defined at G:\class\papper\MZSR-master\dataGenerator.py:80)

Original stack trace for 'IteratorGetNext':
File "G:/class/papper/MZSR-master/main.py", line 67, in
main()
File "G:/class/papper/MZSR-master/main.py", line 20, in main
task_batch_size=TASK_BATCH_SIZE,tfrecord_path=TFRECORD_PATH)
File "G:\class\papper\MZSR-master\dataGenerator.py", line 16, in init
self.label_train = self.load_tfrecord()
File "G:\class\papper\MZSR-master\dataGenerator.py", line 82, in load_tfrecord
label_train = iterator.get_next()
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 426, in get_next
output_shapes=self._structure._flat_shapes, name=name)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 1973, in iterator_get_next
output_shapes=output_shapes, name=name)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
op_def=op_def)
File "D:\learningtool\python\1\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

Model results

I ran your code and got a model, but the effect of the model running on the Set5 dataset is different from that of the paper. What is the reason for this?
Operating parameters:
python main.py --gpu 0 --inputpath Input/g20/Set5/ --gtpath GT/Set5/ --savepath results/Set5 --kernelpath Input/g20/kernel.mat --model 0
The result after I iterate 100,000 times：
[] Average PSNR ** Initial: 14.9049, Final : 34.2266
You result:
[] Average PSNR ** Initial: 15.6594, Final : 35.1986

Training code

HI, it is a nice job, do you have a plan to share your training code?

AlreadyExistsError during Meta-training

About the range of lambda of isotropic Gaussian blur kernel

Thanks for your great job!
I have read the paper but I cannot find the the the range of lambda when you synthesize the isotropic Gaussian blur kernel for training.
I set the sigma to np.asarray([[lamda, 0],[0, lamda]]) to synthesize the isotropic Gaussian blur kernel, is that right ?

Why is the downsampling operator implemented by a model rather than a algorithm in the meta-test step

Thank you for your admirable and meaningful work!
My question is that why the downsampling operator is implemented by a model rather than a algorithm in the meta-test step. And I only found the models of bicubic x2, direct x2,direct x4 and multi-scale. But I want to know how they are implemented. Could you please upload the code? Thank you very much!

How can you get the kernel?

Nice job for solve the time problem of ZSSR!
I notice you have a kernel in input, for SR problem, kernel is important, but you do not provide any code to get kernel. How can you get the kernel?

Hi, I have some questions for you.

Does the experimental environment have to be "CUDA 9.0 & cuDNN 7.1"? I only have "CUDA 10.0 & cuDNN 7.4" environment.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 86: invalid continuation byte

Thank you for sharing the code. I have the following problem, do you know how to solve it? thanks

==================== PRETRAINED MODEL Loading Succeeded ====================
==================== Reading Checkpoints ====================
=================== Fail to find a Checkpoint ====================
==================== No model to load ======================================
[*] Training Starts
Traceback (most recent call last):
File "D:/2020/ReferenceCode/MZSR-master/main.py", line 71, in
main()
File "D:/2020/ReferenceCode/MZSR-master/main.py", line 30, in main
Trainer()
File "D:\2020\ReferenceCode\MZSR-master\train.py", line 171, in call
inputa, labela, inputb, labelb = self.data_generator.make_data_tensor(sess, self.scale_list, noise_std=0.0)
File "D:\2020\ReferenceCode\MZSR-master\dataGenerator.py", line 19, in make_data_tensor
label_train_=sess.run(self.label_train)
File "G:\Anaconda\envs\tensorflow2.3-gpu\lib\site-packages\tensorflow\python\client\session.py", line 958, in run
run_metadata_ptr)
File "G:\Anaconda\envs\tensorflow2.3-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1181, in _run
feed_dict_tensor, options, run_metadata)
File "G:\Anaconda\envs\tensorflow2.3-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1359, in _do_run
run_metadata)
File "G:\Anaconda\envs\tensorflow2.3-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "G:\Anaconda\envs\tensorflow2.3-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "G:\Anaconda\envs\tensorflow2.3-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 86: invalid continuation byte

Process finished with exit code 1

What is the usage of SECOND_ORDER_GRAD_ITER=0 and self.total_loss1?

What is the usage of SECOND_ORDER_GRAD_ITER=0 and self.total_loss1?
As for SECOND_ORDER_GRAD_ITER=0:

if step == SECOND_ORDER_GRAD_ITER:
       second_grad=sess.run(self.second_grad_on)

If we have finished pre-training on large scale datasets, I think it is useless in this meta-transfer learning step.
As for self.total_loss1:

self.total_loss1 = tf.reduce_sum(self.lossesa) / tf.to_float(self.META_BATCH_SIZE)
self.pretrain_op = tf.train.AdamOptimizer(self.META_LR).minimize(self.total_loss1)
        
self.gvs = self.opt.compute_gradients(self.weighted_total_losses2)
self.metatrain_op= self.opt.apply_gradients(self.gvs)

sess.run(self.metatrain_op, feed_dict=feed_dict)

In this meta-transfer learning step, total_loss1 is never used for optimizers. Is it correct?

Sir，I have a problem when training

Dear Sir,
Amazing work ! Congratulation!!
I have a problem when training，as follows：
========== Reading Checkpoints ============
============= Failed to find a checkpoint =============
========== No model to load ===========

did I did something wrogn with Generate TFRecord Dataset？

Which model will get if I run train python main.py --train

Hi, thanks for your meaningful work.
I want to ask if I do the following operations, which model ill I have?

I create trainset use MainSR to get train_MZSR.tfrecord(7.09GB);
run python main.py --train --gpu 0 --trial 0 --step 0.

I trained for 27522 iters, but when I test the trained model on Input/g20/Set5/, I get PSNR=33.6947 and SSIM=0.9265.

Can you give me some suggestions?

Use MZSR without CUDA?

Hello. Is there any possible method of utilizing MZSR without CUDA and cuDNN?

Thanks.

Error during large scale training

Hello, thank you very much for sharing your codes.

When during large-scale training, I came across this error. Can you kindly let me know where I went wrong.
The error code is below.

========== Reading Checkpoints ============
============= Failed to find a checkpoint =============
========== No model to load ===========
WARNING:tensorflow:From /content/MZSR/Large-Scale_Training/train.py:74: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Training Starts
Traceback (most recent call last):
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map_Train._parse_function_673}} Feature: image (data type: string) is required but could not be found.
[[{{node ParseSingleExample/ParseSingleExample}}]]
[[IteratorGetNext]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 51, in
main()
File "main.py", line 48, in main
Trainer.run()
File "/content/MZSR/Large-Scale_Training/train.py", line 87, in run
label_train_, input_train_ = sess.run([label_train, input_train])
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image (data type: string) is required but could not be found.
[[{{node ParseSingleExample/ParseSingleExample}}]]
[[IteratorGetNext]]

Can Zero Shot learning methods be used when there is no GT available?

My question is that I have images where I need to perform super resolution, but do not have the high-resolution ground truth for them. How can I perform inference on these images?

During reproducing “bicubic” downsampling scenario...

It seems that the PSNR and SSIM results of “bicubic” downsampling scenario in the paper cannot be obtained using your current released model. Could you please upload the LR images for “bicubic” downsampling test and code for bicubic downsampling?
In addition, when training the bicubicx2 model, the PSNR/SSIM results obtained by using the existing training code are quite different from those mentioned in the paper. Should I adjust some parameters corresponding to the bicubic downsampling scenario?

Variable dimensions are incompatible while calculating l1_loss(during Large-Scale_Training)

First, error message:
Traceback (most recent call last):
File ".../Large-Scale_Training/main.py", line 49, in
main()
File ".../Large-Scale_Training/main.py", line 46, in main
Trainer.run()
File "...\Large-Scale_Training\train.py", line 39, in run
self.calc_loss()
File "...\Large-Scale_Training\train.py", line 33, in calc_loss
self.loss=tf.losses.absolute_difference(self.MODEL.output , self.label)
File "...\lib\site-packages\tensorflow\python\ops\losses\losses_impl.py", line 271, in absolute_difference
predictions.get_shape().assert_is_compatible_with(labels.get_shape())
File "...\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 844, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (?, 96, 96, 3) and (?, 48, 48, 3) are incompatible

The cause of the error is that the parameter settings of the model itself cannot make the patch read in the dataset reach the size of the scale demanded, but remain the size itself unchanged, which leads to a mismatch in the dimension when calculating the l1_loss of the output of the model and ground truth.
So, I am curious, to get the experimental results proposed in the paper, how should the data be preprocessed in Large-Scale_Training.

Error during Large Scale Training

Hi, thank you for sharing your wonderful work.
When running large scale training, I encountered this error

Traceback (most recent call last):
File "main.py", line 49, in
main()
File "main.py", line 45, in main
scale=SCALE,num_of_data=NUM_OF_DATA, conf=conf)
TypeError: init() missing 1 required positional argument: 'model_num'

It will be very helpful for me if you could help me out with this.

Thank you

About blur kernel generate for meta learning

Hi, thanks for your work.

I noticed in the paper, you mentioned you used both isotropic and anisotropic Gaussian kernels for the blur kernels, while in the code at here, I found that only random anisotropic Gaussian kernel would be generated.

Maybe this is my misunderstanding, can you give me some guides on this?

Thanks.

What is the difference between [inputa, labela] and [inputb, labelb]?

Hi, thanks for your work.

I wonder what is the difference between [inputa, labela] and [inputb, labelb]?

Problem when i load the pretrained model , specially when it reads the checkpoint

Please I'm facing a problem when i load the pretrained model , specially when it reads the checkpoint
this is the error .. how did you kindly solve it please ??

NotFoundError (see above for traceback): Key MODEL/conv7/kernel/Adam_3 not found in checkpoint
[[Node: save/RestoreV2_69 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_69/tensor_names, save/RestoreV2_69/shape_and_slices)]]

Will there be a Pytorch version?:)

Where is the code for large-scale training?

Hi, I only found the pre-training model, but I want to know this part of the code.

jwsoh / mzsr Goto Github PK

mzsr's Issues

Recommend Projects

Recommend Topics

Recommend Org