yinboc / liif Goto Github PK

View Code? Open in Web Editor NEW

1.2K 22.0 143.0 65 KB

Learning Continuous Image Representation with Local Implicit Image Function, in CVPR 2021 (Oral)

Home Page: https://yinboc.github.io/liif/

License: BSD 3-Clause "New" or "Revised" License

Python 94.96% Shell 5.04%

machine-learning super-resolution pytorch implicit-neural-representation

liif's People

Contributors

Stargazers

Watchers

Forkers

mafm lulu1315 angkats peterzs roozbehsanaei sremedios trendingtechnology lzhbrian kahsumon chisyliu cv-ip tubbz-alt antonlinderer xvjiarui wyzhe wormcoder ashishpatel26 xeransis xialuxi pustar zeta1999 super-resolution-list kail85 peterouzh xrosliang notquestions huoshuai-dot zt706 hzy5000 harrywellington9588 liuguoyou mingzailao yuempek futureprecd zineos destructive-observer wh-forker neuralbending peterzhousz killsking mfkiwl cikrhazo cheeun alstafeev helloworldcn sailfish009 sunjian2015 haitian2du applezoos sanghyun-son junghunoh hongbo-sun flybiubiu avani17101 delldu jamekuma mtlong yutong-zhou-cv yhchen12101 itiva-hub saturdays zhongkey99 sunutf wooriel zhangxiaobaibai chrissun06 lifrary gutengzczy ast-363 efimberson northernpeach monaen chengy12 kitsunetic franziskaschrank fxnnxc sunwoo76 raidria filterbank surayuth electronicyh eternalding weepingchestnut fangichao zhaoziheng esw0116 871062304 hydrogensulfate mudimingquedeyinmoujia julianknodt sier-git suzhenwang86 monaco12138 mrshamshir jijun-cheng tianxingxia-cn celsopitta shuguoj liuyu98tju hvdd0701

liif's Issues

Regarding local ensemble

您好，

LIIF模型在进行local_ensemble的过程中，设置了两个变量vx和vy，另其分别取值-1和1以改变query的位置，使其最近邻插值到不同的区域。然而我感觉按照代码这样运行并不能实现论文图2所示的local ensemble，而是生成了一个类似于麻将中五筒的邻域（因为vx和vy都不会取0值）。请问这个地方是我理解错误了吗？还是论文和代码没对上？期待您的回复。

Make this repo easy to use on custom data

Hey, it would be amazing if you had a simple script that allowed running the models on a set of custom input images.
I was able to run test.py on the example datasets, but the dataloaders require a very specific folder structure with LR_bicubic subfolders etc... It's hard to figure out how to simply apply a pretrained model to custom images.

Also, it's not super clear what the best settings are for running this or regular images (of resolution say 250-1000 pixels²) to obtain HD super-resolution images: should you run the model on the full image, or is it better to run it on several crops of the image and tile the result?

can not download split.json

I am trying to download split.json in the paragraph above, but I have an error. Can you solve it?

RuntimeError: edsr-baseline-liif.pth is a zip archive (did you mean to use torch.jit.load()?)

I download the edsr_baseline model and run demo.py.
Then get error

Why does the color change？

stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 144, 144] at entry 1

when I "python test.py --config ./configs/test/test-set5-2.yaml --model edsr-baseline-liif.pth --gpu 0"
it errors that——
stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 144, 144] at entry 1

it means all image should have same shape?

Why div2k and benchmarks have different calc_psnr?

 if dataset == 'benchmark': 
      shave = scale  
      if diff.size(1) > 1:  
          gray_coeffs = [65.738, 129.057, 25.064]  
          convert = diff.new_tensor(gray_coeffs).view(1, 3, 1, 1) / 256  
          diff = diff.mul(convert).sum(dim=1)  
     elif dataset == 'div2k':  
          shave = scale + 6

In benchmark situation, shave =scale, and in div2k situation, shave=scale+6
My question is

Why these 2 situation need shave ?
Why shave size is related to scale?
Why div2k shave =scale+6
Look forward to your favourable reply！ Thanks in advance!

Something about Quick Start

I use the same 32x32 LR image in your paper and then run quick start to get 20x SR image, but it looks quite blurred TAT. Here is my result:

*running command: python demo.py --input input.png --model rdn-liif.pth --resolution 640,640 --output output.png --gpu 2

How long did it take to train over 1000 epochs?

True number of epochs (200 or 1000)?

Hi, very cool work!
In your paper you write that you train for 200 epochs, but in the config files (included in this repo) you have 1000 epochs.
Should there be a big difference between the two options? In terms of runtime it matters a lot... 25 vs 5 hours training time. I wonder if the final quality also changes.
Thanks!

Bugs of demo

Traceback (most recent call last):
  File "demo.py", line 26, in <module>
    model = models.make(torch.load(args.model)['model'], load_sd=True).cuda()
  File "/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "lib/python3.6/site-packages/torch/serialization.py", line 599, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError:rdn-liif.pth is a zip archive (did you mean to use torch.jit.load()?)

Question about train the code with multiple GPUs

Dear author,
There are two GPUs on my machine.But when I trained with --gpu 0,1, I found through nvidia-smi that only gpu 0 was working.
So how to solve this problem? Thank you!

Met with cuda out of memory when I am testing the model

bash scripts/test-div2k.sh save/_train_edsr-baseline-liif/epoch-last.pth 0
then I met with RuntimeError: CUDA out of memory. Tried to allocate 676.00 MiB (GPU 0; 10.92 GiB total capacity; 827.98 MiB already allocated; 478.50 MiB free; 830.00 MiB reserved in total by PyTorch)
how to solve this problem?

Why isn't the area swapped when local_ensemble is disabled?

Hi,

Thank you for your nice work! When I was going through the code, I was a bit confused by this line: https://github.com/yinboc/liif/blob/main/models/liif.py#L105 .

From the paper I understand that the weight used is the diagnoal area when local ensemble is enabled, but when local ensemble is disabled, the weight used is the current area. I wonder whether this is bug or feature :)

Thanks.

About channel

Hi
your code is based on RGB images. But i want to use the code in one channel images，can i directly change the datasets and train in your code？ Or should i change your code about channel to fit my need？

Question about the code

Thanks for your excellent idea and code, these really enlighten me a lot.
There some parts of the code I don’t understand, could you please give me some guidance?

liif/models/liif.py

Line 82 in f80be3e

rel_coord[:, :, 0] *= feat.shape[-2]

rel_coord = coord - q_coord
rel_coord[:, :, 0] *= feat.shape[-2]
rel_coord[:, :, 1] *= feat.shape[-1]

What is the 'rel_coord' refer to? I think the 'coord' and 'q_coord' refer to the xq and v*t in Eq(4) in your paper, what is the 'rel_coord' refer to? I have the same problem with 'rel_cell'.
Looking forward for your response, thank you!

[not code issues]other related

presentation of video:
powers of ten
Universe Size Comparison
Scale of the Universe
IllustrisTNG Sim Project

keyError: 'image-folder'

datasets dont have a key named image-folder when run the train-liif.py. and the code shows that datasets = {}. could anyone help me to run the train_liif.py? Thx a lot !

What is the difference between the proposed local ensemble and bilinear interpolation?

In Fig. 2, the authors propose a local ensemble approach to predict the RGB value of the target position based on its four nearest neighbors. However, the calculation process of the local ensemble is very similar to that of bilinear interpolation. Then I have a question: why not directly using bilinear interpolation in the F.grid_sample function? Looking forward to your early reply.

Regards

Sorry to disturb.But pretrained models in Dropbox is not available for me, could you please upload a GoogleDrive link as alternate?

Or could any enthusiastic guy provide assistance for me? Thanks a lot.

不会翻墙的下不了啊

能给个不用翻墙就能下的吗？

about the code

Hello yinbo,
I have a question about the meaning of data_norm in train configuration file.I guess inp,gt mean input,ground-truth respectively.But what is the meaning of sub:[0.5] and div:[0.5]?
data_norm:
inp: {sub: [0.5], div: [0.5]}
gt: {sub: [0.5], div: [0.5]}
Could you help me?Thank you!

Questions about coordinate conversion

Hi Yinbo,
Thank you for your impressive work. I'm confused about the coordinate conversion in https://github.com/yinboc/liif/blob/main/models/liif.py#L81 when you use them for the feature grid-sampling.
The coord here denotes the normalized index of the hr images. And the q_coord seems to be the interpolated real HR index on basis of the real feature map index. I guess these are for the assumption that the pixel locates on the grid center. The following line is "rel_coord = coord -q_coord". What's the meaning of this rel_coord? I couldn't understand these conversions.
And Later you multiply the rel_coord with the feature map scale for prediction. Is the range of the rel_coord not [-1,1]?
Hope for your reply and thank you for your attention again.

About code

liif/test.py

Line 16 in 7f0ec6b

def batched_predict(model, inp, coord, cell, bsize):

Hi, I don't quite understand the meaning of 'bsize', can you give me some guidance?

Looking forward to your response, thank you!

Question about load model

Hi, I want to use your model to enlarge my image data as middle result, and then use the enlarged one to generate optical flow. So in my .py, I load the model the same way as you showed in demo.py, like:
SRmodel = models.make(th.load(args.model)['model'], load_sd=True).cuda()
but I get the error KeyError: 'liif',

It almost bothered me for a day, and I could not find out why, can you help me with it?
Thank you very much, looking forward to your reply.

Quick Start

I look for a blurred picture, and then run quick start, but the output picture has no obvious change. Is there something wrong?

Fail cases

Hi!
Thanks for this wonderful piece of work. I've tried running the EDSR-baseline pre-trained LIIF model as released and it seems to work good on some pretty random out-of-domain images I tried as well. This seemed surprising since the model was trained with only 800 images in DIV2K.

So do you have any counter-examples you guys found out for where the super-resolution fails significantly ?

运行demo.py出错

将预训练模型放在项目文件夹里，我使用的命令格式如下：
python demo.py --input 11.png --model [rdn-liif.pth] --resolution [HEIGHT],[WIDTH] --output 11_sr.png --gpu 0
但是一直报错：
FileNotFoundError: [Errno 2] No such file or directory: '[rdn-liif.pth]'
于是想是不是应该在MODEL_PATH里放地址，但也找不到这个，请问这个问题怎么处理

question about detail

liif/models/liif.py

Line 82 in 68d6164

rel_coord[:, :, 0] *= feat.shape[-2]

Hi, to my understanding, rel_coord here should be normalized (i.e. in range [-1,1]). Why do you multiply it with feature size? Thx!

bsize?

My GPU memory is 32g, and the image is 3456 x 4608, now I want to resize to 6912 x 9216 by liif, but get a error hint 'cuda out of memory'. So I reduce the bsize gradually to 1, but still have the problem 'cuda out of memory'?

Training with other datasets

Can I use other datasets for training? I look forward to your answer！

A question about the dataset download.

Thanks for your excellent idea and code, these really enlighten me a lot.
But when i download the dataset celebaHQ 10241024.zip from google drive, it always itermittent...and i try the way with the celebAHQ repo, it download too slow. Can you upload the celebaHQ 10241024.zip to the dropbox , it will help much for me.
thank you very much ^_^

Unable to download DIV2K dataset

Hi,
The link provided for the home page of DIV2K dataset works but the link for downloading is not working. Kindly let me know how to resolve this issue.

L1 loss

Hi, thanks for the great work!

I am just curious that, is there any specific or special reason for using L1 loss instead of L2 loss during training? Because I feel like L2 loss matches better with the PSNR metric?

Thanks!

Any plan to release model pre-trained on celebAHQ?

Quick start

Hello, I am so confused, and don't know what's wrong with it. "Quick start" can't be done with two GPUs???

Regarding the code in liif.py

Hi, I have found it difficult to understand the purpose of rx, ry in the code below. Kindly let me know the same. Also why should we perform the flip operation (coord_.flip(-1)) on the "coord_" variable when we have received the actual coordinates.Thank you.

about the code

Thanks for sharing the work. I want to know why the query coordinates are needed to subtract the field radius of feature maps.

Is it wrong with torch.arange()?

Hi!
I try to test the make_coord() in utils.py

def make_coord(shape, ranges=None, flatten=True):
""" Make coordinates at grid centers.
"""
coord_seqs = []
for i, n in enumerate(shape):
if ranges is None:
v0, v1 = -1, 1
else:
v0, v1 = ranges[i]
r = (v1 - v0) / (2 * n)
seq = v0 + r + (2 * r) * torch.arange(n)
coord_seqs.append(seq)
ret = torch.stack(torch.meshgrid(*coord_seqs), dim=-1)
if flatten:
ret = ret.view(-1, ret.shape[-1])
return ret

test is 0.5 * torch.arange(8),get

tensor([0, 0, 0, 0, 0, 0, 0, 0])

what is wrong?

About the experiments in the paper

Hi, I have a small question about the Table2 in your paper, where you only tested x6 and x8 as out-of-distribution settings. I wonder if any scale larger than x8 is possible?

psnr

请问train过程中每个epoch中val得到的psnr是什么尺度的，是固定的么，期待您的回答！

Conversion of PyTorch model to ONNX model

Hello, I am trying to convert the liif PyTorch Model to ONNX model, but I wasn't successful so far, can someone kindly help me out.

Extension to RGBA as well as RGB

Hi there, I've been using LIIF on emoji glyphs and got some great results, however I'd like to recover the transparency, which I had to remove* by simple alpha compositing (i.e. flattening the image) before passing the PNG inputs to LIIF.

* flattening onto a grayscale background after calculating the grayscale tone not present in any semitransparent pixels, with greatest Euclidean distance from the median of the pixel mean in the image

I tried to "supervise" the estimation of transparency but it was only a rough estimate, and the results it gives are not satisfactory (despite the high quality obtained from LIIF)

This subsection of an emoji glyph was flattened against a black background then run through LIIF.
The bottom right plot shows the recovered transparency (RGBA image) flattened against a different background colour (white)

I was wondering if you think the code could be modified in some way for this, to supervise an estimate of the alpha channel?

It seems like it should be possible but it's unclear to me how I might implement it, any advice would be appreciated

Not the result I expected

In trying super resolution on faces fr

om CELEBHQ, I get poor results (using both pretrained models).
See the 1Kx1K output below.

About the loss function in the code

Thanks a lot for your excellent work.But i get a confusion in the code.The loss function is L1-loss in your work, but i can not find the loss for the edsr-baseline.Is it only a L1-Loss to train for your mlp-layers and your edsr-baseline as a whole networks? Forward your reply ,thank you.

Device and training time

Dear Author,
I am a little bit curious about the training time and device. Can you give some descriptions on these? Thx!

crash in demo

stiv2@gaidar:~/liif$ python3 demo.py --input jap.png --model ./rdn-liif.pth --resolution 450,600 --output output.png --gpu 0
Traceback (most recent call last):
  File "demo.py", line 34, in <module>
    coord.unsqueeze(0), cell.unsqueeze(0), bsize=30000)[0]
  File "/home/stiv2/liif/test.py", line 18, in batched_predict
    model.gen_feat(inp)
  File "/home/stiv2/liif/models/liif.py", line 34, in gen_feat
    self.feat = self.encoder(inp)
  File "/home/stiv2/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/stiv2/liif/models/rdn.py", line 99, in forward
    f__1 = self.SFENet1(x)
  File "/home/stiv2/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/stiv2/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
  File "/home/stiv2/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 416, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[1, 4, 45, 60] to have 3 channels, but got 4 channels instead

how can i fix this?

How to input more channels

I want to input a video sequence (for example, three frames of video) instead of one image, how to do this?

Regarding the code at line 86-90 in liif.py

"rel_cell = cell.clone()
rel_cell[:, :, 0] *= feat.shape[-2]
rel_cell[:, :, 1] *= feat.shape[-1]"

What does the code mean? Before that, rel_cell stores the proportion of 2 to crop_hr.shape. So, its multiplication resultant means what?