Giter Site home page Giter Site logo

yinboc / liif Goto Github PK

View Code? Open in Web Editor NEW
1.2K 22.0 143.0 65 KB

Learning Continuous Image Representation with Local Implicit Image Function, in CVPR 2021 (Oral)

Home Page: https://yinboc.github.io/liif/

License: BSD 3-Clause "New" or "Revised" License

Python 94.96% Shell 5.04%
machine-learning super-resolution pytorch implicit-neural-representation

liif's People

Contributors

yinboc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

liif's Issues

Regarding local ensemble

您好,

LIIF模型在进行local_ensemble的过程中,设置了两个变量vx和vy,另其分别取值-1和1以改变query的位置,使其最近邻插值到不同的区域。然而我感觉按照代码这样运行并不能实现论文图2所示的local ensemble,而是生成了一个类似于麻将中五筒的邻域(因为vx和vy都不会取0值)。请问这个地方是我理解错误了吗?还是论文和代码没对上?期待您的回复。

Make this repo easy to use on custom data

Hey, it would be amazing if you had a simple script that allowed running the models on a set of custom input images.
I was able to run test.py on the example datasets, but the dataloaders require a very specific folder structure with LR_bicubic subfolders etc... It's hard to figure out how to simply apply a pretrained model to custom images.

Also, it's not super clear what the best settings are for running this or regular images (of resolution say 250-1000 pixels²) to obtain HD super-resolution images: should you run the model on the full image, or is it better to run it on several crops of the image and tile the result?

Why div2k and benchmarks have different calc_psnr?

 if dataset == 'benchmark': 
      shave = scale  
      if diff.size(1) > 1:  
          gray_coeffs = [65.738, 129.057, 25.064]  
          convert = diff.new_tensor(gray_coeffs).view(1, 3, 1, 1) / 256  
          diff = diff.mul(convert).sum(dim=1)  
     elif dataset == 'div2k':  
          shave = scale + 6

In benchmark situation, shave =scale, and in div2k situation, shave=scale+6
My question is

  1. Why these 2 situation need shave ?
  2. Why shave size is related to scale?
  3. Why div2k shave =scale+6
    Look forward to your favourable reply! Thanks in advance!

Something about Quick Start

I use the same 32x32 LR image in your paper and then run quick start to get 20x SR image, but it looks quite blurred TAT. Here is my result:
input
output


*running command: python demo.py --input input.png --model rdn-liif.pth --resolution 640,640 --output output.png --gpu 2

True number of epochs (200 or 1000)?

Hi, very cool work!
In your paper you write that you train for 200 epochs, but in the config files (included in this repo) you have 1000 epochs.
Should there be a big difference between the two options? In terms of runtime it matters a lot... 25 vs 5 hours training time. I wonder if the final quality also changes.
Thanks!

Bugs of demo

Traceback (most recent call last):
  File "demo.py", line 26, in <module>
    model = models.make(torch.load(args.model)['model'], load_sd=True).cuda()
  File "/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "lib/python3.6/site-packages/torch/serialization.py", line 599, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError:rdn-liif.pth is a zip archive (did you mean to use torch.jit.load()?)

Met with cuda out of memory when I am testing the model

bash scripts/test-div2k.sh save/_train_edsr-baseline-liif/epoch-last.pth 0
then I met with RuntimeError: CUDA out of memory. Tried to allocate 676.00 MiB (GPU 0; 10.92 GiB total capacity; 827.98 MiB already allocated; 478.50 MiB free; 830.00 MiB reserved in total by PyTorch)
how to solve this problem?

About channel

Hi
your code is based on RGB images. But i want to use the code in one channel images,can i directly change the datasets and train in your code? Or should i change your code about channel to fit my need?

Question about the code

Thanks for your excellent idea and code, these really enlighten me a lot.
There some parts of the code I don’t understand, could you please give me some guidance?

rel_coord[:, :, 0] *= feat.shape[-2]

rel_coord = coord - q_coord
rel_coord[:, :, 0] *= feat.shape[-2]
rel_coord[:, :, 1] *= feat.shape[-1]

What is the 'rel_coord' refer to? I think the 'coord' and 'q_coord' refer to the xq and v*t in Eq(4) in your paper, what is the 'rel_coord' refer to? I have the same problem with 'rel_cell'.
Looking forward for your response, thank you!

keyError: 'image-folder'

datasets dont have a key named image-folder when run the train-liif.py. and the code shows that datasets = {}. could anyone help me to run the train_liif.py? Thx a lot !

What is the difference between the proposed local ensemble and bilinear interpolation?

In Fig. 2, the authors propose a local ensemble approach to predict the RGB value of the target position based on its four nearest neighbors. However, the calculation process of the local ensemble is very similar to that of bilinear interpolation. Then I have a question: why not directly using bilinear interpolation in the F.grid_sample function? Looking forward to your early reply.

Regards

about the code

Hello yinbo,
I have a question about the meaning of data_norm in train configuration file.I guess inp,gt mean input,ground-truth respectively.But what is the meaning of sub:[0.5] and div:[0.5]?
data_norm:
inp: {sub: [0.5], div: [0.5]}
gt: {sub: [0.5], div: [0.5]}
Could you help me?Thank you!

Questions about coordinate conversion

Hi Yinbo,
Thank you for your impressive work. I'm confused about the coordinate conversion in https://github.com/yinboc/liif/blob/main/models/liif.py#L81 when you use them for the feature grid-sampling.
The coord here denotes the normalized index of the hr images. And the q_coord seems to be the interpolated real HR index on basis of the real feature map index. I guess these are for the assumption that the pixel locates on the grid center. The following line is "rel_coord = coord -q_coord". What's the meaning of this rel_coord? I couldn't understand these conversions.
And Later you multiply the rel_coord with the feature map scale for prediction. Is the range of the rel_coord not [-1,1]?
Hope for your reply and thank you for your attention again.

About code

liif/test.py

Line 16 in 7f0ec6b

def batched_predict(model, inp, coord, cell, bsize):

Hi, I don't quite understand the meaning of 'bsize', can you give me some guidance?

Looking forward to your response, thank you!

Question about load model

Hi, I want to use your model to enlarge my image data as middle result, and then use the enlarged one to generate optical flow. So in my .py, I load the model the same way as you showed in demo.py, like:
SRmodel = models.make(th.load(args.model)['model'], load_sd=True).cuda()
but I get the error KeyError: 'liif',
image
It almost bothered me for a day, and I could not find out why, can you help me with it?
Thank you very much, looking forward to your reply.

Quick Start

I look for a blurred picture, and then run quick start, but the output picture has no obvious change. Is there something wrong?

Fail cases

Hi!
Thanks for this wonderful piece of work. I've tried running the EDSR-baseline pre-trained LIIF model as released and it seems to work good on some pretty random out-of-domain images I tried as well. This seemed surprising since the model was trained with only 800 images in DIV2K.

So do you have any counter-examples you guys found out for where the super-resolution fails significantly ?

运行demo.py出错

将预训练模型放在项目文件夹里,我使用的命令格式如下:
python demo.py --input 11.png --model [rdn-liif.pth] --resolution [HEIGHT],[WIDTH] --output 11_sr.png --gpu 0
但是一直报错:
FileNotFoundError: [Errno 2] No such file or directory: '[rdn-liif.pth]'
于是想是不是应该在MODEL_PATH里放地址,但也找不到这个,请问这个问题怎么处理

bsize?

My GPU memory is 32g, and the image is 3456 x 4608, now I want to resize to 6912 x 9216 by liif, but get a error hint 'cuda out of memory'. So I reduce the bsize gradually to 1, but still have the problem 'cuda out of memory'?

A question about the dataset download.

Thanks for your excellent idea and code, these really enlighten me a lot.
But when i download the dataset celebaHQ 10241024.zip from google drive, it always itermittent...and i try the way with the celebAHQ repo, it download too slow. Can you upload the celebaHQ 10241024.zip to the dropbox , it will help much for me.
thank you very much ^_^

Unable to download DIV2K dataset

Hi,
The link provided for the home page of DIV2K dataset works but the link for downloading is not working. Kindly let me know how to resolve this issue.

L1 loss

Hi, thanks for the great work!

I am just curious that, is there any specific or special reason for using L1 loss instead of L2 loss during training? Because I feel like L2 loss matches better with the PSNR metric?

Thanks!

Quick start

Hello, I am so confused, and don't know what's wrong with it. "Quick start" can't be done with two GPUs???

Regarding the code in liif.py

Hi, I have found it difficult to understand the purpose of rx, ry in the code below. Kindly let me know the same. Also why should we perform the flip operation (coord_.flip(-1)) on the "coord_" variable when we have received the actual coordinates.Thank you.
Issue

about the code

image
Thanks for sharing the work. I want to know why the query coordinates are needed to subtract the field radius of feature maps.

Is it wrong with torch.arange()?

Hi!
I try to test the make_coord() in utils.py

def make_coord(shape, ranges=None, flatten=True):
""" Make coordinates at grid centers.
"""
coord_seqs = []
for i, n in enumerate(shape):
if ranges is None:
v0, v1 = -1, 1
else:
v0, v1 = ranges[i]
r = (v1 - v0) / (2 * n)
seq = v0 + r + (2 * r) * torch.arange(n)
coord_seqs.append(seq)
ret = torch.stack(torch.meshgrid(*coord_seqs), dim=-1)
if flatten:
ret = ret.view(-1, ret.shape[-1])
return ret

test is 0.5 * torch.arange(8),get

tensor([0, 0, 0, 0, 0, 0, 0, 0])

what is wrong?

About the experiments in the paper

Hi, I have a small question about the Table2 in your paper, where you only tested x6 and x8 as out-of-distribution settings. I wonder if any scale larger than x8 is possible?

psnr

请问train过程中每个epoch中val得到的psnr是什么尺度的,是固定的么,期待您的回答!

Extension to RGBA as well as RGB

Hi there, I've been using LIIF on emoji glyphs and got some great results, however I'd like to recover the transparency, which I had to remove* by simple alpha compositing (i.e. flattening the image) before passing the PNG inputs to LIIF.

* flattening onto a grayscale background after calculating the grayscale tone not present in any semitransparent pixels, with greatest Euclidean distance from the median of the pixel mean in the image

I tried to "supervise" the estimation of transparency but it was only a rough estimate, and the results it gives are not satisfactory (despite the high quality obtained from LIIF)

This subsection of an emoji glyph was flattened against a black background then run through LIIF.
The bottom right plot shows the recovered transparency (RGBA image) flattened against a different background colour (white)

I was wondering if you think the code could be modified in some way for this, to supervise an estimate of the alpha channel?

It seems like it should be possible but it's unclear to me how I might implement it, any advice would be appreciated

Not the result I expected

In trying super resolution on faces fr
0
om CELEBHQ, I get poor results (using both pretrained models).
See the 1Kx1K output below.
output

About the loss function in the code

Thanks a lot for your excellent work.But i get a confusion in the code.The loss function is L1-loss in your work, but i can not find the loss for the edsr-baseline.Is it only a L1-Loss to train for your mlp-layers and your edsr-baseline as a whole networks? Forward your reply ,thank you.

Device and training time

Dear Author,
I am a little bit curious about the training time and device. Can you give some descriptions on these? Thx!

crash in demo

stiv2@gaidar:~/liif$ python3 demo.py --input jap.png --model ./rdn-liif.pth --resolution 450,600 --output output.png --gpu 0
Traceback (most recent call last):
  File "demo.py", line 34, in <module>
    coord.unsqueeze(0), cell.unsqueeze(0), bsize=30000)[0]
  File "/home/stiv2/liif/test.py", line 18, in batched_predict
    model.gen_feat(inp)
  File "/home/stiv2/liif/models/liif.py", line 34, in gen_feat
    self.feat = self.encoder(inp)
  File "/home/stiv2/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/stiv2/liif/models/rdn.py", line 99, in forward
    f__1 = self.SFENet1(x)
  File "/home/stiv2/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/stiv2/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
  File "/home/stiv2/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 416, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[1, 4, 45, 60] to have 3 channels, but got 4 channels instead

how can i fix this?

How to input more channels

I want to input a video sequence (for example, three frames of video) instead of one image, how to do this?

Regarding the code at line 86-90 in liif.py

"rel_cell = cell.clone()
rel_cell[:, :, 0] *= feat.shape[-2]
rel_cell[:, :, 1] *= feat.shape[-1]"

What does the code mean? Before that, rel_cell stores the proportion of 2 to crop_hr.shape. So, its multiplication resultant means what?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.