xindongzhang / elan Goto Github PK

View Code? Open in Web Editor NEW

201.0 8.0 17.0 24 KB

[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution

License: Apache License 2.0

Python 100.00%

efficient-inference super-resolution transformer

elan's Introduction

Hi there 👋

elan's People

Contributors

Stargazers

Watchers

Forkers

scott-mao wangbaorui zhangjiewen123 kentzou chisyliu goforeverv ironteen xavysp berumotto-vermouth gavinljj lxxxdovo baoyu2020 fradeet wujimaka kio2019 bweng001 bonheurswp

elan's Issues

何时开源代码

作者您好，您什么能够开源代码！

How to understand these parameters in "shift-convolution"?

self.weight[0g:1g, 0, 1, 2] = 1.0 ## left
self.weight[1g:2g, 0, 1, 0] = 1.0 ## right
self.weight[2g:3g, 0, 2, 1] = 1.0 ## up
self.weight[3g:4g, 0, 0, 1] = 1.0 ## down
self.weight[4*g:, 0, 1, 1] = 1.0 ## identity

i think [1,2] is down,[1,0] is up,[2,1] is right,[0,1] is left.

Tensor Shape size issue

Hello,
I am running the code and in second epoch, it says the in and out tensor size issue. The in is 3,256,256 and out is 3,248,258. I tried to use resize function. I have used the following configuration:
batch_size: 8
data_repeat: 80
data_augment: 1
epochs: 1000
lr: 0.00025

Please guide!

the flops problem

计算复杂度的代码有么？我们算出来的和论文的不同，希望可以给个参考。

How to directly test the program?

split_with_sizes expects split_sizes to sum exactly to 128 (input tensor's size at dimension 1), but got split_sizes=[42, 42, 42]

输入
input = torch.randn(1, 3, 320, 180)
报错如题。
请问怎么修改

网络配置：

scale: 4

rgb_range: 255

colors: 3

m_elan: 24

c_elan: 60

n_share: 1

r_expand: 2

act_type: 'relu'

window_sizes: [4, 8, 16]

Is the pre-training strategy used in this work ?

When training the ELAN model for SR ×3 and ×4, do we use the pre-trained ×2 network to initialize the model parameters？

question about experiments metric score of other methods

Thank for your work and congratulate accept for ECCV2022.

I read your paper and had a question about other model metric.

In Table1 in your paper, I think that LatticeNet metric reports (PSNR/SSIM) are wrong.

Your report about that and the reports in both LatticeNet and SwinIR paper are different.

So I guess if you did experiments with LatticeNet by yourself, or if it is just mistake.

Thanks for your work one more.

Thank you for your excellent work

I am reproducing your scores in your paper, and all looks well. Your framework is the neatest one I have seen. Anyway, thank you.

Shifted Window and Shared Attention Patterns in Consecutive GMSAs

Hello,

The paper is very interesting to me, since SwinIR suffers from high memory consumption and slow convergence. I recently have two questions about the proposed framework.

Firstly, two consecutive GMSAs can share the attention maps, while the shifted window is applied to partition neighboring pixels together, which should derive different attention patterns. How is it addressed or is interleaved sharing mechanism adopted?

Secondly, the results in Table 3 show the reduction of FLOPs and Latency by using the shifted mechanism. How could this method reduce the computational footprint? Is it solely due to the removal of the masking and relative positional encoding used in SwinIR?

Finally, could you present the convergence of ELAN, compared with SwinIR and other CNN-based models? It can provide a more comprehensive comparison and better show the advantages of ELAN.

Thanks a lot.

BTW, the neat model architecture is definitely appealing.

How to get the inferred images on benchmark datasets?

As described in the title, how can I get the visual results on Set5? I didn`t find the corresponding code in the program.

FileNotFoundError: [Errno 2] No such file or directory: 'D:/SR_datasets\\div2k_cache\\div2k_hr\\rgb\\DIV2K_train_HR\\0001.npy'

div2k_cache是干嘛的

Positional Encoding

Hello,

Thanks for your great work, an efficient and neat transformer framework is essential for low-level vision I think.

According to your work, I tried discard the attention mask and positional encoding in SwinIR, the training and inference speed is largely improved, and the attention mask has slight effect on performance. However, the performance severely droped after removing RPE in original SwinIR.

Could you please give me some hints about how can we discard RPE (and attention mask) correctly? directly removing codes related to positional encoding or removing positional encoding should incorporate someother necessary elements?

Looking forward to your reply, thanks~

New Super-Resolution Benchmarks

Hello,

MSU Graphics & Media Lab Video Group has recently launched two new Super-Resolution Benchmarks.

Video Upscalers Benchmark: Quality Enhancement determines the best upscaling methods for increasing video resolution and improving visual quality.
Super-Resolution for Video Compression benchmark aims to test Super-Resolution methods on compressed videos and select the best model for each video codec standard.

If you are interested in participating, you can add your algorithm following the submission steps:

We would be grateful for your feedback on our work!

xindongzhang / elan Goto Github PK

elan's Introduction

Hi there 👋

elan's People

Contributors

Stargazers

Watchers

Forkers

elan's Issues

scale: 4

rgb_range: 255

colors: 3

m_elan: 24

c_elan: 60

n_share: 1

r_expand: 2

act_type: 'relu'

window_sizes: [4, 8, 16]

Recommend Projects

Recommend Topics

Recommend Org