Dear authors: Congratulations on your excellent results on DiNAT.</p

Relation to visual attention network (VAN). about neighborhood-attention-transformer HOT 4 CLOSED

shi-labs commented on July 27, 2024 1

Relation to visual attention network (VAN).

from neighborhood-attention-transformer.

Comments (4)

alihassanijr commented on July 27, 2024

Hello and thank you for your interest, and congratulations to you on your work, and its fascinating ImageNet performance.

We strictly follow the experiments of NAT, ConvNeXt and Swin to provide a comprehensive and clear comparison of these methods, and by doing so we have to leave out comparisons to many other great works, including, but not limited to PVT, Max-ViT, and the like, because they have very different architectures and/or experiment settings.

We therefore found it difficult to compare our models to VAN because 1. VAN seems to utilize a different hybrid architecture, with depth-wise convolutions built into the MLP block, along with a different configuration that just makes it hard to directly compare VAN variants to those of Swin, ConvNeXt, NAT, and DiNAT.
2. your LKA module is not, from my understanding, a form of dot-product self attention. Our focus in this paper is providing a dot product self attention (DPSA) pattern, as opposed to an alternative “attention” mechanism.
3.
I would also add that NA and DiNA are direct sliding window DPSA modules, and do not utilize (dilated) convolutions to extract weights.

I hope this answers your question, but please let me know if that is not the case.

from neighborhood-attention-transformer.

MenghaoGuo commented on July 27, 2024

Thanks for the detailed reply, I understand the difference between VAN and DiNAT and agree with your viewpoint.

In my opinion, although they have some difference, the core idea, which adopts dilation operation to enlarge receptive field are similar.

I think the differences between DiNAT, VAN and MaxViT should be discussed in the related work chapter.

from neighborhood-attention-transformer.

alihassanijr commented on July 27, 2024

Indeed the idea of dilation (and algorithme a trous) is not new, and it has been explored in many earlier works that go back even decades. We’ve included MaxViT in background section, but did not know about VAN at the time. Could you remind us where it’s been published? We would be happy to include more relevant works in the future.

from neighborhood-attention-transformer.

alihassanijr commented on July 27, 2024

Closing this due to inactivity. If you still have questions feel free to open it back up.

from neighborhood-attention-transformer.

Recommend Projects

Relation to visual attention network (VAN). about neighborhood-attention-transformer HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent