Giter Site home page Giter Site logo

Comments (4)

alihassanijr avatar alihassanijr commented on July 27, 2024

Hello and thank you for your interest, and congratulations to you on your work, and its fascinating ImageNet performance.

We strictly follow the experiments of NAT, ConvNeXt and Swin to provide a comprehensive and clear comparison of these methods, and by doing so we have to leave out comparisons to many other great works, including, but not limited to PVT, Max-ViT, and the like, because they have very different architectures and/or experiment settings.

We therefore found it difficult to compare our models to VAN because 1. VAN seems to utilize a different hybrid architecture, with depth-wise convolutions built into the MLP block, along with a different configuration that just makes it hard to directly compare VAN variants to those of Swin, ConvNeXt, NAT, and DiNAT.
2. your LKA module is not, from my understanding, a form of dot-product self attention. Our focus in this paper is providing a dot product self attention (DPSA) pattern, as opposed to an alternative “attention” mechanism.
3.
I would also add that NA and DiNA are direct sliding window DPSA modules, and do not utilize (dilated) convolutions to extract weights.

I hope this answers your question, but please let me know if that is not the case.

from neighborhood-attention-transformer.

MenghaoGuo avatar MenghaoGuo commented on July 27, 2024

Thanks for the detailed reply, I understand the difference between VAN and DiNAT and agree with your viewpoint.

In my opinion, although they have some difference, the core idea, which adopts dilation operation to enlarge receptive field are similar.

I think the differences between DiNAT, VAN and MaxViT should be discussed in the related work chapter.

from neighborhood-attention-transformer.

alihassanijr avatar alihassanijr commented on July 27, 2024

Indeed the idea of dilation (and algorithme a trous) is not new, and it has been explored in many earlier works that go back even decades. We’ve included MaxViT in background section, but did not know about VAN at the time. Could you remind us where it’s been published? We would be happy to include more relevant works in the future.

from neighborhood-attention-transformer.

alihassanijr avatar alihassanijr commented on July 27, 2024

Closing this due to inactivity. If you still have questions feel free to open it back up.

from neighborhood-attention-transformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.