Comments (4)
Hello and thank you for your interest, and congratulations to you on your work, and its fascinating ImageNet performance.
We strictly follow the experiments of NAT, ConvNeXt and Swin to provide a comprehensive and clear comparison of these methods, and by doing so we have to leave out comparisons to many other great works, including, but not limited to PVT, Max-ViT, and the like, because they have very different architectures and/or experiment settings.
We therefore found it difficult to compare our models to VAN because 1. VAN seems to utilize a different hybrid architecture, with depth-wise convolutions built into the MLP block, along with a different configuration that just makes it hard to directly compare VAN variants to those of Swin, ConvNeXt, NAT, and DiNAT.
2. your LKA module is not, from my understanding, a form of dot-product self attention. Our focus in this paper is providing a dot product self attention (DPSA) pattern, as opposed to an alternative “attention” mechanism.
3.
I would also add that NA and DiNA are direct sliding window DPSA modules, and do not utilize (dilated) convolutions to extract weights.
I hope this answers your question, but please let me know if that is not the case.
from neighborhood-attention-transformer.
Thanks for the detailed reply, I understand the difference between VAN and DiNAT and agree with your viewpoint.
In my opinion, although they have some difference, the core idea, which adopts dilation operation to enlarge receptive field are similar.
I think the differences between DiNAT, VAN and MaxViT should be discussed in the related work chapter.
from neighborhood-attention-transformer.
Indeed the idea of dilation (and algorithme a trous) is not new, and it has been explored in many earlier works that go back even decades. We’ve included MaxViT in background section, but did not know about VAN at the time. Could you remind us where it’s been published? We would be happy to include more relevant works in the future.
from neighborhood-attention-transformer.
Closing this due to inactivity. If you still have questions feel free to open it back up.
from neighborhood-attention-transformer.
Related Issues (20)
- Can you release your training log of NAT? I mean, the summary.csv in output folder. HOT 3
- Issue with Neighborhood Attention Model (NAT) Pretrained Weights HOT 1
- Rule of thumb for dilations value? HOT 1
- Welcome update to OpenMMLab 2.0 HOT 1
- Is it possible to do upsampling using NAT ? HOT 2
- Where is natten.py
- May I ask whether the code of coco instance segmentation mask2former is dinat or NAT? HOT 1
- some problem during train HOT 9
- Is DiNAT code is runnable? HOT 2
- Is dectect model available? HOT 2
- freeze_at be set to 2 to freeze the pretrained weight downloaded from the official website? HOT 2
- About the receptive field of image pixel HOT 4
- NAT Tiny performance on ImageNet 1k HOT 7
- training from scratch with different size for height and width HOT 3
- Cannot repeat the results of Mask2Former+DiNAT-Large on ADE20K HOT 12
- mmdetection on COCO2017 not converge HOT 1
- How to calculate the number of params? HOT 1
- For 3D segmentation HOT 2
- instance segmentation mask2former + dinat HOT 1
- Some comparisons against Deformable Attention HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neighborhood-attention-transformer.