Comments (10)
Hi @diaodeyi ,
In the present code, get_cls_model function is called by the registry.py. You can use build_model in build.py to call the model. Otherwise, you can remove the registry and directly call get_cls_model function. Both way should work
Good luck..
from cvt.
This is after conv_proj_q, conv_proj_k and conv_proj_v. But I'm not sure why the authors still use the pointwise projections after the conv projections.
from cvt.
@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.
from cvt.
I want to know the code how to call the get_cls_model function in the cls_cvt.py
from cvt.
By the way , theself.proj = nn.Linear(dim_out, dim_out)
Means FFN only projection with same dimension?
from cvt.
@diaodeyi It's the single linear layer (with the same in/out dimension) right after the attention calculation. The FFN in this code is class MLP (line 53).
from cvt.
Thanks, there are so many linear projections that aren't be mentioned by paper.
from cvt.
@diaodeyi Yes. I think they have left them out with the presumption that the reader has a prior good understanding of basic transformer architecture.
from cvt.
@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.
No, I think the proj_q\k\v are exactly the things the paper does not mention.
from cvt.
@askerlee, I think it is part of depthwise separable convolutions. Depthwise convolutions followed by pointwise projections.
No, I think the proj_q\k\v are exactly the things the paper does not mention.
Hi, the seperable depth conv contains two parts: depth-wise conv and point-wise conv. The author implemented the point-wise conv via the linear layer, maybe because it's convenience for the ablation study. The only difference between them is the bias term.
from cvt.
Related Issues (20)
- About Cls_cvt.py HOT 2
- what should I change if I want to use a data set with images of 750* 184 HOT 1
- 22k model
- About the pretrained model HOT 4
- Question about the class token
- η²ΎεΊ¦
- As for Cifar10 or Cifar100
- How to calculate the flops of the model?
- recommended torch version may be wrong
- modelzoo not available!!!!! HOT 1
- Recommend change the code
- What's the accuracy of CvT-13 without pre-trained on CIFAR10
- model release
- How can I use CvT architecture to perform Semantic Segmentation?
- NAN loss HOT 4
- bugs when eval
- The config of W24 of finetune on 1k
- pretrained models are not available.
- the flops computed by this code don't match that in the paper
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cvt.