Giter Site home page Giter Site logo

leaplabthu / dat Goto Github PK

View Code? Open in Web Editor NEW
702.0 702.0 67.0 17.42 MB

Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention

Home Page: https://arxiv.org/abs/2309.01430

License: Apache License 2.0

Python 99.23% Shell 0.77%
deep-learning deformable-attention image-classification pytorch vision-transformer

dat's People

Contributors

leaplabthu avatar panxuran avatar vladimir2506 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dat's Issues

Changing model input size from 384 -> 1024

I'd like to know what layers to change if I wanted to input an image of size 1024 instead of 384. I'd also want to know if there are any additional concerns about using this model for a much bigger size input. Thanks.

How to run the model

if name == "main":
x = torch.ones((2, 3, 224, 224))
model = DAT()
y = model(x)
print(y.shape)

I tried to run the model with the above code to learn its details, but the following error occurred.

File "Model\DAT\DAT.py", line 232, in
model = DAT()
File "Model\DAT\DAT.py", line 134, in init
use_dwc_mlps[i])
File "Model\DAT\DAT.py", line 59, in init
no_off, fixed_pe, stage_idx)
File "Model\DAT\DAT_Block.py", line 201, in init
nn.Conv2d(self.n_group_channels, self.n_group_channels, kk, stride, kk // 2, groups=self.n_group_channels),
File "env\lib\site-packages\torch\nn\modules\conv.py", line 446, in init
False, _pair(0), groups, bias, padding_mode, **factory_kwargs)
File "env\lib\site-packages\torch\nn\modules\conv.py", line 132, in init
(out_channels, in_channels // groups, *kernel_size), **factory_kwargs))
RuntimeError: Trying to create tensor with negative dimension -96: [-96, 1, 9, 9]

Process finished with exit code 1

Deformable Attention Journal Paper not referenced

Dear Authors
I had way back in May 2021 - already had published a journal article on Deformable Attention
https://pubmed.ncbi.nlm.nih.gov/34022421/

It was even published much earlier in MedArxiv in August 2020
https://www.medrxiv.org/content/10.1101/2020.08.25.20181834v1

I would have expected that you atleast cite my paper in your journal.

I am surprised that you had not done thorough search of prior art and report all prior work in this space of Deformable Attention (and also the reviewers of CVPR also did not thoroughly check if good prior art was done.

Can you please cite my paper in any further publications of Deformable Attention?

Best Regards
Kumar

some questions about the reference points and offset network

Really nice work! I have some questions about the code. I see your implementation about the conv_offset and I find you use stride of 1 so the reference points is actually the whole map. But the paper says there is a stride of r. If there is no stride larger than 1, the complexity is the same as standard MHSA even larger! I think there maybe something wrong here.

Face negative dimension issue when running on CIFAR10

Hi, I am Lukas Wang, a master's student from Columbia. I am planning to review cutting-edged VIT-based models on medium-size datasets and found your work really interesting! I was trying to run the code using CIFAT10 dataset for testing but the following error came out.
RuntimeError: Trying to create tensor with negative dimension -96: [-96, 1, 9, 9]

I have noticed that the environment variable groups is set to groups=[-1, -1, 3, 6] as default in DAT model while the operation for DAttentionBaseline in dat_block.py will compute a negative value for first two stages. Could you please check out this issue? Really appreciate your help :)!

Why set the reference point coordinates like this

Why set the reference point coordinates like this

    def _get_ref_points(self, H_key, W_key, B, dtype, device):

        ref_y, ref_x = torch.meshgrid(
            torch.linspace(0.5, H_key - 0.5, H_key, dtype=dtype, device=device),
            torch.linspace(0.5, W_key - 0.5, W_key, dtype=dtype, device=device)
        )
        ref = torch.stack((ref_y, ref_x), -1)
        ref[..., 1].div_(W_key).mul_(2).sub_(1)
        ref[..., 0].div_(H_key).mul_(2).sub_(1)
        ref = ref[None, ...].expand(B * self.n_groups, -1, -1, -1)  # B * g H W 2

        return ref

i don't understand this one ref[..., 1].div_(W_key).mul_(2).sub_(1) ,
specially why use .mul_(2).sub_(1)?

The computational cost of deformabel attention

Hi, thanks for your excellent work.
I notice that the number of the sampled keys/values is the same as the querys. Therefore, the computational cost of deformable attention is the same as global attention, is it right? So I'm curious why don't you use a global self-attention at the last two stages?

固定输入图片尺寸

你好,非常引人注目的工作,我注意到position embeddings采用relative方法的话,需要固定测试图片的大小,而采用dwconv方法可能会占用非常大的显存,不知道作者有没有处理动态输入图像尺寸的方法。

Controlling the number of keys per query

In the appendix of DAT vs D-DETR, you mentioned changing the number of keys in Stage 3 and Stage 4. I was wondering where in the code, can you change for that? Thank you.

About Low Accuracy

Hi, thanks for your excellent work.But when I use your DAT model and pre trained weights for image segmentation tasks, the effect is not ideal. I do this: take out the features of each layer and then recover the image size through simple deconvolution up sampling and skip connection operations. The code is as follows:
image

If possible, please tell me where the error is. I hope you can publish the segmentation model as soon as possible and look forward to your reply. Thank you

Error: Trying to create tensor with negative dimension -96: [-96, 1, 9, 9]

@Vladimir2506 @Panxuran @LeapLabTHU

I tried to use your basic DAT module, iam getting error below:

Trying to create tensor with negative dimension -96: [-96, 1, 9, 9].

It is because of

nn.Conv2d(self.n_group_channels, self.n_group_channels, kk, stride, kk//2, groups=self.n_group_channels),
which is because of
https://github.com/LeapLabTHU/DAT/issues/new?permalink=https%3A%2F%2Fgithub.com%2FLeapLabTHU%2FDAT%2Fblob%2F1029c76003b346ddcc80de6293ae9c7e2b6c3565%2Fmodels%2Fdat.py%23L98

Kindly help

采样点个数问题

您好啊,我调试的代码是DAT不是DAT++,
在类DAttentionBaseline的forward函数中有这样一段内容:in dat_blocks.py line 222

q = self.proj_q(x)
q_off = einops.rearrange(q, 'b (g c) h w -> (b g) c h w', g=self.n_groups, c=self.n_group_channels)
offset = self.conv_offset(q_off) # B * g 2 Hg Wg
Hk, Wk = offset.size(2), offset.size(3)
n_sample = Hk * Wk

这一段代码我有疑问,self.conv_offset是这样初始化的:

self.conv_offset = nn.Sequential(
            nn.Conv2d(self.n_group_channels, self.n_group_channels, kk, stride, kk//2, groups=self.n_group_channels),
            LayerNormProxy(self.n_group_channels),
            nn.GELU(),
            nn.Conv2d(self.n_group_channels, 2, 1, 1, 0, bias=False)
        )

这意味这offset的高宽只受stride的影响而和offset_range_factor无关,这是否和你在论文中阐述的内容有冲突;
其次这里的stride在config文件中是1,这也就意味着采样点的是HW而不是HW/r**2.

想请教一下您是我有地方没有考虑到吗。我非常期待您的答复,谢谢!

训练问题

不使用命令行的情况下,该如何通过修改运行配置来调试代码呢?而且可以迁移到其他跟小的数据集上训练吗?

displacement problem

hello sir,
why the displacement is calculated as below ?
displacement = (q_grid.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups, n_sample, 2).unsqueeze(1)).mul(0.5)
why not
displacement = (pos.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups, n_sample, 2).unsqueeze(1))

what does -1 mean in strides and groups variable

Hi, thanks for your nice work. I'm very confused with the negative values (-1) in strides and groups. For example, strides=[-1,-1,1,1] or groups=[-1.-1.3,6]. This also triggers errors when creating weight tensors due to that negative dimensions are not allowed in tensors:

RuntimeError: Trying to create tensor with negative dimension -96: [-96, 1, 9, 9]

It would be appreciated if you can explain what does -1 mean here.

Thanks in advance!

About the range of offset

Thank you for your work!
I have a question, whether the range of offset is related to the kernel size of offset_conv, for example, whether the offset_conv of 3*3 can only consider the offset of only one point around each pixel in that feature map layer?

训练时计算维度出错

在使用DAT时出现了如下错误,参数是按照config设置的,与config一致,每一种方案都试过了,都会报错,麻烦帮忙看下
File "/root/BasicSR-master/basicsr/archs/discriminator_arch.py", line 202, in forward
x_total = einops.rearrange(x, 'b c (r1 h1) (r2 w1) -> b (r1 r2) (h1 w1) c', h1=self.window_size[0], w1=self.window_size[1]) # B x Nr x Ws x C
File "/root/miniconda3/lib/python3.8/site-packages/einops/einops.py", line 487, in rearrange
return reduce(tensor, pattern, reduction='rearrange', **axes_lengths)
File "/root/miniconda3/lib/python3.8/site-packages/einops/einops.py", line 418, in reduce
raise EinopsError(message + '\n {}'.format(e))
einops.EinopsError: Error while processing rearrange-reduction pattern "b c (r1 h1) (r2 w1) -> b (r1 r2) (h1 w1) c".
Input tensor shape: torch.Size([128, 96, 32, 32]). Additional info: {'h1': 7, 'w1': 7}.
Shape mismatch, can't divide axis of length 32 in chunks of 7

an unused parameters for class TransformerStage:"ns_per_pt" and "sr_ratio"

Hello,I found an unused parameters "ns_per_pt" and "sr_ratio",I was wondering what is it used for?
Thank you very much!

class TransformerStage(nn.Module):
def init(self, fmap_size, window_size, ns_per_pt,
dim_in, dim_embed, depths, stage_spec, n_groups,
use_pe, sr_ratio,
heads, stride, offset_range_factor, stage_idx,
dwc_pe, no_off, fixed_pe,
attn_drop, proj_drop, expansion, drop, drop_path_rate, use_dwc_mlp):

unalignment of classification result on imageNet

thanks for the contribution .
I trained the 224 x 224 imageNet classification model, while the acc has a gap between mine and yours. Hope there could be the pretrained model and related setting.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.