shi-labs / neighborhood-attention-transformer Goto Github PK

View Code? Open in Web Editor NEW

1.0K 16.0 82.0 25.97 MB

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

License: MIT License

Python 99.62% Shell 0.38%

neighborhood-attention pytorch

neighborhood-attention-transformer's Introduction

Neighborhood Attention Transformers

Powerful hierarchical vision transformers based on sliding window attention.

Neighborhood Attention (NA, local attention) was introduced in our original paper, NAT, and runs efficiently with our extension to PyTorch, NATTEN.

We recently introduced a new model, DiNAT, which extends NA by dilating neighborhoods (DiNA, sparse global attention, a.k.a. dilated local attention).

Combinations of NA/DiNA are capable of preserving locality, maintaining translational equivariance, expanding the receptive field exponentially, and capturing longer-range inter-dependencies, leading to significant performance boosts in downstream vision tasks, such as StyleNAT for image generation.

News

March 25, 2023

Neighborhood Attention Transformer was accepted to CVPR 2023!

November 18, 2022

NAT and DiNAT are now available through HuggingFace's transformers.
- NAT and DiNAT classification models are also available on the HuggingFace's Model Hub: NAT | DiNAT

November 11, 2022

New preprint: StyleNAT: Giving Each Head a New Perspective.
- Style-based GAN powered with Neighborhood Attention sets new SOTA on FFHQ-256 with a 2.05 FID.

October 8, 2022

NATTEN is now available as a pip package!
- You can now install NATTEN with pre-compiled wheels, and start using it in seconds.
- NATTEN will be maintained and developed as a separate project to support broader usage of sliding window attention, even beyond computer vision.

September 29, 2022

New preprint: Dilated Neighborhood Attention Transformer.

Dilated Neighborhood Attention 🔥

A new hierarchical vision transformer based on Neighborhood Attention (local attention) and Dilated Neighborhood Attention (sparse global attention) that enjoys significant performance boost in downstream tasks.

Check out the DiNAT README.

Neighborhood Attention Transformer

Our original paper, Neighborhood Attention Transformer (NAT), the first efficient sliding-window local attention.

How Neighborhood Attention works

Neighborhood Attention localizes the query token's (red) receptive field to its nearest neighboring tokens in the key-value pair (green). This is equivalent to dot-product self attention when the neighborhood size is identical to the image dimensions. Note that the edges are special (edge) cases.

Citation

@inproceedings{hassani2023neighborhood,
	title        = {Neighborhood Attention Transformer},
	author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
	booktitle    = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month        = {June},
	year         = {2023},
	pages        = {6185-6194}
}
@article{hassani2022dilated,
	title        = {Dilated Neighborhood Attention Transformer},
	author       = {Ali Hassani and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2209.15001},
	eprint       = {2209.15001},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}
@article{walton2022stylenat,
	title        = {StyleNAT: Giving Each Head a New Perspective},
	author       = {Steven Walton and Ali Hassani and Xingqian Xu and Zhangyang Wang and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2211.05770},
	eprint       = {2211.05770},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}

neighborhood-attention-transformer's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes senwang98 wliebtzy achen46 benjamesbabala wanboyang cv-ip angle011 hahaxun nochwysid zax130 wolfworld6 albertmundu whuhxb mldl nobelvictory chisyliu dreamos shelsenchen moqingxinai shaun95 dumpmemory alihassanijr williamlipro wanghuacan lv-tuan pugangqiang hzhang57 shunsunsun mohsen-azimi hamditarek mohammadrezanaderi4 gauenk jordanresearch ancientremember verigle scorpiopeng sabirdvd johndpope zebrajack ardywibowo aavbsouza lvzhp animesh cemberk shengzhang90 justcherie helonin jerry365 waelbou3 qihuacheng mehulk43 514flowey xushu-me dl-vit qhfan editablepublicai unanan gg-big-org ajunlonglive overbestfitting ozoooooh xiangzhang1015 amo5 bbtrbbt4dww4 alexmehta daydreamer2023 vidit98 ritchiealpha mramzy25 isabella232 hhhemmma paperwave adityakane2001 rclai1015 niupdj kingcrypto38 ningdejia hebychen carcruz97 monkeyzhy durrenlee

neighborhood-attention-transformer's Issues

The testing results of the whole dataset is empty

Integrating Na into mmdetection can run, but it keeps reporting errors,The testing results of the whole dataset is empty According to the solution of mmdetection, the learning rate is modified, and there is still no verification set result

Question on the compilation with different version of cuda or pytorch

Thanks for this work. I just wonder if the Cuda source code could be compiled with different versions of PyTorch or Cuda driver which you recommend? Right now, I have some problems with older pytorch and cuda driver for the compilation.

Tiny Bug in nattencuda.py

Great Work!

Well, I found a small bug in nattencuda.py.

Neighborhood-Attention-Transformer/natten/nattencuda.py

Line 117 in a9a7580

pad_r = max(0, self.window_size - W)

it should be self.kernel_size during padding if feature size is small than kernel_size

unable to get the pretrain model for classification

Hi, it seems that the link for the NAT-base classification model is garbled code, could you fix it or send it to me with [email protected]?

Thanks!

how to debug cuda kernel?

Hello author, I want to change nattenqkrpb_cuda_forward_kernel to achieve the desired functions, but I don't know much about CUDA Programming and I don't know how to debug CUDA kernel. The programming tools i am using is the visual studio and the libtorch on Windows 10. Although i can debug some part of .cu file, i can't debug the cuda kernel. So, i want to know what tools and methods do you use to debug cuda programming? Please give me some suggestions!

Is it necessary to write dedicated fp16 kernel ?

Thanks for your great work. You provide a very good template to start from for building attention extensions.

I notice that you use dedicated fp16 kernels, in which instructions like __hfma2 are used, e.g.:

Neighborhood-Attention-Transformer/natten/src/natten1dqkrpb_cuda_kernel.cu

Line 28 in f64cc00

__global__ void natten1dqkrpb_cuda_forward_kernel_fp16(

So, if I directly reuse the kernel currently used for fp32, and dispatch it with AT_DISPATCH_FLOATING_TYPES_AND_HALF as specified in https://discuss.pytorch.org/t/how-can-i-write-the-cuda-code-to-support-fp16-calculation/107181, will the speed slow down heavily?

Question about the dimension of head

Hi, I notice that right now the dimension of the head is fixed as 32 because of the constraint of the Cuda kernel. I wonder what if I change the dimension of the head to 64 since that figure in some codebases is set as 64.

When will the code be released?

Hi, I think your work has a superising performance. But the parallesim is a problem I'm concerned. When will you release your code?

Where is the Neighborhood Attention 2D?

I wonder where is the Neighborhood Attention 2D. I can't find the code

Legacy Torch implementation for Dilated Neighborhood Attention (DiNAT)

Hello,

Congrats for your work!

Would it be possible to release a legacy torch-based implementation for Dilated Neighborhood Attention (DiNAT)? This would make experimentation much easier, without having to set up the CUDA-based part of the natten library.

I think your method has a great potential for long document processing in NLP applications!

Thanks!

How to download the pre training of image1k

About the size of neighborhood

Hi
a neighborhood of size L × L ，Is L here equal to 3？

problem about the dim of Q of transformers QKV

location: Neighborhood-Attention-Transformer/classification/cuda/nattenav_cuda_kernel.cu line 75
description: The last dim of Q over KERNEL_SIZE^2 is not used in the Q*K.

maybe we can directly use the whole Q * its neighborhood of K; the complexity is only increased by dim of Q.

about the kernel size

Hi, I find that there is no explicit parameter specifying kernel size in natten.py. How does cuda code get it? By the shape of rpb? Thanks!

cpp version of nattenav layer

Hello! Do you plan to release c++ version of natten layer (without cuda)? For fast inference on cpu

A question about the rpb in LegacyNeighborhoodAttention2D

My question

Why is the same relative position index used for several positions in the middle?

Information

Neighborhood-Attention-Transformer/natten/nattentorch2d.py

Lines 43 to 57 in 1437787

    
               def apply_pb(self, attn, height, width): 
        
                   """ 
        
                   RPB implementation by @qwopqwop200 
        
                   https://github.com/qwopqwop200/Neighborhood-Attention-Transformer 
        
                   """ 
        
                   num_repeat_h = torch.ones(self.kernel_size,dtype=torch.long) 
        
                   num_repeat_w = torch.ones(self.kernel_size,dtype=torch.long) 
        
                   num_repeat_h[self.kernel_size//2] = height - (self.kernel_size-1) 
        
                   num_repeat_w[self.kernel_size//2] = width - (self.kernel_size-1) 
        
                   bias_hw = (self.idx_h.repeat_interleave(num_repeat_h).unsqueeze(-1) * (2*self.kernel_size-1)) + self.idx_w.repeat_interleave(num_repeat_w) 
        
                   bias_idx = bias_hw.unsqueeze(-1) + self.idx_k 
        
                   # Index flip 
        
                   # Our RPB indexing in the kernel is in a different order, so we flip these indices to ensure weights match. 
        
                   bias_idx = torch.flip(bias_idx.reshape(-1, self.kernel_size**2), [0]) 
        
                   return attn + self.rpb.flatten(1, 2)[:, bias_idx].reshape(self.num_heads, height * width, 1, self.kernel_size ** 2).transpose(0, 1)

A simple visualization:

The related code is copied from LegacyNeighborhoodAttention2D:

# %%
import matplotlib.pyplot as plt
import numpy as np
import torch

kernel_size = 3
height = width = 5
rpb_size = 2 * kernel_size - 1

# %%
fig, axes = plt.subplots(nrows=height, ncols=width, figsize=(8, 8))
shared_bg = np.zeros((height, width), dtype=np.uint8)

# %%
idx_h = torch.arange(0, kernel_size)
idx_w = torch.arange(0, kernel_size)
idx_k = ((idx_h.unsqueeze(-1) * rpb_size) + idx_w).reshape(-1)
print(idx_k.reshape(kernel_size, kernel_size))

# %%
num_repeat_h = torch.ones(kernel_size, dtype=torch.long)
num_repeat_w = torch.ones(kernel_size, dtype=torch.long)
num_repeat_h[kernel_size // 2] = height - (kernel_size - 1)
num_repeat_w[kernel_size // 2] = width - (kernel_size - 1)
bias_hw = (
    idx_h.repeat_interleave(num_repeat_h).unsqueeze(-1) * (2 * kernel_size - 1)
) + idx_w.repeat_interleave(num_repeat_w)
bias_idx = (bias_hw.unsqueeze(-1) + idx_k).reshape(-1, kernel_size ** 2)
print(bias_idx)
'''
tensor([[ 0,  1,  2,  5,  6,  7, 10, 11, 12],
        [ 1,  2,  3,  6,  7,  8, 11, 12, 13],
        [ 1,  2,  3,  6,  7,  8, 11, 12, 13],
        [ 1,  2,  3,  6,  7,  8, 11, 12, 13],
        [ 2,  3,  4,  7,  8,  9, 12, 13, 14],
        [ 5,  6,  7, 10, 11, 12, 15, 16, 17],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 7,  8,  9, 12, 13, 14, 17, 18, 19],
        [ 5,  6,  7, 10, 11, 12, 15, 16, 17],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 7,  8,  9, 12, 13, 14, 17, 18, 19],
        [ 5,  6,  7, 10, 11, 12, 15, 16, 17],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 7,  8,  9, 12, 13, 14, 17, 18, 19],
        [10, 11, 12, 15, 16, 17, 20, 21, 22],
        [11, 12, 13, 16, 17, 18, 21, 22, 23],
        [11, 12, 13, 16, 17, 18, 21, 22, 23],
        [11, 12, 13, 16, 17, 18, 21, 22, 23],
        [12, 13, 14, 17, 18, 19, 22, 23, 24]])
'''

# %%
for h in range(height):
    for w in range(width):
        new_bg = shared_bg.flatten().copy()
        new_bg[bias_idx[h * height + w]] = 255
        new_bg = new_bg.reshape(height, width)
        axes[h, w].imshow(new_bg)

# %%
plt.show()

This seems to be an idea that has been demonstrated by the existing method.

First of all, it is really a very interesting work.

This article has a very similar strategy to one I've read: Stand-Alone Self-Attention in Vision Models.

However, I did not find a relevant comparison in the paper, and the author should probably add more content to explain the difference.

nattenav_cuda.so: cannot open shared object file: No such file or directory

can you help me ?

module = importlib.util.module_from_spec(spec)

File "", line 583, in module_from_spec
File "", line 1043, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /home/zhulifu/.cache/torch_extensions/py37_cu113/nattenav_cuda/nattenav_cuda.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/media/DATA/zhulifu/bin/Neighborhood-Attention-Transformer-main/detection/cuda/natten.py", line 15, in
import nattenav_cuda
ModuleNotFoundError: No module named 'nattenav_cuda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "cuda/gradcheck.py", line 1, in
from natten import NATTENAVFunction, NATTENQKRPBFunction
File "/media/DATA/zhulifu/bin/Neighborhood-Attention-Transformer-main/detection/cuda/natten.py", line 19, in
raise RuntimeError("Could not load NATTEN CUDA extension. " +
RuntimeError: Could not load NATTEN CUDA extension. Please make sure your device has CUDA, the CUDA toolkit for PyTorch is installed, and that you've compiled NATTEN correctly.

pytorch 1.12.0 CUDA 11.6 Win10 VS2019 build error

C:\Program Files\Python\Python37\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

C:\Program Files\Python\Python37\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

2 errors detected in the compilation of "C:/pytorch/NAT/natten/src/nattenav_cuda_kernel.cu".
nattenav_cuda_kernel.cu
ninja: build stopped: subcommand failed.

add web demo/model to Huggingface

Hi, would you be interested in adding Neighborhood-Attention-Transformer to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

PE added on query and key

Hi. I see that current version only support PE as a bias weight added to attention map. I wonder if future version supports adding PE on query and key, which is another common way of PE. Thx again for your work and prompt reply!

failure occured in building at pytorch 1.11.0 / CUDA 11.3 / Win10 / VS2019 error

Thanks for your great job! But i was so sad since the failure occured in building >_<
The Ninja can not generated the file ‘nattenav_cuda.obj’. Please help.
It is the error information.

Relation to visual attention network (VAN).

Dear authors:

Congratulations on your excellent results on DiNAT.

However, I think the idea of this paper is similar as VAN, Code.

Both of them adopting dilation operation to enlarge receptive field and make the network achieve locality and global context. Besides, both of them adopt dilation operation for visual backbone and achieve a great performance on downstream task such as semantic segmentation.

Why not compare with it ?

Best,
Menghao

Is there an efficient way to apply NAtten to 1D data?

Hi, your work is really inspiring and interesting!
I am wondering if there is a simple way to apply your Natten block to 1D data with a shape of (B, C, L), where L is the sequence length?

CUDA extension error

Thank you for your good job, however there is an erro when I build CUDA extension.
torch = 1.11.0
python = 3.7
cuda = 10.1

Traceback (most recent call last):
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1746, in _run_ninja_build
    env=env)
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/nattencuda.py", line 20, in <module>
    'nattenav_cuda', [f'{this_dir}/src/nattenav_cuda.cpp', f'{this_dir}/src/nattenav_cuda_kernel.cu'], verbose=False)
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1156, in load
    keep_intermediates=keep_intermediates)
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1367, in _jit_compile
    is_standalone=is_standalone)
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1472, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'nattenav_cuda': [1/3] /home/yckj3822/anaconda3/envs/unsup3d/bin/x86_64-conda_cos6-linux-gnu-c++ -MMD -MF nattenav_cuda.o.d -DTORCH_EXTENSION_NAME=nattenav_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/TH -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/src/nattenav_cuda.cpp -o nattenav_cuda.o
[2/3] /usr/local/cuda-10.1/bin/nvcc  -ccbin /home/yckj3822/anaconda3/envs/unsup3d/bin/x86_64-conda_cos6-linux-gnu-cc -DTORCH_EXTENSION_NAME=nattenav_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/TH -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/src/nattenav_cuda_kernel.cu -o nattenav_cuda_kernel.cuda.o
FAILED: nattenav_cuda_kernel.cuda.o
/usr/local/cuda-10.1/bin/nvcc  -ccbin /home/yckj3822/anaconda3/envs/unsup3d/bin/x86_64-conda_cos6-linux-gnu-cc -DTORCH_EXTENSION_NAME=nattenav_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/TH -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/src/nattenav_cuda_kernel.cu -o nattenav_cuda_kernel.cuda.o
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc: In instantiation of 'static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]':
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:578:28:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:5033:20:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:5054:24:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:656:134:   required from 'std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:6676:95:   required from here
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:1067:16: error: cannot call member function 'void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]' without object
       __p->_M_set_sharable();
       ~~~~~~~~~^~
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc: In instantiation of 'static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]':
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:578:28:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:5033:20:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:5054:24:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:656:134:   required from 'std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:6681:95:   required from here
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:1067:16: error: cannot call member function 'void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]' without object
ninja: build stopped: subcommand failed.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/nattencuda.py", line 27, in <module>
    import nattenav_cuda
ModuleNotFoundError: No module named 'nattenav_cuda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "natten/gradcheck.py", line 11, in <module>
    from nattencuda import NATTENAVFunction, NATTENQKRPBFunction
  File "/home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/nattencuda.py", line 30, in <module>
    raise RuntimeError("Could not load NATTEN CUDA extension. " +
RuntimeError: Could not load NATTEN CUDA extension. Please make sure your device has CUDA, the CUDA toolkit for PyTorch is installed, and that you've compiled NATTEN correctly.

Salient map

Can you share your method or code which you used to draw the salient map? Thanks a lot!

how to build two NAT cuda

Hi, I hope to build two NAT cuda with different head dimension. But I find that the second one always overwrites the first one. How can I modify setup.py to distinguish them? Are there other codes to be changed? Thx!

About license

Thanks for the great job. Please add an MIT license to the repo. Many Thanks.

throughput of nat_tiny vs resnet50

hi~ I find that the same size models, nat_tiny and resnet50, have very different throughput on NVIDIA GeForce 2080Ti? How about the comparison in your machine?

(Plz don't care about the accuracy in the image, the input is not the ImageNet test set)

Comparison with zero-padding version.

Excellent work!
BTW, the proposed edge/corner neighborghood selection has stronger performance than the zero padding version is claimed in the paper, i wonder about the performance of the latter one, which is not mentioned in the paper?

Running NAT require Removing Visibility for Old (sm_52 or earlier) GPUs

Thank you for the great code. Initially, I had this error:

/home/gauenk/Documents/packages/nat/natten/src/natten1dav_cuda_kernel.cu(58): error: identifier "__hfma2" is undefined
          detected during instantiation of "void natten1dav_cuda_forward_kernel_fp16<KS,NS,scalar_t>(at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, int, int, int, int, int) [with KS=5, NS=2, scalar_t=c10::Half]

After debugging, I discovered the problem. I have a newer GPU (sm_75) and an older GPU (sm_52). If an old GPU is visible, the ninja compiler will add it's "sm" to the list. Notice the "sm_52" in the text below. This version is prior to half-precision hardware.

[1/3] c++ -MMD -MF natten1dav_cuda.o.d -DTORCH_EXTENSION_NAME=natten1dav_cuda -
DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -
DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/usr/local/cuda/include -isystem /home/gauenk/.local/lib/python3.8/site-
packages/torch/include -isystem /home/gauenk/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/gauenk/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/gauenk/.local/lib/python3.8/site-
packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 
-fPIC -std=c++14 -c /home/gauenk/Documents/packages/nat/natten/src/natten1dav_cuda.cpp -o natten1dav_cuda.o

The fix is easy. Just only allow your "good" GPUs (sm_75 or older) to be visible.

export CUDA_VISIBLE_DEVICES=0

I am leaving this comment in case this helps someone else.

abbreviation for rpb

Thanks for your awesome work.

Can you provide some clues for what is rpb and apply_pb, which does not appear in standard attention?

Thanks

Downsampling using Overlapping vs non-overlapping Convolutions

Hi,

Congratulations on the great work. I wanted to ask if you have done an ablation on using Overlapping vs Non-Overlapping convolutions for the downsampling layers in NAT. If so, may you share the results?

Thank you

how to get model object?

pip install git+https://github.com/rwightman/pytorch-image-models.git@9d6aad44f8fd32e89e5cca503efe3ada5071cc2a

got this:

ERROR: Command errored out with exit status 128: git clone --filter=blob:none -q https://github.com/rwightman/pytorch-image-models.git /tmp/pip-req-build-wldj6iv4 Check the logs for full command output.

and if I run :
pip install git+https://github.com/rwightman/pytorch-image-models.git

then run the train, got this:
RuntimeError: Unknown model (nat_tiny)

No position encoding? Could you explain some your thoughts?

How to train without the cuda extension?

I can not use the cuda extension, how can I train the model without it?
Thank you very much!

how to do Visualization

mmdet show the error NAT is not registered.I can't solve the problem.

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

 ** On entry to SGEMM  parameter number 10 had an illegal value
Traceback (most recent call last):
  File "check_flops.py", line 34, in <module>
    model(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 620, in forward
    x = self.forward_features(x)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 616, in forward_features
    return self.features(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 434, in forward
    new_features = layer(features)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 392, in forward
    bottleneck_output = self.bottleneck_fn(prev_features)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 349, in bottleneck_fn
    bottleneck_output = self.block(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 128, in forward
    x = self.attn(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/natten/nattencuda.py", line 121, in forward
    qkv = self.qkv(x).reshape(B, H, W, 3, self.num_heads, self.head_dim).permute(3, 0, 4, 1, 2, 5)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

My input shape is torch.Size([1, 256, 14, 14]). Why am I getting this error?

Any plans to release the Neighborhood Attention code using only Pytorch?

Hi. Thank you for your good work.
I implemented Neighborhood Attention using only pytorch. And as far as I know, Neighborhood Attention was implemented using only Pytorch at the beginning of this work. Personally, I would like to compare the implemented code with the original code.
Do you have any plans to release a Neighborhood Attention implementation using only Pytorch?

Questions about the algorithm speed.

Hi, thanks for your good work.
I notice that the paper does not compare the algorithm speed. I would like to know the speed comparison of NAT vs swin-Transformer, and CNN model.
Thanks！

Details of Training

Hi @alihassanijr , thanks for the great repository. For reproducing your results, how many nodes were used to train these models ? I see that config files are provided for each model, but wonder if any changes are needed if trained on multi-node.

CUDA out of memory

Your work is very good and we have improved our transformer model based on your ideas, but why CUDA out of memory at the same batch_size?In theory, the computation should be reduced and the batch_size should be able to be set to a larger size.

hello , I have already installed CUDA on requirement ，why have CUDA extension error ?

Traceback (most recent call last):
File "E:\executable_code\Neighborhood-Attention-Transformer-main\detection\cuda\natten.py", line 10, in
'nattenav_cuda', ['cuda/nattenav_cuda.cpp', 'cuda/nattenav_cuda_kernel.cu'], verbose=False)
File "C:\D_installation_packet\Anaconda\installion_package\envs\NAT\lib\site-packages\torch\utils\cpp_extension.py", line 1156, in load
keep_intermediates=keep_intermediates)
File "C:\D_installation_packet\Anaconda\installion_package\envs\NAT\lib\site-packages\torch\utils\cpp_extension.py", line 1334, in _jit_compile
is_standalone=is_standalone,
File "C:\D_installation_packet\Anaconda\installion_package\envs\NAT\lib\site-packages\torch\utils_cpp_extension_versioner.py", line 45, in bump_version_if_changed
hash_value = hash_source_files(hash_value, source_files)
File "C:\D_installation_packet\Anaconda\installion_package\envs\NAT\lib\site-packages\torch\utils_cpp_extension_versioner.py", line 15, in hash_source_files
with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: 'cuda/nattenav_cuda.cpp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\executable_code\Neighborhood-Attention-Transformer-main\detection\cuda\natten.py", line 15, in
import nattenav_cuda
ModuleNotFoundError: No module named 'nattenav_cuda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/executable_code/Neighborhood-Attention-Transformer-main/detection/cuda/gradcheck.py", line 1, in
from natten import NATTENAVFunction, NATTENQKRPBFunction
File "E:\executable_code\Neighborhood-Attention-Transformer-main\detection\cuda\natten.py", line 18, in
raise RuntimeError("Could not load NATTEN CUDA extension. " +
RuntimeError: Could not load NATTEN CUDA extension. Please make sure your device has CUDA, the CUDA toolkit for PyTorch is installed, and that you've compiled NATTEN correctly.
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3'

About the detail of NATTENQKRPBFunction and NATTENAVFunction

Hi, first congratulations on your great work! I am a newer to Transformer. I have successfully installed the NA version of cuda. But when I read your core code about NA, I can not understand the detailed codes of this two functions:NATTENQKRPBFunction and NATTENAVFunction. Could you explian ? Thank you so much.

Valid Padded Behavior

First off, great paper! I've been looking for transformers with some of the same locality inductive biases as CNNs. I was wondering if you would be able to add support for a valid padding based alternative though. So rather than handling the edges with the altered behavior, just allow reduced output size like in valid padded convolution layers. This is very important in the domain we work in where we need to be equivariant to the specific crop of the image and the dimensions of the input at inference time.

Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

Hi, thanks for your excellent work, I found this problem when I apply it on other tasks.
sys enviroment: 3090Ti，torch：1.10，CUDA-11.3
EPOCH 1
learning rate = [0.0001]
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [23,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.

Does forward always use fp16?

I notice that you add @custom_fwd(cast_inputs=torch.float16) above the forward() method, does this mean NATTEN operator always runs fp16 inference?

Neighborhood-Attention-Transformer/natten/nattencuda.py

Line 41 in f64cc00

@custom_fwd(cast_inputs=torch.float16)

Do you try larger resolution when evaluation or pretrained on ImageNet-22k to get higher performance?

Hello and thank you for your interest.

How can I implement a function like joint image filtering?

Hi, thanks for your great work and open-source code!

In this code, the input feature first computes weights by qk and aggregates itself in a local window. The qkv is different mappings for the same feature. I want to implement a function like joint image filtering. I have a reference feature and a target feature. I want to obtain weights by the reference feature, and then use the weights to aggregate the target feature. Namly, the qk is different mappings for the reference feature while v is the mapping for the target feature.

Please how can I achieve such functionality using existing code? Thanks!

	def apply_pb(self, attn, height, width):
	"""
	RPB implementation by @qwopqwop200
	https://github.com/qwopqwop200/Neighborhood-Attention-Transformer
	"""
	num_repeat_h = torch.ones(self.kernel_size,dtype=torch.long)
	num_repeat_w = torch.ones(self.kernel_size,dtype=torch.long)
	num_repeat_h[self.kernel_size//2] = height - (self.kernel_size-1)
	num_repeat_w[self.kernel_size//2] = width - (self.kernel_size-1)
	bias_hw = (self.idx_h.repeat_interleave(num_repeat_h).unsqueeze(-1) * (2*self.kernel_size-1)) + self.idx_w.repeat_interleave(num_repeat_w)
	bias_idx = bias_hw.unsqueeze(-1) + self.idx_k
	# Index flip
	# Our RPB indexing in the kernel is in a different order, so we flip these indices to ensure weights match.
	bias_idx = torch.flip(bias_idx.reshape(-1, self.kernel_size**2), [0])
	return attn + self.rpb.flatten(1, 2)[:, bias_idx].reshape(self.num_heads, height * width, 1, self.kernel_size ** 2).transpose(0, 1)

shi-labs / neighborhood-attention-transformer Goto Github PK

neighborhood-attention-transformer's Introduction

Neighborhood Attention Transformers

News

March 25, 2023

November 18, 2022

November 11, 2022

October 8, 2022

September 29, 2022

Dilated Neighborhood Attention 🔥

Neighborhood Attention Transformer

How Neighborhood Attention works

Citation

neighborhood-attention-transformer's People

Contributors

Stargazers

Watchers

Forkers

neighborhood-attention-transformer's Issues

My question

Information

Recommend Projects

Recommend Topics

Recommend Org