Giter Site home page Giter Site logo

shi-labs / neighborhood-attention-transformer Goto Github PK

View Code? Open in Web Editor NEW
1.0K 16.0 82.0 25.97 MB

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

License: MIT License

Python 99.62% Shell 0.38%
neighborhood-attention pytorch

neighborhood-attention-transformer's Introduction

Neighborhood Attention Transformers

PWC PWC PWC PWC PWC PWC PWC PWC

PWC PWC PWC

NAT-Intro NAT-Intro

Powerful hierarchical vision transformers based on sliding window attention.

Neighborhood Attention (NA, local attention) was introduced in our original paper, NAT, and runs efficiently with our extension to PyTorch, NATTEN.

We recently introduced a new model, DiNAT, which extends NA by dilating neighborhoods (DiNA, sparse global attention, a.k.a. dilated local attention).

Combinations of NA/DiNA are capable of preserving locality, maintaining translational equivariance, expanding the receptive field exponentially, and capturing longer-range inter-dependencies, leading to significant performance boosts in downstream vision tasks, such as StyleNAT for image generation.

News

March 25, 2023

  • Neighborhood Attention Transformer was accepted to CVPR 2023!

November 18, 2022

  • NAT and DiNAT are now available through HuggingFace's transformers.
    • NAT and DiNAT classification models are also available on the HuggingFace's Model Hub: NAT | DiNAT

November 11, 2022

October 8, 2022

  • NATTEN is now available as a pip package!
    • You can now install NATTEN with pre-compiled wheels, and start using it in seconds.
    • NATTEN will be maintained and developed as a separate project to support broader usage of sliding window attention, even beyond computer vision.

September 29, 2022

Dilated Neighborhood Attention 🔥

DiNAT-Abs DiNAT-Abs

A new hierarchical vision transformer based on Neighborhood Attention (local attention) and Dilated Neighborhood Attention (sparse global attention) that enjoys significant performance boost in downstream tasks.

Check out the DiNAT README.

Neighborhood Attention Transformer

NAT-Abs NAT-Abs

Our original paper, Neighborhood Attention Transformer (NAT), the first efficient sliding-window local attention.

How Neighborhood Attention works

Neighborhood Attention localizes the query token's (red) receptive field to its nearest neighboring tokens in the key-value pair (green). This is equivalent to dot-product self attention when the neighborhood size is identical to the image dimensions. Note that the edges are special (edge) cases.

720p_fast_dm 720p_fast_lm

Citation

@inproceedings{hassani2023neighborhood,
	title        = {Neighborhood Attention Transformer},
	author       = {Ali Hassani and Steven Walton and Jiachen Li and Shen Li and Humphrey Shi},
	booktitle    = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	month        = {June},
	year         = {2023},
	pages        = {6185-6194}
}
@article{hassani2022dilated,
	title        = {Dilated Neighborhood Attention Transformer},
	author       = {Ali Hassani and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2209.15001},
	eprint       = {2209.15001},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}
@article{walton2022stylenat,
	title        = {StyleNAT: Giving Each Head a New Perspective},
	author       = {Steven Walton and Ali Hassani and Xingqian Xu and Zhangyang Wang and Humphrey Shi},
	year         = 2022,
	url          = {https://arxiv.org/abs/2211.05770},
	eprint       = {2211.05770},
	archiveprefix = {arXiv},
	primaryclass = {cs.CV}
}

neighborhood-attention-transformer's People

Contributors

adityakane2001 avatar alexmehta avatar alihassanijr avatar honghuis avatar ozoooooh avatar stevenwalton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neighborhood-attention-transformer's Issues

The testing results of the whole dataset is empty

Integrating Na into mmdetection can run, but it keeps reporting errors,The testing results of the whole dataset is empty According to the solution of mmdetection, the learning rate is modified, and there is still no verification set result

how to debug cuda kernel?

Hello author, I want to change nattenqkrpb_cuda_forward_kernel to achieve the desired functions, but I don't know much about CUDA Programming and I don't know how to debug CUDA kernel. The programming tools i am using is the visual studio and the libtorch on Windows 10. Although i can debug some part of .cu file, i can't debug the cuda kernel. So, i want to know what tools and methods do you use to debug cuda programming? Please give me some suggestions!

Is it necessary to write dedicated fp16 kernel ?

Thanks for your great work. You provide a very good template to start from for building attention extensions.

I notice that you use dedicated fp16 kernels, in which instructions like __hfma2 are used, e.g.:

__global__ void natten1dqkrpb_cuda_forward_kernel_fp16(

So, if I directly reuse the kernel currently used for fp32, and dispatch it with AT_DISPATCH_FLOATING_TYPES_AND_HALF as specified in https://discuss.pytorch.org/t/how-can-i-write-the-cuda-code-to-support-fp16-calculation/107181, will the speed slow down heavily?

Question about the dimension of head

Hi, I notice that right now the dimension of the head is fixed as 32 because of the constraint of the Cuda kernel. I wonder what if I change the dimension of the head to 64 since that figure in some codebases is set as 64.

When will the code be released?

Hi, I think your work has a superising performance. But the parallesim is a problem I'm concerned. When will you release your code?

Legacy Torch implementation for Dilated Neighborhood Attention (DiNAT)

Hello,

Congrats for your work!

Would it be possible to release a legacy torch-based implementation for Dilated Neighborhood Attention (DiNAT)? This would make experimentation much easier, without having to set up the CUDA-based part of the natten library.

I think your method has a great potential for long document processing in NLP applications!

Thanks!

problem about the dim of Q of transformers QKV

location: Neighborhood-Attention-Transformer/classification/cuda/nattenav_cuda_kernel.cu line 75
description: The last dim of Q over KERNEL_SIZE^2 is not used in the Q*K.

maybe we can directly use the whole Q * its neighborhood of K; the complexity is only increased by dim of Q.

about the kernel size

Hi, I find that there is no explicit parameter specifying kernel size in natten.py. How does cuda code get it? By the shape of rpb? Thanks!

A question about the rpb in LegacyNeighborhoodAttention2D

My question

Why is the same relative position index used for several positions in the middle?

Information

def apply_pb(self, attn, height, width):
"""
RPB implementation by @qwopqwop200
https://github.com/qwopqwop200/Neighborhood-Attention-Transformer
"""
num_repeat_h = torch.ones(self.kernel_size,dtype=torch.long)
num_repeat_w = torch.ones(self.kernel_size,dtype=torch.long)
num_repeat_h[self.kernel_size//2] = height - (self.kernel_size-1)
num_repeat_w[self.kernel_size//2] = width - (self.kernel_size-1)
bias_hw = (self.idx_h.repeat_interleave(num_repeat_h).unsqueeze(-1) * (2*self.kernel_size-1)) + self.idx_w.repeat_interleave(num_repeat_w)
bias_idx = bias_hw.unsqueeze(-1) + self.idx_k
# Index flip
# Our RPB indexing in the kernel is in a different order, so we flip these indices to ensure weights match.
bias_idx = torch.flip(bias_idx.reshape(-1, self.kernel_size**2), [0])
return attn + self.rpb.flatten(1, 2)[:, bias_idx].reshape(self.num_heads, height * width, 1, self.kernel_size ** 2).transpose(0, 1)

A simple visualization:

rpb

The related code is copied from LegacyNeighborhoodAttention2D:

# %%
import matplotlib.pyplot as plt
import numpy as np
import torch

kernel_size = 3
height = width = 5
rpb_size = 2 * kernel_size - 1

# %%
fig, axes = plt.subplots(nrows=height, ncols=width, figsize=(8, 8))
shared_bg = np.zeros((height, width), dtype=np.uint8)

# %%
idx_h = torch.arange(0, kernel_size)
idx_w = torch.arange(0, kernel_size)
idx_k = ((idx_h.unsqueeze(-1) * rpb_size) + idx_w).reshape(-1)
print(idx_k.reshape(kernel_size, kernel_size))

# %%
num_repeat_h = torch.ones(kernel_size, dtype=torch.long)
num_repeat_w = torch.ones(kernel_size, dtype=torch.long)
num_repeat_h[kernel_size // 2] = height - (kernel_size - 1)
num_repeat_w[kernel_size // 2] = width - (kernel_size - 1)
bias_hw = (
    idx_h.repeat_interleave(num_repeat_h).unsqueeze(-1) * (2 * kernel_size - 1)
) + idx_w.repeat_interleave(num_repeat_w)
bias_idx = (bias_hw.unsqueeze(-1) + idx_k).reshape(-1, kernel_size ** 2)
print(bias_idx)
'''
tensor([[ 0,  1,  2,  5,  6,  7, 10, 11, 12],
        [ 1,  2,  3,  6,  7,  8, 11, 12, 13],
        [ 1,  2,  3,  6,  7,  8, 11, 12, 13],
        [ 1,  2,  3,  6,  7,  8, 11, 12, 13],
        [ 2,  3,  4,  7,  8,  9, 12, 13, 14],
        [ 5,  6,  7, 10, 11, 12, 15, 16, 17],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 7,  8,  9, 12, 13, 14, 17, 18, 19],
        [ 5,  6,  7, 10, 11, 12, 15, 16, 17],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 7,  8,  9, 12, 13, 14, 17, 18, 19],
        [ 5,  6,  7, 10, 11, 12, 15, 16, 17],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
        [ 7,  8,  9, 12, 13, 14, 17, 18, 19],
        [10, 11, 12, 15, 16, 17, 20, 21, 22],
        [11, 12, 13, 16, 17, 18, 21, 22, 23],
        [11, 12, 13, 16, 17, 18, 21, 22, 23],
        [11, 12, 13, 16, 17, 18, 21, 22, 23],
        [12, 13, 14, 17, 18, 19, 22, 23, 24]])
'''

# %%
for h in range(height):
    for w in range(width):
        new_bg = shared_bg.flatten().copy()
        new_bg[bias_idx[h * height + w]] = 255
        new_bg = new_bg.reshape(height, width)
        axes[h, w].imshow(new_bg)

# %%
plt.show()

nattenav_cuda.so: cannot open shared object file: No such file or directory

can you help me ?

module = importlib.util.module_from_spec(spec)

File "", line 583, in module_from_spec
File "", line 1043, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /home/zhulifu/.cache/torch_extensions/py37_cu113/nattenav_cuda/nattenav_cuda.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/media/DATA/zhulifu/bin/Neighborhood-Attention-Transformer-main/detection/cuda/natten.py", line 15, in
import nattenav_cuda
ModuleNotFoundError: No module named 'nattenav_cuda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "cuda/gradcheck.py", line 1, in
from natten import NATTENAVFunction, NATTENQKRPBFunction
File "/media/DATA/zhulifu/bin/Neighborhood-Attention-Transformer-main/detection/cuda/natten.py", line 19, in
raise RuntimeError("Could not load NATTEN CUDA extension. " +
RuntimeError: Could not load NATTEN CUDA extension. Please make sure your device has CUDA, the CUDA toolkit for PyTorch is installed, and that you've compiled NATTEN correctly.

pytorch 1.12.0 CUDA 11.6 Win10 VS2019 build error

C:\Program Files\Python\Python37\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

C:\Program Files\Python\Python37\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

2 errors detected in the compilation of "C:/pytorch/NAT/natten/src/nattenav_cuda_kernel.cu".
nattenav_cuda_kernel.cu
ninja: build stopped: subcommand failed.

add web demo/model to Huggingface

Hi, would you be interested in adding Neighborhood-Attention-Transformer to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

PE added on query and key

Hi. I see that current version only support PE as a bias weight added to attention map. I wonder if future version supports adding PE on query and key, which is another common way of PE. Thx again for your work and prompt reply!

Relation to visual attention network (VAN).

Dear authors:

Congratulations on your excellent results on DiNAT.

However, I think the idea of this paper is similar as VAN, Code.

Both of them adopting dilation operation to enlarge receptive field and make the network achieve locality and global context. Besides, both of them adopt dilation operation for visual backbone and achieve a great performance on downstream task such as semantic segmentation.

Why not compare with it ?

Best,
Menghao

CUDA extension error

Thank you for your good job, however there is an erro when I build CUDA extension.
torch = 1.11.0
python = 3.7
cuda = 10.1

Traceback (most recent call last):
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1746, in _run_ninja_build
    env=env)
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/nattencuda.py", line 20, in <module>
    'nattenav_cuda', [f'{this_dir}/src/nattenav_cuda.cpp', f'{this_dir}/src/nattenav_cuda_kernel.cu'], verbose=False)
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1156, in load
    keep_intermediates=keep_intermediates)
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1367, in _jit_compile
    is_standalone=is_standalone)
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1472, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'nattenav_cuda': [1/3] /home/yckj3822/anaconda3/envs/unsup3d/bin/x86_64-conda_cos6-linux-gnu-c++ -MMD -MF nattenav_cuda.o.d -DTORCH_EXTENSION_NAME=nattenav_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/TH -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/src/nattenav_cuda.cpp -o nattenav_cuda.o
[2/3] /usr/local/cuda-10.1/bin/nvcc  -ccbin /home/yckj3822/anaconda3/envs/unsup3d/bin/x86_64-conda_cos6-linux-gnu-cc -DTORCH_EXTENSION_NAME=nattenav_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/TH -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/src/nattenav_cuda_kernel.cu -o nattenav_cuda_kernel.cuda.o
FAILED: nattenav_cuda_kernel.cuda.o
/usr/local/cuda-10.1/bin/nvcc  -ccbin /home/yckj3822/anaconda3/envs/unsup3d/bin/x86_64-conda_cos6-linux-gnu-cc -DTORCH_EXTENSION_NAME=nattenav_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/TH -isystem /home/yckj3822/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/yckj3822/anaconda3/envs/unsup3d/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/src/nattenav_cuda_kernel.cu -o nattenav_cuda_kernel.cuda.o
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc: In instantiation of 'static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]':
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:578:28:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:5033:20:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:5054:24:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:656:134:   required from 'std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:6676:95:   required from here
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:1067:16: error: cannot call member function 'void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]' without object
       __p->_M_set_sharable();
       ~~~~~~~~~^~
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc: In instantiation of 'static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]':
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:578:28:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:5033:20:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:5054:24:   required from 'static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:656:134:   required from 'std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]'
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.h:6681:95:   required from here
/home/yckj3822/anaconda3/envs/unsup3d/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/basic_string.tcc:1067:16: error: cannot call member function 'void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]' without object
ninja: build stopped: subcommand failed.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/nattencuda.py", line 27, in <module>
    import nattenav_cuda
ModuleNotFoundError: No module named 'nattenav_cuda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "natten/gradcheck.py", line 11, in <module>
    from nattencuda import NATTENAVFunction, NATTENQKRPBFunction
  File "/home/yckj3822/GAN/Neighborhood-Attention-Transformer-main/natten/nattencuda.py", line 30, in <module>
    raise RuntimeError("Could not load NATTEN CUDA extension. " +
RuntimeError: Could not load NATTEN CUDA extension. Please make sure your device has CUDA, the CUDA toolkit for PyTorch is installed, and that you've compiled NATTEN correctly.

Salient map

Can you share your method or code which you used to draw the salient map? Thanks a lot!

how to build two NAT cuda

Hi, I hope to build two NAT cuda with different head dimension. But I find that the second one always overwrites the first one. How can I modify setup.py to distinguish them? Are there other codes to be changed? Thx!

About license

Thanks for the great job. Please add an MIT license to the repo. Many Thanks.

throughput of nat_tiny vs resnet50

hi~ I find that the same size models, nat_tiny and resnet50, have very different throughput on NVIDIA GeForce 2080Ti? How about the comparison in your machine?
time_nat_tiny
time_resnet50
(Plz don't care about the accuracy in the image, the input is not the ImageNet test set)

Comparison with zero-padding version.

Excellent work!
BTW, the proposed edge/corner neighborghood selection has stronger performance than the zero padding version is claimed in the paper, i wonder about the performance of the latter one, which is not mentioned in the paper?

Running NAT require Removing Visibility for Old (sm_52 or earlier) GPUs

Thank you for the great code. Initially, I had this error:

/home/gauenk/Documents/packages/nat/natten/src/natten1dav_cuda_kernel.cu(58): error: identifier "__hfma2" is undefined
          detected during instantiation of "void natten1dav_cuda_forward_kernel_fp16<KS,NS,scalar_t>(at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, at::PackedTensorAccessor32<scalar_t, 4UL, at::DefaultPtrTraits>, int, int, int, int, int) [with KS=5, NS=2, scalar_t=c10::Half]

After debugging, I discovered the problem. I have a newer GPU (sm_75) and an older GPU (sm_52). If an old GPU is visible, the ninja compiler will add it's "sm" to the list. Notice the "sm_52" in the text below. This version is prior to half-precision hardware.

[1/3] c++ -MMD -MF natten1dav_cuda.o.d -DTORCH_EXTENSION_NAME=natten1dav_cuda -
DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -
DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -I/usr/local/cuda/include -isystem /home/gauenk/.local/lib/python3.8/site-
packages/torch/include -isystem /home/gauenk/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem 
/home/gauenk/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/gauenk/.local/lib/python3.8/site-
packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 
-fPIC -std=c++14 -c /home/gauenk/Documents/packages/nat/natten/src/natten1dav_cuda.cpp -o natten1dav_cuda.o

The fix is easy. Just only allow your "good" GPUs (sm_75 or older) to be visible.

export CUDA_VISIBLE_DEVICES=0

I am leaving this comment in case this helps someone else.

abbreviation for rpb

Thanks for your awesome work.

Can you provide some clues for what is rpb and apply_pb, which does not appear in standard attention?

Thanks

how to get model object?

pip install git+https://github.com/rwightman/pytorch-image-models.git@9d6aad44f8fd32e89e5cca503efe3ada5071cc2a

got this:

ERROR: Command errored out with exit status 128: git clone --filter=blob:none -q https://github.com/rwightman/pytorch-image-models.git /tmp/pip-req-build-wldj6iv4 Check the logs for full command output.

and if I run :
pip install git+https://github.com/rwightman/pytorch-image-models.git

then run the train, got this:
RuntimeError: Unknown model (nat_tiny)

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

 ** On entry to SGEMM  parameter number 10 had an illegal value
Traceback (most recent call last):
  File "check_flops.py", line 34, in <module>
    model(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 620, in forward
    x = self.forward_features(x)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 616, in forward_features
    return self.features(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 434, in forward
    new_features = layer(features)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 392, in forward
    bottleneck_output = self.bottleneck_fn(prev_features)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 349, in bottleneck_fn
    bottleneck_output = self.block(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/models/natnext510_511.py", line 128, in forward
    x = self.attn(x)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/tiger/convnext/natten/nattencuda.py", line 121, in forward
    qkv = self.qkv(x).reshape(B, H, W, 3, self.num_heads, self.head_dim).permute(3, 0, 4, 1, 2, 5)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

My input shape is torch.Size([1, 256, 14, 14]). Why am I getting this error?

Any plans to release the Neighborhood Attention code using only Pytorch?

Hi. Thank you for your good work.
I implemented Neighborhood Attention using only pytorch. And as far as I know, Neighborhood Attention was implemented using only Pytorch at the beginning of this work. Personally, I would like to compare the implemented code with the original code.
Do you have any plans to release a Neighborhood Attention implementation using only Pytorch?

Questions about the algorithm speed.

Hi, thanks for your good work.
I notice that the paper does not compare the algorithm speed. I would like to know the speed comparison of NAT vs swin-Transformer, and CNN model.
Thanks!

Details of Training

Hi @alihassanijr , thanks for the great repository. For reproducing your results, how many nodes were used to train these models ? I see that config files are provided for each model, but wonder if any changes are needed if trained on multi-node.

CUDA out of memory

Your work is very good and we have improved our transformer model based on your ideas, but why CUDA out of memory at the same batch_size?In theory, the computation should be reduced and the batch_size should be able to be set to a larger size.

hello , I have already installed CUDA on requirement ,why have CUDA extension error ?

Traceback (most recent call last):
File "E:\executable_code\Neighborhood-Attention-Transformer-main\detection\cuda\natten.py", line 10, in
'nattenav_cuda', ['cuda/nattenav_cuda.cpp', 'cuda/nattenav_cuda_kernel.cu'], verbose=False)
File "C:\D_installation_packet\Anaconda\installion_package\envs\NAT\lib\site-packages\torch\utils\cpp_extension.py", line 1156, in load
keep_intermediates=keep_intermediates)
File "C:\D_installation_packet\Anaconda\installion_package\envs\NAT\lib\site-packages\torch\utils\cpp_extension.py", line 1334, in _jit_compile
is_standalone=is_standalone,
File "C:\D_installation_packet\Anaconda\installion_package\envs\NAT\lib\site-packages\torch\utils_cpp_extension_versioner.py", line 45, in bump_version_if_changed
hash_value = hash_source_files(hash_value, source_files)
File "C:\D_installation_packet\Anaconda\installion_package\envs\NAT\lib\site-packages\torch\utils_cpp_extension_versioner.py", line 15, in hash_source_files
with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: 'cuda/nattenav_cuda.cpp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\executable_code\Neighborhood-Attention-Transformer-main\detection\cuda\natten.py", line 15, in
import nattenav_cuda
ModuleNotFoundError: No module named 'nattenav_cuda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/executable_code/Neighborhood-Attention-Transformer-main/detection/cuda/gradcheck.py", line 1, in
from natten import NATTENAVFunction, NATTENQKRPBFunction
File "E:\executable_code\Neighborhood-Attention-Transformer-main\detection\cuda\natten.py", line 18, in
raise RuntimeError("Could not load NATTEN CUDA extension. " +
RuntimeError: Could not load NATTEN CUDA extension. Please make sure your device has CUDA, the CUDA toolkit for PyTorch is installed, and that you've compiled NATTEN correctly.
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3'

About the detail of NATTENQKRPBFunction and NATTENAVFunction

Hi, first congratulations on your great work! I am a newer to Transformer. I have successfully installed the NA version of cuda. But when I read your core code about NA, I can not understand the detailed codes of this two functions:NATTENQKRPBFunction and NATTENAVFunction. Could you explian ? Thank you so much.

Valid Padded Behavior

First off, great paper! I've been looking for transformers with some of the same locality inductive biases as CNNs. I was wondering if you would be able to add support for a valid padding based alternative though. So rather than handling the edges with the altered behavior, just allow reduced output size like in valid padded convolution layers. This is very important in the domain we work in where we need to be equivariant to the specific crop of the image and the dimensions of the input at inference time.

Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.

Hi, thanks for your excellent work, I found this problem when I apply it on other tasks.
sys enviroment: 3090Ti,torch:1.10,CUDA-11.3
EPOCH 1
learning rate = [0.0001]
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [23,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [45,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.

How can I implement a function like joint image filtering?

Hi, thanks for your great work and open-source code!

In this code, the input feature first computes weights by qk and aggregates itself in a local window. The qkv is different mappings for the same feature. I want to implement a function like joint image filtering. I have a reference feature and a target feature. I want to obtain weights by the reference feature, and then use the weights to aggregate the target feature. Namly, the qk is different mappings for the reference feature while v is the mapping for the target feature.

Please how can I achieve such functionality using existing code? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.