kaiyuyue / cgnl-network.pytorch Goto Github PK

Compact Generalized Non-local Network (NeurIPS 2018)

Home Page: https://arxiv.org/abs/1810.13125

License: MIT License

Python 75.27% C++ 17.12% Cuda 7.62%

cgnl-network nl-network compact-generalized-non-local-block non-local-block computer-vision pytorch caffe attention fast-attention

cgnl-network.pytorch's People

Stargazers

Watchers

Forkers

xmczh003 hyzcn ionvision liuyuemaicha pkurainbow trendingtechnology wrccrwx lingeo fendaq wutianyirosun hzhang57 salt-fly dlwbm123 nemonameless queenie88 winwinjjiang pplntech gaoyuchris 845968074 wangkanger lifunudt awangenh zengxi77 thanhhoang283 asaran powder21 shiyongde moonmyth wwhappylife maodong2056 cbanyungong neverstoplearn collierma poonono ridang xdweiustc zzwei1 horanyinora vegetablebird5 mymuli ljm198134

cgnl-network.pytorch's Issues

CGNL for Image Segmentation

How do we use CGNL in image segmentation tasks?

Hi! Thanks for your code and paper. I have several question about this work.
(a). In your paper, the results are a little lower than this repo. why? (about 1%)
(b). In your paper, you also insert 5 NL blocks in resnet, what are the specific positions of these blocks?
(c). Have you inserted 5 NL/GCNL blocks when training on ImageNet?
Many thanks !

Normalization

Hi,
I find your work very interesting. However I have two questions regarding normalization:

In the original non-local neural networks work, the product of phi and theta is normalized BEFORE it multiplies g to produce the output (in their work it is done using a softmax layer). I do not see any such normalization in your work - why?
Your Taylor expansion is based on the assumption that both theta and phi are of unit L2 norm. I do not see this enforced in your code - what have I missed?

Thanks,

Qusetion about SpatialCGNL dot production kernel

Thank you for your work.

There is a qusetion about SpatialCGNL dot production kernel.

In your code，the calculation process of the dot production kernel：p = p.view(b, 1, c * h * w), g = g.view(b, c * h * w, 1), att = torch.bmm(p, g), the shape of att is (b * 1 * 1) ，what is the meaning of the shape of att?

Reproduce mini-kinetics

I implemented I3D and Non-local (NL) based on the code release by Wang et al. (2018); however, CGNL got slightly lower accuracy than NL for mini-Kinetics using 8-frame and 32-frame 5-block ResNet-50 models.
Because NL is reproduced, I wonder if there is any detail for CGNL to get the scores in the paper. Is there any different hyperparameter for CGNL?

Instantiating SpatialCGNL with the default parameters.

Hi! am I doing something wrong? I tried to instantiate resnet.SpatialCGNL() module feeding just required parameters inplanes and planes however it gives me an error.
Working Directory: cgnl-network.pytorch/model
Please find the minimal reproducing code:

from resnet import SpatialCGNL
SpatialCGNL(20, 10)

Error Traceback:

 File "/cgnl-network.pytorch/model/resnet.py", line 138, in __init__
    self.z = nn.Conv2d(planes, inplanes, kernel_size=1, stride=1,
  File "python3.8/site-packages/torch/nn/modules/conv.py", line 340, in __init__
    super(Conv2d, self).__init__(
  File "python3.8/site-packages/torch/nn/modules/conv.py", line 24, in __init__
    if in_channels % groups != 0:
TypeError: unsupported operand type(s) for %: 'int' and 'NoneType'

I believe this is because the parameter groups in resnet.SpatialCGNL is set to None.

video classification

Thanks for the codes. I want to ask if the codes can be used in video classification?

关于GNL

你好，请问SpatialCGNL代码中
t = t.view(b, 1, c * h * w)
p = p.view(b, 1, c * h * w)
g = g.view(b, c * h * w, 1)
att = torch.bmm(p, g)
x = torch.bmm(att, t)
x = x.view(b, c, h, w)
这里求出的att是一个单值，再与t相乘，这样的意义是什么？

Heat-map Visualization

How to generate heat map visualization for video frames as presented in the paper Figure 6 ?

Thank you for your support.

Question about the results in paper.

Hi, thanks for interesting paper :)
I have some questions.

There is no comment in paper whether the results are mean or median or best.
If it is the mean, how many experiments did you execute for the mean?
And are the results in github are included in the mean?

Approximation about Eq. (10)

Thanks for presenting a such interesting work.
I wonder if you use β = exp(−γ(∥θ∥^2 +∥φ∥^2)) to approximate β = exp(−γ(∥θi∥^2 +∥φj∥^2)) in Eq. (10)?

where to add nln-block

In which block did you add the nln-block? There are some secret numbers in function _make_layer,
`` for i in range(1, blocks):
if (i == 5 and blocks == 6) or
(i == 22 and blocks == 23) or
(i == 35 and blocks == 36):...
And did you only add the nln-block to layer3?

Accuracy only ~70% on ucf101.

Thanks for the great job. I follow the training strategy in the paper to train a I3DResNet50 on ucf101, and the ImageNet pretrained model is used. I sample 64 consecutive frames and drop evenly as the training input and sample 30x32 frames as the testing input. I3DResNet is converted from C2D mentioned in Non-local network.
However, I can only get about 70% accuracy. So, can you provide the script about the task of video classification or give some suggestions? Thank you.

In dot-prod kernel, a typo or intended?

According to the definition of bmm, this line outputs bx1x1 tensor; however, Eqn. 12 in the paper seems to be for NC x NC.
Could you clarify my observation?

kaiyuyue / cgnl-network.pytorch Goto Github PK

cgnl-network.pytorch's People

Stargazers

Watchers

Forkers

cgnl-network.pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org