Giter Site home page Giter Site logo

kaiyuyue / cgnl-network.pytorch Goto Github PK

View Code? Open in Web Editor NEW
260.0 7.0 41.0 560 KB

Compact Generalized Non-local Network (NeurIPS 2018)

Home Page: https://arxiv.org/abs/1810.13125

License: MIT License

Python 75.27% C++ 17.12% Cuda 7.62%
cgnl-network nl-network compact-generalized-non-local-block non-local-block computer-vision pytorch caffe attention fast-attention

cgnl-network.pytorch's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cgnl-network.pytorch's Issues

Question about experiments

Hi! Thanks for your code and paper. I have several question about this work.
(a). In your paper, the results are a little lower than this repo. why? (about 1%)
(b). In your paper, you also insert 5 NL blocks in resnet, what are the specific positions of these blocks?
(c). Have you inserted 5 NL/GCNL blocks when training on ImageNet?
Many thanks !

Normalization

Hi,
I find your work very interesting. However I have two questions regarding normalization:

  1. In the original non-local neural networks work, the product of phi and theta is normalized BEFORE it multiplies g to produce the output (in their work it is done using a softmax layer). I do not see any such normalization in your work - why?

  2. Your Taylor expansion is based on the assumption that both theta and phi are of unit L2 norm. I do not see this enforced in your code - what have I missed?

Thanks,

Qusetion about SpatialCGNL dot production kernel

Thank you for your work.

There is a qusetion about SpatialCGNL dot production kernel.

In your code,the calculation process of the dot production kernel:p = p.view(b, 1, c * h * w), g = g.view(b, c * h * w, 1), att = torch.bmm(p, g), the shape of att is (b * 1 * 1) ,what is the meaning of the shape of att?

Reproduce mini-kinetics

I implemented I3D and Non-local (NL) based on the code release by Wang et al. (2018); however, CGNL got slightly lower accuracy than NL for mini-Kinetics using 8-frame and 32-frame 5-block ResNet-50 models.
Because NL is reproduced, I wonder if there is any detail for CGNL to get the scores in the paper. Is there any different hyperparameter for CGNL?

Instantiating SpatialCGNL with the default parameters.

Hi! am I doing something wrong? I tried to instantiate resnet.SpatialCGNL() module feeding just required parameters inplanes and planes however it gives me an error.
Working Directory: cgnl-network.pytorch/model
Please find the minimal reproducing code:

from resnet import SpatialCGNL
SpatialCGNL(20, 10)

Error Traceback:

 File "/cgnl-network.pytorch/model/resnet.py", line 138, in __init__
    self.z = nn.Conv2d(planes, inplanes, kernel_size=1, stride=1,
  File "python3.8/site-packages/torch/nn/modules/conv.py", line 340, in __init__
    super(Conv2d, self).__init__(
  File "python3.8/site-packages/torch/nn/modules/conv.py", line 24, in __init__
    if in_channels % groups != 0:
TypeError: unsupported operand type(s) for %: 'int' and 'NoneType'

I believe this is because the parameter groups in resnet.SpatialCGNL is set to None.

video classification

Thanks for the codes. I want to ask if the codes can be used in video classification?

关于GNL

你好,请问SpatialCGNL代码中
t = t.view(b, 1, c * h * w)
p = p.view(b, 1, c * h * w)
g = g.view(b, c * h * w, 1)
att = torch.bmm(p, g)
x = torch.bmm(att, t)
x = x.view(b, c, h, w)
这里求出的att是一个单值,再与t相乘,这样的意义是什么?

Heat-map Visualization

How to generate heat map visualization for video frames as presented in the paper Figure 6 ?

Thank you for your support.

Question about the results in paper.

Hi, thanks for interesting paper :)
I have some questions.

There is no comment in paper whether the results are mean or median or best.
If it is the mean, how many experiments did you execute for the mean?
And are the results in github are included in the mean?

Approximation about Eq. (10)

Thanks for presenting a such interesting work.
I wonder if you use β = exp(−γ(∥θ∥^2 +∥φ∥^2)) to approximate β = exp(−γ(∥θi∥^2 +∥φj∥^2)) in Eq. (10)?

where to add nln-block

In which block did you add the nln-block? There are some secret numbers in function _make_layer,
`` for i in range(1, blocks):
if (i == 5 and blocks == 6) or
(i == 22 and blocks == 23) or
(i == 35 and blocks == 36):...
And did you only add the nln-block to layer3?

Accuracy only ~70% on ucf101.

Thanks for the great job. I follow the training strategy in the paper to train a I3DResNet50 on ucf101, and the ImageNet pretrained model is used. I sample 64 consecutive frames and drop evenly as the training input and sample 30x32 frames as the testing input. I3DResNet is converted from C2D mentioned in Non-local network.
However, I can only get about 70% accuracy. So, can you provide the script about the task of video classification or give some suggestions? Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.