Giter Site home page Giter Site logo

x3d-multigrid's People

Contributors

kkahatapitiya avatar piergiaj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

x3d-multigrid's Issues

How to test video-level acc?

Hi,appreciate your beautiful work! @kkahatapitiya Could you tell me how you implement performance validation 71.48% Top-1 accuracy (3-view) on Kinetics-400?Have you opensource your video-level accuracy test code? I test the pretrained performance lower than the performance you offered. (video-level acc which is average all clips of a test video of my approach)

num_samples error

Hi,

When I run train_x3d_charades.py, I get the following error:

raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

I'm using the same dataset in the code (Charades_v1_rgb). Do you have any suggestions?
Thank you.

Screenshot 2023-09-01 alle 11 35 47

How to set the super-parameter when doing validation?

Hi, thanks a lot for sharing your implementation! I want to use your pretrained model to do validation, and if I only have one GPU, how should I modify the super-parameters, especially base_bn_splits used in generate_model. And I want to know whether the model named "x3d_multigrid_kinetics_fb_pretrained.pt" is modified from the provided model by facebook? Looking forward to your reply.

Dataset generation

How to generate the following files?
KINETICS_TRAIN_ANNO
KINETICS_VAL_ANNO
KINETICS_CLASS_LABELS

X3D No Multigrid

I am planning to use your implementation in x3d.py and use it in my own training environment to train X3D with a constant batch size. I don't want to use any multigrid features. I will be using my own dataloaders and datasets and so on.
In the below model instantiation snippet, I am unsure about one parameter:

x3d = resnet_x3d.generate_model(x3d_version=X3D_VERSION, n_classes=400, n_input_channels=3,
                                dropout=0.5, base_bn_splits=BASE_BS_PER_GPU//CONST_BN_SIZE)

What is base_bn_splits? If I use a single GPU and a constant batch size, what value do I need to give this parameter? Thanks a lot! @kkahatapitiya

Changing input clip length

Good day!

I have troubles on finding where to specify input clip length parameter when defining X3D model. Currently I'm aiming to change input frames (temporal duration parameter) to 20 for X3D-M training and so that input clip (gamma_tau) is sampled at 10FPS.
Please provide some insight on how that can be achieved.

Why eval mode degenerated?

Thanks for your clean implementation! @kkahatapitiya
I have two problem to consult you:

  1. I find out the prediction in eval mode always same when I finish training x3d on kinetics-200 dataset. But it's normal if inference in model.train().I failed to find the reason.(base_bn_splits=8 or 1 got same observation, I trained the model in normal way.)
  2. Why some layerx.x.bnx.split_bn.running_var and running_mean keep still alone the whole training process ?
    image
    As the chart above, why running_mean and running_var keep same along the whole training process?
    appreciate it

x3d.py

Add these codes to the file

if __name__=='__main__':
    net = generate_model('S',).cuda()
    #print(net)    
    from torchsummary import summary
    inputs = torch.rand(8, 3, 10, 112, 112).cuda()
    output = net(inputs)
    print(output.shape)
    summary(net,input_size=(3,10,112,112),batch_size=8,device='cuda')

The code can run success, but except the summary,
The error report was

 File "x3d.py", line 382, in <module>
    summary(net,input_size=(3,10,112,112),batch_size=8,device='cuda')
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torchsummary\torchsummary.py", line 72, in summary
    model(*x)
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "x3d.py", line 324, in forward
    x = self.bn1(x)
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torch\nn\modules\module.py", line 1128, in _call_impl
    result = forward_call(*input, **kwargs)
  File "x3d.py", line 52, in forward
    x = x.view(n // self.num_splits, c * self.num_splits, t, h, w)
RuntimeError: shape '[0, 192, 10, 56, 56]' is invalid for input of size 1505280

I found that the shape of x was (2,3,10,112,112) in the forwad other than (8,3,10,112,112), and I don`t konw why.
Do you konw that?

Performance Comparison

Hi,@kkahatapitiya, Thanks for your clear reproduction.
I have two question when I test your code:

  1. What is the specific performance on kinetics-400? Because you said that it achieves 62.62% Top-1 accuracy (3-view) on Kinetics-400 when trained for ~200k iterations from scratch, I don not know which version of x3d got this result. How many epoch you trained to get this results?

  2. As for the figure below in the original paper, x3d-M got 4.73G FLOPs but I test this x3d-M of this code and got 3.76G FLOPs. Could you please explain about it?

image

Pretrained models

Hello, which configurations of X3d have you trained and included in the repo ? (X3D-M , X3D- L , X3D-XL ....)

Model convertion

Thank you a lot for sharing your implementation. It is really helpful for implementing X3D network on custom deep learning problem.

The original repo only provides Caffe2 pretrained model. How did you convert them to pytorch format ? (I also want to try other version of X3D)

Training from scratch

Is it possible to train it from scratch? Eventually which is the dataset format I have to provide?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.