kkahatapitiya / x3d-multigrid Goto Github PK

View Code? Open in Web Editor NEW

89.0 89.0 13.0 268.13 MB

PyTorch implementation of X3D models with Multigrid training.

License: MIT License

Python 100.00%

efficient-training efficient-video-architectures multigrid x3d

x3d-multigrid's People

Contributors

Stargazers

Watchers

Forkers

amaljithcf kiyoshikawasaki sunruina2 krystal0606 hitersyw jerichosu jovian-dsouza daheehan333 huangjun12 zeyut seunghoon-yi tdtce shujunyy123

x3d-multigrid's Issues

How to test video-level acc?

Hi,appreciate your beautiful work! @kkahatapitiya Could you tell me how you implement performance validation 71.48% Top-1 accuracy (3-view) on Kinetics-400？Have you opensource your video-level accuracy test code? I test the pretrained performance lower than the performance you offered. （video-level acc which is average all clips of a test video of my approach）

num_samples error

Hi,

When I run train_x3d_charades.py, I get the following error:

raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

I'm using the same dataset in the code (Charades_v1_rgb). Do you have any suggestions?
Thank you.

How to set the super-parameter when doing validation?

Hi, thanks a lot for sharing your implementation! I want to use your pretrained model to do validation, and if I only have one GPU, how should I modify the super-parameters, especially base_bn_splits used in generate_model. And I want to know whether the model named "x3d_multigrid_kinetics_fb_pretrained.pt" is modified from the provided model by facebook? Looking forward to your reply.

Dataset generation

How to generate the following files?
KINETICS_TRAIN_ANNO
KINETICS_VAL_ANNO
KINETICS_CLASS_LABELS

X3D No Multigrid

I am planning to use your implementation in x3d.py and use it in my own training environment to train X3D with a constant batch size. I don't want to use any multigrid features. I will be using my own dataloaders and datasets and so on.
In the below model instantiation snippet, I am unsure about one parameter:

x3d = resnet_x3d.generate_model(x3d_version=X3D_VERSION, n_classes=400, n_input_channels=3,
                                dropout=0.5, base_bn_splits=BASE_BS_PER_GPU//CONST_BN_SIZE)

What is base_bn_splits? If I use a single GPU and a constant batch size, what value do I need to give this parameter? Thanks a lot! @kkahatapitiya

Changing input clip length

Good day!

I have troubles on finding where to specify input clip length parameter when defining X3D model. Currently I'm aiming to change input frames (temporal duration parameter) to 20 for X3D-M training and so that input clip (gamma_tau) is sampled at 10FPS.
Please provide some insight on how that can be achieved.

Why eval mode degenerated?

Thanks for your clean implementation! @kkahatapitiya
I have two problem to consult you:

I find out the prediction in eval mode always same when I finish training x3d on kinetics-200 dataset. But it's normal if inference in model.train().I failed to find the reason.(base_bn_splits=8 or 1 got same observation, I trained the model in normal way.)
Why some layerx.x.bnx.split_bn.running_var and running_mean keep still alone the whole training process ?

As the chart above, why running_mean and running_var keep same along the whole training process?
appreciate it

x3d.py

Add these codes to the file

if __name__=='__main__':
    net = generate_model('S',).cuda()
    #print(net)    
    from torchsummary import summary
    inputs = torch.rand(8, 3, 10, 112, 112).cuda()
    output = net(inputs)
    print(output.shape)
    summary(net,input_size=(3,10,112,112),batch_size=8,device='cuda')

The code can run success, but except the summary,
The error report was

 File "x3d.py", line 382, in <module>
    summary(net,input_size=(3,10,112,112),batch_size=8,device='cuda')
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torchsummary\torchsummary.py", line 72, in summary
    model(*x)
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "x3d.py", line 324, in forward
    x = self.bn1(x)
  File "D:\software\program\Anaconda3\envs\pytorch1\lib\site-packages\torch\nn\modules\module.py", line 1128, in _call_impl
    result = forward_call(*input, **kwargs)
  File "x3d.py", line 52, in forward
    x = x.view(n // self.num_splits, c * self.num_splits, t, h, w)
RuntimeError: shape '[0, 192, 10, 56, 56]' is invalid for input of size 1505280

I found that the shape of x was (2,3,10,112,112) in the forwad other than (8,3,10,112,112), and I don`t konw why.
Do you konw that?

Performance Comparison

Hi，@kkahatapitiya, Thanks for your clear reproduction.
I have two question when I test your code:

What is the specific performance on kinetics-400？ Because you said that it achieves 62.62% Top-1 accuracy (3-view) on Kinetics-400 when trained for ~200k iterations from scratch, I don not know which version of x3d got this result. How many epoch you trained to get this results?
As for the figure below in the original paper, x3d-M got 4.73G FLOPs but I test this x3d-M of this code and got 3.76G FLOPs. Could you please explain about it?