sallymmx / actionclip Goto Github PK

View Code? Open in Web Editor NEW

480.0 4.0 58.0 6.67 MB

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

License: MIT License

Python 98.86% Shell 1.14%

actionclip's Introduction

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Updates

2022.01: Add the trained model download link of google driver.

Overview

Content

Prerequisites
Data Preparation
Uodates
Pretrained Models
- Kinetics-400
- Hmdb51 && UCF101
Testing
Training
Contributors
Citing_ActionClip
Acknowledgments

Prerequisites

The code is built with following libraries:

PyTorch >= 1.8
wandb
RandAugment
pprint
tqdm
dotmap
yaml
csv

For video data pre-processing, you may need ffmpeg.

More detail information about libraries see INSTALL.md.

Data Preparation

We need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Kinetics, UCF101, HMDB51, Charades.

Updates

We now support single crop validation(including zero-shot) on Kinetics-400, UCF101 and HMDB51. The pretrained models see MODEL_ZOO.md for more information.
we now support the model-training on Kinetics-400, UCF101 and HMDB51 on 8, 16 and 32 frames. The model-training configs see configs/README.md for more information.
We now support the model-training on your own datasets. The detail information see configs/README.md.

Pretrained Models

Training video models is computationally expensive. Here we provide some of the pretrained models. We provide a large set of trained models in the ActionCLIP MODEL_ZOO.md.

Kinetics-400

We experiment ActionCLIP with different backbones(we choose Transf as our final visual prompt since it obtains the best results) and input frames configurations on k400. Here is a list of pre-trained models that we provide (see Table 6 of the paper). *Note that we show the 8-frame ViT-B/32 training log file in ViT32_8F_K400.log.

model	n-frame	top1 Acc(single-crop)	top5 Acc(single-crop)	checkpoint
ViT-B/32	8	78.36%	94.25%	link pwd:b5ni
ViT-B/16	8	81.09%	95.49%	link pwd:hqtv
ViT-B/16	16	81.68%	95.87%	link pwd:dk4r
ViT-B/16	32	82.32%	96.20%	link pwd:35uu

HMDB51 && UCF101

On HMDB51 and UCF101 datasets, the accuracy(k400 pretrained) is reported under the accurate setting.

HMDB51

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	76.2%	link

UCF101

model	n-frame	top1 Acc(single-crop)	checkpoint
ViT-B/16	32	97.1%	link

Testing

To test the downloaded pretrained models on Kinetics or HMDB51 or UCF101, you can run scripts/run_test.sh. For example:

# test
bash scripts/run_test.sh  ./configs/k400/k400_test.yaml

Zero-shot

We provide several examples to do zero-shot validation on kinetics-400, UCF101 and HMDB51.

To do zero-shot validation on Kinetics from CLIP pretrained models, you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/k400/k400_ft_zero_shot.yaml

To do zero-shot validation on UCF101 and HMDB51 from Kinetics pretrained models, you need first prepare the k400 pretrained model and then you can run:

# zero-shot
bash scripts/run_test.sh  ./configs/hmdb51/hmdb_ft_zero_shot.yaml

Training

We provided several examples to train ActionCLIP with this repo:

To train on Kinetics from CLIP pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/k400/k400_train.yaml

To train on HMDB51 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/hmdb51/hmdb_train.yaml

To train on UCF101 from Kinetics400 pretrained models, you can run:

# train 
bash scripts/run_train.sh  ./configs/ucf101/ucf_train.yaml

More training details, you can find in configs/README.md

Contributors

ActionCLIP is written and maintained by Mengmeng Wang and Jiazheng Xing.

Citing ActionCLIP

If you find ActionClip useful in your research, please cite our paper.

Acknowledgments

Our code is based on CLIP and STM.

actionclip's People

Contributors

Stargazers

Watchers

Forkers

weidixie sidney1994 aibodygym zhangzhenqiqi yifanfanfanfan jiazheng-xing daydayupdyp xiangjun0103 xujinglin allenxcp tomchen-ctj jingjieshang mathpopo xinyu1205 dreamerlin abhisheklalwani ziyan-wyq xuning2 xy-lin xavier-h-10 gzaraunitn shengzhang90 xinzhe-ni iranroman peanutgeek kevincenty zylwithxy aliman80 ybkim95 neoncloud zonesixgames xtrigold sirliuyang lingxiao108105 mdongbenben anminhhung qhcv yangzhenkui leesunfreshing xuyu0010 zhulinxiaohai yurkar2333 syliudf plaovem wensongvincent wittech franshk xyxing1234 zhaoyijiang frank-regal mdyuan926 mymuli ashlee-lu wuduidi song-jinshui ugrkilc

actionclip's Issues

About label list

Hi,
Nice work. Could you provide the label list for k400 and Charades dataset? Thank you.

about charades dataset

thanks for your brilliant work

关于准确率

我把actionClip放在自己的数据集上，test acc一直再跳啊跳的幅度还很大20多个点的跳

转onnx

Model Weights unavailable for download

Hi,

https://github.com/sallymmx/ActionCLIP/blob/master/MODEL_ZOO.md

I cannot access the links in this .md file to download the weights (it seems that the links are empty)? Please advise.

checkpoint的结果和论文不一致

您好，

ViT-B/16 | frame 32 在论文里面是83.8%，但是github里面写的是82.32%，我用这个checkpoint跑出来的结果是81.3%。

ViT-B/32 | frame 8的结果貌似上传错了，这个文件名和测试结果和ViT-B/16 | frame 32 的一样。

既然在视频分类数据集上训练了temporal transformer，所以zero-shot分类，其实也只是对训练集的class比较好吧

visualized results

I want to know if I can get some visualized results.

关于运行

代码一定要有网才能运行，怎么能设置没有网就能运行啊

about dataset

Has anyone tried on their own small dataset? What's the effect？

Validation set contains strange ids

Hi, thank you for releasing the code.
After checking the validation file, I found ~180 video ids that can not be found in official val file or some of them can not be decoded, the videos are listed below:

tCKnYXne_o
H-Ww0gGDWU
Sam59CH5_o
blRZN-6_ZM
gxLOV_s9wU
Q2OO6q6-iM
I3_52Xh7oU
sSTHZHHp-c
l309dqYR-8
ebcuq_qItc
7tTouR10Qro
6uq-NBo3Bk
D-Fa71ta14
ioNctElzaas
PcOAmaZMNZY
4SJ-uWc3PQ
WI7e5-wURs
JN0MXb-zi8
IyR-sGt0uw
UqI-TBQRgg
mMT4Nt_c-A
UxN_uuEZC0
6k2ntyDP-c
u8zlG-OS_E
8IjJv90K-M
fZ0IO-Q-ZQ
rrtLyJs-3w
06oD_bFxOQ
Lo3hFbum_o
a06xpsZj-U
5EvC-g-KUQ
GQ-4QfVpXc
TyWiE-4zpM
esH_aGzBrw
7mJ12n-xyM
LktSuL8_7M
Xfmjzt_n24
ctWolbJDJyc
H8Ny92IEyaM
OnU1Hr_jlY
7TZOYU_Ta0
2DwBhMUH-I
9ILBd-ArtM
LKhtbW5q-c
0-s1eu4sF4
jToAVyxs-g
clhd73_vDQ
LESFP2wh_8
NtUqv_6vdU
I02uj1Sc7TM
I0luMKjIZyg
5vTJ-N4jrI
hR-iDJcjgU
F-aEPmjERo
ITTI-fkvo
C84yBh-fQw
NMaC-IGv_Q
YGwB8HJj-g
SxAU4_1c_o
S6wwANH-EY
hu_Ld-ddk
blq_c14hGQ
isSe2P8T-4
owWHGvn_b0
OoJW-OeFtw
G_lySaTeNM
XH_50Lp8qQ
PBf_Wa6vO8
pUDgyU_KGc
DYPEKYAcEFg
HWxZHHT_l4
IAo-mNduUk
6Xp_ymM0Lg
ZY_EfSlzGA
Zpj4-Z2YRk
c3-qJC_azg
B_pr_4s7vY
sAA809R_u1E
6c-sV_gmq0
QfuO07EqYhI
D6-UmndVJk
eJVHxmkm-4
bVVs-nntQ
RZXH93_XNY
5-dvLrzE78
kZsdc1A_J8
54Bs-0kdhA
Y-fUYGcb7o
5g0IDBneA
yAlJ1P_SGg
AgJx-0yaFQ
P56BlJO-gw
J-8cbYBG7c
aQR-rCWaVQ
7qK_w-g3Y
0IEt9-NeV4
TL-9g8KBFA
PB0FuE-fdA
7Od7A1-B9s
egPJubR-CE
Etym1-30wM
3E3GBXAUc
j3eNzQR-EI
rQuS0w-1b8
JcZ7Ry_9kg
j4Anoe2ug8k
Eq_X-uRNm8
0ML-FXomBw
G2XYLk0-38
RlFMUo-JE
QbBRu7a_xM
CMel_KnSzw
W9AQZ-gUro
W07v4Ci-zY
5sx6NEtkd1E
jToBK-njO8
spJJybwq-g
wvsuK9HBif0
r5c12Eo_jY
FawHl2-DAg
dguKqz_F5w
GBKRR-OvqM
adU_0hUdr8
DWE7WQkBvBc
ieIssRi8iXU
Q_vnBY8YP8
afwq-zVgJk
JP5gc9_J4I
iox_MbwxNg
aGmWWA-h3s
vI8Vp2-gfiU
7-dud_cqq8
o9-ONbnlRw
aVXC13LEJgU
MPqy00mB-Y
d8_H5d2sd4
2C-yeMmge0
FGj7-Cxu_0
3FihEVl-R8
cMG2QyN-mE
UZLHav3t_NQ
t_T_nYKdh8
DNb_6w2cZM
D5-ZGEjiWI
58I5s_qDVU
gmBW-mkRXA
VP-VaZhno0
p1Cftd_xo
73Kg-MKmwE
k0w_3JFfmE
m-YKP0ReEE
pn5NxJmok
c96bD-9fHs
jR-X0LqwpY
96AfwOj-qw
LrBC1_yf04
Oni-SybW0
6yaNVdS-2E
4J-bkpjVb4
v7DhQiuKEd0
n0WAbM8z-o
m-2ka9iN9M
WRh2_MJLLs
3MhOA-vSO8
u5A74I0-M
kahgmRD-4g
b-YkpzFphk
vYfm8bO-TM
gtC_avp2gU
5Vu8HJ_eMg
wdnasc-fCg
xxBx1jv-ks
Ykfa-4qx0
UHRaVrN_us
JwMoMeZ_v8
0ew-c0w7uc
uVv1h-xAe0
u4SSk4kWqLA
2L8B_meOLE
c-YbuFrXbI
Bu-6oESyxQ

I am not sure if the difference would cause ~1% performance drop in my testing case, since I loaded pretrained model and checked all testing parameters provided in the config file.

No train/validation/test list for HMDB51 or Kinetics?

I am looking to reproduce the model results on the HMDB51 dataset but I cannot find the train/test lists that are used in the network, why are train_rgb_split1.txt and val_rgb_split1.txt missing?

If they are just all original HMDB51 split lists but appended, why is there no test set? There is currently no way to properly reproduce the model results.

About input data- video file, prompts

I'm testing serveral custom video files on actionCLIP model. but i want to use input data generated by webcam(opencv video instance) not video file path. i wonder if this model supports video instance data format as input. thank you.

About KLLoss

Thanks for your amazing work!
The KLLoss in the implementation is divided by feature dims (times batch_size in code), instead of batch size.
The docs of PyTorch points that reduction = 'batchmean' aligns with KL math definition. I'm writing to ask the reason for the implementation choice.
Thanks in advance.

what's the diffrence between model and fusion model?

关于数据集

我很疑惑为什么你们的数据集path是经过修改的，比如将连续的双下划线替换为单下划线，将标签中的(替换为-。而且你们的训练集和验证集都比正常的少几百个视频样本

请问有支持多卡的代码吗？

感谢开源，请问是否有支持多卡的代码？我看文章中说用了4块3090，应该是支持多卡训练的。

Looking for help to try ActionCLIP for custom videos

Hello @sallymmx

Great work on this project. I am new to CLIP and ActionCLIP. I am looking for some help to setup and try out ActionCLIP for some custom videos. Would you be kind enough to do some online tutoring sessions. I know that your time and knowledge is valuable so I will be happy to pay for your time.

Looking forward to hearing from you. Thanks.

ModuleNotFoundError: No module named 'RandAugment'

how to install this api?

关于zero-shot测试和few-shot测试的设置问题

你好,请问zero-shot的实验是把有seen类别的数据结果当做预训练模型,然后直接把这个预训练的模型拿来直接预测unseen类别么?zero-shot的测试跟一般的测试只是测试集不一样吗?另外few-shot如何划分样本集呢?

about pretrained weights (on K400)

Thanks for this excellent work!
Could you please upload the pre-trained weights (on K400) on Google drive?

日志中的 seqTransf 代码中并无对应部分

UCF101的微调

可以公布一下accuracy为97的训练好的模型文件么，我尝试在UCF101上微调，结果和97差了好多

The checkpoints of HMDB51 & UCF101 are not found

Could you share the checkpoints of HMDB51 & UCF101 again? Thanks!

4.5. Comparison with State-of-the-art Methods The results in the appendix

您好，您的工作使我受益良多。我精读了您的论文，并对您论文中第4.5章节附录中的HMDB51和UCF101的结果深感兴趣！
请问我在哪能继续学习到您论文附录中的相关知识呢？
对此，我非常期待您的回复，同时祝您新春愉快！

Pretrained models about UCF101 and HMDB51

I'm very interested in the work you have done. Could you provide pre-training models of UCF101 and HMDB51? Thanks a lot!

Dead links

Hi!

Links for downloading model weights are not working
https://github.com/sallymmx/ActionCLIP/blob/master/MODEL_ZOO.md

Could you please update them

配置问题

怎么在config中没有读取数据集路径的文件啊

About TemporalShift_VIT

https://github.com/sallymmx/ActionCLIP/blob/master/modules/temporal_shift.py#L73-L75

Does it really work? I run into errors, can you fix it, or it is just un-runable

environment.yml

The links of pretrained models are expired.

I can not download the model from this link.
https://github.com/sallymmx/ActionCLIP/blob/master/MODEL_ZOO.md

Pretrained models from baidu?

I have had no luck downloading the model weights. It refers to a website called Baidu that is entirely in Chinese and it wants to install some client that is entirely in chinese. Is it possible to add an alternative download link for non-chinese users?

the checkpoint of models can't be downloaded

Could you provide pretrained models about UCF101 and HMDB51

Could you release 30 crops testing script?

Thank you for your share, it's a great work. And I find you didn't release 30 crops testing script for the best performance. Will this part be released？

根据配置文件无法复现结果

您好，我尝试使用您的配置文件 k400_ft_tem.yaml，但是无法复现 ViT-32_8segments 的结果。我得到的结果是 76.9。我注意到 k400_ft_tem.yaml 使用了数据增广而 k400_ft_tem_test.yaml 没有使用数据增广。数据增广是否应该使用呢？不知您是否知道无法复现的原因可能出现在哪里？谢谢

请问能提供预处理后的数据集下载链接吗，非常感谢

About the pretrained model

为什么我使用readme.md里作者提供的Kinetics-400 pretrained model 执行‘./scripts/run_test.sh ./configs/k400/k400_test.yaml’命令就报这个错误：
model = build_model(state_dict or model.state_dict(), joint=joint,tsm=tsm,T=T,dropout=dropout, emb_dropout=emb_dropout,pretrain=pretrain).to(device)
File "/home/houyf22/lzu/ActionCLIP-master/clip/model.py", line 314, in build_model
vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
KeyError: 'visual.layer1.0.conv1.weight'

但是使用clip的预训练模型就不报错，是不是作者没有根据自己的预训练模型调整代码还是我漏了什么步骤？

Could u please fix the links of the pretrained model?

Please fix the links~ ~!!! Thank u.

关于小样本设置下的结果

你好，我比较感兴趣论文中汇报的小样本设置下的准确率您是如何得到的？
1.是否按照小样本的一般范式（meta-learning）重新进行fine-tune?
2.zero-shot下可以直接计算视频特征与标签文本的相似度，但是few-shot下每个类别除了标签还有少量的样本，这些少量样本如何贡献到最终的预测得分？
希望得到您的回复！🙏

i am still not very clear about how to prepare dataset

FileNotFoundError: [Errno 2] No such file or directory: '/public/datasets/kinetics400/data2/extracted_val_frames/ice_skating/99elfyHn6wo/img_00023.jpg'

您好，想知道zero-shot实验中ucf_zero_shot.yaml中label_list的csv文件中存放的哪些信息，还有val_list的txt文件存放的哪些信息

数据集准备

hi sallymmx:

感谢作者开源，供大家学习。在准备数据集的时候，有不清楚的地方，请教下：
1 看了TSN 准备数据集的方式，它用了计算了光流，没有看清楚如何准备rgb图片的？是吧每个视频都转换为图片吗？
2 在k4001中的train_frames_new.txt 中“atasets/kinetics400/data2/extracted_train_frames/riding_elephant/7MNOnIQx5wY_000000_000010 166 269”其中“166，269”这列表示啥？
3 在k400_test.yaml中，num_segments表示啥？采样帧数？
4 seg_length 表示啥意思？

The details about multi-label video classification (Charades)

It is mentioned in the paper that the method is also effective for multi-classification. “ActionCLIP achieves the top performance of 44.3 mAP, which demonstrates its effectiveness on multi-label video classification."

Could you please explain the details of how to deal with multiple categories? thanks.

About the bibtex

Hi there,

Thank you for sharing your great work. I am wondering whether it is an accepted paper (according to the provided bibtex in readme) or an pre-print paper in arXiv? It seems that I cannot find this paper by searching the information in the bibtex.

where is T

https://github.com/sallymmx/ActionCLIP/blob/master/clip/model.py#L111
I can't find T in this code

关于gpu的问题

怎么设置gpu的个数和指定用几号gpu啊，我一跑实验室其他人都跑不了了

Cannot reproduce results

Hello, I tried to reproduce the Kinetics 400 results using the config file k400_test.yaml and the 32-frame ViT-B/16 model. I get the following results: Epoch: [DotMap()/DotMap()]: Top1: 81.32102272727273, Top5: 95.90097402597402. This is slightly lower than the 82.32% and 96.20% provided in the README. Any insights? Thanks.

Also, do you have code for the 10-clip 3-crop setting used with the best performant model? If I understand properly, this setting achieves 83.8% and 97.1% as reported in the paper, is that correct?

About tsm

It seems all the settings do not apply tsm for Pre-network Prompt since I didn't find any config set tsm=True

So, the tsm is useless in ActionCLIP ?

random seed

Hello, thank you for your excellent work! I noticed that there is no random seed set in the code. How can I ensure that the experiment is reproducible?

sallymmx / actionclip Goto Github PK

actionclip's Introduction

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Updates

Overview

Content

Prerequisites

Data Preparation

Updates

Pretrained Models

Kinetics-400

HMDB51 && UCF101

HMDB51

UCF101

Testing

Zero-shot

Training

Contributors

Citing ActionCLIP

Acknowledgments

actionclip's People

Contributors

Stargazers

Watchers

Forkers

actionclip's Issues

Recommend Projects

Recommend Topics

Recommend Org