Giter Site home page Giter Site logo

actionclip's People

Contributors

jiazheng-xing avatar sallymmx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

actionclip's Issues

No train/validation/test list for HMDB51 or Kinetics?

I am looking to reproduce the model results on the HMDB51 dataset but I cannot find the train/test lists that are used in the network, why are train_rgb_split1.txt and val_rgb_split1.txt missing?

If they are just all original HMDB51 split lists but appended, why is there no test set? There is currently no way to properly reproduce the model results.

UCF101的微调

可以公布一下accuracy为97的训练好的模型文件么,我尝试在UCF101上微调,结果和97差了好多

About the pretrained model

为什么我使用readme.md里作者提供的Kinetics-400 pretrained model 执行‘./scripts/run_test.sh ./configs/k400/k400_test.yaml’命令就报这个错误:
model = build_model(state_dict or model.state_dict(), joint=joint,tsm=tsm,T=T,dropout=dropout, emb_dropout=emb_dropout,pretrain=pretrain).to(device)
File "/home/houyf22/lzu/ActionCLIP-master/clip/model.py", line 314, in build_model
vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
KeyError: 'visual.layer1.0.conv1.weight'

但是使用clip的预训练模型就不报错,是不是作者没有根据自己的预训练模型调整代码还是我漏了什么步骤?

Cannot reproduce results

Hello, I tried to reproduce the Kinetics 400 results using the config file k400_test.yaml and the 32-frame ViT-B/16 model. I get the following results: Epoch: [DotMap()/DotMap()]: Top1: 81.32102272727273, Top5: 95.90097402597402. This is slightly lower than the 82.32% and 96.20% provided in the README. Any insights? Thanks.

Also, do you have code for the 10-clip 3-crop setting used with the best performant model? If I understand properly, this setting achieves 83.8% and 97.1% as reported in the paper, is that correct?

数据集准备

hi sallymmx:

感谢作者开源,供大家学习。在准备数据集的时候,有不清楚的地方,请教下:
1 看了TSN 准备数据集的方式,它用了计算了光流,没有看清楚如何准备rgb图片的?是吧每个视频都转换为图片吗?
2 在k4001中的train_frames_new.txt 中“atasets/kinetics400/data2/extracted_train_frames/riding_elephant/7MNOnIQx5wY_000000_000010 166 269”其中“166,269”这列表示啥?
3 在k400_test.yaml中,num_segments表示啥? 采样帧数?
4 seg_length 表示啥意思?

根据配置文件无法复现结果

您好,我尝试使用您的配置文件 k400_ft_tem.yaml,但是无法复现 ViT-32_8segments 的结果。我得到的结果是 76.9。我注意到 k400_ft_tem.yaml 使用了数据增广而 k400_ft_tem_test.yaml 没有使用数据增广。数据增广是否应该使用呢?不知您是否知道无法复现的原因可能出现在哪里?谢谢

About input data- video file, prompts

I'm testing serveral custom video files on actionCLIP model. but i want to use input data generated by webcam(opencv video instance) not video file path. i wonder if this model supports video instance data format as input. thank you.

About label list

Hi,
Nice work. Could you provide the label list for k400 and Charades dataset? Thank you.

关于小样本设置下的结果

你好,我比较感兴趣论文中汇报的小样本设置下的准确率您是如何得到的?
1.是否按照小样本的一般范式(meta-learning)重新进行fine-tune?
2.zero-shot下可以直接计算视频特征与标签文本的相似度,但是few-shot下每个类别除了标签还有少量的样本,这些少量样本如何贡献到最终的预测得分?
希望得到您的回复!🙏

About KLLoss

Thanks for your amazing work!
The KLLoss in the implementation is divided by feature dims (times batch_size in code), instead of batch size.
The docs of PyTorch points that reduction = 'batchmean' aligns with KL math definition. I'm writing to ask the reason for the implementation choice.
Thanks in advance.

4.5. Comparison with State-of-the-art Methods The results in the appendix

您好,您的工作使我受益良多。我精读了您的论文,并对您论文中第4.5章节附录中的HMDB51和UCF101的结果深感兴趣!
请问我在哪能继续学习到您论文附录中的相关知识呢?
对此,我非常期待您的回复,同时祝您新春愉快!

配置问题

怎么在config中没有读取数据集路径的文件啊

关于准确率

我把actionClip放在自己的数据集上,test acc一直再跳啊 跳的幅度还很大20多个点的跳

The details about multi-label video classification (Charades)

It is mentioned in the paper that the method is also effective for multi-classification. “ActionCLIP achieves the top performance of 44.3 mAP, which demonstrates its effectiveness on multi-label video classification."

Could you please explain the details of how to deal with multiple categories? thanks.

Looking for help to try ActionCLIP for custom videos

Hello @sallymmx

Great work on this project. I am new to CLIP and ActionCLIP. I am looking for some help to setup and try out ActionCLIP for some custom videos. Would you be kind enough to do some online tutoring sessions. I know that your time and knowledge is valuable so I will be happy to pay for your time.

Looking forward to hearing from you. Thanks.

About the bibtex

Hi there,

Thank you for sharing your great work. I am wondering whether it is an accepted paper (according to the provided bibtex in readme) or an pre-print paper in arXiv? It seems that I cannot find this paper by searching the information in the bibtex.

About tsm

It seems all the settings do not apply tsm for Pre-network Prompt since I didn't find any config set tsm=True

So, the tsm is useless in ActionCLIP ?

random seed

Hello, thank you for your excellent work! I noticed that there is no random seed set in the code. How can I ensure that the experiment is reproducible?

Validation set contains strange ids

Hi, thank you for releasing the code.
After checking the validation file, I found ~180 video ids that can not be found in official val file or some of them can not be decoded, the videos are listed below:

tCKnYXne_o
H-Ww0gGDWU
Sam59CH5_o
blRZN-6_ZM
gxLOV_s9wU
Q2OO6q6-iM
I3_52Xh7oU
sSTHZHHp-c
l309dqYR-8
ebcuq_qItc
7tTouR10Qro
6uq-NBo3Bk
D-Fa71ta14
ioNctElzaas
PcOAmaZMNZY
4SJ-uWc3PQ
WI7e5-wURs
JN0MXb-zi8
IyR-sGt0uw
UqI-TBQRgg
mMT4Nt_c-A
UxN_uuEZC0
6k2ntyDP-c
u8zlG-OS_E
8IjJv90K-M
fZ0IO-Q-ZQ
rrtLyJs-3w
06oD_bFxOQ
Lo3hFbum_o
a06xpsZj-U
5EvC-g-KUQ
GQ-4QfVpXc
TyWiE-4zpM
esH_aGzBrw
7mJ12n-xyM
LktSuL8_7M
Xfmjzt_n24
ctWolbJDJyc
H8Ny92IEyaM
OnU1Hr_jlY
7TZOYU_Ta0
2DwBhMUH-I
9ILBd-ArtM
LKhtbW5q-c
0-s1eu4sF4
jToAVyxs-g
clhd73_vDQ
LESFP2wh_8
NtUqv_6vdU
I02uj1Sc7TM
I0luMKjIZyg
5vTJ-N4jrI
hR-iDJcjgU
F-aEPmjERo
ITTI-fkvo
C84yBh-fQw
NMaC-IGv_Q
YGwB8HJj-g
SxAU4_1c_o
S6wwANH-EY
hu_Ld-ddk
blq_c14hGQ
isSe2P8T-4
owWHGvn_b0
OoJW-OeFtw
G_lySaTeNM
XH_50Lp8qQ
PBf_Wa6vO8
pUDgyU_KGc
DYPEKYAcEFg
HWxZHHT_l4
IAo-mNduUk
6Xp_ymM0Lg
ZY_EfSlzGA
Zpj4-Z2YRk
c3-qJC_azg
B_pr_4s7vY
sAA809R_u1E
6c-sV_gmq0
QfuO07EqYhI
D6-UmndVJk
eJVHxmkm-4
bVVs-nntQ
RZXH93_XNY
5-dvLrzE78
kZsdc1A_J8
54Bs-0kdhA
Y-fUYGcb7o
5g0IDBneA
yAlJ1P_SGg
AgJx-0yaFQ
P56BlJO-gw
J-8cbYBG7c
aQR-rCWaVQ
7qK_w-g3Y
0IEt9-NeV4
TL-9g8KBFA
PB0FuE-fdA
7Od7A1-B9s
egPJubR-CE
Etym1-30wM
3E3GBXAUc
j3eNzQR-EI
rQuS0w-1b8
JcZ7Ry_9kg
j4Anoe2ug8k
Eq_X-uRNm8
0ML-FXomBw
G2XYLk0-38
RlFMUo-JE
QbBRu7a_xM
CMel_KnSzw
W9AQZ-gUro
W07v4Ci-zY
5sx6NEtkd1E
jToBK-njO8
spJJybwq-g
wvsuK9HBif0
r5c12Eo_jY
FawHl2-DAg
dguKqz_F5w
GBKRR-OvqM
adU_0hUdr8
DWE7WQkBvBc
ieIssRi8iXU
Q_vnBY8YP8
afwq-zVgJk
JP5gc9_J4I
iox_MbwxNg
aGmWWA-h3s
vI8Vp2-gfiU
7-dud_cqq8
o9-ONbnlRw
aVXC13LEJgU
MPqy00mB-Y
d8_H5d2sd4
2C-yeMmge0
FGj7-Cxu_0
3FihEVl-R8
cMG2QyN-mE
UZLHav3t_NQ
t_T_nYKdh8
DNb_6w2cZM
D5-ZGEjiWI
58I5s_qDVU
gmBW-mkRXA
VP-VaZhno0
p1Cftd_xo
73Kg-MKmwE
k0w_3JFfmE
m-YKP0ReEE
pn5NxJmok
c96bD-9fHs
jR-X0LqwpY
96AfwOj-qw
LrBC1_yf04
Oni-SybW0
6yaNVdS-2E
4J-bkpjVb4
v7DhQiuKEd0
n0WAbM8z-o
m-2ka9iN9M
WRh2_MJLLs
3MhOA-vSO8
u5A74I0-M
kahgmRD-4g
b-YkpzFphk
vYfm8bO-TM
gtC_avp2gU
5Vu8HJ_eMg
wdnasc-fCg
xxBx1jv-ks
Ykfa-4qx0
UHRaVrN_us
JwMoMeZ_v8
0ew-c0w7uc
uVv1h-xAe0
u4SSk4kWqLA
2L8B_meOLE
c-YbuFrXbI
Bu-6oESyxQ

I am not sure if the difference would cause ~1% performance drop in my testing case, since I loaded pretrained model and checked all testing parameters provided in the config file.

关于zero-shot测试和few-shot测试的设置问题

你好,请问zero-shot的实验是把有seen类别的数据结果当做预训练模型,然后直接把这个预训练的模型拿来直接预测unseen类别么?zero-shot的测试跟一般的测试只是测试集不一样吗?另外few-shot如何划分样本集呢?

关于gpu的问题

怎么设置gpu的个数和指定用几号gpu啊,我一跑实验室其他人都跑不了了

about dataset

Has anyone tried on their own small dataset? What's the effect?

Pretrained models from baidu?

I have had no luck downloading the model weights. It refers to a website called Baidu that is entirely in Chinese and it wants to install some client that is entirely in chinese. Is it possible to add an alternative download link for non-chinese users?

checkpoint的结果和论文不一致

您好,

ViT-B/16 | frame 32 在论文里面是83.8%,但是github里面写的是82.32%,我用这个checkpoint跑出来的结果是81.3%。

ViT-B/32 | frame 8的结果貌似上传错了,这个文件名和测试结果和ViT-B/16 | frame 32 的一样。

关于运行

代码一定要有网才能运行,怎么能设置没有网就能运行啊

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.