acherstyx / cocap Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 4.0 2.69 MB

[ICCV 2023] Accurate and Fast Compressed Video Captioning

Home Page: https://arxiv.org/abs/2309.12867

License: MIT License

Python 99.96% Shell 0.04%

compressed-video iccv2023 video-captioning

cocap's Introduction

Hi! I'm Yaojie Shen 👋

I’m currently a Master’s student at Institute of Software, Chinese Academy of Sciences.
My recent research interest lie in the fields of:

Computer Vision
Multimodality
Large Language Models

Language & Tools

...

Knowledge Management Workflow

Obsidian, Zotero, Things, Notion, ...

Status

cocap's People

Contributors

Stargazers

Watchers

Forkers

uncle-cao xinyuxiao xiaohuang-max bingliangli

cocap's Issues

Train the model on single GPU

I trained the model on single Nvidia RTX-4090 use the default config setting. However the result of the test dataset is significantly worse than the paper reported e.g. CIDer in msvd dataset from 113.0 -> 101.5.
I also tuned the accumulation step to 32 in order to satisfy the requirement of batch_size 64 in the paper in config setting but it seemed not helpful.

More information about the environment in which the code runs

I found that using some specific versions of CUDA and PyTorch may cause segfaults, dynamic link library exceptions, etc., making the code unable to be reproduced. I hope the author can provide information on a runnable environment (the version of the required library)
thank you

Inference Script

hi @acherstyx,

Is there an inference script to demo?

Thanks

AssertionError: An object named 'MSRVTTCaptioningDatasetForCLIP' was already registered in 'DATASET' registry!

Hello! Thank you for your contribution.
I have a problem when training, here is the error log:

Looking forward to your reply.

About inference speed

Thank you for sharing your code.
Could you please provide additional details regarding the inference speed calculation in Fig. 2 and Table 3? I am a bit confused.

Regarding Table 3, where the inference time for your model is listed as 178 ms, could you specify if this time corresponds to generate caption for one video file ?

Additionally, I would appreciate clarification on whether the time costs of IO operations and frame extraction are excluded from these calculations.

Lastly, Lastly, the videos in MSRVTT have a different number of frames, so how was this issue addressed in Table 3? For your model, how many frames per video are considered?

ERROR about cv_reader when using MSVD

I converted the video according to the method you provided. I found that some errors occurred in the batch of videos（num_workers=12, 4 were correct and 8 were wrong) , the wrong videos are:

./dataset/msvd/videos_240_h264_keyint_60/Nd45qJn61Dw_0_10.avi
./dataset/msvd/videos_240_h264_keyint_60/5P6UU6m3cqk_57_75.avi
./dataset/msvd/videos_240_h264_keyint_60/PD6eQY7yCfw_32_37.avi
./dataset/msvd/videos_240_h264_keyint_60/77iDIp40m9E_159_181.avi
./dataset/msvd/videos_240_h264_keyint_60/9Wr48VFhZH8_45_50.avi
./dataset/msvd/videos_240_h264_keyint_60/HxRK-WqZ5Gk_30_50.avi
./dataset/msvd/videos_240_h264_keyint_60/UgUFP5baQ9Y_0_7.avi
./dataset/msvd/videos_240_h264_keyint_60/PqSZ89FqpiY_65_75.avi

and I converted these wrong videos to mp4 but got the same error.

I wonder if there is something wrong with the MSVD dataset or cv_reader (I can train normally on MSRVTT).

Your help will be greatly appreciated.

How can I use the pretrained checkpoints?

Thanks to your work!
Could you please tell me how can I use the checkpoints you released?

About memory usage

Hello, thank you very much for publishing such a high-level code. When I use your code to run on my personal video dataset, the memory usage of the program is very high, but the RAM of the workstation I use is 128GB. Of course, this may also be related to the size of my video. Is there any way to reduce the RAM of the code by modifying the config?

Pretrained checkpoint

Thanks for your work!
Could you upload the model's pretrained checkpoint file?
I want to test with the weights file to caption video input.

Thank you

The issue regarding the code getting stuck in the process.

Hello, thank you very much for your open-source code. I've been working on reproducing your code recently and applying it to my personal dataset. However, I've encountered an issue where the process gets stuck after completing one epoch, with no error reported. I hope to get your help. Thank you very much!

docker environment

Dear sir, I have trouble installing cv_reader, for the sake of I have not the sudo permission and my linux server cannot connect to the external network. So is there a suitable docker image or some other way to prepare the environment? Thank you, sir!

您好，我想请教下实验设置的问题

在您提供的代码中，您似乎没有用验证集验证？例如在您的MSVD_caption.json文件中似乎只有train和test的split划分。如果用测试集挑选最合适的训练权重是否有些不公平？三个数据集上似乎都没有用验证集去验证。期待您的回复