Giter Site home page Giter Site logo

acherstyx / cocap Goto Github PK

View Code? Open in Web Editor NEW
31.0 31.0 4.0 2.69 MB

[ICCV 2023] Accurate and Fast Compressed Video Captioning

Home Page: https://arxiv.org/abs/2309.12867

License: MIT License

Python 99.96% Shell 0.04%
compressed-video iccv2023 video-captioning

cocap's Introduction

Hi! I'm Yaojie Shen 👋

I’m currently a Master’s student at Institute of Software, Chinese Academy of Sciences.
My recent research interest lie in the fields of:

  • Computer Vision
  • Multimodality
  • Large Language Models

Language & Tools

python logo pytorch logo git logo cmake logo ...

Knowledge Management Workflow

Obsidian, Zotero, Things, Notion, ...

Status

acherstyx's github stats 

cocap's People

Contributors

acherstyx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cocap's Issues

Train the model on single GPU

I trained the model on single Nvidia RTX-4090 use the default config setting. However the result of the test dataset is significantly worse than the paper reported e.g. CIDer in msvd dataset from 113.0 -> 101.5.
I also tuned the accumulation step to 32 in order to satisfy the requirement of batch_size 64 in the paper in config setting but it seemed not helpful.

More information about the environment in which the code runs

I found that using some specific versions of CUDA and PyTorch may cause segfaults, dynamic link library exceptions, etc., making the code unable to be reproduced. I hope the author can provide information on a runnable environment (the version of the required library)
thank you

About inference speed

Thank you for sharing your code.
Could you please provide additional details regarding the inference speed calculation in Fig. 2 and Table 3? I am a bit confused.

Regarding Table 3, where the inference time for your model is listed as 178 ms, could you specify if this time corresponds to generate caption for one video file ?

Additionally, I would appreciate clarification on whether the time costs of IO operations and frame extraction are excluded from these calculations.

Lastly, Lastly, the videos in MSRVTT have a different number of frames, so how was this issue addressed in Table 3? For your model, how many frames per video are considered?

ERROR about cv_reader when using MSVD

image

I converted the video according to the method you provided. I found that some errors occurred in the batch of videos(num_workers=12, 4 were correct and 8 were wrong) , the wrong videos are:

./dataset/msvd/videos_240_h264_keyint_60/Nd45qJn61Dw_0_10.avi
./dataset/msvd/videos_240_h264_keyint_60/5P6UU6m3cqk_57_75.avi
./dataset/msvd/videos_240_h264_keyint_60/PD6eQY7yCfw_32_37.avi
./dataset/msvd/videos_240_h264_keyint_60/77iDIp40m9E_159_181.avi
./dataset/msvd/videos_240_h264_keyint_60/9Wr48VFhZH8_45_50.avi
./dataset/msvd/videos_240_h264_keyint_60/HxRK-WqZ5Gk_30_50.avi
./dataset/msvd/videos_240_h264_keyint_60/UgUFP5baQ9Y_0_7.avi
./dataset/msvd/videos_240_h264_keyint_60/PqSZ89FqpiY_65_75.avi

and I converted these wrong videos to mp4 but got the same error.

I wonder if there is something wrong with the MSVD dataset or cv_reader (I can train normally on MSRVTT).

Your help will be greatly appreciated.

About memory usage

Hello, thank you very much for publishing such a high-level code. When I use your code to run on my personal video dataset, the memory usage of the program is very high, but the RAM of the workstation I use is 128GB. Of course, this may also be related to the size of my video. Is there any way to reduce the RAM of the code by modifying the config?

Pretrained checkpoint

Thanks for your work!
Could you upload the model's pretrained checkpoint file?
I want to test with the weights file to caption video input.

Thank you

The issue regarding the code getting stuck in the process.

Hello, thank you very much for your open-source code. I've been working on reproducing your code recently and applying it to my personal dataset. However, I've encountered an issue where the process gets stuck after completing one epoch, with no error reported. I hope to get your help. Thank you very much!

docker environment

Dear sir, I have trouble installing cv_reader, for the sake of I have not the sudo permission and my linux server cannot connect to the external network. So is there a suitable docker image or some other way to prepare the environment? Thank you, sir!

您好,我想请教下实验设置的问题

在您提供的代码中,您似乎没有用验证集验证?例如在您的MSVD_caption.json文件中似乎只有train和test的split划分。如果用测试集挑选最合适的训练权重是否有些不公平?三个数据集上似乎都没有用验证集去验证。期待您的回复

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.