Giter Site home page Giter Site logo

ttgeng233 / unav Goto Github PK

View Code? Open in Web Editor NEW
53.0 53.0 3.0 20.35 MB

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Home Page: https://unav100.github.io

License: MIT License

Python 96.52% C++ 3.48%
audio-visual-events audio-visual-learning multi-modal-learning

unav's People

Contributors

ttengwang avatar ttgeng233 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

unav's Issues

Videos of duration greater than 1 minute

Hi.
I am training the model on untrimmed videos of duration 3-10 minutes. I am giving segment start and segment end in secs in the annotation file. eg.
clip_id,segment_start,segment_end,label,label_id,duration,subset
ASDAB005.mp4,460.0,465.0,Hitting others,7.0,100000000.0,train
ASDHY041.mp4,356.0,363.0,Crying,0.0,100000000.0,train
ASDHY064_54544.mp4,349.0,414.0,Walking away,5.0,100000000.0,train

And duration value is put a very large value.

But during evaluation stage, segments predicted are always in duration between 0-60 secs only. Does anything need to be modified for code to work for longer duration videos?
Thank you

nms-1d-cpu importError

when i execute python setup.py install --user
it happened like this:

running install
/sda/home/xxx/anaconda3/envs/UnAV/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
/sda/home/xxx/anaconda3/envs/UnAV/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
running bdist_egg
running egg_info
writing nms_1d_cpu.egg-info/PKG-INFO
writing dependency_links to nms_1d_cpu.egg-info/dependency_links.txt
writing top-level names to nms_1d_cpu.egg-info/top_level.txt
/sda/home/xxx/anaconda3/envs/UnAV/lib/python3.10/site-packages/torch/utils/cpp_extension.py:387: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'nms_1d_cpu.egg-info/SOURCES.txt'
writing manifest file 'nms_1d_cpu.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-310/nms_1d_cpu.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for nms_1d_cpu.cpython-310-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/nms_1d_cpu.py to nms_1d_cpu.cpython-310.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying nms_1d_cpu.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nms_1d_cpu.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nms_1d_cpu.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nms_1d_cpu.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.nms_1d_cpu.cpython-310: module references __file__
creating 'dist/nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg
creating /sda/home/xxx/.local/lib/python3.10/site-packages/nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg
Extracting nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg to /sda/home/xxx/.local/lib/python3.10/site-packages
Adding nms-1d-cpu 0.0.0 to easy-install.pth file
detected new path './nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg'

Installed /sda/home/xxx/.local/lib/python3.10/site-packages/nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg
Processing dependencies for nms-1d-cpu==0.0.0
Finished processing dependencies for nms-1d-cpu==0.0.0

and then when i try to train the model, other error happened:

Traceback (most recent call last):
  File "/sda/home/xxx/Program/UnAV/./train.py", line 18, in <module>
    from libs.modeling import make_multimodal_meta_arch
  File "/sda/home/xxx/Program/UnAV/libs/modeling/__init__.py", line 7, in <module>
    from . import multimodal_meta_archs
  File "/sda/home/xxx/Program/UnAV/libs/modeling/multimodal_meta_archs.py", line 12, in <module>
    from ..utils import batched_nms
  File "/sda/home/xxx/Program/UnAV/libs/utils/__init__.py", line 1, in <module>
    from .nms import batched_nms
  File "/sda/home/xxx/Program/UnAV/libs/utils/nms.py", line 5, in <module>
    import nms_1d_cpu
ImportError: /sda/home/xxx/.local/lib/python3.10/site-packages/nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg/nms_1d_cpu.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl

do you know why this?

visulization

Hi! Thank you for your good work! I obtain the prediction result using the validation codes and find that each segments in the video are shown. Therefore, I want to know how to generate the final each event instance boundaries. Do you need to combine multi-segment predictions under the same label ?

Vggish Feature size

Hi. I am trying to extract visual and audio features on raw video clips. For visual features,
python main.py stack_size=24 step_size=8 extraction_fps=25 feature_type=i3d
feature dimension for videos matches with that of already shared by you.
Eg. it gives 112x1024 rgb and flow features which matches with that of yours.

But for audio features, after converting the video fps to 25 and without converting fps,
python main.py feature_type=vggish produces features which don't match with that of shared by you.
Eg. It gives 32x128 dim feature only.
Can you please tell what needs to be done so that I can get same 112x128 audio feature?

Thank you.

Computation of i3d features

Hi. Thanks for awesome work.

I am not able to extract visual features in an efficient way. Its taking too much time even on 4 GPUs with 24GB RAM each. I am extracting visual features on 278 videos of 4-5 minutes duration by dividing into 16 parallel subsets simultaneously. It has extracted features for only 60 videos in 24 hours. Can you suggest an efficient way for the same?

How much time did it take for you to extract features of 10000 odd videos of 1 minute duration?

Thank you.

Issue in

Thanks for your work. I am training the model on custom video dataset. While doing evaluation after one epoch, I get error
modeling/multimodal_meta_archs.py", line 619, in
torch.cat(x) for x in [segs_all, scores_all, cls_idxs_all]
RuntimeError: torch.cat(): expected a non-empty list of Tensors
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:31 (most recent call first):

After sometime, figured out that in def inference_single_video() function:
topk idxs tensor([94585022332496, 94582502706592, 94585097936080, ..., 94585022005264, 140088139025376, 64],
device='cuda:0')
pt_idxs tensor([94585098952080, 140088139025376, 64, ..., 94585021827888, 140088139025376, 64],
device='cuda:0')
are becoming very large.

Can you suggest what could be the issue?
Thank you.

Request for dataset download location

Hi, thank you for sharing interesting work!
I tried to download the UnAV100 dataset, but Baidu pan requires me to create Baidu account.
Because Chinese phone number is needed to create Baidu account and I don't have it, I could not download the dataset.
Could you share the dataset with other cloud services, such as Google drive and one drive?

missing nms_cpu.cpp file

I greatly appreciate the work you have done. However, during the setup of this project, I encountered two issues:

  1. Environment configuration bug: according to the content in utils/setup.py, there is a missing file './csrc/nms_cpu.cpp' in this project.
setup(
    name='nms_1d_cpu',
    ext_modules=[
        CppExtension(
            name = 'nms_1d_cpu',
            sources = ['./csrc/nms_cpu.cpp'],
            extra_compile_args=['-fopenmp']
        )
    ],
    cmdclass={
        'build_ext': BuildExtension
    }
)
  1. Parameter passing bug: In multimodal_meta_archs.py, on line 266, the DependencyBlock class is called without the pyramid_level parameter being passed.
if self.use_dependency:
            self.dependency_block = make_dependency_block(
                'DependencyBlock',
                **{
                    'in_channel' : embd_dim*2,
                    'n_embd' : 128,  
                    'n_embd_ks' : embd_kernel_size,
                    'num_classes' : self.num_classes,
                    'pyramid_level' : backbone_arch[-1] + 1, # n_head?
                    'path_pdrop' : self.train_droppath,
                }
            )

Request for dataset!

Hi, congratulations on your excellent work! Could you please provide the link (Baidu Drive) of dataset (raw videos)? I'd appreciate it if you could help me.

Code for visualisations

Hey thanks for your great work in this domain. I was wondering if you could provide the code to generate the figures like fig 3. in the supplementary material.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.