ttgeng233 / unav Goto Github PK
View Code? Open in Web Editor NEWDense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Home Page: https://unav100.github.io
License: MIT License
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
Home Page: https://unav100.github.io
License: MIT License
Hi.
I am training the model on untrimmed videos of duration 3-10 minutes. I am giving segment start and segment end in secs in the annotation file. eg.
clip_id,segment_start,segment_end,label,label_id,duration,subset
ASDAB005.mp4,460.0,465.0,Hitting others,7.0,100000000.0,train
ASDHY041.mp4,356.0,363.0,Crying,0.0,100000000.0,train
ASDHY064_54544.mp4,349.0,414.0,Walking away,5.0,100000000.0,train
And duration value is put a very large value.
But during evaluation stage, segments predicted are always in duration between 0-60 secs only. Does anything need to be modified for code to work for longer duration videos?
Thank you
when i execute python setup.py install --user
it happened like this:
running install
/sda/home/xxx/anaconda3/envs/UnAV/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` directly.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
********************************************************************************
!!
self.initialize_options()
/sda/home/xxx/anaconda3/envs/UnAV/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!
********************************************************************************
Please avoid running ``setup.py`` and ``easy_install``.
Instead, use pypa/build, pypa/installer or other
standards-based tools.
See https://github.com/pypa/setuptools/issues/917 for details.
********************************************************************************
!!
self.initialize_options()
running bdist_egg
running egg_info
writing nms_1d_cpu.egg-info/PKG-INFO
writing dependency_links to nms_1d_cpu.egg-info/dependency_links.txt
writing top-level names to nms_1d_cpu.egg-info/top_level.txt
/sda/home/xxx/anaconda3/envs/UnAV/lib/python3.10/site-packages/torch/utils/cpp_extension.py:387: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'nms_1d_cpu.egg-info/SOURCES.txt'
writing manifest file 'nms_1d_cpu.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-cpython-310/nms_1d_cpu.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for nms_1d_cpu.cpython-310-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/nms_1d_cpu.py to nms_1d_cpu.cpython-310.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying nms_1d_cpu.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nms_1d_cpu.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nms_1d_cpu.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying nms_1d_cpu.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.nms_1d_cpu.cpython-310: module references __file__
creating 'dist/nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg
creating /sda/home/xxx/.local/lib/python3.10/site-packages/nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg
Extracting nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg to /sda/home/xxx/.local/lib/python3.10/site-packages
Adding nms-1d-cpu 0.0.0 to easy-install.pth file
detected new path './nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg'
Installed /sda/home/xxx/.local/lib/python3.10/site-packages/nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg
Processing dependencies for nms-1d-cpu==0.0.0
Finished processing dependencies for nms-1d-cpu==0.0.0
and then when i try to train the model, other error happened:
Traceback (most recent call last):
File "/sda/home/xxx/Program/UnAV/./train.py", line 18, in <module>
from libs.modeling import make_multimodal_meta_arch
File "/sda/home/xxx/Program/UnAV/libs/modeling/__init__.py", line 7, in <module>
from . import multimodal_meta_archs
File "/sda/home/xxx/Program/UnAV/libs/modeling/multimodal_meta_archs.py", line 12, in <module>
from ..utils import batched_nms
File "/sda/home/xxx/Program/UnAV/libs/utils/__init__.py", line 1, in <module>
from .nms import batched_nms
File "/sda/home/xxx/Program/UnAV/libs/utils/nms.py", line 5, in <module>
import nms_1d_cpu
ImportError: /sda/home/xxx/.local/lib/python3.10/site-packages/nms_1d_cpu-0.0.0-py3.10-linux-x86_64.egg/nms_1d_cpu.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl
do you know why this?
Hi! Thank you for your good work! I obtain the prediction result using the validation codes and find that each segments in the video are shown. Therefore, I want to know how to generate the final each event instance boundaries. Do you need to combine multi-segment predictions under the same label ?
Hi. I am trying to extract visual and audio features on raw video clips. For visual features,
python main.py stack_size=24 step_size=8 extraction_fps=25 feature_type=i3d
feature dimension for videos matches with that of already shared by you.
Eg. it gives 112x1024 rgb and flow features which matches with that of yours.
But for audio features, after converting the video fps to 25 and without converting fps,
python main.py feature_type=vggish produces features which don't match with that of shared by you.
Eg. It gives 32x128 dim feature only.
Can you please tell what needs to be done so that I can get same 112x128 audio feature?
Thank you.
Hi. Thanks for awesome work.
I am not able to extract visual features in an efficient way. Its taking too much time even on 4 GPUs with 24GB RAM each. I am extracting visual features on 278 videos of 4-5 minutes duration by dividing into 16 parallel subsets simultaneously. It has extracted features for only 60 videos in 24 hours. Can you suggest an efficient way for the same?
How much time did it take for you to extract features of 10000 odd videos of 1 minute duration?
Thank you.
Thanks for your work. I am training the model on custom video dataset. While doing evaluation after one epoch, I get error
modeling/multimodal_meta_archs.py", line 619, in
torch.cat(x) for x in [segs_all, scores_all, cls_idxs_all]
RuntimeError: torch.cat(): expected a non-empty list of Tensors
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:31 (most recent call first):
After sometime, figured out that in def inference_single_video() function:
topk idxs tensor([94585022332496, 94582502706592, 94585097936080, ..., 94585022005264, 140088139025376, 64],
device='cuda:0')
pt_idxs tensor([94585098952080, 140088139025376, 64, ..., 94585021827888, 140088139025376, 64],
device='cuda:0')
are becoming very large.
Can you suggest what could be the issue?
Thank you.
Hi, thank you for sharing interesting work!
I tried to download the UnAV100 dataset, but Baidu pan requires me to create Baidu account.
Because Chinese phone number is needed to create Baidu account and I don't have it, I could not download the dataset.
Could you share the dataset with other cloud services, such as Google drive and one drive?
I greatly appreciate the work you have done. However, during the setup of this project, I encountered two issues:
setup(
name='nms_1d_cpu',
ext_modules=[
CppExtension(
name = 'nms_1d_cpu',
sources = ['./csrc/nms_cpu.cpp'],
extra_compile_args=['-fopenmp']
)
],
cmdclass={
'build_ext': BuildExtension
}
)
if self.use_dependency:
self.dependency_block = make_dependency_block(
'DependencyBlock',
**{
'in_channel' : embd_dim*2,
'n_embd' : 128,
'n_embd_ks' : embd_kernel_size,
'num_classes' : self.num_classes,
'pyramid_level' : backbone_arch[-1] + 1, # n_head?
'path_pdrop' : self.train_droppath,
}
)
Hi, congratulations on your excellent work! Could you please provide the link (Baidu Drive) of dataset (raw videos)? I'd appreciate it if you could help me.
How to run on multiple GPUs?
Hey thanks for your great work in this domain. I was wondering if you could provide the code to generate the figures like fig 3. in the supplementary material.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.