cdjkim / audiocaps Goto Github PK
View Code? Open in Web Editor NEW๐ Repository for our NAACL-HLT 2019 paper: AudioCaps
Home Page: https://audiocaps.github.io/
License: MIT License
๐ Repository for our NAACL-HLT 2019 paper: AudioCaps
Home Page: https://audiocaps.github.io/
License: MIT License
Hi,
Thanks so much for this great and impressive resource!
I am relatively new to the field of audio captioning, so apologies if my question is basic :)
I was wondering if you have a piece of code to download the relevant files?
Or do I need to download the entire AudioSet data? If so, can you please point me to a code that does that reliably?
Thanks in advance,
Felix
Thanks a lot for your great contribution!
May I ask that could you release the audio classes (or sound events) responding to each sample on Audiocaps?
Hi all,
Thank you for creating and sharing the AudioCaps dataset. I found it to be very useful.
However, I noticed that the number of files in each set (training, validation, and test) is very different from the numbers presented in the official repository. Here are the number of files I obtained:
Set | Number of files |
---|---|
Training | 45458 |
Validation | 2245 |
Test | 4440 |
However, the original values in the repository are:
Set | Number of files |
---|---|
Training | 49,838 |
Validation | 495 |
Test | 975 |
Total | 51,308 |
I also noticed that the csv files contain more rows than what is proposed as the validation and test set, and are more similar to the number of files I obtained.
I am wondering if there is something I am missing or if there is an issue with the original values provided in the repository. Please let me know if there is any clarification needed or if there are any updates to the dataset.
I also created a python package to download the dataset very easily: https://github.com/MorenoLaQuatra/audiocaps-download
Thank you for your time and for providing this valuable resource.
Best regards,
Moreno La Quatra
Hi, I want to download the audio data of AudioCaps, and writting to notarize the duration of audio clips. In the csv files, just beginning time is provided. Is the duration of each audio clip in AudioCaps 10 seconds?
So how does the dataset end up being downloaded?
Nice work and thanks for sharing the dataset and code!
In your paper, I found that "The final number of audio clips is about 115K, from which we obtain captions for 46K as the first version". Could you please also share the final dataset with only word labels? Thank you!
Best,
Yapeng
hello, pre-trained model does not download (google driver link is error)
so, Could I download pre-trained model audiocaps ?
thank you
Really enjoyed reading your work. Are the video captioning sentences available by any chance?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.