Comments (5)
Hi Calum,
I have obtained some descent results on BirdCLEF (pruned), I have trained on the training split using self-supervised learning methods. For long audio files, using a pre-trained audio network to select the segments with high activation of bird class boosts the performance. We have a pre-print if you'd like to take a look at it : https://arxiv.org/pdf/2312.15824.pdf
We also have a repo for this work : https://github.com/ilyassmoummad/ssl4birdsounds
Happy new year !
Best,
Ilyass
from metaaudio-a-few-shot-audio-classification-benchmark.
Hi there,
Thanks for the in depth details of your issue!
Based on the information you have provided, I would suggest some code profiling as a starting point. I can't say I have extensively used torchaudio load (generally I opt for .pt pre-processing and use torch.load natively), however based on my experience torchaudio I/O can take slightly longer, as it has a built-in conversion process. This being said, if loading time fails painfully long, something else is probably occuring. I would suggest timing the following components and seeing where the majority is being taken:
- torchaudiio load call
- librosa get duration call (some functions I have used in librosa in the past are quite slow). Also note that from what I understand librosa uses numpy so there may be a double conversion here i.e.torch tensor-> numpy array -> torch tensor
- Time variation using different number of num_loaders. The relationship between set num_loaders and time from my experience is rarely linear and so this may be having an impact. For example, if num_loaders is too high, due to the large file size of birdclef samples, it is possible that computer /vram memory is being filled and is overflowing, which would likely put a massive hit on loading time over a full set. My recommendation is to set this to 0 while you evaluate the other options above and then play with this as a last test
Hope this helps!
Best,
Calum
from metaaudio-a-few-shot-audio-classification-benchmark.
Thank you Calum for your reply and for the tips and informations you've provided.
I have a question about torch.load and .pt format. How do you deal with long audio recordings ? because as far as I know with torch.load we can't specify the frames to sample unlike torchaudio.load. Do you chunk the file in many 5s files ? in that case do you consider these chunks as different samples for when you sample batches/episodes or you constrain to sample at most 1.
Because I have tried chunking the long audio recording and concatenated them in the first dimension in order to sample from that dimension the but that requires me to load the whole three minutes (max duration).
Thanks in advance !
Best,
Ilyass
from metaaudio-a-few-shot-audio-classification-benchmark.
Hi Ilyass,
So the way I treated it when this codebase was written was in fact chunked. I preprocess the datasets into 5s parts and stacked them in a 2d tensor. At sample time, I would load that full sample and then subselect one of the already chunked parts (meaning i wasn't recombining them and then resampling). In theory this isnt the best way to go, as you in theory leaving many combinations unsampled, e.g. maybe 2nd second half of 1 clip and 1st half of another. Although not ideal, experiments at the time suggested that final performance wasn't really impacted and so I opted for it due to the speed-up.
Saying all of this, I now use a different pipeline which utilises normal full sample loading with proper subselection and the loading doesn't seem to be that expensive. Did you get any interesting results from timing these functions? I haven't used torchaudio frame selection from file before, can you versify that this is truly faster than just loading the file, manual subsampling and then discarding?
Best,
Calum
from metaaudio-a-few-shot-audio-classification-benchmark.
Hi Calum,
Thank you so much for sharing these techniques. I have tried all of them in the past, but the issue persisted. I realized that my problem originated from loading data stored on a remote server. Fetching the data each time in the getitem function led to very long loading times. Now, I am using the data on a local machine, and everything is running smoothly! Thank you a lot again! I won't take long before I have some results on BirdCLEF 2020 (pruned), and I will be gladly sharing them with you !
Best,
Ilyass
from metaaudio-a-few-shot-audio-classification-benchmark.
Related Issues (12)
- How to reproduce the results HOT 1
- Request for Updates on SimpleShot Code for ESC-50 Dataset HOT 2
- train_batch_size represents what? HOT 1
- About the accuracy of 5-way 5-shot HOT 1
- Problems with full_stack_KAGGLE.py
- How to implement Meta-Curvature HOT 2
- Unable to run example for MAML_ESC HOT 2
- How can I run MAMl Example for a predefined support set and a query HOT 2
- Training for more than n_way>5 HOT 1
- Environment configuration Question
- to_spec.py IndentationError HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metaaudio-a-few-shot-audio-classification-benchmark.