Hi, I have an issue concerning the loading time of BirdClef2020(prun

Loading of BirdClef2020(pruned) about metaaudio-a-few-shot-audio-classification-benchmark HOT 5 CLOSED

cheggan commented on June 3, 2024

Loading of BirdClef2020(pruned)

from metaaudio-a-few-shot-audio-classification-benchmark.

Comments (5)

ilyassmoummad commented on June 3, 2024 1

Hi Calum,

I have obtained some descent results on BirdCLEF (pruned), I have trained on the training split using self-supervised learning methods. For long audio files, using a pre-trained audio network to select the segments with high activation of bird class boosts the performance. We have a pre-print if you'd like to take a look at it : https://arxiv.org/pdf/2312.15824.pdf
We also have a repo for this work : https://github.com/ilyassmoummad/ssl4birdsounds

Happy new year !

Best,
Ilyass

from metaaudio-a-few-shot-audio-classification-benchmark.

CHeggan commented on June 3, 2024

Hi there,

Thanks for the in depth details of your issue!

Based on the information you have provided, I would suggest some code profiling as a starting point. I can't say I have extensively used torchaudio load (generally I opt for .pt pre-processing and use torch.load natively), however based on my experience torchaudio I/O can take slightly longer, as it has a built-in conversion process. This being said, if loading time fails painfully long, something else is probably occuring. I would suggest timing the following components and seeing where the majority is being taken:

torchaudiio load call
librosa get duration call (some functions I have used in librosa in the past are quite slow). Also note that from what I understand librosa uses numpy so there may be a double conversion here i.e.torch tensor-> numpy array -> torch tensor
Time variation using different number of num_loaders. The relationship between set num_loaders and time from my experience is rarely linear and so this may be having an impact. For example, if num_loaders is too high, due to the large file size of birdclef samples, it is possible that computer /vram memory is being filled and is overflowing, which would likely put a massive hit on loading time over a full set. My recommendation is to set this to 0 while you evaluate the other options above and then play with this as a last test

Hope this helps!

Best,
Calum

from metaaudio-a-few-shot-audio-classification-benchmark.

ilyassmoummad commented on June 3, 2024

Thank you Calum for your reply and for the tips and informations you've provided.
I have a question about torch.load and .pt format. How do you deal with long audio recordings ? because as far as I know with torch.load we can't specify the frames to sample unlike torchaudio.load. Do you chunk the file in many 5s files ? in that case do you consider these chunks as different samples for when you sample batches/episodes or you constrain to sample at most 1.
Because I have tried chunking the long audio recording and concatenated them in the first dimension in order to sample from that dimension the but that requires me to load the whole three minutes (max duration).

Thanks in advance !

Best,
Ilyass

from metaaudio-a-few-shot-audio-classification-benchmark.

CHeggan commented on June 3, 2024

Hi Ilyass,

So the way I treated it when this codebase was written was in fact chunked. I preprocess the datasets into 5s parts and stacked them in a 2d tensor. At sample time, I would load that full sample and then subselect one of the already chunked parts (meaning i wasn't recombining them and then resampling). In theory this isnt the best way to go, as you in theory leaving many combinations unsampled, e.g. maybe 2nd second half of 1 clip and 1st half of another. Although not ideal, experiments at the time suggested that final performance wasn't really impacted and so I opted for it due to the speed-up.

Saying all of this, I now use a different pipeline which utilises normal full sample loading with proper subselection and the loading doesn't seem to be that expensive. Did you get any interesting results from timing these functions? I haven't used torchaudio frame selection from file before, can you versify that this is truly faster than just loading the file, manual subsampling and then discarding?

Best,
Calum

from metaaudio-a-few-shot-audio-classification-benchmark.

ilyassmoummad commented on June 3, 2024

Hi Calum,

Thank you so much for sharing these techniques. I have tried all of them in the past, but the issue persisted. I realized that my problem originated from loading data stored on a remote server. Fetching the data each time in the getitem function led to very long loading times. Now, I am using the data on a local machine, and everything is running smoothly! Thank you a lot again! I won't take long before I have some results on BirdCLEF 2020 (pruned), and I will be gladly sharing them with you !

Best,
Ilyass

from metaaudio-a-few-shot-audio-classification-benchmark.

Loading of BirdClef2020(pruned) about metaaudio-a-few-shot-audio-classification-benchmark HOT 5 CLOSED

Comments (5)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent