Giter Site home page Giter Site logo

Comments (5)

ilyassmoummad avatar ilyassmoummad commented on June 3, 2024 1

Hi Calum,

I have obtained some descent results on BirdCLEF (pruned), I have trained on the training split using self-supervised learning methods. For long audio files, using a pre-trained audio network to select the segments with high activation of bird class boosts the performance. We have a pre-print if you'd like to take a look at it : https://arxiv.org/pdf/2312.15824.pdf
We also have a repo for this work : https://github.com/ilyassmoummad/ssl4birdsounds

Happy new year !

Best,
Ilyass

from metaaudio-a-few-shot-audio-classification-benchmark.

CHeggan avatar CHeggan commented on June 3, 2024

Hi there,

Thanks for the in depth details of your issue!

Based on the information you have provided, I would suggest some code profiling as a starting point. I can't say I have extensively used torchaudio load (generally I opt for .pt pre-processing and use torch.load natively), however based on my experience torchaudio I/O can take slightly longer, as it has a built-in conversion process. This being said, if loading time fails painfully long, something else is probably occuring. I would suggest timing the following components and seeing where the majority is being taken:

  • torchaudiio load call
  • librosa get duration call (some functions I have used in librosa in the past are quite slow). Also note that from what I understand librosa uses numpy so there may be a double conversion here i.e.torch tensor-> numpy array -> torch tensor
  • Time variation using different number of num_loaders. The relationship between set num_loaders and time from my experience is rarely linear and so this may be having an impact. For example, if num_loaders is too high, due to the large file size of birdclef samples, it is possible that computer /vram memory is being filled and is overflowing, which would likely put a massive hit on loading time over a full set. My recommendation is to set this to 0 while you evaluate the other options above and then play with this as a last test

Hope this helps!

Best,
Calum

from metaaudio-a-few-shot-audio-classification-benchmark.

ilyassmoummad avatar ilyassmoummad commented on June 3, 2024

Thank you Calum for your reply and for the tips and informations you've provided.
I have a question about torch.load and .pt format. How do you deal with long audio recordings ? because as far as I know with torch.load we can't specify the frames to sample unlike torchaudio.load. Do you chunk the file in many 5s files ? in that case do you consider these chunks as different samples for when you sample batches/episodes or you constrain to sample at most 1.
Because I have tried chunking the long audio recording and concatenated them in the first dimension in order to sample from that dimension the but that requires me to load the whole three minutes (max duration).

Thanks in advance !

Best,
Ilyass

from metaaudio-a-few-shot-audio-classification-benchmark.

CHeggan avatar CHeggan commented on June 3, 2024

Hi Ilyass,

So the way I treated it when this codebase was written was in fact chunked. I preprocess the datasets into 5s parts and stacked them in a 2d tensor. At sample time, I would load that full sample and then subselect one of the already chunked parts (meaning i wasn't recombining them and then resampling). In theory this isnt the best way to go, as you in theory leaving many combinations unsampled, e.g. maybe 2nd second half of 1 clip and 1st half of another. Although not ideal, experiments at the time suggested that final performance wasn't really impacted and so I opted for it due to the speed-up.

Saying all of this, I now use a different pipeline which utilises normal full sample loading with proper subselection and the loading doesn't seem to be that expensive. Did you get any interesting results from timing these functions? I haven't used torchaudio frame selection from file before, can you versify that this is truly faster than just loading the file, manual subsampling and then discarding?

Best,
Calum

from metaaudio-a-few-shot-audio-classification-benchmark.

ilyassmoummad avatar ilyassmoummad commented on June 3, 2024

Hi Calum,

Thank you so much for sharing these techniques. I have tried all of them in the past, but the issue persisted. I realized that my problem originated from loading data stored on a remote server. Fetching the data each time in the getitem function led to very long loading times. Now, I am using the data on a local machine, and everything is running smoothly! Thank you a lot again! I won't take long before I have some results on BirdCLEF 2020 (pruned), and I will be gladly sharing them with you !

Best,
Ilyass

from metaaudio-a-few-shot-audio-classification-benchmark.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.