Giter Site home page Giter Site logo

audio-classification's Introduction

Rethinking CNN Models for Audio Classification

This repository contains the PyTorch code for our paper Rethinking CNN Models for Audio Classification. The experiments are conducted on the following three datasets which can be downloaded from the links provided:

  1. ESC-50
  2. UrbanSound8K
  3. GTZAN

Preprocessing

The preprocessing is done separately to save time during the training of the models.

For ESC-50:

python preprocessing/preprocessingESC.py --csv_file /path/to/file.csv --data_dir /path/to/audio_data/ --store_dir /path/to/store_spectrograms/ --sampling_rate 44100

For UrbanSound8K:

python preprocessing/preprocessingUSC.py --csv_file /path/to/csv_file/ --data_dir /path/to/audio_data/ --store_dir /path/to/store_spectrograms/

For GTZAN:

python preprocessing/preprocessingGTZAN.py --data_dir /path/to/audio_data/ --store_dir /path/to/store_spectrograms/ --sampling_rate 22050

Training the Models

The configurations for training the models are provided in the config folder. The sample_config.json explains the details of all the variables in the configurations. The command for training is:

python train.py --config_path /config/your_config.json

audio-classification's People

Contributors

ayaneai avatar kamalesh0406 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

audio-classification's Issues

About datasetaug

In dataloaders/datasetaug.py(line23-line29)

sample = value
limits = ((-2, 2), (0.9, 1.2))

if self.mode=="train":
	pitch_shift = np.random.randint(limits[0][1], limits[0][1] + 1)
	time_stretch = np.random.random() * (limits[1][1] - limits[1][0]) + limits[1][0]
	new_audio = librosa.effects.time_stretch(librosa.effects.pitch_shift(sample, self.sr, pitch_shift), time_stretch)

I want to know if there is something wrong with pitch_shift?
If use pitch_shift = np.random.randint(limits[0][1], limits[0][1] + 1), so pitch shift is fixed as 2.
I think we should use pitch_shift = np.random.randint(limits[0][0], limits[0][1] + 1), so pitch shift will range from -2 to 2 as we expect.

matrix normalization

How did you normalize (3,128,250) inputs? In preprocessing audios no normalization happened.
Does Densenet normalize inputs? if yes, where?

Thankss

train.py is running but no outputs

Salam Kamalesh,

I added a print command before "with tqdm(total=len(data_loader)) as t:" and it was the last output in the console. I stopped running the file after more than 5 hours and nothing changed, unfortunately.

do you have any idea why this may happen?

Thanks a lot!!

Error occur when running the classification for 'UrbanSound8k' with normalization.

Thank you for the great works.

When we tried to run the urbansound 8k classification tasks with augmentation, some error occured (in the fold 2).

"Padding size should be less than the dimension 2 of the samples."

It occured from the following code in "datasetaug.py".

spec = torchaudio.transforms.MelSpectrogram(sample_rate=self.sr, n_fft=self.fft, win_length=window_length, hop_length=hop_length, n_mels=self.melbins)(clip)

Is it normal? or i run in the different version of librosa?

Thank you.

Accuracy of each fold

Sorry to disturb you. I fork your code and try to use 'resnet' to predict samples in dataset 'UrbanSound8k'. However, Accuracy of the first fold is 76.9, which is far below 84.76%[1], so I want to know if it is normal and hope if you could make your result of each fold public. Look forward to your reply.

[1] Palanisamy, Kamalesh, et al. “Rethinking CNN Models for Audio Classification.” ArXiv Preprint ArXiv:2007.11154, 2020.

About the Integrated Gradients

Hi,
Thanks for your contibution. I am interested in your paper and trying to run the scripts. I found you mentioned the integrated gradients results in your paper. It is amazing. Could you provide the related code?

Thanks

About paper

Hello, thanks for your contibution. I would like to know the current status of your paper. Whether it has been accepted or not?

Prediction/Inference for novel data

Do you provide a utility/function for inference on novel data, that is, a way to apply a trained model to a previously unseen audio file?

Question regarding json file

Hello Kamalesh,

I am interested in your paper and am trying to run your solution. I have a question regarding Urbansound8k config file. You mentioned the number of fold =1. Why you did this?

About GPU Utilization

Thanks for your great work!
I tried to run your project, but the speed of training is very slow. I find GPU Utilization is very low, only 1%. However, GPU Usage is normal about 4GB.
I don't know the reason it happened. Looking forward to your answer!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.