tyiannak / deep_audio_features Goto Github PK

Pytorch implementation of deep audio embedding calculation

License: MIT License

Python 99.90% Shell 0.10%

deep_audio_features's Introduction

Welcome 👋

I am a Principal Researcher at the Multimedia Analysis Group of the Institute of Informatics and Telecommunication< of the National Center for Scientific Research "Demokritos" and a Machine Learning Director at Behavioral Signals. You can find more info about me at my web page

deep_audio_features's People

Contributors

Stargazers

Watchers

Forkers

hadryan nikosmichas anupsingh15 sandybei minhquan-pham kevinn1999 kathy883 iamaer4fa

deep_audio_features's Issues

Max sequence length computation error

The code down below does not compute the max sequence length. Please check the length formula.

deep_audio_features/utils/load_dataset.py

Lines 115 to 121 in 71e0ba5

    
           with contextlib.closing(wave.open(f, 'r')) as fp: 
        
               frames = fp.getnframes() 
        
               fs = fp.getframerate() 
        
               duration = frames / float(fs) 
        
               length = int((duration - 
        
                             (config.HOP_LENGTH - config.HOP_LENGTH)) / \ 
        
                            (config.HOP_LENGTH) + 1)

Code cleanup, folder re-organization and readme update

Combine deep feature models: create a basic script that solves any audio classification task using pyAudioAnalysis

Testing Script failure

Hello,
After a successful run of the training script, I got a model and applied it to the testing script. This is what I got after the execution:

Traceback (most recent call last): File "basic_test.py", line 94, in <module> test_model(modelpath=model, ifile=ifile, layers_dropped=layers_dropped) File "basic_test.py", line 54, in test_model fuse=fuse) TypeError: __init__() got an unexpected keyword argument 'spec_size'

Thanks,

Bug in classification report?

Theres a bug related to path dept in classification report. To reproduce:

from deep_audio_features.bin import classification_report as cr
cr.test_report('/Users/tyiannak/Downloads/soundscape_8k_1s.pt', ['/Users/tyiannak/Downloads/soundscape_8k_1sec/test/1', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/2/', '/Users/tyiannak/Downloads/soundscape_8k_1sec/te
   ...: st/3', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/4', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/5'])

Loaded model class mapping: {0: '1', 1: '2', 2: '3', 3: '4', 4: '5'}
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-3e59e19efe16> in <module>
----> 1 cr.test_report('/Users/tyiannak/Downloads/soundscape_8k_1s.pt', ['/Users/tyiannak/Downloads/soundscape_8k_1sec/test/1', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/2/', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/3', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/4', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/5'])

/usr/local/lib/python3.9/site-packages/deep_audio_features/bin/classification_report.py in test_report(model_path, folders)
     58 
     59     max_seq_length = model.max_sequence_length
---> 60     files_test, y_test, class_mapping = load_dataset.load(
     61         folders=folders, test=False,
     62         validation=False, class_mapping=class_mapping)

/usr/local/lib/python3.9/site-packages/deep_audio_features/utils/load_dataset.py in load(folders, test_val, test, validation, class_mapping)
     71         folder2idx = {v: k for k, v in idx2folder.items()}
     72 
---> 73     labels = list(map(lambda x: folder2idx[x], labels))
     74 
     75     class_mapping = {}

/usr/local/lib/python3.9/site-packages/deep_audio_features/utils/load_dataset.py in <lambda>(x)
     71         folder2idx = {v: k for k, v in idx2folder.items()}
     72 
---> 73     labels = list(map(lambda x: folder2idx[x], labels))
     74 
     75     class_mapping = {}

KeyError: '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/1'

if I go to the soundscape_8k_1sec path and then run

cr.test_report('../soundscape_8k_1s.pt', ['test/1', 'test/2/', 'test/3', 'test/4', 'test/5'])

Everything runs ok.

Also if I use the long path in the bin.basic_training script it also runs ok. So probably sth is going wrong with the load_dataset.load(), around the class mapping assignment when classification_report is used.

CNN + TRL

CNN and Tensor Regression Layer instead of linear

Add classnames in the loading and saving of the CNN models

Can this provide any output that can be used to coach language learners pronunciation?

You were listed here: https://github.com/tyiannak/pyAudioAnalysis
I'm just looking around at the moment so if you can give me some idea on what this does, I would be very grateful. I'm being a bit lazy here as I do have a comp sci degree.

convolutional auto encoder train and test functions and scripts

Basic Testing Script

Add initialization methods

Add folder setup script

Int16 melgram

Check if Int16 melgram has comparable performance to float 32 melgram

Error in histogram file name on Windows 10.

Got this error on Windows 10 running:

C:\Python310\lib\site-packages\deep_audio_features\bin\basic_training.py -i "genres/blues" "genres/classical" "genres/country" "genres/disco" "genres/hiphop", "genres/jazz" "genres/metal" "genres/pop" "genres/reggae" "genres/rock" -o "energy"

...
--> Plotting histogram of spectrogram sizes.
Traceback (most recent call last):
File "", line 1, in
File "C:\Python310\lib\site-packages\deep_audio_features\bin\basic_training.py", line 64, in train_model
train_set = FeatureExtractorDataset(X=files_train, y=y_train,
File "C:\Python310\lib\site-packages\deep_audio_features\dataloading\dataloading.py", line 86, in init
self.plot_hist(spec_sizes, y)
File "C:\Python310\lib\site-packages\deep_audio_features\dataloading\dataloading.py", line 261, in plot_hist
plt.savefig(ct.strftime("%m_%d_%Y, %H:%M:%S") + ".png")
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\pyplot.py", line 1023, in savefig
res = fig.savefig(*args, **kwargs)
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\figure.py", line 3378, in savefig
self.canvas.print_figure(fname, **kwargs)
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\backend_bases.py", line 2366, in print_figure
result = print_method(
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\backend_bases.py", line 2232, in
print_method = functools.wraps(meth)(lambda *args, **kwargs: meth(
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\backends\backend_agg.py", line 509, in print_png
self._print_pil(filename_or_obj, "png", pil_kwargs, metadata)
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\backends\backend_agg.py", line 458, in _print_pil
mpl.image.imsave(
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\image.py", line 1689, in imsave
image.save(fname, **pil_kwargs)
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\PIL\Image.py", line 2410, in save
fp = builtins.open(filename, "w+b")
OSError: [Errno 22] Invalid argument: '01_09_2024, 08:51:43.png'

"ValueError" in Transfer Learning script

Hi,
While running the transfer learning script in terminal, I get a "ValueError":
Resetting model to epoch 14. Traceback (most recent call last): File "bin/transfer_learning.py", line 179, in <module> transfer_learning(model=modelpath, folders=folders, strategy=strategy) File "bin/transfer_learning.py", line 122, in transfer_learning best_model, train_losses, valid_losses, train_accuracy, \ ValueError: too many values to unpack (expected 6)

What can I do?

Thanks!

Add segmentation in retrieval scripts

Combine deep feature models: add deep learning feature extractors as FE alternatives

Transfer learning refactoring

Change segmentation from spec-based to audio-based

Basic Training Script

Are there any pretrained models?

Hello, do you provide any pretrained models with this project?

Refactoring

Change model saving and loading

Store just weights, not the whole object
@lobracost

Test segmentation

Code cleanup, docs and readme update

"pop up windows" in scripts

Hi,

When I execute the "training script" (and I think same happens with the other two scripts also) it starts like this:

I see the terminal window and a "pop up" window named "Figure 1". In order to proceed I must close the pop up window. If I don't nothing happens. So When I close it the script continues like this:

Again, I have to close the window to continue the execution. When I close it, the script continues as expected but:

This time I can't close the pop up "Figure 1" window until the script is finished running.

So, suppose this problem is something you can reproduce is there a way to fix it?
It would be great if the Histograms could be saved as an image automatically without user interference.

Thank you very much,

KeyError: 'Classifier' Error in 3.4.3 Predict on an unknown sample using the combiner

Here is a KeyError when I run deep_audio_features/combine/predict.py after installing more libraries as required.

Does anyone know how to deal with this?

Create fine-tuning script

Script for fine tuning of a model.
Similar code structure with train function.

deep_audio_features/core/basic_training.py

Line 20 in 0b2c4b6

def train_model(folders=None):

Refactor

Configure architectures from config file
Class for training and validation

error in audioTrainTest.extract_features_and_train if class folder contain only 1 sample

295 for feat in features:
296         temp = []
297         for i in range(feat.shape[0]):
298             temp_fv = feat[i, :]

if one of the class folder has only 1 sample, feat will be an 1d array of shape (, 136). And line 398 will give an error as it tries to access 1d array with 2d indices (feat[i, :])

I propose the following fix

    for feat in features:
        if feat.ndim == 1: # this class has only 1 sample
            feat = feat.reshape((1, feat.shape[0]))
        temp = []
        for i in range(feat.shape[0]):
            temp_fv = feat[i, :]

	with contextlib.closing(wave.open(f, 'r')) as fp:
	frames = fp.getnframes()
	fs = fp.getframerate()
	duration = frames / float(fs)
	length = int((duration -
	(config.HOP_LENGTH - config.HOP_LENGTH)) / \
	(config.HOP_LENGTH) + 1)