py-lidbox / lidbox Goto Github PK

End-to-end spoken language identification out of the box.

License: MIT License

Python 99.52% JavaScript 0.30% Makefile 0.18%

language-recognition language-identification tensorflow spoken-language-recognition spoken-language-identification audio-analysis speech deep-learning big-data

lidbox's People

Contributors

Stargazers

Watchers

Forkers

mindis hangxiu entn-at yaffa16 janaab11 gaoyiyeah vinye ishine twistedmove ilyushin fb029ed ai-x-king ykingliu

lidbox's Issues

Inspect if model2function ops are placed on a device during function creation or when the function is called

lidbox/lidbox/util.py

Lines 101 to 106 in 49c27a0

    
           def model2function(model): 
        
               model_input = model.inputs[0] 
        
               model_fn = tf.function( 
        
                       lambda x: model(x, training=False), 
        
                       input_signature=[tf.TensorSpec(model_input.shape, model_input.dtype)]) 
        
               return model_fn.get_concrete_function()

Rewrite end-to-end builder or decide not to use config files at all

Defining an end-to-end pipeline in yaml adds an unnecessary layer of complexity. Perhaps a single example pipeline could be supported but any customization is easier to do with a custom Python script by using the lidbox API.

separate training and prediction steps for backend classifiers

Add generic Kaldi-like metadata loader

utt2path, utt2label etc -> pandas.DataFrame

More languages

Are there any plans to train more languages, e.g. adding this dataset:
https://www.50languages.com/

If it helps I can provide MP3-files as ZIPs.

scripts/prepare.bash in Common Voice example is broken

This might be because the structure of the downloaded .tar.gx files have changed since it was first written. Currently, the validated.tsv lies deeper inside the directory than assumed by the script, hence giving the following error at runtime:

unpacking './downloads/br.tar.gz'
cut: ./data/br/validated.tsv: No such file or directory
error: unable to load list of paths from metadata file at './data/br/validated.tsv'

Cleanup embed-package

No module called 'plda'

Hi,
Today I tried to use the lidbox and I ran into the following error:
File "D:\anaconda\envs\slid\lib\site-packages\lidbox\embed\sklearn_utils.py", line 6, in <module> from plda import Classifier as PLDAClassifier ModuleNotFoundError: No module named 'plda'
Seems like a package is missing on install. My installation of the lidbox is done through pip.
Thanks in advance!

Make cache-invalidation more aggressive

Changing dataset metadata should always invalidate all existing signal caches. Compare e.g. config-file contents or save checksums of metadata alongside caches.

Replace scipy-filter and resampling with tf ops

filter

resample

Support for ragged batches during training

For example, the x-vector architecture should be trained on arbitrary length input. Without ragged batches, this limits the batch size to 1. By supporting ragged batches, we could train with larger batch sizes.

Rewrite Common Voice example preparation script in Python

Rewrite all metadata preprocessing with pandas

There's a ridiculous amount of hand written dict-juggling that could be one-liners in pandas.

Code review of the feature extraction package

https://github.com/py-lidbox/lidbox/tree/master/lidbox/features

Especially the correctness of DSP-related functions.

Line 81 in abc2a43

    
           # TODO convert to multi-level pandas.DataFrame by separating language metrics from summary metrics

E.g. separate class-metrics from summary metrics.

Using 'common-voice-small' example setup with larger dataset results in seg fault (core dumped) error

Hi, thanks for creating such a well documented project! I'm part of a university student group using this to train a model on (we hope) 20-30 languages commonly found in Australia. Your explanations in the examples section were incredibly useful, especially since none of us have any experience in this area. Please excuse any ignorance in the following questions.

We received the below warnings when running the project using the same datasets and code as your 'common-voice-small' example, but these didn't prevent model training from completing. Now that we're increasing the size of our dataset to include additional languages (totaling ~12gb audio), we're hitting predictable seg faults when caching the dataset or during model training. We're guessing that the issues stem from the CUDA version installed on the university machines, which is something we have no control over. We're wondering if you've encountered these issues using Lidbox, and/or if you have advice on circumventing them.

This is running on Ubuntu 20.4.06 LTS with a NVIDIA A40 w/ 45gb RAM.

Memory leak in CUDA11.x

The below warning appears when either caching the dataset or - if we omit caching - when model training begins. Using nvidia-smi to monitor GPU memory usage shows that usage gradually increases until it reaches capacity, upon when the program seg faults. So it certainly seems that the issue is caused by the cuFFT plan creation memory leak.

tensorflow/core/kernels/fft_ops.cc:472] The CUDA FFT plan cache capacity of 512 has been exceeded. This may lead to extra time being spent constantly creating new plans. For CUDA 11.x, there is also a memory leak in cuFFT plan creation which may cause GPU memory usage to slowly increase. If this causes an issue, try modifying your fft parameters to increase cache hits, or build TensorFlow with CUDA 10.x or 12.x, or use explicit device placement to run frequently-changing FFTs on CPU.

The seg fault during caching:

If we omit pre-caching the dataset, we proceed to training and see the above warning along with the following warning:

The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.

My assumption is that this second error simply increases training time because it would affect access speed rather than the number of FFTs being executed. Is that the case?

Do you have any advice about how to modify our usage of Lidbox in order to minimise the effect of this memory leak? Until we resolve the seg fault issue we are running training on CPU, which works but is incredibly slow.

Thank you for your time,
Toby

print ratio of how much the shuffle buffer contains of the whole dataset

Implement tests and add API examples for the tf.data.Dataset interface

https://github.com/matiaslindgren/lidbox/tree/master/lidbox/dataset

Error while running train-embeddings option in cli

This error comes from running step 5 lidbox train-embeddings -v config.xvector-NB.yaml in the Common Voice example:

...
2020-07-01 18:25:25.740 I lidbox.embeddings.sklearn_utils: Wrote embedding demo to './lidbox-cache/naive_bayes/common-voice-4-embeddings/figures/test/embeddings-PCA-2D.png'
2020-07-01 18:25:28.541 I lidbox.embeddings.sklearn_utils: Wrote embedding demo to './lidbox-cache/naive_bayes/common-voice-4-embeddings/figures/test/embeddings-PCA-3D.png'
2020-07-01 18:25:28.541 I lidbox.embeddings.sklearn_utils: Fitting with train_X (22794, 3) and train_y (22794,) classifier:
  GaussianNB(priors=None, var_smoothing=1e-09)
Traceback (most recent call last):
  File "/Users/knethil/.pyenv/versions/3.7.5/bin/lidbox", line 11, in <module>
    load_entry_point('lidbox==0.5.0', 'console_scripts', 'lidbox')()
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/__main__.py", line 36, in main
    ret = command.run()
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/cli.py", line 184, in run
    metrics = lidbox.api.fit_embedding_classifier_and_evaluate_test_set(split2ds, split2meta, labels, config)
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 305, in fit_embedding_classifier_and_evaluate_test_set
    utt2prediction, utt2target = process_predictions(test_data["ids"], predictions["test"], "test")
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 279, in process_predictions
    utt2prediction = generate_worst_case_predictions_for_missed_utterances(utt2prediction, utt2target, labels)
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 326, in generate_worst_case_predictions_for_missed_utterances
    predictions = np.stack([p for _, p in utt2prediction])
  File "<__array_function__ internals>", line 6, in stack
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/numpy/core/shape_base.py", line 423, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

I tried to follow the code and this seems to be in the part where you process predictions of the NB classifier. Is there a way to bypass this training/prediction step and just get the x-vector embeddings from the trained model ?

	def model2function(model):
	model_input = model.inputs[0]
	model_fn = tf.function(
	lambda x: model(x, training=False),
	input_signature=[tf.TensorSpec(model_input.shape, model_input.dtype)])
	return model_fn.get_concrete_function()