Giter Site home page Giter Site logo

py-lidbox / lidbox Goto Github PK

View Code? Open in Web Editor NEW
48.0 9.0 13.0 6.67 MB

End-to-end spoken language identification out of the box.

License: MIT License

Python 99.52% JavaScript 0.30% Makefile 0.18%
language-recognition language-identification tensorflow spoken-language-recognition spoken-language-identification audio-analysis speech deep-learning big-data

lidbox's People

Contributors

janaab11 avatar matiaslindgren avatar vinye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lidbox's Issues

scripts/prepare.bash in Common Voice example is broken

This might be because the structure of the downloaded .tar.gx files have changed since it was first written. Currently, the validated.tsv lies deeper inside the directory than assumed by the script, hence giving the following error at runtime:

unpacking './downloads/br.tar.gz'
cut: ./data/br/validated.tsv: No such file or directory
error: unable to load list of paths from metadata file at './data/br/validated.tsv'

No module called 'plda'

Hi,
Today I tried to use the lidbox and I ran into the following error:
File "D:\anaconda\envs\slid\lib\site-packages\lidbox\embed\sklearn_utils.py", line 6, in <module> from plda import Classifier as PLDAClassifier ModuleNotFoundError: No module named 'plda'
Seems like a package is missing on install. My installation of the lidbox is done through pip.
Thanks in advance!

Make cache-invalidation more aggressive

Changing dataset metadata should always invalidate all existing signal caches. Compare e.g. config-file contents or save checksums of metadata alongside caches.

Support for ragged batches during training

For example, the x-vector architecture should be trained on arbitrary length input. Without ragged batches, this limits the batch size to 1. By supporting ragged batches, we could train with larger batch sizes.

Using 'common-voice-small' example setup with larger dataset results in seg fault (core dumped) error

Hi, thanks for creating such a well documented project! I'm part of a university student group using this to train a model on (we hope) 20-30 languages commonly found in Australia. Your explanations in the examples section were incredibly useful, especially since none of us have any experience in this area. Please excuse any ignorance in the following questions.

We received the below warnings when running the project using the same datasets and code as your 'common-voice-small' example, but these didn't prevent model training from completing. Now that we're increasing the size of our dataset to include additional languages (totaling ~12gb audio), we're hitting predictable seg faults when caching the dataset or during model training. We're guessing that the issues stem from the CUDA version installed on the university machines, which is something we have no control over. We're wondering if you've encountered these issues using Lidbox, and/or if you have advice on circumventing them.

This is running on Ubuntu 20.4.06 LTS with a NVIDIA A40 w/ 45gb RAM.

Memory leak in CUDA11.x

The below warning appears when either caching the dataset or - if we omit caching - when model training begins. Using nvidia-smi to monitor GPU memory usage shows that usage gradually increases until it reaches capacity, upon when the program seg faults. So it certainly seems that the issue is caused by the cuFFT plan creation memory leak.

tensorflow/core/kernels/fft_ops.cc:472] The CUDA FFT plan cache capacity of 512 has been exceeded. This may lead to extra time being spent constantly creating new plans. For CUDA 11.x, there is also a memory leak in cuFFT plan creation which may cause GPU memory usage to slowly increase. If this causes an issue, try modifying your fft parameters to increase cache hits, or build TensorFlow with CUDA 10.x or 12.x, or use explicit device placement to run frequently-changing FFTs on CPU.

The seg fault during caching:
segfault

If we omit pre-caching the dataset, we proceed to training and see the above warning along with the following warning:

The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to dataset.cache().take(k).repeat(). You should use dataset.take(k).cache().repeat() instead.

My assumption is that this second error simply increases training time because it would affect access speed rather than the number of FFTs being executed. Is that the case?

Do you have any advice about how to modify our usage of Lidbox in order to minimise the effect of this memory leak? Until we resolve the seg fault issue we are running training on CPU, which works but is incredibly slow.

Thank you for your time,
Toby

Error while running train-embeddings option in cli

This error comes from running step 5 lidbox train-embeddings -v config.xvector-NB.yaml in the Common Voice example:

...
2020-07-01 18:25:25.740 I lidbox.embeddings.sklearn_utils: Wrote embedding demo to './lidbox-cache/naive_bayes/common-voice-4-embeddings/figures/test/embeddings-PCA-2D.png'
2020-07-01 18:25:28.541 I lidbox.embeddings.sklearn_utils: Wrote embedding demo to './lidbox-cache/naive_bayes/common-voice-4-embeddings/figures/test/embeddings-PCA-3D.png'
2020-07-01 18:25:28.541 I lidbox.embeddings.sklearn_utils: Fitting with train_X (22794, 3) and train_y (22794,) classifier:
  GaussianNB(priors=None, var_smoothing=1e-09)
Traceback (most recent call last):
  File "/Users/knethil/.pyenv/versions/3.7.5/bin/lidbox", line 11, in <module>
    load_entry_point('lidbox==0.5.0', 'console_scripts', 'lidbox')()
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/__main__.py", line 36, in main
    ret = command.run()
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/cli.py", line 184, in run
    metrics = lidbox.api.fit_embedding_classifier_and_evaluate_test_set(split2ds, split2meta, labels, config)
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 305, in fit_embedding_classifier_and_evaluate_test_set
    utt2prediction, utt2target = process_predictions(test_data["ids"], predictions["test"], "test")
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 279, in process_predictions
    utt2prediction = generate_worst_case_predictions_for_missed_utterances(utt2prediction, utt2target, labels)
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/lidbox/api.py", line 326, in generate_worst_case_predictions_for_missed_utterances
    predictions = np.stack([p for _, p in utt2prediction])
  File "<__array_function__ internals>", line 6, in stack
  File "/Users/knethil/.pyenv/versions/3.7.5/lib/python3.7/site-packages/numpy/core/shape_base.py", line 423, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

I tried to follow the code and this seems to be in the part where you process predictions of the NB classifier. Is there a way to bypass this training/prediction step and just get the x-vector embeddings from the trained model ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.