Giter Site home page Giter Site logo

lucasnewman / best-rq-pytorch Goto Github PK

View Code? Open in Web Editor NEW
78.0 78.0 8.0 374 KB

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

License: MIT License

Python 100.00%
automatic-speech-recognition speech-synthesis text-to-speech

best-rq-pytorch's People

Contributors

lucasnewman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

best-rq-pytorch's Issues

What is the final result like?

What is the final result like? Can you take a screenshot to show it?
this is my result:
trainable parameters: 657398946
training with dataset of 2052 samples and validating with randomly splitted 109 samples
do you want to clear previous experiment checkpoints and results? (y/n)
y
0: loss: 6.983
0: valid loss 5.440
0: saving model to results

Process end, code exit -1073741819 (0xC0000005)

Pre-training cost of BEST-RQ

Thanks for your great work.

I want to know how long does pre-train take for 0.3B models.
Can you share your experience for the cost of pre-training BEST-RQ? (batch size, GPU you used, # of that GPU, training time, etc...)

Missing Convolution Subsampling?

Hi Lucas,
I'm looking over the code and I believe you have missed the two convolution subsampling layers in conformer.py,

4.1.1. NON-STREAMING MODELS The model has two convolution layers at the bottom which provide 4 times temporal-dimension reduction for the input sequences. The rest of the layers are a stack of Conformer models. We explore 0.6B model size which is extensively studied in the previous works. The model contains 24 layers of Conformer models.

Screenshot 2023-09-28 at 18 47 25

If you'd like I can create a pull request and implement this for you now.
Thanks - If I've misunderstood the paper, please call me out! 😅

What are the GPU memory specs needed to run pretraining and kmeans?

Hi,
I am trying to run the pretraining of the full model (which should have ~650M parameters) in a 24GB GPU card and it only runs if I set the batch size to 1 (totally useless training). What would be the memory necessary to run the full training with the preset batch size?
Also, Once finished training, I tried to run the Kmeans fitting script and it seems to require even more memory. Any idea as well on what is needed?

Thanks!

Qustion about the implementation

Hi Lucas,

Thanks for sharing your implementation of the framework. I don't quite get it why the labels are passed into the conformer model instead of the original data. To my understanding, the conformer is used to encode the original data and predict the corresponding labels (indices in the codebook), so the input here shouldn't be the labels, right?

outputs = self.conformer(
labels,
mask = mask,
return_layer_output = return_layer_output,
return_emb = return_emb
)

The vision of CUDA torch torchaudio torchvision

Can you tell me the vision of 'CUDA' 'torch' 'torchaudio' 'torchvision'.
I find "orch.cuda.is_available()" is False.
Then i change the vision of 'torch' 'torchaudio' and 'torchvision' according to the version of 'CUDA'.
Then "orch.cuda.is_available()" is True.
Then appear "ImportError" about "torchaudio"

Beartype Issue

Hello,
I am encoutering the following issue when testing the pretrain.py script from the examples folder.
beartype.roar.BeartypeDecorHintNonpepException: Method best_rq_pytorch.conformer.ConformerWrapper.__init__() parameter "conformer" type hint <built-in function any> either PEP-noncompliant or currently unsupported by @beartype.

I do not understand how to address this, could you please provide some guidance? Perhaps my data is not well formatted, could you point me to the correct formatting?

Implementation of BestRQTrainerASR

Hey! Thanks for your code. I was wondering if you have plans to implement the BestRQTrainerASR class, BestRQTrainerASR class and the BestRQTrainASRWrapper class. If not, could you please guide me on how to implement it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.