lucasnewman / best-rq-pytorch Goto Github PK

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

License: MIT License

Python 100.00%

automatic-speech-recognition speech-synthesis text-to-speech

best-rq-pytorch's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes ishine lvzhiqiang synthaether fmac2000 qoboty make1986 mayuravaani

best-rq-pytorch's Issues

What is the final result like?

What is the final result like? Can you take a screenshot to show it?
this is my result:
trainable parameters: 657398946
training with dataset of 2052 samples and validating with randomly splitted 109 samples
do you want to clear previous experiment checkpoints and results? (y/n)
y
0: loss: 6.983
0: valid loss 5.440
0: saving model to results

Process end, code exit -1073741819 (0xC0000005)

Pre-training cost of BEST-RQ

Thanks for your great work.

I want to know how long does pre-train take for 0.3B models.
Can you share your experience for the cost of pre-training BEST-RQ? (batch size, GPU you used, # of that GPU, training time, etc...)

Missing Convolution Subsampling?

Hi Lucas,
I'm looking over the code and I believe you have missed the two convolution subsampling layers in conformer.py,

4.1.1. NON-STREAMING MODELS The model has two convolution layers at the bottom which provide 4 times temporal-dimension reduction for the input sequences. The rest of the layers are a stack of Conformer models. We explore 0.6B model size which is extensively studied in the previous works. The model contains 24 layers of Conformer models.

If you'd like I can create a pull request and implement this for you now.
Thanks - If I've misunderstood the paper, please call me out! 😅

What are the GPU memory specs needed to run pretraining and kmeans?

Hi,
I am trying to run the pretraining of the full model (which should have ~650M parameters) in a 24GB GPU card and it only runs if I set the batch size to 1 (totally useless training). What would be the memory necessary to run the full training with the preset batch size?
Also, Once finished training, I tried to run the Kmeans fitting script and it seems to require even more memory. Any idea as well on what is needed?

Thanks!

Qustion about the implementation

Hi Lucas,

Thanks for sharing your implementation of the framework. I don't quite get it why the labels are passed into the conformer model instead of the original data. To my understanding, the conformer is used to encode the original data and predict the corresponding labels (indices in the codebook), so the input here shouldn't be the labels, right?

best-rq-pytorch/best_rq_pytorch/best_rq.py

Lines 144 to 149 in b4b0d8d

    
           outputs = self.conformer( 
        
               labels, 
        
               mask = mask, 
        
               return_layer_output = return_layer_output, 
        
               return_emb = return_emb 
        
           )

changed three parameters: sample_rate win_length and hop_length

brq = BestRQ(
sample_rate = 22050,
win_length = 1024，
hop_length = 256,

hi, I changed three parameters, as above, then the program may have crashed. so these three parameters can not be changed?

The vision of CUDA torch torchaudio torchvision

Can you tell me the vision of 'CUDA' 'torch' 'torchaudio' 'torchvision'.
I find "orch.cuda.is_available()" is False.
Then i change the vision of 'torch' 'torchaudio' and 'torchvision' according to the version of 'CUDA'.
Then "orch.cuda.is_available()" is True.
Then appear "ImportError" about "torchaudio"

Beartype Issue

Hello,
I am encoutering the following issue when testing the pretrain.py script from the examples folder.
beartype.roar.BeartypeDecorHintNonpepException: Method best_rq_pytorch.conformer.ConformerWrapper.__init__() parameter "conformer" type hint <built-in function any> either PEP-noncompliant or currently unsupported by @beartype.

I do not understand how to address this, could you please provide some guidance? Perhaps my data is not well formatted, could you point me to the correct formatting?

Implementation of BestRQTrainerASR

Hey! Thanks for your code. I was wondering if you have plans to implement the BestRQTrainerASR class, BestRQTrainerASR class and the BestRQTrainASRWrapper class. If not, could you please guide me on how to implement it?

lucasnewman / best-rq-pytorch Goto Github PK

best-rq-pytorch's People

Contributors

Stargazers

Watchers

Forkers

best-rq-pytorch's Issues

What is the final result like?

Pre-training cost of BEST-RQ

Missing Convolution Subsampling?

What are the GPU memory specs needed to run pretraining and kmeans?

Qustion about the implementation

changed three parameters: sample_rate win_length and hop_length

The vision of CUDA torch torchaudio torchvision

Beartype Issue

Implementation of BestRQTrainerASR

Is there any pretrain model？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	outputs = self.conformer(
	labels,
	mask = mask,
	return_layer_output = return_layer_output,
	return_emb = return_emb
	)