pythainlp / pythaiasr Goto Github PK

View Code? Open in Web Editor NEW

59.0 6.0 13.0 182 KB

Python Thai Automatic Speech Recognition

License: Apache License 2.0

Python 97.27% Dockerfile 2.73%

thai-language thai-nlp asr automatic-speech-recognition hacktoberfest hacktoberfest2022

pythaiasr's Introduction

PyThaiASR

Python Thai Automatic Speech Recognition

PyThaiASR is a Python package for Automatic Speech Recognition with focus on Thai language. It have offline thai automatic speech recognition model.

License: Apache-2.0 License

Google Colab: Link Google colab

Model homepage: https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th

Install

pip install pythaiasr

For Wav2Vec2 with language model: if you want to use wannaphong/wav2vec2-large-xlsr-53-th-cv8-* model with language model, you needs to install by the step.

pip install pythaiasr[lm]
pip install https://github.com/kpu/kenlm/archive/refs/heads/master.zip

Usage

from pythaiasr import asr

file = "a.wav"
print(asr(file))

API

asr(data: str, model: str = _model_name, lm: bool=False, device: str=None, sampling_rate: int=16_000)

data: path of sound file or numpy array of the voice
model: The ASR model
lm: Use language model (except airesearch/wav2vec2-large-xlsr-53-th model)
device: device
sampling_rate: The sample rate
return: thai text from ASR

Options for model

airesearch/wav2vec2-large-xlsr-53-th (default) - AI RESEARCH - PyThaiNLP model
wannaphong/wav2vec2-large-xlsr-53-th-cv8-newmm - Thai Wav2Vec2 with CommonVoice V8 (newmm tokenizer)
wannaphong/wav2vec2-large-xlsr-53-th-cv8-deepcut - Thai Wav2Vec2 with CommonVoice V8 (deepcut tokenizer)

You can read about models from the list:

Docker

To use this inside of Docker do the following:

docker build -t <Your Tag name> .
docker run docker run --entrypoint /bin/bash -it <Your Tag name>

You will then get access to a interactive shell environment where you can use python with all packages installed.

pythaiasr's People

Contributors

Stargazers

Watchers

Forkers

tann9949 chatcharoen xkzy anhthoai xuridongsheng7142 ekaja wannaphong naturewoker dekearthsa amphancm anusornc riccardo-ravaro ivan-meer

pythaiasr's Issues

Out of memory

I test with 1:00 min wav file but it run out of GPU memory.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 138.00 MiB (GPU 0; 5.93 GiB total capacity; 4.99 GiB already allocated; 126.19 MiB free; 5.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How could I run with a larger file?

[TODO] Add input as waveform and batch

Add inputs as waveform arrays and inputs as batch to faster processing while use GPU.

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

Hello,

I always get this error when run print(asr(file)), any help with that? I'm running that colab.
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
296 _single(0), self.dilation, self.groups)
297 return F.conv1d(input, weight, bias, self.stride,
--> 298 self.padding, self.dilation, self.groups)
299
300 def forward(self, input: Tensor) -> Tensor:

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function

Great day to you and thanks a lot for your contribution and determination on this project.

In Colab, I have code

%%capture
!pip install pythaiasr
!pip -q install pydub

import IPython
from pythaiasr import asr
file = "/content/voice data/helloWeeHee.wav"
IPython.display.Audio(file)

#pythaiasr
asr(file, "airesearch/wav2vec2-large-xlsr-53-th")
asr(file, "wannaphong/wav2vec2-large-xlsr-53-th-cv8-newmm")

However, after I found an error, try some solutions from Stackoverflow such as importing tensorflow and !pip install pythaiasr[lm]