Giter Site home page Giter Site logo

jonatasgrosman / asrecognition Goto Github PK

View Code? Open in Web Editor NEW
51.0 3.0 6.0 109 KB

ASRecognition: just an easy-to-use library for Automatic Speech Recognition.

License: MIT License

Makefile 14.85% Python 85.15%
audio speech automatic-speech-recognition asr voice-recognition speech-recognition speech-to-text

asrecognition's Introduction

Hi there ๐Ÿ‘‹ ! My name is Jonatas Grosman.

I'm just a geek that loves to build useful things ๐Ÿค“

asrecognition's People

Contributors

jonatasgrosman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

asrecognition's Issues

mp3 files do not work

Despite the example in the README using an mp3 file, it does not work on my Macbook:

>>> asr.transcribe(["/Users/emiel/Downloads/test.mp3"])
Traceback (most recent call last):
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/librosa/core/audio.py", line 149, in load
    with sf.SoundFile(path) as sf_desc:
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/Users/emiel/Downloads/test.mp3': File contains data in an unknown format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/asrecognition/engine.py", line 104, in transcribe
    data = data.map(_load_audio, num_proc=self.number_of_workers)
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
    return self._map_single(
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
    out = func(self, *args, **kwargs)
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1997, in _map_single
    example = apply_function_on_filtered_inputs(example, i, offset=offset)
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/asrecognition/engine.py", line 99, in _load_audio
    waveform, sampling_rate = librosa.load(item["path"], sr=16_000)
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/librosa/core/audio.py", line 166, in load
    y, sr_native = __audioread_load(path, offset, duration, dtype)
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/librosa/core/audio.py", line 190, in __audioread_load
    with audioread.audio_open(path) as input_file:
  File "/Users/emiel/miniconda3/lib/python3.8/site-packages/audioread/__init__.py", line 116, in audio_open
    raise NoBackendError()
audioread.exceptions.NoBackendError

Wave files work fine. But it seems that just pip-installing ASRecognition is no guarantee for mp3-functionality.

Setting device for ASR model

Hi Jonatas,

First of all thank you for the amazing work, both for pre-trained language-specific models and for the very easy-to-use library.

I want to ask if is possible to specify the device (GPU or CPU) for the ASR model. Actually, I tried:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

However, it doesn't seem to work. Any suggestion?

Word level audio start and end

Hi @jonatasgrosman, first of all: Thank you for your great library. Can you imagine a way to implement a function that prints single recognized words with each individual start and end time in e.g. milliseconds?

Requirements misleading

In your requirements, you're only listing Python 3.7+. However, having Python 3.7+ is not enough. I have recently tried to use this tool with Tensorflow 2.1.0 installed in addition to Python 3.7.9, and get the following error while importing ASRecognition:

Code:

from asrecognition import ASREngine

Error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in _get_module(self, module_name)
   2149         try:
-> 2150             return importlib.import_module("." + module_name, self.__name__)
   2151         except Exception as e:

~\.conda\envs\asdfghjkl\lib\importlib\__init__.py in import_module(name, package)
    126             level += 1
--> 127     return _bootstrap._gcd_import(name[level:], package, level)
    128 

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _gcd_import(name, package, level)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load(name, import_)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load_unlocked(name, import_)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _load_unlocked(spec)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap_external.py in exec_module(self, module)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\modeling_tf_utils.py in <module>
     29 from tensorflow.python.keras.engine import data_adapter
---> 30 from tensorflow.python.keras.engine.keras_tensor import KerasTensor
     31 from tensorflow.python.keras.saving import hdf5_format

ModuleNotFoundError: No module named 'tensorflow.python.keras.engine.keras_tensor'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in _get_module(self, module_name)
   2149         try:
-> 2150             return importlib.import_module("." + module_name, self.__name__)
   2151         except Exception as e:

~\.conda\envs\asdfghjkl\lib\importlib\__init__.py in import_module(name, package)
    126             level += 1
--> 127     return _bootstrap._gcd_import(name[level:], package, level)
    128 

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _gcd_import(name, package, level)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load(name, import_)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load_unlocked(name, import_)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _gcd_import(name, package, level)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load(name, import_)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load_unlocked(name, import_)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _load_unlocked(spec)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap_external.py in exec_module(self, module)

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\models\__init__.py in <module>
     18 
---> 19 from . import (
     20     albert,

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\models\layoutlm\__init__.py in <module>
     21 from ...file_utils import _LazyModule, is_tf_available, is_tokenizers_available, is_torch_available
---> 22 from .configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig
     23 from .tokenization_layoutlm import LayoutLMTokenizer

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\models\layoutlm\configuration_layoutlm.py in <module>
     21 from ... import is_torch_available
---> 22 from ...onnx import OnnxConfig, PatchingSpec
     23 from ...utils import logging

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\onnx\__init__.py in <module>
     16 from .config import EXTERNAL_DATA_FORMAT_SIZE_LIMIT, OnnxConfig, OnnxConfigWithPast, PatchingSpec
---> 17 from .convert import export, validate_model_outputs
     18 from .utils import ParameterFormat, compute_serialized_parameters_size

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\onnx\convert.py in <module>
     22 
---> 23 from .. import PreTrainedModel, PreTrainedTokenizer, TensorType, TFPreTrainedModel, is_torch_available
     24 from ..file_utils import is_torch_onnx_dict_inputs_support_available

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _handle_fromlist(module, fromlist, import_, recursive)

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in __getattr__(self, name)
   2139         elif name in self._class_to_module.keys():
-> 2140             module = self._get_module(self._class_to_module[name])
   2141             value = getattr(module, name)

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in _get_module(self, module_name)
   2153                 f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its traceback):\n{e}"
-> 2154             ) from e
   2155 

RuntimeError: Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
No module named 'tensorflow.python.keras.engine.keras_tensor'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1588/1100290613.py in <module>
----> 1 from asrecognition import ASREngine
      2 
      3 asr = ASREngine("en")
      4 
      5 # 2 - Use the loaded ASR engine to transcribe a list of audio files

~\.conda\envs\asdfghjkl\lib\site-packages\asrecognition\__init__.py in <module>
      1 import datasets
      2 import logging
----> 3 from asrecognition.engine import ASREngine
      4 
      5 datasets.logging.get_verbosity = lambda: logging.NOTSET

~\.conda\envs\asdfghjkl\lib\site-packages\asrecognition\engine.py in <module>
      5 import logging
      6 from typing import List, Dict, Optional
----> 7 from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
      8 
      9 MODEL_PATH_BY_LANGUAGE = {

~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _handle_fromlist(module, fromlist, import_, recursive)

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in __getattr__(self, name)
   2138             value = self._get_module(name)
   2139         elif name in self._class_to_module.keys():
-> 2140             module = self._get_module(self._class_to_module[name])
   2141             value = getattr(module, name)
   2142         else:

~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in _get_module(self, module_name)
   2152             raise RuntimeError(
   2153                 f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its traceback):\n{e}"
-> 2154             ) from e
   2155 
   2156     def __reduce__(self):

RuntimeError: Failed to import transformers.models.wav2vec2 because of the following error (look up to see its traceback):
Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
No module named 'tensorflow.python.keras.engine.keras_tensor'

transcription errors and language model implementation

Hi Jonatas,

Congratulations on your work.

After transcribing some audio files in Italian I found that although some words are clearly recognized, they are transcribed attached to each other.
Could this result be a problem in the library, regardless of recognition?

Here are some examples:
(transcript) -------------------> (ground truth)
dell'europagrillo -------------> dell'europa grillo
nuclearipalermo -------------> nucleari palermo
dispersiproseguiamo --------> dispersi proseguiamo
capopolitico ------------------> capo politico
mendicavoti ------------------> mendica voti
votoonline --------------------> voto online
dellegamento ----------------> del legamento
meritavaquando -------------> meritava quando
martabassino ----------------> marta bassino
compagnoniintanto ---------> compagnoni intanto

As for the language model, have you tried to include it in your library?
Or could you give me some guidelines for integrating the language model?

Thanks for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.