Hi there ๐ ! My name is Jonatas Grosman.
I'm just a geek that loves to build useful things ๐ค
ASRecognition: just an easy-to-use library for Automatic Speech Recognition.
License: MIT License
Hi there ๐ ! My name is Jonatas Grosman.
I'm just a geek that loves to build useful things ๐ค
Despite the example in the README using an mp3 file, it does not work on my Macbook:
>>> asr.transcribe(["/Users/emiel/Downloads/test.mp3"])
Traceback (most recent call last):
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/librosa/core/audio.py", line 149, in load
with sf.SoundFile(path) as sf_desc:
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
_error_check(_snd.sf_error(file_ptr),
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/Users/emiel/Downloads/test.mp3': File contains data in an unknown format.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/asrecognition/engine.py", line 104, in transcribe
data = data.map(_load_audio, num_proc=self.number_of_workers)
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
return self._map_single(
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
out = func(self, *args, **kwargs)
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1997, in _map_single
example = apply_function_on_filtered_inputs(example, i, offset=offset)
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/asrecognition/engine.py", line 99, in _load_audio
waveform, sampling_rate = librosa.load(item["path"], sr=16_000)
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/librosa/core/audio.py", line 166, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/librosa/core/audio.py", line 190, in __audioread_load
with audioread.audio_open(path) as input_file:
File "/Users/emiel/miniconda3/lib/python3.8/site-packages/audioread/__init__.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError
Wave files work fine. But it seems that just pip-installing ASRecognition is no guarantee for mp3-functionality.
Hi Jonatas,
First of all thank you for the amazing work, both for pre-trained language-specific models and for the very easy-to-use library.
I want to ask if is possible to specify the device (GPU or CPU) for the ASR model. Actually, I tried:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
However, it doesn't seem to work. Any suggestion?
Hi @jonatasgrosman, first of all: Thank you for your great library. Can you imagine a way to implement a function that prints single recognized words with each individual start and end time in e.g. milliseconds?
In your requirements, you're only listing Python 3.7+. However, having Python 3.7+ is not enough. I have recently tried to use this tool with Tensorflow 2.1.0 installed in addition to Python 3.7.9, and get the following error while importing ASRecognition:
Code:
from asrecognition import ASREngine
Error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in _get_module(self, module_name)
2149 try:
-> 2150 return importlib.import_module("." + module_name, self.__name__)
2151 except Exception as e:
~\.conda\envs\asdfghjkl\lib\importlib\__init__.py in import_module(name, package)
126 level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)
128
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _gcd_import(name, package, level)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load(name, import_)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load_unlocked(name, import_)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _load_unlocked(spec)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap_external.py in exec_module(self, module)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\modeling_tf_utils.py in <module>
29 from tensorflow.python.keras.engine import data_adapter
---> 30 from tensorflow.python.keras.engine.keras_tensor import KerasTensor
31 from tensorflow.python.keras.saving import hdf5_format
ModuleNotFoundError: No module named 'tensorflow.python.keras.engine.keras_tensor'
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in _get_module(self, module_name)
2149 try:
-> 2150 return importlib.import_module("." + module_name, self.__name__)
2151 except Exception as e:
~\.conda\envs\asdfghjkl\lib\importlib\__init__.py in import_module(name, package)
126 level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)
128
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _gcd_import(name, package, level)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load(name, import_)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load_unlocked(name, import_)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _gcd_import(name, package, level)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load(name, import_)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _find_and_load_unlocked(name, import_)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _load_unlocked(spec)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap_external.py in exec_module(self, module)
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\models\__init__.py in <module>
18
---> 19 from . import (
20 albert,
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\models\layoutlm\__init__.py in <module>
21 from ...file_utils import _LazyModule, is_tf_available, is_tokenizers_available, is_torch_available
---> 22 from .configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig
23 from .tokenization_layoutlm import LayoutLMTokenizer
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\models\layoutlm\configuration_layoutlm.py in <module>
21 from ... import is_torch_available
---> 22 from ...onnx import OnnxConfig, PatchingSpec
23 from ...utils import logging
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\onnx\__init__.py in <module>
16 from .config import EXTERNAL_DATA_FORMAT_SIZE_LIMIT, OnnxConfig, OnnxConfigWithPast, PatchingSpec
---> 17 from .convert import export, validate_model_outputs
18 from .utils import ParameterFormat, compute_serialized_parameters_size
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\onnx\convert.py in <module>
22
---> 23 from .. import PreTrainedModel, PreTrainedTokenizer, TensorType, TFPreTrainedModel, is_torch_available
24 from ..file_utils import is_torch_onnx_dict_inputs_support_available
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _handle_fromlist(module, fromlist, import_, recursive)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in __getattr__(self, name)
2139 elif name in self._class_to_module.keys():
-> 2140 module = self._get_module(self._class_to_module[name])
2141 value = getattr(module, name)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in _get_module(self, module_name)
2153 f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its traceback):\n{e}"
-> 2154 ) from e
2155
RuntimeError: Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
No module named 'tensorflow.python.keras.engine.keras_tensor'
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1588/1100290613.py in <module>
----> 1 from asrecognition import ASREngine
2
3 asr = ASREngine("en")
4
5 # 2 - Use the loaded ASR engine to transcribe a list of audio files
~\.conda\envs\asdfghjkl\lib\site-packages\asrecognition\__init__.py in <module>
1 import datasets
2 import logging
----> 3 from asrecognition.engine import ASREngine
4
5 datasets.logging.get_verbosity = lambda: logging.NOTSET
~\.conda\envs\asdfghjkl\lib\site-packages\asrecognition\engine.py in <module>
5 import logging
6 from typing import List, Dict, Optional
----> 7 from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
8
9 MODEL_PATH_BY_LANGUAGE = {
~\.conda\envs\asdfghjkl\lib\importlib\_bootstrap.py in _handle_fromlist(module, fromlist, import_, recursive)
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in __getattr__(self, name)
2138 value = self._get_module(name)
2139 elif name in self._class_to_module.keys():
-> 2140 module = self._get_module(self._class_to_module[name])
2141 value = getattr(module, name)
2142 else:
~\.conda\envs\asdfghjkl\lib\site-packages\transformers\file_utils.py in _get_module(self, module_name)
2152 raise RuntimeError(
2153 f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its traceback):\n{e}"
-> 2154 ) from e
2155
2156 def __reduce__(self):
RuntimeError: Failed to import transformers.models.wav2vec2 because of the following error (look up to see its traceback):
Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
No module named 'tensorflow.python.keras.engine.keras_tensor'
Hi Jonatas,
Congratulations on your work.
After transcribing some audio files in Italian I found that although some words are clearly recognized, they are transcribed attached to each other.
Could this result be a problem in the library, regardless of recognition?
Here are some examples:
(transcript) -------------------> (ground truth)
dell'europagrillo -------------> dell'europa grillo
nuclearipalermo -------------> nucleari palermo
dispersiproseguiamo --------> dispersi proseguiamo
capopolitico ------------------> capo politico
mendicavoti ------------------> mendica voti
votoonline --------------------> voto online
dellegamento ----------------> del legamento
meritavaquando -------------> meritava quando
martabassino ----------------> marta bassino
compagnoniintanto ---------> compagnoni intanto
As for the language model, have you tried to include it in your library?
Or could you give me some guidelines for integrating the language model?
Thanks for your help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.