I saw that support for portuguese was added a few commits ago and decided to give it a

I've created a <a href="https://github.com/m-bain/whisperX/pull/55" data-hovercard-typ

I have the same issue when I try the Russian wav2vec model <a href="https://huggingfac

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Error when trying to use the pt align model. about whisperx HOT 8 CLOSED

m-bain commented on July 16, 2024

Error when trying to use the pt align model.

from whisperx.

Comments (8)

jonatasgrosman commented on July 16, 2024 1

I've created a PR to fix this issue by forcing the model to ignore the embedded language model if it exists in the model's repo

from whisperx.

rhenanbartels commented on July 16, 2024

I've looked at the code in whisperx/transcribe.py and the original exception is not the one that says align_model could not be found.

First, the original exception was:

ImportError:
Wav2Vec2ProcessorWithLM requires the pyctcdecode library but it was not found in your environment. You can install it with pip:
pip install pyctcdecode. Please note that you may need to restart your runtime after installation.

After installing pip install pyctcdecode

the exception became:

name 'kenlm' is not defined
Error loading model from huggingface, check https://huggingface.co/models for finetuned wav2vec2.0 models

and installing pip install kenlm did the trick (as a temporary solution).

I don't know the cause of the error yet, once the model is available here.

Another potential solution is to use the original source facebook/wav2vec2-large-xlsr-53-portuguese

Note: the same error is thrown with "nl" language.

from whisperx.

vogelcodes commented on July 16, 2024

pip install kenlm failed and I didn't noticed it
I'm on Windows. I had to download Visual Studio C++ Build Tools.
I have installed pyctcdecode also. But still the same error:

New language found (pt)! Previous was (en), loading new alignment model for new language...
C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\huggingface_hub\utils_deprecation.py:100: FutureWarning: Deprecated argument(s) used in 'snapshot_download': allow_regex. Will not be supported from version '0.12'.

Please use allow_patterns and ignore_patterns instead.
warnings.warn(message, FutureWarning)
Fetching 4 files: 100%|████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4032.98it/s]
'charmap' codec can't decode byte 0x81 in position 33: character maps to
Error loading model from huggingface, check https://huggingface.co/models for finetuned wav2vec2.0 models
Traceback (most recent call last):
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\whisperx\transcribe.py", line 428, in load_align_model
processor = AutoProcessor.from_pretrained(model_name)
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\transformers\models\auto\processing_auto.py", line 259, in from_pretrained
return processor_class.from_pretrained(
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\transformers\models\wav2vec2_with_lm\processing_wav2vec2_with_lm.py", line 161, in from_pretrained
decoder = BeamSearchDecoderCTC.load_from_hf_hub(
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyctcdecode\decoder.py", line 831, in load_from_hf_hub
return cls.load_from_dir(cached_directory)
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyctcdecode\decoder.py", line 792, in load_from_dir
alphabet = Alphabet.loads(fi.read())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 33: character maps to

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\Scripts\whisperx-script.py", line 33, in
sys.exit(load_entry_point('whisperx==1.0', 'console_scripts', 'whisperx')())
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\whisperx\transcribe.py", line 525, in cli
align_model, align_metadata = load_align_model(result["language"], device)
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\whisperx\transcribe.py", line 433, in load_align_model
raise ValueError(f'The chosen align_model "{model_name}" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)')
ValueError: The chosen align_model "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)

from whisperx.

DvGils commented on July 16, 2024

I get the same error. Persistent after installing pyctcdecode, Visual Studio C++ Build Tools, and kenlm.

from whisperx.

Jeronymous commented on July 16, 2024

I have the same issue when I try the Russian wav2vec model https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-russian

Traceback (most recent call last):
  File "/usr/local/bin/whisperx", line 33, in <module>
    sys.exit(load_entry_point('whisperx', 'console_scripts', 'whisperx')())
  File "/xxx/whisperX/whisperx/transcribe.py", line 453, in cli
    align_model, align_metadata = load_align_model(align_language, device, model_name=align_model)
  File "/xxx/whisperX/whisperx/alignment.py", line 62, in load_align_model
    raise ValueError(f'The chosen align_model "{model_name}" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)')
ValueError: The chosen align_model "jonatasgrosman/wav2vec2-large-xlsr-53-russian" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)

from whisperx.

m-bain commented on July 16, 2024

These are errors with the huggingface model, maybe the author of these models can help @jonatasgrosman ?

from whisperx.

jonatasgrosman commented on July 16, 2024

Hi everyone! I don't know what's happening here precisely, but it may be because whisperx is using the hugging face's pipelines for the ASR. And there are some rules on these pipelines that force the usage of a language model in the presence of one in the model repository (you can see a language_model folder on these reported repos).

That is why the whisperx works for other models that don't have a language_model folder on them. For the models with a language_model folder, you'll need to install the pyctcdecode and kenlm deps to make the whisperx works.

@m-bain I think you could force the pipeline to ignore the embedded language model by default to prevent this kind of issue.

from whisperx.

m-bain commented on July 16, 2024

Thanks @jonatasgrosman !

from whisperx.

Error when trying to use the pt align model. about whisperx HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent