Comments (8)
I've created a PR to fix this issue by forcing the model to ignore the embedded language model if it exists in the model's repo
from whisperx.
I've looked at the code in whisperx/transcribe.py
and the original exception is not the one that says align_model
could not be found.
First, the original exception was:
ImportError:
Wav2Vec2ProcessorWithLM requires the pyctcdecode library but it was not found in your environment. You can install it with pip:
pip install pyctcdecode
. Please note that you may need to restart your runtime after installation.
After installing pip install pyctcdecode
the exception became:
name 'kenlm' is not defined
Error loading model from huggingface, check https://huggingface.co/models for finetuned wav2vec2.0 models
and installing pip install kenlm
did the trick (as a temporary solution).
I don't know the cause of the error yet, once the model is available here.
Another potential solution is to use the original source facebook/wav2vec2-large-xlsr-53-portuguese
Note: the same error is thrown with "nl" language.
from whisperx.
pip install kenlm failed and I didn't noticed it
I'm on Windows. I had to download Visual Studio C++ Build Tools.
I have installed pyctcdecode also. But still the same error:
New language found (pt)! Previous was (en), loading new alignment model for new language...
C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\huggingface_hub\utils_deprecation.py:100: FutureWarning: Deprecated argument(s) used in 'snapshot_download': allow_regex. Will not be supported from version '0.12'.Please use
allow_patterns
andignore_patterns
instead.
warnings.warn(message, FutureWarning)
Fetching 4 files: 100%|████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4032.98it/s]
'charmap' codec can't decode byte 0x81 in position 33: character maps to
Error loading model from huggingface, check https://huggingface.co/models for finetuned wav2vec2.0 models
Traceback (most recent call last):
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\whisperx\transcribe.py", line 428, in load_align_model
processor = AutoProcessor.from_pretrained(model_name)
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\transformers\models\auto\processing_auto.py", line 259, in from_pretrained
return processor_class.from_pretrained(
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\transformers\models\wav2vec2_with_lm\processing_wav2vec2_with_lm.py", line 161, in from_pretrained
decoder = BeamSearchDecoderCTC.load_from_hf_hub(
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyctcdecode\decoder.py", line 831, in load_from_hf_hub
return cls.load_from_dir(cached_directory)
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pyctcdecode\decoder.py", line 792, in load_from_dir
alphabet = Alphabet.loads(fi.read())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 33: character maps toDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\Scripts\whisperx-script.py", line 33, in
sys.exit(load_entry_point('whisperx==1.0', 'console_scripts', 'whisperx')())
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\whisperx\transcribe.py", line 525, in cli
align_model, align_metadata = load_align_model(result["language"], device)
File "C:\Users\danie\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\whisperx\transcribe.py", line 433, in load_align_model
raise ValueError(f'The chosen align_model "{model_name}" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)')
ValueError: The chosen align_model "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)
from whisperx.
I get the same error. Persistent after installing pyctcdecode, Visual Studio C++ Build Tools, and kenlm.
from whisperx.
I have the same issue when I try the Russian wav2vec model https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-russian
Traceback (most recent call last):
File "/usr/local/bin/whisperx", line 33, in <module>
sys.exit(load_entry_point('whisperx', 'console_scripts', 'whisperx')())
File "/xxx/whisperX/whisperx/transcribe.py", line 453, in cli
align_model, align_metadata = load_align_model(align_language, device, model_name=align_model)
File "/xxx/whisperX/whisperx/alignment.py", line 62, in load_align_model
raise ValueError(f'The chosen align_model "{model_name}" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)')
ValueError: The chosen align_model "jonatasgrosman/wav2vec2-large-xlsr-53-russian" could not be found in huggingface (https://huggingface.co/models) or torchaudio (https://pytorch.org/audio/stable/pipelines.html#id14)
from whisperx.
These are errors with the huggingface model, maybe the author of these models can help @jonatasgrosman ?
from whisperx.
Hi everyone! I don't know what's happening here precisely, but it may be because whisperx is using the hugging face's pipelines for the ASR. And there are some rules on these pipelines that force the usage of a language model in the presence of one in the model repository (you can see a language_model folder on these reported repos).
That is why the whisperx works for other models that don't have a language_model folder on them. For the models with a language_model folder, you'll need to install the pyctcdecode and kenlm deps to make the whisperx works.
@m-bain I think you could force the pipeline to ignore the embedded language model by default to prevent this kind of issue.
from whisperx.
Thanks @jonatasgrosman !
from whisperx.
Related Issues (20)
- RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device HOT 5
- Getting no audio found error HOT 1
- whisperx.load_model & default_asr_options Error in Colab HOT 4
- Doesn't accept num_speakers as argument HOT 3
- whisperx.align has empty word intervals for numbers HOT 1
- Error While Using Machine With Only CPU (EC2 Instance) HOT 3
- No speaker labels in txt format with diarization enabled HOT 6
- Support for vulkan (intel arc gpu)
- IGNORE
- Diarization precision - is there way to improve it? HOT 4
- torchaudio._backend.set_audio_backend has been deprecated. HOT 3
- Probability or score coming from faster-whisper and not alignment model
- Timing of subtitles is way off if I limit max_line_count and max_line_width==bad things? HOT 3
- TypeError: TranscriptionOptions.__new__() got an unexpected keyword argument 'hotwords' HOT 2
- Load Model To CPU and Then GPU HOT 1
- My timestamps with whisperX are way off HOT 14
- Issue with Periods in Dates or numbers Causing Incorrect Segment Splitting in German Transcriptions
- Unable to Transcribe More Than 90 Minutes (1h30m) HOT 2
- Empty transcript is generated
- Benchmarks for whisperx, faster-whisper, and whispers2t! HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperx.