Comments (10)
Can you upload the audio file to reproduce?
from whisper-diarization.
Unfortunately not this one. I can try to find one that I can share.
Are you suggesting the error is related to this specific audio file?
from whisper-diarization.
from whisper-diarization.
OK, I'm checking with another file, to start with. And i noticed that it says:
[NeMo W 2024-05-22 20:23:04 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
I'm not doing translation, so I assume this is not a problem, right?
from whisper-diarization.
Not a problem
from whisper-diarization.
I'm confused. I tried the above command on a different file twice and got two different errors, each different from the one reported above.
First time ended with
Suppressing numeral and symbol tokens
Traceback (most recent call last):
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1472, in _get_module
File "/opt/anaconda3/envs/pretzel/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers.models.wav2vec2_bert.configuration_wav2vec2_bert'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/xhxxch/whisper-dia/diarize.py", line 124, in <module>
alignment_model, alignment_tokenizer, alignment_dictionary = load_alignment_model(
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/ctc_forced_aligner/alignment_utils.py", line 276, in load_alignment_model
AutoModelForCTC.from_pretrained(
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 540, in from_pretrained
if kwargs_orig.get("quantization_config", None) is not None:
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 751, in keys
return getattribute_from_module(self._modules[module_name], attr)
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 752, in <listcomp>
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 748, in _load_attr_from_module
module_name = model_type_to_module_name(model_type)
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 692, in getattribute_from_module
return None
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1462, in __getattr__
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1474, in _get_module
RuntimeError: Failed to import transformers.models.wav2vec2_bert.configuration_wav2vec2_bert because of the following error (look up to see its traceback):
No module named 'transformers.models.wav2vec2_bert.configuration_wav2vec2_bert'
While the above process was executing I also did pip install 'nemo_toolkit[nlp]'
. Assuming that this may be the reason why I'm getting a different error, I did pip uninstall 'nemo_toolkit[nlp]'
and just to make sure that I still have what I need I did pip install 'nemo_toolkit[asr]'
again.
After that the very same command failed immediately with
objc[6072]: Class AVFFrameReceiver is implemented in both /opt/anaconda3/envs/pretzel/lib/libavdevice.58.8.100.dylib (0x1759f0798) and /opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x17860c760). One of the two will be used. Which one is undefined.
objc[6072]: Class AVFAudioReceiver is implemented in both /opt/anaconda3/envs/pretzel/lib/libavdevice.58.8.100.dylib (0x1759f07e8) and /opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x17860c7b0). One of the two will be used. Which one is undefined.
Traceback (most recent call last):
File "/Users/xhxxch/whisper-dia/diarize.py", line 3, in <module>
from helpers import (
File "/Users/xhxxch/whisper-dia/helpers.py", line 7, in <module>
from whisperx.alignment import DEFAULT_ALIGN_MODELS_HF, DEFAULT_ALIGN_MODELS_TORCH
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/__init__.py", line 1, in <module>
from .transcribe import load_model
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/transcribe.py", line 10, in <module>
from .asr import load_model
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/asr.py", line 13, in <module>
from .vad import load_vad_model, merge_chunks
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/vad.py", line 11, in <module>
from pyannote.audio.pipelines import VoiceActivityDetection
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/__init__.py", line 26, in <module>
from .speaker_diarization import SpeakerDiarization
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 42, in <module>
from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 56, in <module>
from nemo.collections.asr.models import (
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/__init__.py", line 15, in <module>
from nemo.collections.asr import data, losses, models, modules
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/models/__init__.py", line 36, in <module>
from nemo.collections.asr.models.transformer_bpe_models import EncDecTransfModelBPE
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/models/transformer_bpe_models.py", line 52, in <module>
from nemo.collections.nlp.modules.common import TokenClassifier
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/__init__.py", line 15, in <module>
from nemo.collections.nlp import data, losses, models, modules
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/__init__.py", line 31, in <module>
from nemo.collections.nlp.models.machine_translation import MTEncDecModel
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/__init__.py", line 15, in <module>
from nemo.collections.nlp.models.machine_translation.mt_enc_dec_bottleneck_model import MTBottleneckModel
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/mt_enc_dec_bottleneck_model.py", line 23, in <module>
from nemo.collections.nlp.models.machine_translation.mt_enc_dec_model import MTEncDecModel
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py", line 38, in <module>
from nemo.collections.common.tokenizers.chinese_tokenizers import ChineseProcessor
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/common/tokenizers/chinese_tokenizers.py", line 38, in <module>
import opencc
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/opencc.py", line 24, in <module>
libopencc = CDLL('libopencc.so.1', use_errno=True)
File "/opt/anaconda3/envs/pretzel/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(libopencc.so.1, 0x0006): tried: 'libopencc.so.1' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibopencc.so.1' (no such file), '/opt/anaconda3/envs/pretzel/lib/python3.10/lib-dynload/../../libopencc.so.1' (no such file), '/opt/anaconda3/envs/pretzel/bin/../lib/libopencc.so.1' (no such file), '/usr/lib/libopencc.so.1' (no such file, not in dyld cache), 'libopencc.so.1' (no such file), '/usr/local/lib/libopencc.so.1' (no such file), '/usr/lib/libopencc.so.1' (no such file, not in dyld cache)
Edit: I reinstalled the requirements (exceopt for nemo, which fails via the requirements.txt), but the error remains the same, no matter what audio file I use.
from whisper-diarization.
please reinstall ctc-forced-aligner
again, it needs to be recompiled with the torch version you are using, and upgrade transformers
to the latest version or atleast 4.34
from whisper-diarization.
or it's better to reinstall all the requirements
from whisper-diarization.
or it's better to reinstall all the requirements
I did, but that didn't change anything.
What seems to work (still executing, so far) is the solution mentioned in #177 (comment). I did
brew install opencc
ln -s /opt/homebrew/lib/libopencc.dylib libopencc.so.1
Now I'm waiting for the command to process to finish after Suppressing numeral and symbol tokens
What puzzles me, though is, why I oreviously (with the first testfile above) didn't get an error about libopencc.so.1
and now suddenly I did.
Edit: OK, we're back to where we were in the OP:
Suppressing numeral and symbol tokens
Some weights of the model checkpoint at MahmoudAshraf/mms-300m-1130-forced-aligner were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at MahmoudAshraf/mms-300m-1130-forced-aligner and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/Users/xhxxch/whisper-dia/diarize.py", line 155, in <module>
spans = get_spans(tokens_starred, segments, alignment_tokenizer.decode(blank_id))
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/ctc_forced_aligner/alignment_utils.py", line 63, in get_spans
assert seg.label == ltr, f"{seg.label} != {ltr}"
AssertionError: g != <star>
But this is with a different audio file. So the error is not specific to one specific file. I'm suspecting it's not so much aboyút the audio file but about the language. You can probably take any Audio file in Swedish and reproduce the error.
Maybe this is related: As I am trying to understand how your script works, it looks like it is using a wav2vec2 model, just like whisperX which made me wonder how it works with Swedish audio, given that Swedish is not one of the languages for which whisperX already has a wav2vec2 model (when I tried whisperX I used KBLab/wav2vec2-large-voxrex-swedish
).
from whisper-diarization.
@tophee my script uses a multilingual alignment model, so if you changed the default model to a model which has the native vocabulary of the language you need to turn the romanization off too, can you upload the audio file to test as I have tried a Swedish audio and it worked fine with the default model
from whisper-diarization.
Related Issues (20)
- TypeError: unsupported operand type(s) for |: 'type' and 'type' HOT 4
- No such file or directory: '/usr/local/lib/python3.10/dist-packages/ctc_forced_aligner/punctuations.lst' HOT 1
- Installation issues with Windows HOT 4
- separating tracks killed HOT 2
- "Permission denied" HOT 6
- ContextualVersionConflict / Could not find the operator torchvision::nms HOT 5
- ubuntu pip install issue HOT 2
- Doesn't it provide Diarization for Korean? HOT 2
- Installed without any issues, received this error when trying to run. HOT 11
- Cannot import name 'ModelFilter' from 'huggingface_hub' HOT 6
- ModuleNotFoundError: No module named 'numpy' HOT 4
- Additional punctuation support HOT 2
- Some errors occurred during Speaker Diarization using the NeMo MSDD Model HOT 2
- How to tune speaker diarization error? HOT 3
- ImportError: cannot import name 'ModelFilter' from 'huggingface_hub' HOT 2
- FileNotFoundError HOT 27
- Deployment options HOT 5
- Requested float16 compute type, tesla p40 HOT 1
- Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diarization Error Rate error HOT 1
- cannot import name '_sentencepiece' from partially initialized module 'sentencepiece' (most likely due to a circular import) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper-diarization.