Giter Site home page Giter Site logo

Comments (10)

MahmoudAshraf97 avatar MahmoudAshraf97 commented on September 26, 2024

Can you upload the audio file to reproduce?

from whisper-diarization.

tophee avatar tophee commented on September 26, 2024

Unfortunately not this one. I can try to find one that I can share.

Are you suggesting the error is related to this specific audio file?

from whisper-diarization.

MahmoudAshraf97 avatar MahmoudAshraf97 commented on September 26, 2024

from whisper-diarization.

tophee avatar tophee commented on September 26, 2024

OK, I'm checking with another file, to start with. And i noticed that it says:

[NeMo W 2024-05-22 20:23:04 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.

I'm not doing translation, so I assume this is not a problem, right?

from whisper-diarization.

MahmoudAshraf97 avatar MahmoudAshraf97 commented on September 26, 2024

Not a problem

from whisper-diarization.

tophee avatar tophee commented on September 26, 2024

I'm confused. I tried the above command on a different file twice and got two different errors, each different from the one reported above.

First time ended with

Suppressing numeral and symbol tokens
Traceback (most recent call last):
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1472, in _get_module
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers.models.wav2vec2_bert.configuration_wav2vec2_bert'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/xhxxch/whisper-dia/diarize.py", line 124, in <module>
    alignment_model, alignment_tokenizer, alignment_dictionary = load_alignment_model(
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/ctc_forced_aligner/alignment_utils.py", line 276, in load_alignment_model
    AutoModelForCTC.from_pretrained(
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 540, in from_pretrained
    if kwargs_orig.get("quantization_config", None) is not None:
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 751, in keys
    return getattribute_from_module(self._modules[module_name], attr)
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 752, in <listcomp>
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 748, in _load_attr_from_module
    module_name = model_type_to_module_name(model_type)
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 692, in getattribute_from_module
    return None
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1462, in __getattr__
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1474, in _get_module
RuntimeError: Failed to import transformers.models.wav2vec2_bert.configuration_wav2vec2_bert because of the following error (look up to see its traceback):
No module named 'transformers.models.wav2vec2_bert.configuration_wav2vec2_bert'

While the above process was executing I also did pip install 'nemo_toolkit[nlp]'. Assuming that this may be the reason why I'm getting a different error, I did pip uninstall 'nemo_toolkit[nlp]' and just to make sure that I still have what I need I did pip install 'nemo_toolkit[asr]' again.

After that the very same command failed immediately with

objc[6072]: Class AVFFrameReceiver is implemented in both /opt/anaconda3/envs/pretzel/lib/libavdevice.58.8.100.dylib (0x1759f0798) and /opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x17860c760). One of the two will be used. Which one is undefined.
objc[6072]: Class AVFAudioReceiver is implemented in both /opt/anaconda3/envs/pretzel/lib/libavdevice.58.8.100.dylib (0x1759f07e8) and /opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x17860c7b0). One of the two will be used. Which one is undefined.
Traceback (most recent call last):
  File "/Users/xhxxch/whisper-dia/diarize.py", line 3, in <module>
    from helpers import (
  File "/Users/xhxxch/whisper-dia/helpers.py", line 7, in <module>
    from whisperx.alignment import DEFAULT_ALIGN_MODELS_HF, DEFAULT_ALIGN_MODELS_TORCH
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/__init__.py", line 1, in <module>
    from .transcribe import load_model
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/transcribe.py", line 10, in <module>
    from .asr import load_model
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/asr.py", line 13, in <module>
    from .vad import load_vad_model, merge_chunks
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/vad.py", line 11, in <module>
    from pyannote.audio.pipelines import VoiceActivityDetection
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/__init__.py", line 26, in <module>
    from .speaker_diarization import SpeakerDiarization
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 42, in <module>
    from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 56, in <module>
    from nemo.collections.asr.models import (
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/__init__.py", line 15, in <module>
    from nemo.collections.asr import data, losses, models, modules
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/models/__init__.py", line 36, in <module>
    from nemo.collections.asr.models.transformer_bpe_models import EncDecTransfModelBPE
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/models/transformer_bpe_models.py", line 52, in <module>
    from nemo.collections.nlp.modules.common import TokenClassifier
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/__init__.py", line 15, in <module>
    from nemo.collections.nlp import data, losses, models, modules
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/__init__.py", line 31, in <module>
    from nemo.collections.nlp.models.machine_translation import MTEncDecModel
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/__init__.py", line 15, in <module>
    from nemo.collections.nlp.models.machine_translation.mt_enc_dec_bottleneck_model import MTBottleneckModel
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/mt_enc_dec_bottleneck_model.py", line 23, in <module>
    from nemo.collections.nlp.models.machine_translation.mt_enc_dec_model import MTEncDecModel
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py", line 38, in <module>
    from nemo.collections.common.tokenizers.chinese_tokenizers import ChineseProcessor
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/common/tokenizers/chinese_tokenizers.py", line 38, in <module>
    import opencc
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/opencc.py", line 24, in <module>
    libopencc = CDLL('libopencc.so.1', use_errno=True)
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(libopencc.so.1, 0x0006): tried: 'libopencc.so.1' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibopencc.so.1' (no such file), '/opt/anaconda3/envs/pretzel/lib/python3.10/lib-dynload/../../libopencc.so.1' (no such file), '/opt/anaconda3/envs/pretzel/bin/../lib/libopencc.so.1' (no such file), '/usr/lib/libopencc.so.1' (no such file, not in dyld cache), 'libopencc.so.1' (no such file), '/usr/local/lib/libopencc.so.1' (no such file), '/usr/lib/libopencc.so.1' (no such file, not in dyld cache)

Edit: I reinstalled the requirements (exceopt for nemo, which fails via the requirements.txt), but the error remains the same, no matter what audio file I use.

from whisper-diarization.

MahmoudAshraf97 avatar MahmoudAshraf97 commented on September 26, 2024

please reinstall ctc-forced-aligner again, it needs to be recompiled with the torch version you are using, and upgrade transformers to the latest version or atleast 4.34

from whisper-diarization.

MahmoudAshraf97 avatar MahmoudAshraf97 commented on September 26, 2024

or it's better to reinstall all the requirements

from whisper-diarization.

tophee avatar tophee commented on September 26, 2024

or it's better to reinstall all the requirements

I did, but that didn't change anything.

What seems to work (still executing, so far) is the solution mentioned in #177 (comment). I did

brew install opencc
ln -s /opt/homebrew/lib/libopencc.dylib libopencc.so.1

Now I'm waiting for the command to process to finish after Suppressing numeral and symbol tokens

What puzzles me, though is, why I oreviously (with the first testfile above) didn't get an error about libopencc.so.1 and now suddenly I did.

Edit: OK, we're back to where we were in the OP:

Suppressing numeral and symbol tokens
Some weights of the model checkpoint at MahmoudAshraf/mms-300m-1130-forced-aligner were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at MahmoudAshraf/mms-300m-1130-forced-aligner and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "/Users/xhxxch/whisper-dia/diarize.py", line 155, in <module>
    spans = get_spans(tokens_starred, segments, alignment_tokenizer.decode(blank_id))
  File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/ctc_forced_aligner/alignment_utils.py", line 63, in get_spans
    assert seg.label == ltr, f"{seg.label} != {ltr}"
AssertionError: g != <star>

But this is with a different audio file. So the error is not specific to one specific file. I'm suspecting it's not so much aboyút the audio file but about the language. You can probably take any Audio file in Swedish and reproduce the error.

Maybe this is related: As I am trying to understand how your script works, it looks like it is using a wav2vec2 model, just like whisperX which made me wonder how it works with Swedish audio, given that Swedish is not one of the languages for which whisperX already has a wav2vec2 model (when I tried whisperX I used KBLab/wav2vec2-large-voxrex-swedish).

from whisper-diarization.

MahmoudAshraf97 avatar MahmoudAshraf97 commented on September 26, 2024

@tophee my script uses a multilingual alignment model, so if you changed the default model to a model which has the native vocabulary of the language you need to turn the romanization off too, can you upload the audio file to test as I have tried a Swedish audio and it worked fine with the default model

from whisper-diarization.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.