Comments (3)
Any other detail? Version of python in use? Errors?
from whisper-diarization.
Hey @transcriptionstream thanks for your reply!
Python 3.10 I dont get any errors
60
00:05:28,056 --> 00:05:32,820
Speaker 1: Right now we spend the same amount of compute on each token, a dumb one, or like figuring out some complicated math.
61
00:05:32,820 --> 00:05:33,700
Speaker 1: !
62
00:05:33,700 --> 00:05:36,383
Speaker 0: Subscribe to Unconfuse Me wherever you listen to podcasts.
until 60 everything worked fine and accurate but after that there is a lot of spoken text which is missing and after that comes in the audio the part of 62 so it skipps it
when I repeat it, the skipped audio part differs in length
47
00:04:57,496 --> 00:05:02,519
Speaker 0: So, you know, to generate every new word, it's essentially doing the same thing.
48
00:05:02,519 --> 00:05:33,700
Speaker 0: !
49
00:05:33,700 --> 00:05:36,383
Speaker 0: Subscribe to Unconfuse Me wherever you listen to podcasts.
now the skipped part is way longer but the last sentence is still there
from whisper-diarization.
Thats the full logging
python diarize.py -a /home/pascal/code/video_translator/data/sent_lvl_sd/bgates_saltmann2/audio_file_enh.wav --whisper-model large-v3 --suppress_numerals --device cuda --language en
/home/pascal/anaconda3/envs/whisper_diar_inf/lib/python3.10/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
torchvision is not available - cannot save figures
[NeMo W 2024-03-27 17:19:05 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
Failed to align segment ("!"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("!"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("!"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("!"): no characters in this segment found in model dictionary, resorting to original...
[NeMo I 2024-03-27 17:20:14 msdd_models:1092] Loading pretrained diar_msdd_telephonic model from NGC
[NeMo I 2024-03-27 17:20:14 cloud:58] Found existing object /home/pascal/.cache/torch/NeMo/NeMo_1.22.0/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo.
[NeMo I 2024-03-27 17:20:14 cloud:64] Re-using file from: /home/pascal/.cache/torch/NeMo/NeMo_1.22.0/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo
[NeMo I 2024-03-27 17:20:14 common:913] Instantiating model from pre-trained checkpoint
[NeMo W 2024-03-27 17:20:15 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: true
[NeMo W 2024-03-27 17:20:15 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: false
[NeMo W 2024-03-27 17:20:15 modelPT:174] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
emb_dir: null
sample_rate: 16000
num_spks: 2
soft_label_thres: 0.5
labels: null
batch_size: 15
emb_batch_size: 0
shuffle: false
seq_eval_mode: false
[NeMo I 2024-03-27 17:20:15 features:289] PADDING: 16
[NeMo I 2024-03-27 17:20:15 features:289] PADDING: 16
[NeMo I 2024-03-27 17:20:15 audio_preprocessing:517] Numba CUDA SpecAugment kernel is being used
[NeMo I 2024-03-27 17:20:15 save_restore_connector:249] Model EncDecDiarLabelModel was successfully restored from /home/pascal/.cache/torch/NeMo/NeMo_1.22.0/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo.
[NeMo I 2024-03-27 17:20:15 features:289] PADDING: 16
[NeMo I 2024-03-27 17:20:15 audio_preprocessing:517] Numba CUDA SpecAugment kernel is being used
[NeMo I 2024-03-27 17:20:15 clustering_diarizer:127] Loading pretrained vad_multilingual_marblenet model from NGC
[NeMo I 2024-03-27 17:20:15 cloud:58] Found existing object /home/pascal/.cache/torch/NeMo/NeMo_1.22.0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.
[NeMo I 2024-03-27 17:20:15 cloud:64] Re-using file from: /home/pascal/.cache/torch/NeMo/NeMo_1.22.0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo
[NeMo I 2024-03-27 17:20:15 common:913] Instantiating model from pre-trained checkpoint
[NeMo W 2024-03-27 17:20:15 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
Train config :
manifest_filepath: /manifests/ami_train_0.63.json,/manifests/freesound_background_train.json,/manifests/freesound_laughter_train.json,/manifests/fisher_2004_background.json,/manifests/fisher_2004_speech_sampled.json,/manifests/google_train_manifest.json,/manifests/icsi_all_0.63.json,/manifests/musan_freesound_train.json,/manifests/musan_music_train.json,/manifests/musan_soundbible_train.json,/manifests/mandarin_train_sample.json,/manifests/german_train_sample.json,/manifests/spanish_train_sample.json,/manifests/french_train_sample.json,/manifests/russian_train_sample.json
sample_rate: 16000
labels:
- background
- speech
batch_size: 256
shuffle: true
is_tarred: false
tarred_audio_filepaths: null
tarred_shard_strategy: scatter
augmentor:
shift:
prob: 0.5
min_shift_ms: -10.0
max_shift_ms: 10.0
white_noise:
prob: 0.5
min_level: -90
max_level: -46
norm: true
noise:
prob: 0.5
manifest_path: /manifests/noise_0_1_musan_fs.json
min_snr_db: 0
max_snr_db: 30
max_gain_db: 300.0
norm: true
gain:
prob: 0.5
min_gain_dbfs: -10.0
max_gain_dbfs: 10.0
norm: true
num_workers: 16
pin_memory: true
[NeMo W 2024-03-27 17:20:15 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).
Validation config :
manifest_filepath: /manifests/ami_dev_0.63.json,/manifests/freesound_background_dev.json,/manifests/freesound_laughter_dev.json,/manifests/ch120_moved_0.63.json,/manifests/fisher_2005_500_speech_sampled.json,/manifests/google_dev_manifest.json,/manifests/musan_music_dev.json,/manifests/mandarin_dev.json,/manifests/german_dev.json,/manifests/spanish_dev.json,/manifests/french_dev.json,/manifests/russian_dev.json
sample_rate: 16000
labels:
- background
- speech
batch_size: 256
shuffle: false
val_loss_idx: 0
num_workers: 16
pin_memory: true
[NeMo W 2024-03-27 17:20:15 modelPT:174] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
Test config :
manifest_filepath: null
sample_rate: 16000
labels:
- background
- speech
batch_size: 128
shuffle: false
test_loss_idx: 0
[NeMo I 2024-03-27 17:20:15 features:289] PADDING: 16
[NeMo I 2024-03-27 17:20:15 save_restore_connector:249] Model EncDecClassificationModel was successfully restored from /home/pascal/.cache/torch/NeMo/NeMo_1.22.0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.
[NeMo I 2024-03-27 17:20:15 msdd_models:864] Multiscale Weights: [1, 1, 1, 1, 1]
[NeMo I 2024-03-27 17:20:15 msdd_models:865] Clustering Parameters: {
"oracle_num_speakers": false,
"max_num_speakers": 8,
"enhanced_count_thres": 80,
"max_rp_threshold": 0.25,
"sparse_search_volume": 30,
"maj_vote_spk_count": false
}
[NeMo I 2024-03-27 17:20:15 speaker_utils:93] Number of files to diarize: 1
[NeMo I 2024-03-27 17:20:15 clustering_diarizer:309] Split long audio file to avoid CUDA memory issue
splitting manifest: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.29it/s]
[NeMo I 2024-03-27 17:20:16 classification_models:273] Perform streaming frame-level VAD
[NeMo I 2024-03-27 17:20:16 collections:445] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-03-27 17:20:16 collections:446] Dataset loaded with 8 items, total duration of 0.10 hours.
[NeMo I 2024-03-27 17:20:16 collections:448] # 8 files loaded accounting to # 1 labels
vad: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:01<00:00, 7.35it/s]
[NeMo I 2024-03-27 17:20:17 clustering_diarizer:250] Generating predictions with overlapping input segments
[NeMo I 2024-03-27 17:20:18 clustering_diarizer:262] Converting frame level prediction to speech/no-speech segment in start and end times format.
creating speech segments: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6.57it/s]
[NeMo I 2024-03-27 17:20:19 clustering_diarizer:287] Subsegmentation for embedding extraction: scale0, /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/subsegments_scale0.json
[NeMo I 2024-03-27 17:20:19 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2024-03-27 17:20:19 collections:445] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-03-27 17:20:19 collections:446] Dataset loaded with 343 items, total duration of 0.13 hours.
[NeMo I 2024-03-27 17:20:19 collections:448] # 343 files loaded accounting to # 1 labels
[1/5] extract embeddings: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 10.06it/s]
[NeMo I 2024-03-27 17:20:19 clustering_diarizer:389] Saved embedding files to /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings
[NeMo I 2024-03-27 17:20:19 clustering_diarizer:287] Subsegmentation for embedding extraction: scale1, /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/subsegments_scale1.json
[NeMo I 2024-03-27 17:20:19 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2024-03-27 17:20:19 collections:445] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-03-27 17:20:19 collections:446] Dataset loaded with 420 items, total duration of 0.13 hours.
[NeMo I 2024-03-27 17:20:19 collections:448] # 420 files loaded accounting to # 1 labels
[2/5] extract embeddings: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 12.25it/s]
[NeMo I 2024-03-27 17:20:20 clustering_diarizer:389] Saved embedding files to /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings
[NeMo I 2024-03-27 17:20:20 clustering_diarizer:287] Subsegmentation for embedding extraction: scale2, /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/subsegments_scale2.json
[NeMo I 2024-03-27 17:20:20 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2024-03-27 17:20:20 collections:445] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-03-27 17:20:20 collections:446] Dataset loaded with 535 items, total duration of 0.14 hours.
[NeMo I 2024-03-27 17:20:20 collections:448] # 535 files loaded accounting to # 1 labels
[3/5] extract embeddings: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 13.27it/s]
[NeMo I 2024-03-27 17:20:20 clustering_diarizer:389] Saved embedding files to /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings
[NeMo I 2024-03-27 17:20:20 clustering_diarizer:287] Subsegmentation for embedding extraction: scale3, /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/subsegments_scale3.json
[NeMo I 2024-03-27 17:20:21 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2024-03-27 17:20:21 collections:445] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-03-27 17:20:21 collections:446] Dataset loaded with 722 items, total duration of 0.14 hours.
[NeMo I 2024-03-27 17:20:21 collections:448] # 722 files loaded accounting to # 1 labels
[4/5] extract embeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 15.12it/s]
[NeMo I 2024-03-27 17:20:21 clustering_diarizer:389] Saved embedding files to /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings
[NeMo I 2024-03-27 17:20:21 clustering_diarizer:287] Subsegmentation for embedding extraction: scale4, /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/subsegments_scale4.json
[NeMo I 2024-03-27 17:20:21 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2024-03-27 17:20:21 collections:445] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2024-03-27 17:20:21 collections:446] Dataset loaded with 1106 items, total duration of 0.15 hours.
[NeMo I 2024-03-27 17:20:21 collections:448] # 1106 files loaded accounting to # 1 labels
[5/5] extract embeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 18.48it/s]
[NeMo I 2024-03-27 17:20:22 clustering_diarizer:389] Saved embedding files to /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings
clustering: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.52it/s]
[NeMo I 2024-03-27 17:20:23 clustering_diarizer:464] Outputs are saved in /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs directory
[NeMo W 2024-03-27 17:20:23 der:185] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-03-27 17:20:23 msdd_models:960] Loading embedding pickle file of scale:0 at /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings/subsegments_scale0_embeddings.pkl
[NeMo I 2024-03-27 17:20:23 msdd_models:960] Loading embedding pickle file of scale:1 at /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings/subsegments_scale1_embeddings.pkl
[NeMo I 2024-03-27 17:20:23 msdd_models:960] Loading embedding pickle file of scale:2 at /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings/subsegments_scale2_embeddings.pkl
[NeMo I 2024-03-27 17:20:23 msdd_models:960] Loading embedding pickle file of scale:3 at /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings/subsegments_scale3_embeddings.pkl
[NeMo I 2024-03-27 17:20:23 msdd_models:960] Loading embedding pickle file of scale:4 at /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/embeddings/subsegments_scale4_embeddings.pkl
[NeMo I 2024-03-27 17:20:23 msdd_models:938] Loading cluster label file from /home/pascal/code/chamelaion_inference/speaker_diarization/temp_outputs/speaker_outputs/subsegments_scale4_cluster.label
[NeMo I 2024-03-27 17:20:23 collections:761] Filtered duration for loading collection is 0.000000.
[NeMo I 2024-03-27 17:20:23 collections:764] Total 1 session files loaded accounting to # 1 audio clips
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 36.66it/s]
[NeMo I 2024-03-27 17:20:23 msdd_models:1403] [Threshold: 0.7000] [use_clus_as_main=False] [diar_window=50]
[NeMo I 2024-03-27 17:20:23 speaker_utils:93] Number of files to diarize: 1
[NeMo I 2024-03-27 17:20:23 speaker_utils:93] Number of files to diarize: 1
[NeMo W 2024-03-27 17:20:23 der:185] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-03-27 17:20:23 speaker_utils:93] Number of files to diarize: 1
[NeMo W 2024-03-27 17:20:23 der:185] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-03-27 17:20:23 speaker_utils:93] Number of files to diarize: 1
[NeMo W 2024-03-27 17:20:23 der:185] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate
[NeMo I 2024-03-27 17:20:23 msdd_models:1431]
from whisper-diarization.
Related Issues (20)
- Issue with an audio/video file HOT 1
- Numpy Conflict - current requirements.txt HOT 2
- Is this repo usable for a production use case!! HOT 7
- Error in diarization
- Installing from requirements.txt leads to the installation of ?every version of the packages needed HOT 7
- faster-whisper branch/revision has changed HOT 3
- diarize.py unexpected keyword argument ‘max_new_tokens’ HOT 4
- Error: got an unexpected keyword argument 'max_new_tokens' HOT 2
- AssertionError: chunk size too large, text got clipped HOT 2
- Language param not working HOT 2
- install issue HOT 1
- WhisperX forced alignment HOT 1
- How to use Yaml File HOT 1
- Failed to install on Apple Silicon HOT 10
- word_timestamps - IndexError: list index out of range HOT 1
- Transcription for non-verbal/non-speech labels(laughter etc.)? HOT 3
- Any suggestions for improving speaker diarization!! HOT 3
- Install fails on Python 3.12 due to missing distutils HOT 5
- python version it best works in ?????? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper-diarization.