vasistalodagala / whisper-finetune Goto Github PK

Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.

License: MIT License

Python 100.00%

asr jax pytorch speech-recognition transformers whisper

whisper-finetune's People

Contributors

Stargazers

Watchers

Forkers

reddevill ai-x-king ngeniedeveloper yfliao jairsan ayushi1509 lorenzoncina james-coder junoe2020 lsrao kkramarenko xuridongsheng7142 lihansen suryatmodulus airobotproject sauravghodasara chintan-desynova givimad vijaynaidu hanasim ankit1057 yaselley dannyzis titiaffandi kedarjois shahin-trunk sunnypage mnabihali prashantgb h2-ml-stt goy0695 sofea-ruslan loganmaria raiyan007-gb dangnm xjohnxjohn csetanmayjain chevolier dreamerlll muhammad-jafri fortvivlan ishine tuwrraphael

whisper-finetune's Issues

how to add a new language?

Dear All,

I would like to recognize Taiwanese Hakka speech using fine-tuned Whisper. However, Hakka is not supported by WhisperTokenizer. Any idea?

Here is my code and log:

ngpu=10  # number of GPUs to perform distributed training on.

torchrun --nproc_per_node=${ngpu} train/fine-tune_on_custom_dataset.py \
--model_name vasista22/whisper-telugu-base \
--language hakka \
--sampling_rate 16000 \
--num_proc 4 \
--train_strategy epoch \
--learning_rate 3e-3 \
--warmup 1000 \
--train_batchsize 16 \
--eval_batchsize 8 \
--num_epochs 20 \
--resume_from_ckpt None \
--output_dir op_dir_epoch \
--train_datasets output_data/train  \
--eval_datasets output_data/dev output_data/test


ValueError: Unsupported language: hakka. Language should be one of: ['english', 'chinese', 'german', 'spanish', 'russian', 'korean', 'french', 'japanese', 'portuguese', 'turkish', 'polish', 'catalan', 'dutch', 'arabic', 'swedish', 'italian', 'indonesian', 'hindi', 'finnish', 'vietnamese', 'hebrew', 'ukrainian', 'greek', 'malay', 'czech', 'romanian', 'danish', 'hungarian', 'tamil', 'norwegian', 'thai', 'urdu', 'croatian', 'bulgarian', 'lithuanian', 'latin', 'maori', 'malayalam', 'welsh', 'slovak', 'telugu', 'persian', 'latvian', 'bengali', 'serbian', 'azerbaijani', 'slovenian', 'kannada', 'estonian', 'macedonian', 'breton', 'basque', 'icelandic', 'armenian', 'nepali', 'mongolian', 'bosnian', 'kazakh', 'albanian', 'swahili', 'galician', 'marathi', 'punjabi', 'sinhala', 'khmer', 'shona', 'yoruba', 'somali', 'afrikaans', 'occitan', 'georgian', 'belarusian', 'tajik', 'sindhi', 'gujarati', 'amharic', 'yiddish', 'lao', 'uzbek', 'faroese', 'haitian creole', 'pashto', 'turkmen', 'nynorsk', 'maltese', 'sanskrit', 'luxembourgish', 'myanmar', 'tibetan', 'tagalog', 'malagasy', 'assamese', 'tatar', 'hawaiian', 'lingala', 'hausa', 'bashkir', 'javanese', 'sundanese', 'burmese', 'valencian', 'flemish', 'haitian', 'letzeburgesch', 'pushto', 'panjabi', 'moldavian', 'moldovan', 'sinhalese', 'castilian'].
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 1353, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3358, in _map_single
    example = apply_function_on_filtered_inputs(example, i, offset=offset)
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3261, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/usr1/liao/whisper-hakka/train/fine-tune_on_custom_dataset.py", line 198, in prepare_dataset
    batch["labels"] = processor.tokenizer(transcription).input_ids
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2538, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2644, in _call_one
    return self.encode_plus(
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2717, in encode_plus
    return self._encode_plus(
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 652, in _encode_plus
    return self.prepare_for_model(
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3156, in prepare_for_model
    total_len = len_ids + len_pair_ids + (self.num_special_tokens_to_add(pair=pair) if add_special_tokens else 0)
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 479, in num_special_tokens_to_add
    return len(self.build_inputs_with_special_tokens(token_ids_0, token_ids_1 if pair else None))
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/transformers/models/whisper/tokenization_whisper.py", line 428, in build_inputs_with_special_tokens
    return self.prefix_tokens + token_ids_0 + [self.eos_token_id]
  File "/home/liao/anaconda3/envs/pytorch/lib/python3.9/site-packages/transformers/models/whisper/tokenization_whisper.py", line 406, in prefix_tokens
    raise ValueError(
ValueError: Unsupported language: hakka. Language should be one of: ['english', 'chinese', 'german', 'spanish', 'russian', 'korean', 'french', 'japanese', 'portuguese', 'turkish', 'polish', 'catalan', 'dutch', 'arabic', 'swedish', 'italian', 'indonesian', 'hindi', 'finnish', 'vietnamese', 'hebrew', 'ukrainian', 'greek', 'malay', 'czech', 'romanian', 'danish', 'hungarian', 'tamil', 'norwegian', 'thai', 'urdu', 'croatian', 'bulgarian', 'lithuanian', 'latin', 'maori', 'malayalam', 'welsh', 'slovak', 'telugu', 'persian', 'latvian', 'bengali', 'serbian', 'azerbaijani', 'slovenian', 'kannada', 'estonian', 'macedonian', 'breton', 'basque', 'icelandic', 'armenian', 'nepali', 'mongolian', 'bosnian', 'kazakh', 'albanian', 'swahili', 'galician', 'marathi', 'punjabi', 'sinhala', 'khmer', 'shona', 'yoruba', 'somali', 'afrikaans', 'occitan', 'georgian', 'belarusian', 'tajik', 'sindhi', 'gujarati', 'amharic', 'yiddish', 'lao', 'uzbek', 'faroese', 'haitian creole', 'pashto', 'turkmen', 'nynorsk', 'maltese', 'sanskrit', 'luxembourgish', 'myanmar', 'tibetan', 'tagalog', 'malagasy', 'assamese', 'tatar', 'hawaiian', 'lingala', 'hausa', 'bashkir', 'javanese', 'sundanese', 'burmese', 'valencian', 'flemish', 'haitian', 'letzeburgesch', 'pushto', 'panjabi', 'moldavian', 'moldovan', 'sinhalese', 'castilian'].
"""

Fine tuning whisper model on custom dataset -- trainer error.

I am running the fine-tuning script on a custom dataset. In the trainer initialization,

trainer = Seq2SeqTrainer(
    training_args,
    model=model,
    train_dataset=raw_dataset["train"],
    eval_dataset=raw_dataset["eval"],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    tokenizer=processor.feature_extractor,
)

I am getting a TypeError: __init__() got multiple values for argument 'model' .
Can anyone help me with this?

Config error in requirements.txt pkg_resources==0.0.0 causes failure to load

Create a new virtual environment.
Run pip install -r requirements.txt
Error when trying to find pkg_resources==0.0.0

Apparently this is related to a known Ubuntu issue?

pypa/pip#4022

Information On Batch Size And Learning Rate

The discord link in the README does not work for me.

Do you have any information on what batch size or learning rate to use? I could only find the max learning rate that was used in the paper. Experimentally, I found that too small of a batch size seems to cause issues

What batch size and learning rate do you recommend and why?

dataset with segment metadata

Hi dear,
Thank you for this code.

If my dataset contains long waves with segment metadata how can prepare it?
for example:
wav_1 path_wave

seg_1 wav_1 beginning_segment end_segment
seg_1 wav_1 1.2 3.2

pytorch_model.bin not getting saved

While fine tuning vasita/whisper-kannada-small model on customized dataset, after training, in the output directory, all the other json and bin files are getting saved except pytorch_model.bin file and after saving it says some keys were missing while saving the model [proj_out.weight]. Why is that I have no clue, I actually run the whole thing on google colab.

import torch
import evaluate
from dataclasses import dataclass
from typing import Any, Dict, List, Union
from datasets import DatasetDict, Audio, load_from_disk, concatenate_datasets
from transformers.models.whisper.english_normalizer import BasicTextNormalizer
from transformers import (
WhisperFeatureExtractor,
WhisperTokenizer,
WhisperProcessor,
WhisperForConditionalGeneration,
Seq2SeqTrainingArguments,
Seq2SeqTrainer,
)

model_name = 'vasista22/whisper-kannada-small'
language = 'Kannada'
sampling_rate = 16000
num_proc = 1
train_strategy = 'steps'
learning_rate = 1.75e-5*0.1
warmup = 20
train_batchsize = 16
eval_batchsize = 8

num_epochs = 20

num_steps = 50
resume_from_ckpt = None
output_dir = 'model_1'
train_datasets = ['/content/drive/MyDrive/Children/prepared_data']
eval_datasets = ['/content/drive/MyDrive/Children/prepared_data']

gradient_checkpointing = True
freeze_feature_encoder = False
freeze_encoder = False
do_normalize_eval = True
do_lower_case = False
do_remove_punctuation = False
normalizer = BasicTextNormalizer()

feature_extractor = WhisperFeatureExtractor.from_pretrained(model_name)
tokenizer = WhisperTokenizer.from_pretrained(model_name, language=language, task="transcribe")
processor = WhisperProcessor.from_pretrained(model_name, language=language, task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained(model_name)

if model.config.decoder_start_token_id is None:
raise ValueError("Make sure that config.decoder_start_token_id is correctly defined")

if freeze_feature_encoder:
model.freeze_feature_encoder()

if freeze_encoder:
model.freeze_encoder()
model.model.encoder.gradient_checkpointing = False

model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

if gradient_checkpointing:
model.config.use_cache = False

def load_custom_dataset(split):
ds = []
if split == 'train':
for dset in train_datasets:
ds.append(load_from_disk(dset))
if split == 'eval':
for dset in eval_datasets:
ds.append(load_from_disk(dset))

ds_to_return = concatenate_datasets(ds)
ds_to_return = ds_to_return.shuffle(seed=22)
return ds_to_return

def prepare_dataset(batch):
# load and (possibly) resample audio data to 16kHz
audio = batch["audio"]

# compute log-Mel input features from input audio array
batch["input_features"] = processor.feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]
# compute input length of audio sample in seconds
batch["input_length"] = len(audio["array"]) / audio["sampling_rate"]

# optional pre-processing steps
transcription = batch["sentence"]
if do_lower_case:
    transcription = transcription.lower()
if do_remove_punctuation:
    transcription = normalizer(transcription).strip()

# encode target text to label ids
batch["labels"] = processor.tokenizer(transcription).input_ids
return batch

max_label_length = model.config.max_length
min_input_length = 0.0
max_input_length = 30.0
def is_in_length_range(length, labels):
return min_input_length < length < max_input_length and 0 < len(labels) < max_label_length

print('DATASET PREPARATION IN PROGRESS...')
raw_dataset = DatasetDict()
raw_dataset["train"] = load_custom_dataset('train')
raw_dataset["eval"] = load_custom_dataset('eval')

raw_dataset = raw_dataset.cast_column("audio", Audio(sampling_rate=sampling_rate))
raw_dataset = raw_dataset.map(prepare_dataset, num_proc=num_proc)

raw_dataset = raw_dataset.filter(
is_in_length_range,
input_columns=["input_length", "labels"],
num_proc=num_proc,
)

@DataClass
class DataCollatorSpeechSeq2SeqWithPadding:
processor: Any

def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
    # split inputs and labels since they have to be of different lengths and need different padding methods
    # first treat the audio inputs by simply returning torch tensors
    input_features = [{"input_features": feature["input_features"]} for feature in features]
    batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")

    # get the tokenized label sequences
    label_features = [{"input_ids": feature["labels"]} for feature in features]
    # pad the labels to max length
    labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")

    # replace padding with -100 to ignore loss correctly
    labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)

    # if bos token is appended in previous tokenization step,
    # cut bos token here as it's append later anyways
    if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():
        labels = labels[:, 1:]

    batch["labels"] = labels

    return batch

data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor)
print('DATASET PREPARATION COMPLETED')

metric = evaluate.load("wer")
def compute_metrics(pred):
pred_ids = pred.predictions
label_ids = pred.label_ids

# replace -100 with the pad_token_id
label_ids[label_ids == -100] = processor.tokenizer.pad_token_id

# we do not want to group tokens when computing the metrics
pred_str = processor.tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
label_str = processor.tokenizer.batch_decode(label_ids, skip_special_tokens=True)

if do_normalize_eval:
    pred_str = [normalizer(pred) for pred in pred_str]
    label_str = [normalizer(label) for label in label_str]

wer = 100 * metric.compute(predictions=pred_str, references=label_str)
return {"wer": wer}

############################### TRAINING ARGS AND TRAINING ############################

if train_strategy == 'epoch':
training_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=train_batchsize,
gradient_accumulation_steps=1,
learning_rate=learning_rate,
warmup_steps=warmup,
gradient_checkpointing=gradient_checkpointing,
fp16=True,
evaluation_strategy="epoch",
save_strategy="epoch",
num_train_epochs=num_epochs,
save_total_limit=10,
per_device_eval_batch_size=eval_batchsize,
predict_with_generate=True,
generation_max_length=225,
logging_steps=500,
report_to=["tensorboard"],
load_best_model_at_end=True,
metric_for_best_model="wer",
greater_is_better=False,
optim="adamw_bnb_8bit",
resume_from_checkpoint= resume_from_ckpt,
)

elif train_strategy == 'steps':
training_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=train_batchsize,
gradient_accumulation_steps=1,
learning_rate=learning_rate,
warmup_steps=warmup,
gradient_checkpointing=gradient_checkpointing,
fp16=True,
evaluation_strategy="steps",
eval_steps=50,
save_strategy="steps",
save_steps=50,
max_steps=num_steps,
save_total_limit=10,
per_device_eval_batch_size=eval_batchsize,
predict_with_generate=True,
generation_max_length=225,
logging_steps=500,
report_to=["tensorboard"],
load_best_model_at_end=True,
metric_for_best_model="wer",
greater_is_better=False,
optim="adamw_bnb_8bit",
resume_from_checkpoint=resume_from_ckpt,
)

trainer = Seq2SeqTrainer(
args=training_args,
model=model,
train_dataset=raw_dataset["train"],
eval_dataset=raw_dataset["eval"],
data_collator=data_collator,
compute_metrics=compute_metrics,
tokenizer=processor.feature_extractor,
)

processor.save_pretrained(output_dir)
model.save_pretrained(output_dir)
print('TRAINING IN PROGRESS...')
trainer.train()
print('DONE TRAINING')

This was the code, please help

Finetuning is not progressing, it just stops, remaining time is a question mark.

ARGUMENTS OF INTEREST:
{'model_name': 'openai/whisper-large-v3', 'language': 'hungarian', 'sampling_rate': 16000, 'num_proc': 2, 'train_strategy': 'epoch', 'learning_rate': 0.003, 'warmup': 1000, 'train_batchsize': 16, 'eval_batchsize': 8, 'num_epochs': 20, 'num_steps': 100000, 'resume_from_ckpt': 'None', 'output_dir': 'models', 'train_datasets': ['/data/train'], 'eval_datasets': ['/data/dev']}

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
DATASET PREPARATION IN PROGRESS...
Loading cached shuffled indices for dataset at /tmp/tmpiriwoix_/data//train/cache-b8d5b2a7daea0d9d.arrow
Loading cached shuffled indices for dataset at /tmp/tmpiriwoix_/data/dev/cache-fedae2d849cf9c83.arrow
Map (num_proc=2): 0%| | 0/1000 [00:00<?, ? examples/s]

How to convert the whisper-finetune model to ONNX?

model.freeze_feature_encoder vs model.freeze_encoder

In the fine-tune_on_custom_dataset.py script, what's the difference between model.freeze_feature_encoder and model.freeze_encoder? What do you suggest to use in which scenario?

Multilingual Finetuning of Whisper

Has anyone managed to fine-tune WHisper on multiple languages?

Error while evaluating on hf dataset

Im getting this error while evaluating on hf dataset:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/datasets/load.py:2554: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=<use_auth_token>' instead.
warnings.warn(
Decode Progress: 0it [00:01, ?it/s]
Traceback (most recent call last):
File "/Users/prox/PycharmProjects/liveWhisper/testing.py", line 227, in
main(args)
File "/Users/prox/PycharmProjects/liveWhisper/testing.py", line 102, in main
for out in tqdm(whisper_asr(data(dataset), batch_size=args.batch_size), desc='Decode Progress'):
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 124, in next
item = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 269, in next
processed = self.infer(next(self.iterator), **self.params)
^^^^^^^^^^^^^^^^^^^
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 42, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/transformers/pipelines/base.py", line 194, in inner
padded[key] = _pad(items, key, _padding_value, padding_side)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/transformers/pipelines/base.py", line 100, in _pad
max_length = max(item[key].shape[1] for item in items)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/prox/PycharmProjects/liveWhisper/venv/lib/python3.11/site-packages/transformers/pipelines/base.py", line 100, in
max_length = max(item[key].shape[1] for item in items)
~~~~~~~~~~~~~~~^^^
IndexError: tuple index out of range

How to fetch timestamps after infrencing on the telugu transcribe code.

I am inferencing my data on whisper-telugu-medium (https://huggingface.co/vasista22/whisper-telugu-medium). Its only transcribing the audio without providing the timestamps like whisper. Is there any way we can fetch the timecode for the transcribed audio?

Can u please help me in this scneraio?

Thanks in advance.

Can the transcribe_audio.py recognize long audio?

I tried to upload a long audio, but it was recognized without time division

finetune on commonvoice

Hi,
What should I do for commonvoice?

this isnt really an issue, just a question about fine tuning whisper to better recognize spoken names

We want to fine tune whisper to better recognize names.

We have about 12M samples of people saying their first and last names.

Is it better to fine tune whisper on both first and last name at the same time, or better to fine tune whisper based on individual audio clips where each clip contains just someone saying a first name and someone saying a last name?

missing arguments

in line 10 of data_prep.py the .txt arguments are missing which are causing directory errors please update it
scp_entries = open(f"{args.source_data_dir}/audio_paths**.txt**", 'r').readlines()
txt_entries = open(f"{args.source_data_dir}/text**.txt**", 'r').readlines()

Create LICENSE

Please add a license to this repo. Might I suggest an MIT license?

Insufficient VRAM

While trying to finetune the openai/whisper-medium model with the google/fleurs dataset, even only using one language (greek) I very soon run out of VRAM, on a 20GB VRAM GPU.

Is there some way to reduce the VRAM consumption?

fine-tuning does not seem to improve/converge

I succeeded to start fine-tuning process with my own labelled English speech data, i.e. using
fine-tune_on_custom_dataset.py
see output below.

However, the process does not seem to converge and eval_wer stuck to be at pretty high level.
Any idea what may go wrong?
I am using the 'standard' parameters as used in the example codes.
Question regarding the audio files: I assume that 16kHz wav files (short int values) are expected (i.e. with wav header,
no headerless pcm in any particular byte order), right?

Thanks for any hint !
kind regards

{'loss': 1.2395, 'learning_rate': 0.001488, 'epoch': 0.13}
{'loss': 1.8445, 'learning_rate': 0.002988, 'epoch': 0.27}
{'loss': 1.8692, 'learning_rate': 0.002979891891891892, 'epoch': 0.4}
{'loss': 1.8025, 'learning_rate': 0.0029596621621621622, 'epoch': 0.53}
{'loss': 1.7203, 'learning_rate': 0.002939391891891892, 'epoch': 0.67}
{'loss': 1.5855, 'learning_rate': 0.0029191621621621625, 'epoch': 0.8}
{'loss': 1.5751, 'learning_rate': 0.002900716216216216, 'epoch': 0.93}
{'eval_loss': nan, 'eval_wer': 100.0, 'eval_runtime': 22.4018, 'eval_samples_per_second': 2.232, 'eval_steps_per_second': 0.312, 'epoch': 1.0}
{'loss': 2.4114, 'learning_rate': 0.002896135135135135, 'epoch': 1.07}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 1.2}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 1.33}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 1.47}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 1.6}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 1.73}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 1.87}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 2.0}
{'eval_loss': nan, 'eval_wer': 100.0, 'eval_runtime': 21.6604, 'eval_samples_per_second': 2.308, 'eval_steps_per_second': 0.323, 'epoch': 2.0}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 2.13}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 2.27}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 2.4}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 2.53}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 2.67}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 2.8}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 2.93}
{'eval_loss': nan, 'eval_wer': 100.0, 'eval_runtime': 21.5522, 'eval_samples_per_second': 2.32, 'eval_steps_per_second': 0.325, 'epoch': 3.0}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 3.07}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 3.2}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 3.33}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 3.47}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 3.6}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 3.73}
{'loss': 0.0, 'learning_rate': 0.002896135135135135, 'epoch': 3.87}

vasistalodagala / whisper-finetune Goto Github PK

whisper-finetune's People

Contributors

Stargazers

Watchers

Forkers

whisper-finetune's Issues

num_epochs = 20

Thanks for any hint ! kind regards

Recommend Projects

Recommend Topics

Recommend Org

Thanks for any hint !
kind regards