tomchang25 / whisper-auto-transcribe Goto Github PK

Auto transcribe tool based on whisper

License: MIT License

Python 95.54% Batchfile 4.46%

asr text-to-speech deep-learning speech-recognition speech-to-text language-model pytorch speech-processing voice-activity-detection gradio

whisper-auto-transcribe's Introduction

whisper-auto-transcribe

Easily generate free subtitles for your video

View Demo · Report Bug · Request Feature

About The Project

Features:

Automatically generates subtitles for video or audio content
Translates content to English
Supports 99 languages
Offers high accuracy and ease of use
Provides support for GPU acceleration and CLI mode

Unique feature:

Includes a one-click installer
Increased time precision from 1 to 0.01 seconds
Supports Youtube integration
Preview subtitles in video
Provides support for Background Music Mute, works fine even during heavy metal live performances
Supports long files, 3-hour files have been tested
Resolves the issue of subtitle repetition
Support for batch processing.

Future feature:

Subtitle editing
Improved translation

The tool is based on OpenAI-whisper, the latest project developed by OpenAI.

For more details, you can check this.

(back to top)

How to use

Installation

Install Python 3 and Git

Clone the repo

# Chage currently dir to Document
# You can specify directory to any other location except "Program Files" and "Program Files (x86)"
cd ~

# Stable version
git clone https://github.com/tomchang25/whisper-auto-transcribe.git
cd whisper-auto-transcribe

Open webui.bat

Check for any errors and ensure that the final lines are correct.

Launching Web UI with arguments:
Running on local URL:  http://127.0.0.1:7860

Open your browser and go to http://127.0.0.1:7860

(Optional) Command-line interface

Open enable_venv.bat.

Now, you can use the CLI mode.

# Get help messages
python .\cli.py -h

# A simple example
python .\cli.py .\mp4\1min.mp4 --output .\tmp\123456.srt -lang ja --task translate --model large

# A batch example
python .\cli.py .\mp4 --output .\batch\ --model small --model medium

(Optional) GPU acceleration (CUDA.11.3)

Install CUDA
Install CUDNN

Unistall CPU version Pytorch

pip uninstall torch torchvision torchaudio

Reinstall GPU version Pytorch

# on Windows
python -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html

(back to top)

Demo

Heavy Metal Watch on Youtube

404
0:53:33.590 --> 0:53:38.190
From the depths of hellish silence, bastard spells, explosive violence
(From the depths of hell in silence, Cast their spells, explosive violence)

405
0:53:38.670 --> 0:53:43.190
Russian minds have my protected, glorious mission undetected
(Russian night time flight perfected, Flawless vision, undetected)

406
0:53:44.190 --> 0:53:48.190
Put down in all the flames, I'm going strong, I'm half-moon's number one
(Pushing on and on, their planes are going strong, Air Force number one)

407
0:53:49.110 --> 0:53:53.030
Talking with the moon, looking for the truth, I'm moon's number one
(Somewhere down below they're looking for the foe, Bomber's on the run)

408
0:53:53.870 --> 0:53:58.190
You can hide, you can move, just to write, learn to expect, learn to think dark
(You can't hide, you can't move, just abide, Their attack's been proved (raiders in the dark))

409
0:53:59.110 --> 0:54:03.190
Silence is the night, the witch is in the fight, never miss the mark
(Silent through the night the witches join the fight. Never miss their mark)

410
0:54:04.150 --> 0:54:08.090
Canvas, wings of death, the pattern is your fate
(Canvas wings of death, Prepare to meet your fate)

411
0:54:09.190 --> 0:54:13.030
Night on the regiment, 188
(Night Bomber Regiment, 588)

412
0:54:14.190 --> 0:54:19.090
Undetected, unexpected, wings of glory, tell the story
(Undetected, unexpected, Wings of glory, Tell their story)

413
0:54:19.530 --> 0:54:24.110
Deviation, deviation, undetected, stealth, perfected
(Aviation, deviation, Undetected, Stealth perfected)

414
0:54:24.330 --> 0:54:28.150
Silence in ground, retreated to the sound, helpless in the air
(Foes are losing ground, retreating to the sound, Death is in the air)

415
0:54:29.130 --> 0:54:33.150
Suddenly appears, the world in your face, mindful, the witch is there
(Suddenly appears, confirming all your fears, Strike from witches lair)

416
0:54:33.830 --> 0:54:36.850
Let it fall, come around, I don't sound so, we're about to drown
(Target found, come around, barrels sound, From the battleground)

417
0:54:37.210 --> 0:54:41.210
Lashes, standing high, the old genie awaits, the beaten at the gates
(Rodina awaits, defeat them at the gates, Live to fight and fly)

418
0:54:41.790 --> 0:54:43.430
Just to fight and fly
()

419
0:54:44.250 --> 0:54:48.190
Canvas, wings of death, the pattern is your fate
(Canvas wings of death, Prepare to meet your fate)

420
0:54:49.270 --> 0:54:53.070
Night on the regiment, 188
(Night Bomber Regiment, 588)

421
0:54:54.190 --> 0:54:59.110
Undetected, unexpected, wings of glory, tell the story
(Undetected, unexpected, Wings of glory, Tell their story)

422
0:54:59.470 --> 0:55:04.110
Deviation, deviation, undetected, stealth, perfected
(Aviation, deviation, Undetected, Stealth perfected)

423
0:55:24.140 --> 0:55:27.410
Beneath the starlight of the heavens
(Beneath the starlight of the heavens)

424
0:55:29.200 --> 0:55:31.720
Unlikely heroes in disguise
(Unlikely heroes in the skies)

425
0:55:31.720 --> 0:55:34.040
Canvas, wings of death, the witch is gonna die
(Canvas wings of death, Prepare to meet your fate)

426
0:55:34.660 --> 0:55:37.320
Stay in fear, humble horizon
(As they appear on the horizon)

427
0:55:39.540 --> 0:55:43.460
Win when wisdom, and the night witch has come
(The wind will whisper when the Night Witches come)

428
0:55:44.460 --> 0:55:48.560
Undetected, unexpected, wings of glory, tell the story
(Undetected, unexpected, Wings of glory, Tell their story)

429
0:55:49.480 --> 0:55:53.540
Deviation, deviation, undetected, stealth, perfected
(Aviation, deviation, undetected, Stealth perfected)

430
0:55:54.340 --> 0:55:58.140
From the depths of hell in silence, lost in spells, explosive violence
(From the depths of hell in silence, Cast their spells, explosive violence)

431
0:55:59.260 --> 0:56:04.220
Russian beta, but perfected, bonus mission, undetected
(Russian night time flight perfected, Flawless vision, undetected)

Limitation

Currently, there are several restrictions on this project.

GPU acceleration only works on CUDA environment.

Also, if you want to use GPU acceleration, please make sure you have enough GPU VRAM. Here is some recommended value.

Precision	Whisper model	Required VRAM	*Time used	Performance
1	`tiny`	~1 GB	~1/20	~Disaster
2	`base`	~1 GB	~1/10	~Youtube
3	`small`	~2 GB	~1/8	-
4	`medium`	~5 GB	~1/5	-
5	`large`	~10 GB	~1/2	~Sonix.ai

*Time used is relatived to video/audio time and test in 10 min Enlgish audio with GPU acceleration.

(back to top)

Contact

Report Bugs: https://github.com/tomchang25/whisper-auto-transcribe/issues

Project Link: https://github.com/tomchang25/whisper-auto-transcribe

My twitter: https://twitter.com/Greysuki

My Gmail: [email protected]

(back to top)

License

The code and the model weights of Whisper are released under the MIT License.

This project is distributed under the MIT License. Please refer to LICENSE.txt for more information.

(back to top)

Acknowledgments

(back to top)

whisper-auto-transcribe's People

Contributors

Stargazers

Watchers

Forkers

uwing85213 royhowtohack spladder87 0000duck dl1991 techthiyanes tenyearsadream snoopycn d3287t328 hongdols jieqian-chen qagrp ljsharp mudakikwa

whisper-auto-transcribe's Issues

Re Run again from mute music version

Whisper v3 released

Whisper v3 got released
openai/whisper@c5d4256
https://news.ycombinator.com/item?id=38166965

Any upgrades for this project? If I'm reading correctly, the main improvement is in the model file, so probably replacing C:\Users\%USER%\.cache\whisper\large-v2.pt with the new one should be enough?

Progress bar

Support for * and ? filename wildcards for cmd batch processing

Please add support for * and ? filename wildcards for cmd batch processing.

Project manager

ModuleNotFoundError: No module named 'whisper_timestamped'

Getting this error with 3.0 Alpha: ModuleNotFoundError: No module named 'whisper_timestamped'

Real time transcribe and translate

Real time whisper-class model: https://github.com/cyberofficial/Synthalingua
Concat to yt-dlp and Discord for real time transcribe and translate?

Able to slice begin (and end) in Video

Yeap, I need to modify gradio again.

3.x Subtitle review and edit by df

Edieble result, dataframe, jump to current time caption, jump back top

VAD prompt Ref. Whisper Webui

https://gitlab.com/aadnk/whisper-webui/-/issues/7

json output Ref.whisper-webui

Still getting many duplicated lines

Support M1/2 - With GPU / Neural Engine acceleration

I don't know if it's even possible, but it would be cool and efficient to have it run on M1/2 with GPU or Neural Engine

Different model support

This should be more easy to import model than modify every thing.

v3.1 batch conversion

Is it possible to convert multiple audio files at the same time?

Get rid of pysubs2

Subtitels repeating

A lot of srt I created from japanese have a lot of respating lines. Both with transscribe and translate.

4
0:02:10.580 --> 0:02:13.580
and very easy

5
0:02:13.580 --> 0:02:16.580
and very easy

6
0:02:16.580 --> 0:02:19.580
and very easy

7
0:02:19.580 --> 0:02:22.580
and very easy

8
0:02:22.580 --> 0:02:25.580
and very easy

9
0:02:25.580 --> 0:02:28.580
and very easy

Stable whisper throw error in extremly long audio

Track this issue now, but I don't think it will be resolved in the near future.
As an alternative solution, you can either change the model or slice your file.

It appears that one hour is a dividing line.

Edit:
I have found where the problem is.
Setting the language to 'auto' may cause the model to incorrectly identify the language, resulting in errors.
In my case, both two different language audio incorrectly identify as Welish:

English -> Welish
Japanese-> Welish

Setting the language explicitly should solve this problem.

Seperate VAD

whisper-webui (Apache License 2.0); https://huggingface.co/spaces/aadnk/whisper-webui/blob/main/LICENSE.md
whisper-timestamped (AGPLv3): https://github.com/linto-ai/whisper-timestamped

Read list

Multi-language Support

0.3.2b2 - Many GB temp files left undeleted

Still not cleaning up temp files after processing is done.

For a 6.8GB MKV I still get:

1 remaining 6GB MKV file in Temp\tempfreesubtitle\
2 remaining 2.25GB+2.25GB WAV files in Temp\htdemucs\

The program should do the clean-up after the task is completed. The 'Temp' folder is NOT a folder that Windows will delete or clean on its own.

Very High RAM usage

It used up 32GB of RAM up until the computer totally froze:

0.3.1 temporal files in tmp\ folder are not deleted after finishing each job

Temporal files pile up in the tmp\ folder and are not deleted after each job.

So if you run a batch of 100 conversions you end up with 100+ GB in temporal files filling up your disk...

what it is?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\whisper-auto-transcribe-main\new\whisper-auto-transcribe\venv\lib\site-packages\gradio\routes.py", line 399, in run_predict
output = await app.get_blocks().process_api(
File "D:\whisper-auto-transcribe-main\new\whisper-auto-transcribe\venv\lib\site-packages\gradio\blocks.py", line 1303, in process_api
result = await self.call_function(
File "D:\whisper-auto-transcribe-main\new\whisper-auto-transcribe\venv\lib\site-packages\gradio\blocks.py", line 1026, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\whisper-auto-transcribe-main\new\whisper-auto-transcribe\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\whisper-auto-transcribe-main\new\whisper-auto-transcribe\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\whisper-auto-transcribe-main\new\whisper-auto-transcribe\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\whisper-auto-transcribe-main\new\whisper-auto-transcribe\src\transcribe_gui.py", line 327, in handle_form_submit
subtitle_file_path = task.transcribe(
File "D:\whisper-auto-transcribe-main\new\whisper-auto-transcribe\src\utils\task.py", line 110, in transcribe
raise Exception(
Exception: Error. Vocal extracter unavailable. Received:
C:\Users\73B5~~1\AppData\Local\Temp\6a2c5585a48a66a68c6f1a9593cb8d763b9e0aad\Не хочу-0-100.mp3, C:\Users\73B5~~1\AppData\Local\Temp\tempfreesubtitle\main.mp3, tmp/2023-05-19 23-39-52
demucs --two-stems=vocals "C:\Users\73B5~1\AppData\Local\Temp\tempfreesubtitle\main.mp3" -o "tmp/2023-05-19 23-39-52" --filename "{stem}.{ext}"

(this error crashes immediately when you go to the result page)

v3.1 Compresses audio for audio preview

Inspired from https://github.com/victorGPT/Transcriptify

Forgot ffmpeg path in CLI

GPU not available

I have a RTX 3070 and followed all the installation steps for GPU support, but the GUI still doesn't allow me to select "GPU" as Device:

0.3.1 changing " " spaces for "-" dashes in names

Why does 3.1 version now changes " " spaces in names for "-" dashes?
That didn't happen before... it worked perfectly well using "C:\My Movies\Whatever with spaces\A Title With Many Spaces In The Name.mp4" without automatically converting it to "C:\My-Movies\Whatever-with-spaces\A-Title-With-Many-Spaces-In-The-Name.mp4"...

[FEATURE] Support CLI mode

Hi,

Thank you for this nice project i was thinking of doing the same, however my use case is rather different i have many video files that lack subtitles, so to speed this up i thought to automate the generation of the subs using this tool. however it seems to be GUI focused, if possible, could you make a CLI mode to be able to containerize this tool and run in the cloud if needed as well.

For example

$ ./python wat.py --input file.mkv --lang jpn --audio-track a:1 --output sub.srt

it would be extremely helpful to have CLI mode.

Thank you.

EDIT: i managed to created something that works but keep in mind i am no python dev at all

#!/usr/bin/env python3
import os
import re
import whisper
import datetime
import torch
from time import gmtime, strftime
import argparse

precision2model = ["tiny", "base", "small", "medium", "large"]

parser = argparse.ArgumentParser(description='Whisper Auto Transcribe')

parser.add_argument('input', metavar='input', type=str,
                    help='Input video file')

parser.add_argument('--output', metavar='output', type=str,
                    help='SRT file output.',
                    required=True)

parser.add_argument('--language', metavar='language', type=str,
                    help='Input lanuage code [ISO 639-1]. Default [auto].',
                    required=False, default='auto')

parser.add_argument('--task', metavar='task', type=str,
                    help='Task mode [translate, transcribe] Default [translate].',
                    required=False, default='translate')

parser.add_argument('--device', metavar='device', type=str,
                    help='Use device. [cpu, cuda] Default [cpu].',
                    required=False, default='cpu')

parser.add_argument('--model', metavar='model', type=str,
                    help='Use model. [tiny, base, small, medium, large] Default [base].',
                    required=False, default='base')


def transcribe_start(model_type, file_path, language, output_path, task="transcribe", device=None):
    model = whisper.load_model(model_type, device=device)

    print(("Task: {task} \nModel: {model_type}\nInput: {file_path} \nDevice: {device} \nLanguage: {language} \nOutput: {output_path}")
          .format(model_type=model_type, file_path=file_path, output_path=output_path, language=language, task=task, device=model.device))

    if language == 'auto':
        language = None

    result = model.transcribe(
        file_path, language=language, task=task, verbose=False)

    with open(output_path, "w", encoding="UTF-8") as f:
        for seg in result["segments"]:
            id = seg["id"]
            start = (
                str(datetime.timedelta(seconds=round(seg["start"])))
                + ","
                + str(seg["start"] % 1)[2:5]
            )
            end = (
                str(datetime.timedelta(seconds=round(seg["end"])))
                + ","
                + str(seg["end"] % 1)[2:5]
            )
            text = seg["text"]
            f.write(f"{id}\n{start} --> {end}\n{text}\n\n")

    del model.encoder
    del model.decoder

    torch.cuda.empty_cache()

    return output_path, result


if __name__ == "__main__":
    args = parser.parse_args()
    res = transcribe_start(model_type=args.model, file_path=args.input,
                           language=args.language, task=args.task, output_path=args.output, device=args.device)
    print(("{task} file is found at [{file}].\n").format(
        file=res[0], task=args.task))

v3.1 Compress 30-minute audio file to 96 kbps MP3 format using Hugging Face and OpenAI-Whisper API for demo

What'are the differences between transcribe models?

In cli.py, three model types are listed whisper, whisper_timestams, stable_whisper. Could you elaborate the difference between them? @tomchang25

Thanks for making this tool available. I used it to transcribe my entire audiobook library and it worked great!

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory

Had to reinstall Windows 11. Reinstalled whisper following all steps, but it doesn't work now, always shows the same errors:

32GB RAM and a RTX 4080 16GB VRAM.

sound waveform and something fantastic

https://www.assemblyai.com/blog/getting-started-with-huggingfaces-gradio/

Different results CUDA - CPU

Results are very different using CUDA or CPU...

This is just an example:
tyod-216-2 Subtitles - CPU & CUDA.zip

Of course CUDA is like 10x faster or more. But it seems sometimes it just misses a lot. Sometimes there are parts on the video with people clearly talking and there isn't a single subtitle line for a long time... is this due to the duration of the files?

webui.bat/install fails with Python 3.11

Problem:

There are no available .whl packages for torch 1.12.1+cu113 for Python 3.11, as the latest ones are for 3.10 a the latest.
This causes errors during install if you have Python 3.11 as the default Python interpreter when webui.bat creates the venv for this project.

Fix:

Make sure your Python version is (as of writing) 3.7, 3.8, 3.9, or 3.10, and this is what the venv is refering to.

How I fixed my install:

In the whisper-auto-transcribe folder:
I deleted the venv created by webui.bat to remove the reference to the 3.11 install of Python.
(Here you would find and install python 3.10 and use its install location instead of this one, I already had my old install in my Program Files tho'.)
Recreate the venv:
PS F:\whisper-auto-transcribe> & "C:\Program Files\Python310\python.exe" -m venv venv
Now the venv is using Python 3.10, and the webui.bat runs as intended.

This is especially tricky for new users, as no official installers/binaries exist for older Python versions (including 3.10), where the current README points users to.
There are some good samaritans building them (Google can help you finding them), but I'm not sure if it is good to just link to one here, so no PR from me.
Then there is the build-your-own-Python-way, but if you can manage that, the missing .whls for torch wouldn't really be a problem for you.

Subtitle Naming Scheme

Possible to have an option to produce a subtitle file with the same name as the input file (in my case video)?

Thank you for the guide.

webui doesn't work

When using transcribe or translation browser reports error and this is output from console:

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Traceback (most recent call last):
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\src\utils\task.py", line 108, in transcribe
subprocess.run(cmd, check=True)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\venv\lib\site-packages\gradio\routes.py", line 399, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\venv\lib\site-packages\gradio\blocks.py", line 1303, in process_api
result = await self.call_function(
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\venv\lib\site-packages\gradio\blocks.py", line 1026, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\src\transcribe_gui.py", line 327, in handle_form_submit
subtitle_file_path = task.transcribe(
File "C:\Users\user\Downloads\autotranscribe\whisper-auto-transcribe\src\utils\task.py", line 110, in transcribe
raise Exception(
Exception: Error. Vocal extracter unavailable. Received:
C:\Users\user\AppData\Local\Temp\bebf2e05e25135efa50c8a746b06c2875e007655\IMG_5503.MP4, C:\Users\user\AppData\Local\Temp\tempfreesubtitle\main.MP4, C:\Users\user\AppData\Local\Temp
demucs --two-stems=vocals "C:\Users\user\AppData\Local\Temp\tempfreesubtitle\main.MP4" -o "C:\Users\user\AppData\Local\Temp" --filename "{stem}.{ext}"

Error when running

Hello. Getting this error after installing.

Log:

venv "F:\whisper-auto-transcribe\venv\Scripts\Python.exe"
Python 3.9.4 (tags/v3.9.4:1f2e308, Apr 6 2021, 13:40:21) [MSC v.1928 64 bit (AMD64)]
Commit hash: 0f023b4
Check torch and torchvision
Check gradio
Installing requirements for Web UI
Launching Web UI with arguments:
Traceback (most recent call last):
File "F:\whisper-auto-transcribe\launch.py", line 244, in
start_webui()
File "F:\whisper-auto-transcribe\launch.py", line 238, in start_webui
from gui import gui
File "F:\whisper-auto-transcribe\gui.py", line 1, in
from src import transcribe_gui
File "F:\whisper-auto-transcribe\src\transcribe_gui.py", line 7, in
from src.utils import task
File "F:\whisper-auto-transcribe\src\utils\task.py", line 8, in
import stable_whisper
File "F:\whisper-auto-transcribe\venv\lib\site-packages\stable_whisper_init_.py", line 1, in
from .whisper_word_level import *
File "F:\whisper-auto-transcribe\venv\lib\site-packages\stable_whisper\whisper_word_level.py", line 8, in
import whisper
File "F:\whisper-auto-transcribe\venv\lib\site-packages\whisper_init_.py", line 13, in
from .model import ModelDimensions, Whisper
File "F:\whisper-auto-transcribe\venv\lib\site-packages\whisper\model.py", line 13, in
from .transcribe import transcribe as transcribe_function
File "F:\whisper-auto-transcribe\venv\lib\site-packages\whisper\transcribe.py", line 20, in
from .timing import add_word_timestamps
File "F:\whisper-auto-transcribe\venv\lib\site-packages\whisper\timing.py", line 7, in
import numba
File "F:\whisper-auto-transcribe\venv\lib\site-packages\numba_init_.py", line 42, in
from numba.np.ufunc import (vectorize, guvectorize, threading_layer,
File "F:\whisper-auto-transcribe\venv\lib\site-packages\numba\np\ufunc_init_.py", line 3, in
from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize
File "F:\whisper-auto-transcribe\venv\lib\site-packages\numba\np\ufunc\decorators.py", line 3, in
from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception

Error when running on Google Colab

When using on Google Colab (T4 GPU) through Command-line interface, I get error related to Vocal extracter. Specifically

Running Command:
!python /content/whisper-auto-transcribe/cli.py '/content/Files/Youtube.mp4' --output '/content/tmp/Youtube.srt' -lang ja --model large

Error Message:

FileNotFoundError: [Errno 2] No such file or directory: 'demucs --two-stems=vocals "/tmp/tempfreesubtitle/main.mp4" -o "/tmp" --filename "{stem}.{ext}"'

Exception: Error. Vocal extracter unavailable. Received: 
/content/Files/Youtube.mp4, /tmp/tempfreesubtitle/main.mp4, /tmp
demucs --two-stems=vocals "/tmp/tempfreesubtitle/main.mp4" -o "/tmp" --filename "{stem}.{ext}"

Output:

Traceback (most recent call last):
  File "/content/whisper-auto-transcribe/src/utils/task.py", line 108, in transcribe
    subprocess.run(cmd, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'demucs --two-stems=vocals "/tmp/tempfreesubtitle/main.mp4" -o "/tmp" --filename "{stem}.{ext}"'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/whisper-auto-transcribe/cli.py", line 139, in <module>
    cli()
  File "/content/whisper-auto-transcribe/cli.py", line 121, in cli
    subtitle_path = transcribe(
  File "/content/whisper-auto-transcribe/src/utils/task.py", line 110, in transcribe
    raise Exception(
Exception: Error. Vocal extracter unavailable. Received: 
/content/Files/Youtube.mp4, /tmp/tempfreesubtitle/main.mp4, /tmp
demucs --two-stems=vocals "/tmp/tempfreesubtitle/main.mp4" -o "/tmp" --filename "{stem}.{ext}"

Solution that i have tried:
I set the vocal_extractor to False on Line 27 in src/util/task.py, it working properly with no issues. Source: (#47 (comment)).
But i would like to have a better solution for running this without disable Vocal extracter

Can't get webui.bat running successfully

venv "D:\whisper-auto-transcribe\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 06bd1ea
Check torch and torchvision
Fetching updates for Custom Gradio...
Checking out commint for Custom Gradio with hash: 8ac54ca7902a04ede2cb93a11ff42c6cb011b296...
Traceback (most recent call last):
File "D:\whisper-auto-transcribe\launch.py", line 198, in
git_clone(
File "D:\whisper-auto-transcribe\launch.py", line 169, in git_clone
run(
File "D:\whisper-auto-transcribe\launch.py", line 63, in run
raise RuntimeError(message)
RuntimeError: Couldn't checkout commit 8ac54ca7902a04ede2cb93a11ff42c6cb011b296 for Custom Gradio.
Command: git -C repositories\gradio checkout 8ac54ca7902a04ede2cb93a11ff42c6cb011b296
Error code: 128
stdout:
stderr: fatal: reference is not a tree: 8ac54ca7902a04ede2cb93a11ff42c6cb011b296

Error during processing (ValueError: Expected parameter logits)

Hi, I was running a translation and during it i had a crash.
The command I ran was
python C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\cli.py "D:\j\X.mp4" --output "D:\j\X.srt" -lang ja --task translate --model-size large --device cuda
Only change in repo I've made is I've set vocal_extracter=False in task.py because it didn't start otherwise.
Stacktrace:
43%|██████████████████████████████▋ | 2698.92/6231.83 [04:35<06:00, 9.79sec/s] Traceback (most recent call last): File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\cli.py", line 139, in <module> cli() File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\cli.py", line 121, in cli subtitle_path = transcribe( File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\src\utils\task.py", line 156, in transcribe result = used_model.transcribe( File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\stable_whisper\whisper_word_level.py", line 453, in transcribe_stable result: DecodingResult = decode_with_fallback(mel_segment, ts_token_mask=ts_token_mask) File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\stable_whisper\whisper_word_level.py", line 337, in decode_with_fallback decode_result, audio_features = model.decode(seg, File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\stable_whisper\decode.py", line 112, in decode_stable result = task.run(mel) File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\whisper\decoding.py", line 729, in run tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens) File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\stable_whisper\decode.py", line 61, in _main_loop tokens, completed = self.decoder.update(tokens, logits, sum_logprobs) File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\whisper\decoding.py", line 276, in update next_tokens = Categorical(logits=logits / self.temperature).sample() File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\torch\distributions\categorical.py", line 64, in __init__ super(Categorical, self).__init__(batch_shape, validate_args=validate_args) File "C:\Users\san-a\Downloads\tools\whisper-auto-transcribe-0.3.2b2\venv\lib\site-packages\torch\distributions\distribution.py", line 55, in __init__ raise ValueError( ValueError: Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

Feature: Linux support

Is it possible for a linux support in the script?

speaker diarization

Nan errors on a lot of Japanese video

Hi, first of all nice project! I can transcribe quite a few videos but more often than not they fail with the following exception (I'm using the cli so input is consistent). I couldn't figure out why though. Any suggestions?

An error occurred during transcription: Expected parameter logits (Tensor of shape (1, 51865)) of distribution Categorical(logits: torch.Size([1, 51865])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')

PyQt6 - WIP

unexpected keyword argument 'caption'

Getting error:
cuda:0
Detected language: English
100%|███████████████████████████████████████████████████████████████████| 69968/69968 [02:45<00:00, 422.93frames/s]
tmp/Updated ｜ Near-Automated Voice Cloning ｜ Whisper STT + Coqui TTS ｜ Fine Tune a VITS Model on Colab.srt tmp/Updated ｜ Near-Automated Voice Cloning ｜ Whisper STT + Coqui TTS ｜ Fine Tune a VITS Model on Colab.vtt tmp/Updated ｜ Near-Automated Voice Cloning ｜ Whisper STT + Coqui TTS ｜ Fine Tune a VITS Model on Colab.ass
Traceback (most recent call last):
File "C:\Users\CHP_7575\Documents\whisper-auto-transcribe\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\CHP_7575\Documents\whisper-auto-transcribe\venv\lib\site-packages\gradio\blocks.py", line 1018, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "C:\Users\CHP_7575\Documents\whisper-auto-transcribe\venv\lib\site-packages\gradio\blocks.py", line 947, in postprocess_data
prediction_value = postprocess_update_dict(
File "C:\Users\CHP_7575\Documents\whisper-auto-transcribe\venv\lib\site-packages\gradio\blocks.py", line 371, in postprocess_update_dict
update_dict = block.get_specific_update(update_dict)
File "C:\Users\CHP_7575\Documents\whisper-auto-transcribe\venv\lib\site-packages\gradio\blocks.py", line 257, in get_specific_update
specific_update = cls.update(**generic_update)
TypeError: Video.update() got an unexpected keyword argument 'caption'

Tried running inside venv and without. Tried different gradio versions as well. Does not work for me on .mp3's or using YT vid.

tomchang25 / whisper-auto-transcribe Goto Github PK

whisper-auto-transcribe's Introduction

whisper-auto-transcribe

Easily generate free subtitles for your video

About The Project

Features:

Unique feature:

Future feature:

How to use

Installation

(Optional) Command-line interface

(Optional) GPU acceleration (CUDA.11.3)

Demo

Heavy Metal Watch on Youtube

English Watch on Youtube

Limitation

Contact

License

Acknowledgments

whisper-auto-transcribe's People

Contributors

Stargazers

Watchers

Forkers

whisper-auto-transcribe's Issues

Problem:

Fix:

How I fixed my install:

When using on Google Colab (T4 GPU) through Command-line interface, I get error related to Vocal extracter. Specifically

Recommend Projects

Recommend Topics

Recommend Org