Giter Site home page Giter Site logo

No outputs about whisper-ctranslate2 HOT 46 CLOSED

softcatala avatar softcatala commented on May 14, 2024
No outputs

from whisper-ctranslate2.

Comments (46)

dgoryeo avatar dgoryeo commented on May 14, 2024 4

Hi @Zacharie-Jacob , I tried additional test on a 4min clip with same command line:

whisper-ctranslate2 $_ --model medium --language 'Japanese' --vad_filter True --device cuda --compute_type int8 --output_format srt --output_dir $directory --task translate --word_timestamps True --verbose True > $directory\$filename.md

Here are the results:

  • 4min video mp4: No output.
  • 4min audio wav: No output.
  • 4min audio wav 16khz mono: Works! The srt was generated.

I repeated the same test with --device cpu , and it worked well on all 3 above tests.

from whisper-ctranslate2.

tariq0101 avatar tariq0101 commented on May 14, 2024 2

I think it's exactly the same problem.
the software finishes transcribing on GPU but no output files are created, you can still copy the results from the terminal.
the software finishes transcribing in CPU and creates output files.
this is probably a bug in faster-whisper if you can't find any problems in your code.

from whisper-ctranslate2.

dgoryeo avatar dgoryeo commented on May 14, 2024 2

I can confirm that I have the same problem. No output file is created. My command line is (in powershell):

whisper-ctranslate2 $_ --model medium --language 'Japanese' --vad_filter True --device cuda --compute_type int8 --output_format srt --output_dir $directory --task translate --word_timestamps True --verbose True > $directory\$filename.md

from whisper-ctranslate2.

runw99 avatar runw99 commented on May 14, 2024 2

In my environment, I can almost stably trigger the bug. It prints completely in command line, but nothing outputs in current directory and there is a windows error python has stopped working. The problem is probably about dictionary referencing and memory reclamation issue. My temporary solution is to transfer writer of whisper_ctranslate2.py to transcribe.py. Although it damages the code structure, it is currently important for me that it works

# \Anaconda\envs\Lib\site-packages\src\whisper_ctranslate2\whisper_ctranslate2.py
def main():
    ...
    for audio_path in audio:
        result = Transcribe().inference(
            ...
            output_format, 
            output_dir,
            audio_path,
        )
        # writer = get_writer(output_format, output_dir)
        # writer(result, audio_path)
# \Anaconda\envs\Lib\site-packages\src\whisper_ctranslate2\transcribe.py
class Transcribe:
    ...
    def inference(
        ...
        output_format, 
        output_dir,
        audio_path,
    ):
        ...
        
        result = dict(
            text=all_text,
            segments=list_segments,
            language=language_name,
        )

        from .writers import get_writer
        writer = get_writer(output_format, output_dir)
        writer(result, audio_path)

        # return result

The detailed process of my debugging

environment

OS: Windows 10
python: 3.9.16150.1013
GPU: GTX1660ti (mobile)
IDE: VS code

package:
numpy==1.23.3
faster-whisper==0.4.1
ctranslate2==3.11.0
tqdm==4.65.0
sounddevice==0.4.6

trigger the bug

  1. Audio file: 5m.mp3. about 100 segments.

  2. Model: guillaumekln/faster-whisper-tiny or guillaumekln/faster-whisper-large-v2

  3. cmd or powershell whisper-ctranslate2 ".\5m.mp3" --language Japanese --model_directory "..\model\faster-whisper-tiny"

  4. It will print the results on the screen correctly. After that, the python has stopped working and no output files.

Set the breakpoint

# whisper_ctranslate2\whisper_ctranslate2.py
for audio_path in audio:
    result = Transcribe().inference(...) 
    print(result) # some operation. Setting breakpoint here and moving the mouse on result will trigger `python has stopped working`

error analysis (unconfirmed)

  1. small audio file works well but failed in large file.

  2. openai/whisper works well for me. The difference with openai/whisper is that in whisper_ctranslate2, def transcribe(...) has been changed to:

class Transcribe:
	...
	def inference(...):
        	list_segments = []
        	last_pos = 0
        	accumated_inc = 0
        	all_text = ""
		...
		return dict(
            		text=all_text,
            		segments=list_segments,
            		language=language_name,
        	)

I guess it is suspected that list_segments is a local variable of Transcribe.inference, and after calling result = Transcribe().inference(...), the memory recycling mechanism causes the memory pointed to by result["segments"] to be recycled.

list_segments = [
    { },
    ...
]

Some failed attempts

ucrtbase.dll

In Windows Event Viewer, we can see that the crash seems to be related to ucrtbase.dll. However, I have tried search it online but no result related and I have tried updated it but it also doesn't work.

Writers

  1. Replace the main content of whisper_ctranslate2/writer.py'with openai/whisper/utils. py and make modifications, useless

  2. Place the content of writer. py directly in whisper_ctranslate2/whisper_ctranslate2.py is also useless.

from whisper-ctranslate2.

emcodem avatar emcodem commented on May 14, 2024 2

no luck getting any kind of output, using a 16khz wav that i use for testing Const-me whisper and whisper cpp, expected is a 10 minute translation.

C:\Users\emcod>whisper-ctranslate2 c:\temp\test.wav --model medium
There are old cache files at `C:\Users\emcod\.cache\whisper-ctranslate2` which are no longer used. Consider deleting them
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Downloading (…)56e98277/config.json: 100%|█████████████████████████████████████████| 2.26k/2.26k [00:00<00:00, 752kB/s]
C:\python3100\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\emcod\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Downloading (…)98277/vocabulary.txt: 100%|██████████████████████████████████████████| 460k/460k [00:00<00:00, 2.17MB/s]
Downloading (…)98277/tokenizer.json: 100%|████████████████████████████████████████| 2.20M/2.20M [00:01<00:00, 2.18MB/s]
Downloading model.bin: 100%|██████████████████████████████████████████████████████| 1.53G/1.53G [03:02<00:00, 8.39MB/s]

C:\Users\emcod>

Or some try with default params

C:\Users\emcod>whisper-ctranslate2 c:\temp\test.wav --language de
There are old cache files at `C:\Users\emcod\.cache\whisper-ctranslate2` which are no longer used. Consider deleting them
Detected language 'German' with probability 1.000000

Then me follows the instructions and delete "old cache files" C:\Users\emcod\.cache\whisper-ctranslate2 (i delete the whole .cache folder):


C:\Users\emcod>whisper-ctranslate2 c:\temp\test.wav --language de
Downloading (…)e94b4c8a/config.json: 100%|████████████████████████████████████████| 2.37k/2.37k [00:00<00:00, 1.19MB/s]
C:\python3100\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\emcod\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Downloading (…)b4c8a/vocabulary.txt: 100%|██████████████████████████████████████████| 460k/460k [00:00<00:00, 1.44MB/s]
Downloading (…)b4c8a/tokenizer.json: 100%|█████████████████████████████████████████| 2.20M/2.20M [00:07<00:00, 308kB/s]
Downloading model.bin: 100%|████████████████████████████████████████████████████████| 484M/484M [00:57<00:00, 8.35MB/s]
Detected language 'German' with probability 1.000000███████████████████████████████| 2.20M/2.20M [00:07<00:00, 309kB/s]

C:\Users\emcod>

Try to enable debug logging:

C:\Users\emcod>whisper-ctranslate2 --verbose true c:\temp\test.wav
whisper-ctranslate2: error: argument --verbose: invalid str2bool value: 'true'

same with

C:\Users\emcod>whisper-ctranslate2 --verbose 1 c:\temp\test.wav
whisper-ctranslate2: error: argument --verbose: invalid str2bool value: '1'

Now, read some python docs and see that "true" often is written as "True":

C:\Users\emcod>whisper-ctranslate2 --verbose True c:\temp\test.wav
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>

ok, try some other stuff:

C:\Users\emcod>whisper-ctranslate2 --verbose True c:\temp\test.wav --compute_type int8
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024 2

Version 0.2.6 should fix this.

from whisper-ctranslate2.

avc1657 avatar avc1657 commented on May 14, 2024 1
  1. Please make sure that you use version 0.16. If you are not, please update to this version.

Im on 0.17, checked by running " whisper-ctranslate2 --version "

  1. While the tool is running, can you see anything on the terminal? You should see the transcription that is doing.

Yes, the transcriptions appears on the terminal

  1. Can you try just "whisper-ctranslate2 [the video file] --model large-v2". Does it work?

Ok, ive just tried this and noticed something. Btw i decided to run on a 2 minutes flac audio to speed up things. I ran the program using "whisper-ctranslate2 [the audio file] --model tiny": didnt work. Then i ran with large-v2 and to my surprise, it worked. Then i tried again with large-v2 and it worked again. Then i came back to tiny and it stopped working. Then i tried with base: doesnt work. THen i finally tried with large-v2 again and it worked. But previously even the large-v2 was not working.

from whisper-ctranslate2.

tariq0101 avatar tariq0101 commented on May 14, 2024 1

Hi, I found that this only happens on GPU, it produces output when I add "--device CPU"
I'm sorry I can't provide anything because it's only happening on my personal videos.
videos with a clear professional audio isn't looping and producing output, it's probable an issue with whisper itself and not your software.
is there a way to produce log files?

from whisper-ctranslate2.

rsmith02ct avatar rsmith02ct commented on May 14, 2024 1

Hi Jordimas, I am having similar issues.

The first is that nothing gets output unless output type and location are set (though perhaps that is by design?)

The second is that unless I add "--device CPU" no data is returned- I just go back to the command prompt. This is true for short clear wav, longer mp4, English and Japanese.

I have a RTX 2080 Super with the current studio driver (531.61). I am able to use basic Whisper installations with CUDA as well as Const-me, etc. Is there something I need to set up here or in NVIDIA control panel?

For test video we can use the same one I shared before.

whisper-ctranslate2.exe --language ja --model "large-v2" --device CPU --output_dir "C:\Users\rsmit\Dropbox\Videos" --output_format "srt" "C:\Users\rsmit\Dropbox\Videos\10 MPantry final new titles 2.mp4"
This works- actually very well in terms of quality! No issues at all.

Change to CUDA and it fails.

whisper-ctranslate2.exe --language ja --model "large-v2" --device CUDA --output_dir "C:\Users\rsmit\Dropbox\Videos" --output_format "srt" "C:\Users\rsmit\Dropbox\Videos\10 MPantry final new titles 2.mp4"

Screenshot 2023-04-16 13 30 15

Base model, etc. also fail.

NVIDIA Control Panel reports I have NVIDIA CUDA 12.1.107 driver. It has a compute capability of 7.5.
I also installed the standalone cuda_12.1.0_531.14_windows.exe

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024 1

The first is that nothing gets output unless output type and location are set (though perhaps that is by design?)

No, this is not by design. By design it outputs all formats and writes in the current directory that you are.

@rsmith02ct Is possible please to create a separate ticker for this issue? It's different to the other one. Thanks

from whisper-ctranslate2.

Zacharie-Jacob avatar Zacharie-Jacob commented on May 14, 2024 1

I had this same problem. I was unable to pinpoint it to specifically whisper-ctranslate2, but the problem is exactly the same as yours. It displays the translation. There are no errors. No output files are written.

It does write out if I choose a very small file (like a minute or two long), but longer files just mysteriously do not have any outputs.

I do not know enough about the code itself to know if it makes sense that longer files would not produce outputs but shorter files will.

from whisper-ctranslate2.

tariq0101 avatar tariq0101 commented on May 14, 2024 1

I can confirm that 16khz mono conversion works, but a lot of the information are lost and the output is very different than CPU on original file.

from whisper-ctranslate2.

Qel0droma avatar Qel0droma commented on May 14, 2024 1

have the same problem. Can see all the text in the powershell of it transcribing and translating, then when its done. nothing. no srt files are generated. whisper-ctranslate2 "file name here.mp4" --device cuda --device_index 0 --vad_filter true --vad_min_speech_duration_ms 50 --vad_min_silence_duration_ms 2000 --vad_max_speech_duration_s 10 --condition_on_previous_text False --language Japanese --task translate --output_format srt --model large-v2

from whisper-ctranslate2.

Qel0droma avatar Qel0droma commented on May 14, 2024 1

@Qel0droma Are you using a GPU?

yes

from whisper-ctranslate2.

guillaumekln avatar guillaumekln commented on May 14, 2024 1

Hi,

I think it's the same issue as SYSTRAN/faster-whisper#71 which I can now reproduce on Windows.

When the output files are missing, you can verify that the process crashed with a non-zero exit code:

PS > $LASTEXITCODE
-1073740791

The process crashes when the model is unloaded but only when the transcription triggered the temperature fallback. If you disable the temperature fallback it should work without issue. Try adding this option on the command line:

--temperature_increment_on_fallback None

The crash seems to happen only on Windows.

@jordimas In the meantime, you could slightly change the code to ensure the WhisperModel instance is still alive when writing the results on disk.

from whisper-ctranslate2.

coder543 avatar coder543 commented on May 14, 2024 1

Even using --temperature_increment_on_fallback None, I am getting zero output (even on the console) if I use the GPU on Windows. I am using a 3090, and I did install the various dependencies as far as I can tell. It would be nice if we got an error message of some kind.

from whisper-ctranslate2.

guillaumekln avatar guillaumekln commented on May 14, 2024 1

You could load the model once and then use the same model instance to transcribe each file. This should work around the issue and also be more efficient than reloading the model each time.

from whisper-ctranslate2.

umiyuki avatar umiyuki commented on May 14, 2024 1

I followed guillaumekln's tip and modified the code:
move the WhisperModel generation to the main function of whisper_ctranslate2.py instead of the inference function. The model should be passed to the inference function. You also need to add
from faster_whisper import WhisperModel
to whisper_ctranslate2.py.
image

from whisper-ctranslate2.

guillaumekln avatar guillaumekln commented on May 14, 2024 1

Hi, this change does not fix the issue according to user reports in SYSTRAN/faster-whisper#71. I have a hard time debugging this issue as I don't typically develop on Windows.

For now I suggest that you update the code to keep the model alive until all transcriptions are complete.

from whisper-ctranslate2.

worldjoe avatar worldjoe commented on May 14, 2024 1

Loaded 0.2.7 and sure enough this fixed the problem for me. I had been forced to use --device cpu for a while now, which is significantly slower than cuda with my 3080. Thank you.

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024

Thanks for reporting this

If you can do the following:

  1. Please make sure that you use version 0.16. If you are not, please update to this version.

  2. While the tool is running, can you see anything on the terminal? You should see the transcription that is doing.

  3. Can you try just "whisper-ctranslate2 [the video file] --model large-v2". Does it work?

Thanks

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024

Hello. I'm unable to reproduce this problem in my Windows machine.

My only comment if you have tried doing inference in CPU vs GPU and if this makes any difference.

Thanks

from whisper-ctranslate2.

tariq0101 avatar tariq0101 commented on May 14, 2024

Hi, I Have the same problem, transcription appears on the screen until the end of the duration but no files are produced.
the model is only using 50% of vram so it's definitely not crashing.
I'm also on windows, python3.9
this only happens on some files, smaller files or "clearer" files work fine, I think it's looping in the end or something like that.
"--vad_filter True" doesn't seem to do anything.

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024

Do you have any file that you can share then I can try to reproduce it? Thanks

from whisper-ctranslate2.

Zacharie-Jacob avatar Zacharie-Jacob commented on May 14, 2024

I can confirm that I have the same problem. No output file is created. My command line is (in powershell):

whisper-ctranslate2 $_ --model medium --language 'Japanese' --vad_filter True --device cuda --compute_type int8 --output_format srt --output_dir $directory --task translate --word_timestamps True --verbose True > $directory\$filename.md

Could you try running this on a clip that is only one or two minutes, and see if it works? That seems like it works for me, which may help narrow down a cause if that is a reproducible pattern.

from whisper-ctranslate2.

rsmith02ct avatar rsmith02ct commented on May 14, 2024

Hmm, here I don't see any text in the cmd terminal window when --cuda is enabled (and there's no text output). When set to CPU it works fine on every file I've given it in English and Japanese. I'm using an NVIDIA RTX 2080 Super with the current studio driver and CUDA SDK also installed (Windows 11).

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024

Thanks for investing time on this @runw99

Regarding memory, Python uses reference counting then it should delete the variable when it does out of scope.

Here you have an article that explains how memory works in Python:

https://rushter.com/blog/python-garbage-collector/

Actually you have check the reference that it has by doing:

import sys
print(sys.getrefcount(foo))

I have no idea why this happens, but I do not believe that is due to the variable going out of scope (it's recycled)

from whisper-ctranslate2.

runw99 avatar runw99 commented on May 14, 2024

Thanks for investing time on this @runw99

Regarding memory, Python uses reference counting then it should delete the variable when it does out of scope.

Here you have an article that explains how memory works in Python:

https://rushter.com/blog/python-garbage-collector/

Actually you have check the reference that it has by doing:

import sys print(sys.getrefcount(foo))

I have no idea why this happens, but I do not believe that is due to the variable going out of scope (it's recycled)

Thansk for your reply. The article you mentioned helps me review the Garbage Collection in Python and learn something new.
And I went back and tried some copy.deepcopy(list_segments) operations, but still couldn't solve this bug. So, perhaps the Garbage Collection is really not the reason for this.

I have never encountered such a bug before, and I am curious about its causes and solutions. Looking forward to the follow-up

Thank you again for the patient answer and this project really saves me a lot of effort to run a big model.

from whisper-ctranslate2.

nikes avatar nikes commented on May 14, 2024

I ran 355 files, ranging in length from 10 to 120 minutes.
In the output I got 150(*5) files with text.
So I confirm that there is definitely a problem.
The original whisper project works correctly, so it's strange...

from whisper-ctranslate2.

Purfview avatar Purfview commented on May 14, 2024

@rsmith02ct reported that my standalone compile doesn't have this bug. [it doesn't use cli from this repo]

I can confirm that 16khz mono conversion works, but a lot of the information are lost and the output is very different than CPU on original file.

Faster-whisper converts to same audio format using PyAV library, OpenAI is using ffmpeg.
Strangely, transcription quality and timestamps accuracy ~significantly suffers on audios converted by ffmpeg.exe, no idea why this happens, I'm too lazy to investigate this...

from whisper-ctranslate2.

dgoryeo avatar dgoryeo commented on May 14, 2024

I second @rsmith02ct , I too have noticed that when I convert audio by Audacity, the results are better than ffmpeg.

from whisper-ctranslate2.

zx3777 avatar zx3777 commented on May 14, 2024

same problem , 1.0 could outputs , but will frequently missing large dialogues

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024

@Qel0droma Are you using a GPU?

from whisper-ctranslate2.

emcodem avatar emcodem commented on May 14, 2024

Win 11:

C:\Users\emcod>whisper-ctranslate2 --verbose True --temperature_increment_on_fallback None c:\temp\1234.wav
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>echo %errorlevel%
-1073740791

C:\Users\emcod>whisper-ctranslate2 --verbose True c:\temp\test.wav --compute_type int8
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>whisper-ctranslate2 --verbose True --temperature_increment_on_fallback None c:\temp\1234.wav
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>echo %errorlevel%
-1073740791

C:\Users\emcod>whisper-ctranslate2 --temperature_increment_on_fallback None c:\temp\1234.wav
Detecting language using up to the first 30 seconds. Use `--language` to specify the language

C:\Users\emcod>whisper-ctranslate2 --temperature_increment_on_fallback None  --language de c:\temp\1234.wav
Detected language 'German' with probability 1.000000

C:\Users\emcod>whisper-ctranslate2 --temperature_increment_on_fallback None  --language de c:\temp\test.wav
Detected language 'German' with probability 1.000000

C:\Users\emcod>echo %errorlevel%
-1073740791

Going to try on other OS tomorrow.

from whisper-ctranslate2.

Zacharie-Jacob avatar Zacharie-Jacob commented on May 14, 2024

Hi,

I think it's the same issue as guillaumekln/faster-whisper#71 which I can now reproduce on Windows.

When the output files are missing, you can verify that the process crashed with a non-zero exit code:

PS > $LASTEXITCODE
-1073740791

The process crashes when the model is unloaded but only when the transcription triggered the temperature fallback. If you disable the temperature fallback it should work without issue. Try adding this option on the command line:

--temperature_increment_on_fallback None

The crash seems to happen only on Windows.

@jordimas In the meantime, you could slightly change the code to ensure the WhisperModel instance is still alive when writing the results on disk.

Thank you, this fixes my problem. Yes, I am on Windows.

Unfortunately, that setting was particularly useful, as it prevents the translation from falling into ruts. I will have to make do with a combination of other settings for now.

from whisper-ctranslate2.

Zacharie-Jacob avatar Zacharie-Jacob commented on May 14, 2024

Even using --temperature_increment_on_fallback None, I am getting zero output (even on the console) if I use the GPU on Windows. I am using a 3090, and I did install the various dependencies as far as I can tell. It would be nice if we got an error message of some kind.

It looks like it is linked to general use of Temperature, perhaps? I was under the impression that you can have no temperature increment while still using temperature and best_of, but it looks like I get intermittent missing outputs if I am using any temperature settings at all other than just setting the fallback to None.

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024

@jordimas In the meantime, you could slightly change the code to ensure the WhisperModel instance is still alive when writing the results on disk.

Thanks a lot for looking into this issue. I was trying to get more evidence before reporting it to CTranslate issue, but it's great that you are looking a this.

Based on the feedback on this thread and the fact that I do not even have a Windows box with CUDA to test it, I do not know if it's worth to do a fix in whisper-ctranslate2 or just wait for the issue to be fixed in ctranslate2. I

from whisper-ctranslate2.

jpenney avatar jpenney commented on May 14, 2024

Just to see I made a local change to ensure the model was unloaded after outputs were written out. This sort of works, in that if it was going to crash, the files are written out before it crashes, but if you passed multiple files in to be processed it still crashes when the model is unloaded, so:

PS X:\to-process> whisper-ctranslate2 --model large-v2 --task translate --vad_filter True --language ja --output_format all --patience 2.0 -o translate-out file1.wav file2.wav file3.wav

Assuming the crash currently occurs with file2.wav, before the change it only output the files for file1.wav, now it outputs file2.wav then crashes, so file3.wav still isn't processed.

diff --git a/src/whisper_ctranslate2/transcribe.py b/src/whisper_ctranslate2/transcribe.py
index ca53fac..c422037 100644
--- a/src/whisper_ctranslate2/transcribe.py
+++ b/src/whisper_ctranslate2/transcribe.py
@@ -187,7 +187,7 @@ class Transcribe:
                 last_pos = segment.end
                 pbar.update(increment)

-        return dict(
+        return model, dict(
             text=all_text,
             segments=list_segments,
             language=language_name,
diff --git a/src/whisper_ctranslate2/whisper_ctranslate2.py b/src/whisper_ctranslate2/whisper_ctranslate2.py
index 1ff8335..58862a8 100644
--- a/src/whisper_ctranslate2/whisper_ctranslate2.py
+++ b/src/whisper_ctranslate2/whisper_ctranslate2.py
@@ -514,7 +514,7 @@ def main():
         return

     for audio_path in audio:
-        result = Transcribe().inference(
+        model, result = Transcribe().inference(
             audio_path,
             model_dir,
             cache_directory,
@@ -531,6 +531,7 @@ def main():
         )
         writer = get_writer(output_format, output_dir)
         writer(result, audio_path, writer_args)
+        model = None

     if verbose:
         print(f"Transcription results written to '{output_dir}' directory")

So it's not that helpful to try and work around it from whisper-ctranslate2. Hopefully it can be resolved upstream.

from whisper-ctranslate2.

Zacharie-Jacob avatar Zacharie-Jacob commented on May 14, 2024

Is there a good workaround for this? Not having access to Temperature at all results in substantially worse model results.

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024

Hello @guillaumekln. Do you have a timeline to release OpenNMT/CTranslate2#1201 ? If it's going to take more than a week, I can release a version changing the structure of the code (while my preference is to get this fixed upstream).

Thanks,

Jordi

from whisper-ctranslate2.

jordimas avatar jordimas commented on May 14, 2024

I will then merge https://github.com/Softcatala/whisper-ctranslate2/pull/44/files in the next hours. This should fix the issue. If somebody wants to provide feedback since I do not have a Windows box handy neither. Thanks

from whisper-ctranslate2.

iGerman00 avatar iGerman00 commented on May 14, 2024

Version 0.2.6 should fix this.

Currently am having the same issue on 0.2.7.

C:\Users\igerm\Desktop\whisper〉whisper-ctranslate2 --model large-v2 --language English -f all --verbose True audio.wav
Detected language 'English' with probability 1.000000

And then it exits. CPU works.

from whisper-ctranslate2.

emcodem avatar emcodem commented on May 14, 2024

Detected language 'English' with probability 1.000000
IMHO that should be fixed, i mean it actually did not "detect" anything because the user disabled automated detection by specifying the language.

@iGerman00 try if this works for you https://github.com/Purfview/whisper-standalone-win

from whisper-ctranslate2.

eric-gitta-moore avatar eric-gitta-moore commented on May 14, 2024

I also have a similar problem, but in my case, there is no effective output. And the return code is not 0

(whisper) PS D:\BaiduNetdiskDownload> pip list 
Package             Version
------------------- ----------
av                  10.0.0
certifi             2023.11.17
cffi                1.16.0
charset-normalizer  3.3.2
colorama            0.4.6
coloredlogs         15.0.1
ctranslate2         3.23.0
faster-whisper      0.10.0
filelock            3.13.1
flatbuffers         23.5.26
fsspec              2023.12.2
huggingface-hub     0.19.4
humanfriendly       10.0
idna                3.6
mpmath              1.3.0
numpy               1.26.2
onnxruntime         1.16.3
packaging           23.2
pip                 23.3.1
protobuf            4.25.1
pycparser           2.21
pyreadline3         3.4.1
PyYAML              6.0.1
requests            2.31.0
setuptools          68.2.2
sounddevice         0.4.6
sympy               1.12
tokenizers          0.15.0
tqdm                4.66.1
typing_extensions   4.9.0
urllib3             2.1.0
wheel               0.41.2
whisper-ctranslate2 0.3.4
(whisper) PS D:\BaiduNetdiskDownload> whisper-ctranslate2.exe aaa.mp4 --model small --language zh --verbose True                                                                                                                    
stream 0, timescale not set
Detected language 'Chinese' with probability 1.000000
(whisper) PS D:\BaiduNetdiskDownload> 

from whisper-ctranslate2.

ysshin avatar ysshin commented on May 14, 2024

Does this problem still exist? I am seeing it, so I think it is...

from whisper-ctranslate2.

zx3777 avatar zx3777 commented on May 14, 2024

from whisper-ctranslate2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.