stream-translator

Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and OpenAI's whisper for transcription/translation.

This fork optimized the audio slicing logic based on VAD, and introduced OpenAI's GPT API to support language translation beyond English.

Sample:

Prerequisites

Install and add ffmpeg to your PATH
Install CUDA on your system. If you installed a different version of CUDA than 11.3, change cu113 in requirements.txt accordingly. You can check the installed CUDA version with nvcc --version.

Setup

Setup a virtual environment.
git clone https://github.com/ionic-bond/stream-translator-gpt
pip install -r requirements.txt
Make sure that pytorch is installed with CUDA support. Whisper will probably not run in real time on a CPU.

Command-line usage

python translator.py URL --flags

By default, the URL can be of the form twitch.tv/forsen and yt-dlp is used to obtain the .m3u8 link which is passed to ffmpeg.

--flags	Default Value	Description
`--format`	wa*	Stream format code, this parameter will be passed directly to yt-dlp.
`--cookies`		Used to open member-only stream, this parameter will be passed directly to yt-dlp.
`--frame_duration`	0.1	The unit that processes live streaming data in seconds.
`--continuous_no_speech_threshold`	0.8	Slice if there is no speech for a continuous period in second.
`--min_audio_length`	3.0	Minimum slice audio length in seconds.
`--max_audio_length`	30.0	Maximum slice audio length in seconds.
`--prefix_retention_length`	0.8	The length of the retention prefix audio during slicing.
`--vad_threshold`	0.5	The threshold of Voice activity detection. if the speech probability of a frame is higher than this value, then this frame is speech.
`--model`	small	Select model size. See here for available models.
`--task`	translate	Whether to transcribe the audio (keep original language) or translate to english.
`--language`	auto	Language spoken in the stream. See here for available languages.
`--history_buffer_size`	0	Times of previous audio/text to use for conditioning the model. Set to 0 to just use audio from the last processing. Note that this can easily lead to repetition/loops if the chosen language/model settings do not produce good results to begin with.
`--beam_size`	5	Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate).
`--best_of`	5	Number of candidates when sampling with non-zero temperature.
`--direct_url`		Set this flag to pass the URL directly to ffmpeg. Otherwise, yt-dlp is used to obtain the stream URL.
`--use_faster_whisper`		Set this flag to use faster_whisper implementation instead of the original OpenAI implementation
`--use_whisper_api`		Set this flag to use OpenAI Whisper API instead of the original local Whipser.
`--whisper_filters`	emoji_filter	Filters apply to whisper results, separated by ",".
`--output_timestamps`		Output the timestamp of the text when outputting the text.
`--openai_api_key`		OpenAI API key if using GPT translation / Whisper API.
`--gpt_translation_prompt`		If set, will translate the result text to target language via ChatGPT API. Example: "Translate from Japanese to Chinese"
`--gpt_translation_history_size`	0	The number of previous messages sent when calling the GPT API. If the history size is 0, the GPT API will be called parallelly. If the history size > 0, the GPT API will be called serially.
`--gpt_model`	gpt-3.5-turbo	GPT model name, gpt-3.5-turbo or gpt-4
`--gpt_translation_timeout`	15	If the ChatGPT translation exceeds this number of seconds, the translation will be discarded.
`--cqhttp_url`		If set, will send the result text to the cqhttp server.
`--cqhttp_token`		Token of cqhttp, if it is not set on the server side, it does not need to fill in.

Using faster-whisper

faster-whisper provides significant performance upgrades over the original OpenAI implementation (~ 4x faster, ~ 2x less memory). To use it, install the cuDNN to your CUDA dir, Then you can run the CLI with --use_faster_whisper.

sonbonghee / stream-translator-gpt Goto Github PK