endrence3 / openedai-whisper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from matatonic/openedai-whisper

0.0 0.0 0.0 25 KB

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

License: GNU Affero General Public License v3.0

Python 95.60% Dockerfile 4.40%

openedai-whisper's Introduction

OpenedAI Whisper

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

Compatible with the OpenAI audio/transcriptions and audio/translations API
Does not connect to the OpenAI API and does not require an OpenAI API Key
Not affiliated with OpenAI in any way

API Compatibility:

/v1/audio/transcriptions
/v1/audio/translations

Parameter Support:

Details:

CUDA or CPU support (automatically detected)
float32, float16 or bfloat16 support (automatically detected)

Tested whisper models:

openai/whisper-large-v2 (the default)
openai/whisper-large-v3
distil-whisper/distil-medium.en
openai/whisper-tiny.en
...

Version: 0.1.0, Last update: 2024-03-15

API Documentation

Usage

Installation instructions

You will need to install CUDA for your operating system if you want to use CUDA.

# Install the Python requirements
pip install -r requirements.txt
# install ffmpeg
sudo apt install ffmpeg

Usage

Usage: whisper.py [-m <model_name>] [-d <device>] [-t <dtype>] [-P <port>] [-H <host>] [--preload]


Description:
OpenedAI Whisper API Server

Options:
-h, --help            Show this help message and exit.
-m MODEL, --model MODEL
                      The model to use for transcription.
                      Ex. distil-whisper/distil-medium.en (default: openai/whisper-large-v2)
-d DEVICE, --device DEVICE
                      Set the torch device for the model. Ex. cuda:1 (default: auto)
-t DTYPE, --dtype DTYPE
                      Set the torch data type for processing (float32, float16, bfloat16) (default: auto)
-P PORT, --port PORT  Server tcp port (default: 8000)
-H HOST, --host HOST  Host to listen on, Ex. 0.0.0.0 (default: localhost)
--preload             Preload model and exit. (default: False)

Sample API Usage

You can use it like this:

curl -s http://localhost:8000/v1/audio/transscriptions -H "Content-Type: multipart/form-data" -F model="whisper-1" -F file="@audio.mp3" -F response_format=text

Or just like this:

curl -s http://localhost:8000/v1/audio/transscriptions -F model="whisper-1" -F file="@audio.mp3"

Or like this example from the OpenAI Speech to text guide Quickstart:

from openai import OpenAI
client = OpenAI(api_key='sk-1111', base_url='http://localhost:8000/v1')

audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
print(transcription.text)