Giter Site home page Giter Site logo

endrence3 / openedai-whisper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from matatonic/openedai-whisper

0.0 0.0 0.0 25 KB

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

License: GNU Affero General Public License v3.0

Python 95.60% Dockerfile 4.40%

openedai-whisper's Introduction

OpenedAI Whisper

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

  • Compatible with the OpenAI audio/transcriptions and audio/translations API
  • Does not connect to the OpenAI API and does not require an OpenAI API Key
  • Not affiliated with OpenAI in any way

API Compatibility:

  • /v1/audio/transcriptions
  • /v1/audio/translations

Parameter Support:

  • file
  • model (only whisper-1 exists, so this is ignored)
  • language
  • prompt (not yet supported)
  • temperature
  • response_format:
    • json
    • text
    • srt
    • vtt
    • verbose_json *(partial support, some fields missing)

Details:

  • CUDA or CPU support (automatically detected)
  • float32, float16 or bfloat16 support (automatically detected)

Tested whisper models:

  • openai/whisper-large-v2 (the default)
  • openai/whisper-large-v3
  • distil-whisper/distil-medium.en
  • openai/whisper-tiny.en
  • ...

Version: 0.1.0, Last update: 2024-03-15

API Documentation

Usage

Installation instructions

You will need to install CUDA for your operating system if you want to use CUDA.

# Install the Python requirements
pip install -r requirements.txt
# install ffmpeg
sudo apt install ffmpeg

Usage

Usage: whisper.py [-m <model_name>] [-d <device>] [-t <dtype>] [-P <port>] [-H <host>] [--preload]


Description:
OpenedAI Whisper API Server

Options:
-h, --help            Show this help message and exit.
-m MODEL, --model MODEL
                      The model to use for transcription.
                      Ex. distil-whisper/distil-medium.en (default: openai/whisper-large-v2)
-d DEVICE, --device DEVICE
                      Set the torch device for the model. Ex. cuda:1 (default: auto)
-t DTYPE, --dtype DTYPE
                      Set the torch data type for processing (float32, float16, bfloat16) (default: auto)
-P PORT, --port PORT  Server tcp port (default: 8000)
-H HOST, --host HOST  Host to listen on, Ex. 0.0.0.0 (default: localhost)
--preload             Preload model and exit. (default: False)

Sample API Usage

You can use it like this:

curl -s http://localhost:8000/v1/audio/transscriptions -H "Content-Type: multipart/form-data" -F model="whisper-1" -F file="@audio.mp3" -F response_format=text

Or just like this:

curl -s http://localhost:8000/v1/audio/transscriptions -F model="whisper-1" -F file="@audio.mp3"

Or like this example from the OpenAI Speech to text guide Quickstart:

from openai import OpenAI
client = OpenAI(api_key='sk-1111', base_url='http://localhost:8000/v1')

audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
print(transcription.text)

Docker support

You can run the server via docker like so:

docker compose build
docker compose up

Options can be set via whisper.env.

openedai-whisper's People

Contributors

endrence3 avatar matatonic avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.