Giter Site home page Giter Site logo

xiepengli / faster-whisper-server Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fedirz/faster-whisper-server

0.0 0.0 0.0 68.27 MB

Home Page: https://hub.docker.com/r/fedirz/faster-whisper-server

License: MIT License

Shell 0.38% Python 97.07% Nix 2.54%

faster-whisper-server's Introduction

Faster Whisper Server

faster-whisper-server is an OpenAI API compatible transcription server which uses faster-whisper as it's backend. Features:

  • GPU and CPU support.
  • Easily deployable using Docker.
  • Configurable through environment variables (see config.py).
  • OpenAI API compatible.

Please create an issue if you find a bug, have a question, or a feature suggestion.

OpenAI API Compatibility ++

See OpenAI API reference for more information.

  • Audio file transcription via POST /v1/audio/transcriptions endpoint.
    • Unlike OpenAI's API, faster-whisper-server also supports streaming transcriptions(and translations). This is usefull for when you want to process large audio files would rather receive the transcription in chunks as they are processed rather than waiting for the whole file to be transcribe. It works in the similar way to chat messages are being when chatting with LLMs.
  • Audio file translation via POST /v1/audio/translations endpoint.
  • (WIP) Live audio transcription via WS /v1/audio/transcriptions endpoint.
    • LocalAgreement2 (paper | original implementation) algorithm is used for live transcription.
    • Only transcription of single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.

Quick Start

Using Docker

docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cuda
# or
docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cpu

Using Docker Compose

curl -sO https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
docker compose up --detach faster-whisper-server-cuda
# or
docker compose up --detach faster-whisper-server-cpu

Usage

OpenAI API CLI

export OPENAI_API_KEY="cant-be-empty"
export OPENAI_BASE_URL=http://localhost:8000/v1/
openai api audio.transcriptions.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format text

openai api audio.translations.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format verbose_json

OpenAI API Python SDK

from openai import OpenAI

client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")

audio_file = open("audio.wav", "rb")
transcript = client.audio.transcriptions.create(
    model="Systran/faster-distil-whisper-large-v3", file=audio_file
)
print(transcript.text)

CURL

# If `model` isn't specified, the default model is used
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "stream=true"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "model=Systran/faster-distil-whisper-large-v3"
# It's recommended that you always specify the language as that will reduce the transcription time
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "language=en"

curl http://localhost:8000/v1/audio/translations -F "[email protected]"

Live Transcription (using Web Socket)

From live-audio example

demo.mp4

websocat installation is required. Live transcribing audio data from a microphone.

ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions

faster-whisper-server's People

Contributors

fedirz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.