Giter Site home page Giter Site logo

kf-r / turk-chat Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 303 KB

Lightweight speech-to-speech web-based chat app combining speech recognition, LLM completion and text-to-speech. Implemented with Python (Flask) and vanilla JavaScript only.

License: MIT License

Python 44.37% HTML 5.80% JavaScript 43.70% CSS 6.13%
conversation elevenlabs flask llm prompt-engineering pure-javascript python speech-recognition text-to-speech web-capabilities

turk-chat's Introduction

turk-chat conversation agent

If you're looking for the older async pygame version: https://github.com/KF-R/turk-chat-pygame


Features:

  • Ultra-lightweight; only a Python Flask server and vanilla JS.
  • Integrated ring buffer, sound activity detection algorithm and real-time animated speech visualization.
  • Automatic speech detection with termination detection; no push-to-talk or activation (listens, responds, listens... )
  • Speech is recorded, transcribed by either the OpenAI Whisper API or CTranslate2-based fast Whisper:
  • Transcribed speech, along with full chat history, is submitted to OpenAI API for a chat response.
  • Response is filtered for numbers, years, code blocks etc. in order to provide more naturalistic TTS.
  • Filtered response is read via ElevenLabs Text-To-Speech API or fast local TTS engine using:
  • Spoken response is visualized by way of a real-time waveform animation.
  • After the spoken response is complete, listening is resumed in order to facilitate fluid on-going conversation.
  • Integrated web access tools; turk-chat can grab current headlines, read wikipedia, summarise web pages etc.
  • Toggle between basic and advanced LLM back ends (e.g. GPT-3.5 vs GPT-4)
  • Obligatory Larson scanner using KITT and Cylon modes for a bit of additional visual feedback.
  • Simplified UI mode added (with KITT head-unit visualizer).

Usage:

  • git clone https://github.com/KF-R/turk-chat
  • Install requirements
    • sudo apt install portaudio-dev19
    • cd turk-chat
    • pip install -r requirements.txt
  • Set up API keys (See below and/or my_env.py.example)
  • Launch turk_flask.py, which is a Python Flask application.
  • Visit localhost port 5000 in your browser (e.g. https://127.0.0.1:5000/)
  • Approve the ad-hoc SSL certificate to authorise the page.
  • Click the Start Listening button. The first time you do this, you'll be asked to grant permissions to your microphone.
  • Start talking. Be patient with the response.
  • After your chat agent has finished speaking its response, it will automatically resume listening.
  • The Voice drop-down list is populated with the voice names from your ElevenLabs voice library.
  • You can change the responding voice without affecting the on-going conversation
  • Use the model switch to toggle between basic (e.g. GPT-3.5) and advanced (e.g. GPT-4) models.
  • To stop listening, click the Stop Listening button or refresh the page.
  • To clear/archive the chat message and engine logs, click the Reset button.
  • Archived conversations will be stored in the archive directory.
  • Code blocks generated by your chat partner will be stored in the sandbox directory.
  • Previously recorded .wav files are kept in audio_in
  • Previously generated .mp3 files are kept in audio_out

API keys:

As written, it expects 'my_env.py' in your home directory; its contents defining API keys as follows:

API_KEY_OPENAI = '<insert_your_OpenAI_API_key_here>'
API_KEY_ELEVENLABS = '<insert_your_ElevenLabs_API_key_here>'

Note:

The local TTS engine being used is https://balacoon.com/freeware/tts/package, which is x64-based. You'll need to stick with the Elevenlabs API and disable/replace 'Balacoon' if you're running on another platform e.g. Arm.


v0.4.x

turk-chat's People

Contributors

kf-r avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.