Giter Site home page Giter Site logo

voice-to-gpt's Introduction

VOICE TO GPT

Voice-to-GPT is a web application that allows users to interact with an AI assistant using voice commands. The application records users' voice input, transcribes it, and sends the transcribed text to OpenAI's GPT-4 for processing. The AI assistant responds with an answer, which is then converted back to speech and played to the user.

functionalities:

  • Voice input: Users can speak their questions or commands directly into their microphone.
  • Automatic speech recognition (ASR): The application transcribes users' voice input using Whisper ASR.
  • AI assistant: The transcribed text is sent to OpenAI's GPT-4, which processes the input and generates an appropriate response.
  • Text-to-speech (TTS): The AI assistant's response is converted back to speech and played to the user.
  • Please follow the instructions in the "Installation" section to set up and run the application.
Screen.Recording.2023-03-28.at.11.57.42.mp4

Installation

remember to add the GPT API key in you env first

export OPENAI_API_KEY=......

You only need to say what you want to ask the GPT API.

To compile the image you need to do

docker build -t audio-to-gpt .

and to execute

docker run -p 5001:5000 -e OPENAI_API_KEY=$OPENAI_API_KEY audio-to-gpt

and after that open your browser in

https://127.0.0.1:5001

and enjoy

Remember that depends of your computer, lambda, cloud run, etc resources spead will be different

Usage

  1. Open the application in a web browser.
  2. Click the "Record" button and speak your question or command into the microphone.
  3. Click the "Stop" button when you're done speaking.
  4. The application will transcribe your speech, send the text to GPT-4, and play the AI assistant's response.

Dependencies

Flask

Flask-CORS

OpenAI

Whisper

Contributing

If you'd like to contribute to this project, please submit a pull request with your proposed changes. Be sure to provide a clear description of the changes and any relevant information.

License

This project is licensed under the MIT License. Please refer to the LICENSE file for more information.

voice-to-gpt's People

Contributors

pabloinigo avatar

Stargazers

John Owen Nixon avatar Trianwar avatar  avatar

Watchers

Kirill Shirinkin avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.