Giter Site home page Giter Site logo

diy-astra's Introduction

DIY-Astra

DIY-Astra is a Flask application that utilizes computer vision and natural language processing to create an interactive AI assistant. The application captures live video feed from a webcam, analyzes the captured images using the Google AI API, and generates text responses based on the visual input. The generated text responses are then converted to audio using the ElevenLabs API and played back to the user.

Features

  • Live video feed capture from the webcam
  • Image analysis using the Google AI API
  • Text generation based on visual input
  • Text-to-speech conversion using the ElevenLabs API
  • Real-time audio playback of generated responses
  • Web-based user interface for interaction and control

Requirements

To run the DIY-Astra application, you need to have the following dependencies installed:

  • Python 3.x
  • Flask
  • Flask-SocketIO
  • OpenCV (cv2)
  • Pydub
  • Google Generative AI Client Library
  • Pillow (PIL)
  • Requests

You also need to have valid API keys for the following services:

  • Google AI API (GOOGLE_API_KEY)
  • ElevenLabs API (ELEVENLABS_API_KEY)

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/diy-astra.git
  2. Navigate to the project directory:

    cd diy-astra
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Set up the API keys:

    • Replace GOOGLE_API_KEY in app.py with your Google AI API key.
    • Replace ELEVENLABS_API_KEY in app.py with your ElevenLabs API key.
  5. Run the application:

    python app.py
  6. Open your web browser and navigate to http://localhost:5001 to access the DIY-Astra interface.

Usage

  1. Make sure your webcam is connected and accessible.
  2. Launch the DIY-Astra application by running python app.py.
  3. The application will open in your default web browser.
  4. The live video feed from your webcam will be displayed in the interface.
  5. DIY-Astra will continuously capture images, analyze them using the Google AI API, and generate text responses based on the visual input.
  6. The generated text responses will be displayed in the text container below the video feed.
  7. The text responses will also be converted to audio using the ElevenLabs API and played back in real-time.
  8. You can stop the application by clicking the "Stop" button in the interface. To resume, click the "Resume" button.

File Structure

  • app.py: The main Flask application file containing the server-side logic.
  • templates/index.html: The HTML template for the user interface.
  • static/css/styles.css: The CSS stylesheet for styling the user interface.
  • static/js/script.js: The JavaScript file for client-side interactions and socket communication.
  • requirements.txt: The list of required Python dependencies.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

diy-astra's People

Contributors

doriandarko avatar

Stargazers

Thuan Ha avatar Adham Eldeeb avatar  avatar Barrios7 avatar 潘晓彤 avatar  avatar Kala avatar Muzamil Bashir avatar  avatar Chris avatar  avatar  avatar Alex Macdonald-Smith avatar  avatar Lucas avatar  avatar  avatar AboveWallStreet avatar  avatar  avatar Fyo avatar ddc avatar Agrim Singh avatar Eduardo avatar firm-gold avatar zhitingwang avatar  avatar Nick Dhima avatar Eris2025 avatar Mingxi Zhang avatar jiandong avatar Danielh Carranza avatar Wusong avatar Attila Nagy avatar  avatar 爱可可-爱生活 avatar mahady hasan rayhan avatar Dominic Gallego avatar Jordan Ellis avatar Ricardo Salta avatar Damien Hou avatar Akshara Hegde avatar  avatar  avatar  avatar Damien avatar tonychen avatar Bruno Lima avatar Fabio Dias Rollo avatar  avatar  avatar  avatar Fabien Georjon avatar  avatar  avatar Dennis Riungu Muticia avatar Bruno Moya  avatar reyblume avatar  avatar  avatar Pablo Ambram avatar jas avatar John Merritt avatar  avatar  avatar Tri Cells Technologies avatar  avatar Paulo Moekotte avatar pookzzz avatar Lau Van Kiet avatar Asman Mirza avatar Viet avatar  avatar  avatar Dean Taplin avatar Haldun avatar  avatar Marcin Miłkowski avatar  avatar  avatar JimmyLv_吕立青 avatar  avatar Mikel V. avatar  avatar Branislav Đalić avatar  avatar ziyao kang avatar Worldwide Casting avatar ASUN avatar Nitin Surya avatar Dunno Much avatar Giovanny Velez avatar Michael Su avatar Ivan N avatar Yusuf avatar  avatar  avatar  avatar rye avatar Ionut John Burchi avatar

Watchers

 avatar Jack C Crawford avatar bitcryptowski.btc avatar Tri Cells Technologies avatar Dominic Gallego avatar

diy-astra's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.