Giter Site home page Giter Site logo

sirivisiongpt's Introduction

SiriVisionGPT

Talk with ChatGPT using your voice and camera.

As an extension to SiriGPT, I created SiriVisionGPT. With SiriVisionGPT, you can not only talk to chatGPT with your voice, but also using your camera, making use of YOLO-NAS object recognition.

SiriVisionGPT uses pytorch mps to speed up the yolo object detection, but can also run on CPU. This will limit the framerate of yolo however.

Usage

To use siriVisionGPT.py, you first need to install all requirements, use the following command.

python3 -m pip install -r requirements.txt

Next, you need an OpenAI api key. Add this key to your .env file, and you can start the client using:

python3 siriVisionGPT.py.py

or:

python3 siriVisionGPT.py.py -mh <max_history> -c <num_channels> -d <recording_duration>

With parameters:

-h, --help            show this help message and exit
-mh, --history        Maximum number of messages GPT will save and use as history of the conversation. 
                      More hisotry means more tokens used. Default is 10.
-c, --channels        Number of channels to record. Default is 2.
-d, --duration        Number of seconds to record. Default is 6.
-yd, --yoloduration   Number of seconds to record video. Default is 9.
-yc, --camera         Index of the camera yolo will use as input. Default is 0.

The max history parameter is the amount of messages ChatGPT uses as reference history. All the tokens of these messages will need to parsed, so more history means a more expensive chat.

Vision mode

After the word 'show' has been spoken, the camera will automatically be activated in the next prompt. A comma seperated string of the detected objects will then be used as the a prompt. For example:

(You) Im going to show you items of food, and you need to tell me if they are gluten free.

(SiriVisionGPT) Okay!

(You) Shows items on camera

(SiriVisionGPT) Answers

Barcode mode

After the words 'show' and 'barcode' have been spoken, barcode mode is activated. You can now show a barcode on the camera and if this barcode is present in the openfoodfacts database, information about the product is sent to GPT to use in the answer to your question. For example:

(You) Im going to show you a barcode. Can I eat this if I am allergic to peanuts?

(SiriVisionGPT) Okay!

(You) Shows barcode on camera

(SiriVisionGPT) Answers

Demo

WIP

sirivisiongpt's People

Contributors

deboradum avatar

Stargazers

 avatar 昱彤 avatar  avatar SAP Sentinel avatar  avatar Thomsooo avatar  avatar xf4n avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.