Giter Site home page Giter Site logo

dalle-picture-frame's Introduction

DALL-E Picture Frame

A DALL-E 2 powered voice activated picture frame.

This was originally written to run on a Raspberry PI with a USB speakerphone attached. I used a Raspberry PI 4 running Raspbian and the INNOTRIK Bluetooth Conference Speaker with Microphone attached. It's the same hardware used in my Conversation Speaker project.

Setup

1. Azure Cognitive Services

You will need to deploy an instance of Azure Cognitive Services for the Speech-To-Text (https://azure.microsoft.com/en-us/products/cognitive-services/). You get 5 hours of audio processing for free and it's relatively inexpensive after that.

2. OpenAI's DALL-E 2

You will need to sign up for OpenAI's DALL-E 2 (https://openai.com/dall-e-2/). You get 50 free credits when signing up and 15 free credits every month after that. For more, you can spend $15 to get enough credits for 460 image requests. Note that this application uses three image requests per rendering to attain 1080p.

3. 'feh' image viewer

The slideshow uses feh to run. Simply run sudo apt install feh from your linux shell (e.g. bash).

3. dotnet 6.x SDK

You will need to install the dotnet 6.x SDK: https://dotnet.microsoft.com/en-us/download/dotnet/6.0

3. Set Configuration

Look at ./configuration.json and run each of the dotnet user-secrets set commands before building. This adds the keys and endpoints for Azure Cognitive Services and OpenAI to the application.

4. Build and run

From the /src/dalleframecon directory, run dotnet run and that should start the show.

How it works

The application...

  1. Starts and starts a separate process showing a slideshow of previous images.
  2. Starts listening to you for the wake phrase, "Hey, DALL-E" (pronounced 'Hey Dolly').
  3. Then sounds a notification 'bing!'.
  4. Listens to you for what to render (e.g. "A 3D render of a white furry monster in a purple room.").
  5. Responds with "Sure thing, one moment."
  6. Then updates with "Just adding a few more details."
  7. On completion says "Here you go!".
  8. Then restarts the randomized slideshow of previous images, starting with the image just requested.
  9. And finally starts listening for the wake phrase again.

Notes

  • The application makes three calls to DALL-E in order to render a 1080p image.
    • DALL-E itself can only render a 1024x1024 image at the largest. The application resends the first image back to DALL-E and requests a render the remaining left and right images, then the application stiches it all together for proper aspect ratio and upscales it to 1080p.
  • Images are stored in the application's directory under a sub-directory "images".
  • The application may not be able to start the slideshow if there are no images in the "images" directory.
  • The application does not handle error codes from OpenAI or Cognitive Services. If some returns a failure, such as for content violations, the application will stop listening but continue continue the slideshow.

dalle-picture-frame's People

Contributors

adrianwyatt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.