Giter Site home page Giter Site logo

ai-waifu's Introduction

AI Waifu (VTuber)

GitHub GitHub top language Static Badge

Anime AI Waifu is an AI powered voice assistant with VTuber's model, that combines the charm of anime characters with cutting-edge technologies. This project is meant to create an engaging experience where you can interact with desired character in real-time without powerful hardware.

Features

  • ๐ŸŽค Voice Interaction: Speak to your AI waifu and get instant (almost) responses.

    • Whisper - openai's paid speech recognition.
    • Google sr - free speech recognition alternative.
    • Console - if you don't want use microphone just type prompts with your keyboard.
  • ๐Ÿค– AI Chatbot Integration: Conversations are powered by an AI chatbot, ensuring engaging and dynamic interactions.

    • Openai's 'gpt-3.5-turbo' or any other available model.
    • File with personality and behaviour description.
    • Remembers previous messages.
  • ๐Ÿ“ข Text-to-Speech: Hear your AI waifu's responses as she speaks back to you, creating an immersive experience.

    • Google tts - free and simple solution.
    • ElevenLabs - amazing results, tons of voices.
    • Console - get text responses in your console (but VTube model will be just idle).
  • ๐ŸŒ Integration with VTube Studio: Seamlessly connect your AI waifu to VTube Studio for an even more lifelike and visually engaging interaction.

    • Lipsync while talking.

Showcase

Video demonstration

*Demonstration in real time without cutouts or speed up. This is real delay in answers.

Installation

To run this project, you need:

  1. Install Python 3.10.5 if you don't already have it installed.

  2. Clone the repository by running git clone https://github.com/JarikDem-Bot/ai-waifu.git

  3. Install the required Python packages by running pip install -r requirements.txt in the project directory.

  4. Create .env file inside the project directory and enter your API keys

    .env template
    OPENAI_API_KEY='YOUR_OPEN_AI_KEY'
    ELEVENLABS_API_KEY='YOUR_ELEVENLABS_KEY'
  5. Install VB-Cable

  6. Install and set VTube Studio

    Settings:
    • Select CABLE Output as microphone. Select Preview microphone audio to hear waifu's answers

    • Select input and output for Mouth Open. Optionally you can set "breathing" to get idle movents.

  7. Select your required settings in main.py in waifu.initialize

    Arguments:
    • user_input_service (str) - the way to interact with Waifu

      • "whisper" - OpenAI's whisper speech to text service; paid, requires OpanAi API key.
      • "google" - free google speech to text service.
      • "console" - type your promt in console with text (absoulutely free).
      • None or unspecified - default value is "whisper".
    • stt_duration (float) - the maximum number of seconds that it will dynamically adjust the threshold for before returning. This value should be at least 0.5 in order to get a representative sample of the ambient noise. Default value is 0.5.

    • mic_index (int) - index of the device to use for audio input. If None or unspecified will use default microphone.

    • chatbot_service (str) - service that will generate responses

      • "openai" - OpenAI text generation servise; paid, requires OpanAi API key.
      • "test" - returns prewritten message; used as dummy text for developement to reduce time and cost of testings.
      • None or unspecified - default value is "openai".
    • chatbot_model (str) - model used for text generation. List of available models you can find here. Default value is "gpt-3.5-turbo".

    • chatbot_temperature (float) - determines creativity of the generated text. A higher value leads to more creative result. A lower value leads to less creative and more similar results. Default value is 0.5.

    • personality_file (str) - relative path to txt file with waifu's description. Default value is "personality.txt".

    • tts_service (str) - service that "reads" Waifu's responses

      • "google" - free Google's tts, voice feels very "robotic".
      • "elevenlabs" - ElevenLabs tts with good quality; paid, requires ElevenLabs API key.
      • "console" - output will be printed in console (free).
      • None or unspecified - default value is "google".
    • output_device - (int) output device ID or (str) output device name substring. If VB-Cable is used, you need to find device, that will start with CABLE Input (VB-Audio Virtual using sd.query_devices() command.

    • tts_voice (str) - ElevenLabs voice name. Default value is "Elli".

    • tts_model (str) - ElevenLabs model. Recommended values are "eleven_monolingual_v1" and "eleven_multilingual_v1". Default value is "eleven_monolingual_v1".

  8. Run the project by executing python main.py in the project directory.


Warning

Depending on the selected input mode, program may send all recorded sounds or other data to the 3-rd parties such as: Google (stt, tts), OpenAI (stt, text generation), ElevenLabs (tts).

License

MIT

ai-waifu's People

Contributors

jarikdem-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ai-waifu's Issues

Discord

I was trying to find a way for the AI to use discord, I thought about changing the inputs and outputs around but unfortunately only outputs are allowed, you're probably busy but do you have something in mind for what I want?

waifu.py", line 33, in Waifu

NotOpenSSLWarning: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: urllib3/urllib3#3020
warnings.warn(
Traceback (most recent call last):
File "/Users/rajeev/Downloads/ai-waifu-master/main.py", line 1, in
from waifu import Waifu
File "/Users/rajeev/Downloads/ai-waifu-master/waifu.py", line 12, in
class Waifu:
File "/Users/rajeev/Downloads/ai-waifu-master/waifu.py", line 33, in Waifu
def initialise(self, user_input_service:str | None = None, stt_duration:float | None = None, mic_index:int | None = None,
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Invalid number of channels

File "C:\Users\chibi\Desktop\Nero\main.py", line 25, in
main()
File "C:\Users\chibi\Desktop\Nero\main.py", line 6, in main
waifu.initialize(user_input_service='whisper',
File "C:\Users\chibi\Desktop\Nero\waifu.py", line 46, in initialize
self.update_tts(service=tts_service, output_device=output_device, voice=tts_voice, model=tts_model)
File "C:\Users\chibi\Desktop\Nero\waifu.py", line 99, in update_tts
sd.check_output_settings(output_device)
File "C:\Python\lib\site-packages\sounddevice.py", line 697, in check_output_settings
_check(_lib.Pa_IsFormatSupported(_ffi.NULL, parameters, samplerate))
File "C:\Python\lib\site-packages\sounddevice.py", line 2747, in _check
raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Invalid number of channels [PaErrorCode -9998]

I'm stumped, I don't even know where to start

Discord Back Again

Do you have an idea on how to get the bot to pick up audio in discord calls to respond to?

Token limit

How do you get rid of your tokens? I know our current limit depends on our version of chat gpt, but I would just like to know how to clear it out beforehand.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.