Giter Site home page Giter Site logo

positivewon / wavjourney Goto Github PK

View Code? Open in Web Editor NEW

This project forked from audio-agi/wavjourney

0.0 0.0 0.0 65.85 MB

WavJourney: Compositional Audio Creation with LLMs

Home Page: https://audio-agi.github.io/WavJourney_demopage/

License: Other

Python 96.91% Dockerfile 2.01% Shell 1.08%

wavjourney's Introduction

๐ŸŽต WavJourney: Compositional Audio Creation with LLMs

arXiv GitHub Stars githubio Hugging Face Spaces

We are actively seeking research and commercial cooperation in advancing AI-assisted multimedia storytelling. If you are interested, please email [email protected] for more details! ๐Ÿ‘

This repository contains the official implementation of "WavJourney: Compositional Audio Creation with Large Language Models".

Starting with a text prompt, WavJourney can create audio content with engaging storylines encompassing personalized speakers, lifelike speech in context, emotionally resonant music compositions, and impactful sound effects that enhance the auditory experience.

Welcome to share your creation with Discord or the HuggingFace community!


Preliminaries

  1. Install the environment:
bash ./scripts/EnvsSetup.sh
  1. Activate the conda environment:
conda activate WavJourney
  1. (Optional) You can modify the default configuration in config.yaml, check the details described in the configuration file.
  2. Pre-download the models (might take some time):
python scripts/download_models.py
  1. Set the WAVJOURNEY_OPENAI_KEY in the environment variable for accessing GPT-4 API [Guidance]
export WAVJOURNEY_OPENAI_KEY=your_openai_key_here
  1. Set environment variables for using API services.
# Set the port for the WAVJOURNEY service to 8021
export WAVJOURNEY_SERVICE_PORT=8021

# Set the URL for the WAVJOURNEY service to 127.0.0.1
export WAVJOURNEY_SERVICE_URL=127.0.0.1

# Limit the maximum script lines for WAVJOURNEY to 999
export WAVJOURNEY_MAX_SCRIPT_LINES=999
  1. Start Python API services (e.g., Text-to-Speech, Text-to-Audio)
bash scripts/start_services.sh

Web APP

bash scripts/start_ui.sh

Commandline Usage

python wavjourney_cli.py -f --input-text "Generate a one-minute introduction to quantum mechanics" 

Kill the services

You can kill the running services via this command:

python scripts/kill_services.py

(Advanced features) Speaker customization

You can add voice presets to WavJourney to customize the voice actors. Simply provide the voice id, the description and a sample wav file, and WavJourney will pick the voice automatically based on the audio script. Predefined system voice presets are in data/voice_presets.

You can manage voice presets via UI. Specifically, if you want to add voice to voice presets. Run the script via command line below:

python add_voice_preset.py --id "id" --desc "description" --wav-path path/to/wav --session-id ''

What makes for good voice prompt? See detailed instructions here.

Hardware requirement

  • The VRAM of the GPU in the default configuration should be greater than 16 GB.
  • Operation system: Linux.

Citation

If you find this work useful, you can cite the paper below:

@article{liu2023wavjourney,
    title   = {WavJourney: Compositional Audio Creation with Large Language Models},
    author  = {Liu, Xubo and Zhu, Zhongkai and Liu, Haohe and Yuan, Yi and Huang, Qiushi and Liang, Jinhua and Cao, Yin and Kong, Qiuqiang and Plumbley, Mark D and Wang, Wenwu},
    journal = {arXiv preprint arXiv:2307.14335},
    year    = {2023}
}

"Buy Me A Coffee"

Appreciation

  • Bark for a zero-shot text-to-speech synthesis model.
  • AudioCraft for state-of-the-art audio generation models.

Disclaimer

We are not responsible for audio generated using semantics created by this model. Just don't use it for illegal purposes.

wavjourney's People

Contributors

liuxubo717 avatar zzk1st avatar hqsiswiliam avatar shenyuann avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.