Giter Site home page Giter Site logo

rozwhite / audio-diffusion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from teticio/audio-diffusion

0.0 0.0 0.0 4.67 MB

Apply Denoising Diffusion Probabilistic Models using the new Hugging Face diffusers package to synthesize music instead of images.

License: GNU General Public License v3.0

Python 2.65% Jupyter Notebook 97.35%

audio-diffusion's Introduction

title emoji colorFrom colorTo sdk sdk_version app_file pinned license
Audio Diffusion
๐ŸŽต
pink
blue
gradio
3.1.4
app.py
false
gpl-3.0

audio-diffusion Open in Colab

Apply Denoising Diffusion Probabilistic Models using the new Hugging Face diffusers package to synthesize music instead of images.


UPDATE: I've trained a new model on 30,000 samples that have been used in music, sourced from WhoSampled and YouTube. The idea is that the model could be used to generate loops or "breaks" that can be sampled to make new tracks. People ("crate diggers") go to a lot of lengths or are willing to pay a lot of money to find breaks in old records.


mel spectrogram


Audio can be represented as images by transforming to a mel spectrogram, such as the one shown above. The class Mel in mel.py can convert a slice of audio into a mel spectrogram of x_res x y_res and vice versa. The higher the resolution, the less audio information will be lost. You can see how this works in the test_mel.ipynb notebook.

A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio.

You can play around with the model on Google Colab or Hugging Face spaces. Check out some automatically generated loops here.


Generate Mel spectrogram dataset from directory of audio files

Training can be run with Mel spectrograms of resolution 64x64 on a single commercial grade GPU (e.g. RTX 2080 Ti). The hop_length should be set to 1024 for better results.

python audio_to_images.py \
  --resolution 64 \
  --hop_length 1024 \
  --input_dir path-to-audio-files \
  --output_dir data-test

Generate dataset of 256x256 Mel spectrograms and push to hub (you will need to be authenticated with huggingface-cli login).

python audio_to_images.py \
  --resolution 256 \
  --input_dir path-to-audio-files \
  --output_dir data-256 \
  --push_to_hub teticio\audio-diffusion-256

Train model

Run training on local machine.

accelerate launch --config_file accelerate_local.yaml \
  train_unconditional.py \
  --dataset_name data-64 \
  --resolution 64 \
  --hop_length 1024 \
  --output_dir ddpm-ema-audio-64 \
  --train_batch_size 16 \
  --num_epochs 100 \
  --gradient_accumulation_steps 1 \
  --learning_rate 1e-4 \
  --lr_warmup_steps 500 \
  --mixed_precision no

Run training on local machine with batch_size of 2 and gradient_accumulation_steps 8 to compensate, so that 256x256 resolution model fits on commercial grade GPU and push to hub.

accelerate launch --config_file accelerate_local.yaml \
  train_unconditional.py \
  --dataset_name teticio/audio-diffusion-256 \
  --resolution 256 \
  --output_dir ddpm-ema-audio-256 \
  --num_epochs 100 \
  --train_batch_size 2 \
  --eval_batch_size 2 \
  --gradient_accumulation_steps 8 \
  --learning_rate 1e-4 \
  --lr_warmup_steps 500 \
  --mixed_precision no \
  --push_to_hub True \
  --hub_model_id audio-diffusion-256 \
  --hub_token $(cat $HOME/.huggingface/token)

Run training on SageMaker.

accelerate launch --config_file accelerate_sagemaker.yaml \
  strain_unconditional.py \
  --dataset_name teticio/audio-diffusion-256 \
  --resolution 256 \
  --output_dir ddpm-ema-audio-256 \
  --train_batch_size 16 \
  --num_epochs 100 \
  --gradient_accumulation_steps 1 \
  --learning_rate 1e-4 \
  --lr_warmup_steps 500 \
  --mixed_precision no

audio-diffusion's People

Contributors

teticio avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.