Giter Site home page Giter Site logo

figaro's Introduction

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

Listen to the samples on Soundcloud.

Paper: https://openreview.net/forum?id=NyR8OZFHw6i

Colab Demo: https://colab.research.google.com/drive/1UAKFkbPQTfkYMq1GxXfGZOJXOXU_svo6


Getting started

Prerequisites:

  • Python 3.9
  • Conda

Setup

  1. Clone this repository to your disk
  2. Install required packages (see requirements.txt). With venv:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Preparing the Data

To train models and to generate new samples, we use the Lakh MIDI dataset (altough any collection of MIDI files can be used).

  1. Download (size: 1.6GB) and extract the archive file:
wget http://hog.ee.columbia.edu/craffel/lmd/lmd_full.tar.gz
tar -xzf lmd_full.tar.gz
  1. You may wish to remove the archive file now: rm lmd_full.tar.gz

Download Pre-Trained Models

If you don't wish to train your own models, you can download our pre-trained models.

  1. Download (size: 2.3GB) and extract the archive file:
wget -O checkpoints.zip https://polybox.ethz.ch/index.php/s/a0HUHzKuPPefWkW/download
unzip checkpoints.zip
  1. You may wish to remove the archive file now: rm checkpoints.zip

Training

Training arguments such as model type, batch size, model params are passed to the training scripts via environment variables.

Available model types are:

  • vq-vae: VQ-VAE model used for the learned desription
  • figaro: FIGARO with both the expert and learned description
  • figaro-expert: FIGARO with only the expert description
  • figaro-learned: FIGARO with only the learned description
  • figaro-no-inst: FIGARO (expert) without instruments
  • figaro-no-chord: FIGARO (expert) without chords
  • figaro-no-meta: FIGARO (expert) without style (meta) information
  • baseline: Unconditional decoder-only baseline following Huang et al. (2018)

Example invocation of the training script is given by the following command:

MODEL=figaro-expert python src/train.py

For models using the learned description (figaro and figaro-learned), a pre-trained VQ-VAE checkpoint needs to be provided as well:

MODEL=figaro VAE_CHECKPOINT=./checkpoints/vq-vae.ckpt python src/train.py

Generation

To generate samples, make sure you have a trained checkpoint prepared (either download one or train it yourself). For this script, make sure that the dataset is prepared according to Preparing the Data. This is needed to extract descriptions, based on which new samples can be generated.

An example invocation of the generation script is given by the following command:

python src/generate.py --model figaro-expert --checkpoint ./checkpoints/figaro-expert.ckpt

For models using the learned description (figaro and figaro-learned), a pre-trained VQ-VAE checkpoint needs to be provided as well:

python src/generate.py --model figaro --checkpoint ./checkpoints/figaro.ckpt --vae_checkpoint ./checkpoints/vq-vae.ckpt

Evaluation

We provide the evaluation scripts used to calculate the desription metrics on some set of generated samples. Refer to the previous section for how to generate samples yourself.

Example usage:

python src/evaluate.py --samples_dir ./samples/figaro-expert

It has been pointed out that the order of the dataset files (from which the splits are calculated) is non-deterministic and depends on the OS. To address this and to ensure reproducibility, I have added the exact files used for training/validation/testing in the respective file in the splits folder.

Parameters

The following environment variables are available for controlling hyperparameters beyond their default value.

Training (train.py)

Model

Variable Description Default value
MODEL Model architecture to be trained
D_MODEL Hidden size of the model 512
CONTEXT_SIZE Number of tokens in the context to be passed to the auto-encoder 256
D_LATENT [VQ-VAE] Dimensionality of the latent space 1024
N_CODES [VQ-VAE] Codebook size 2048
N_GROUPS [VQ-VAE] Number of groups to split the latent vector into before discretization 16

Optimization

Variable Description Default value
EPOCHS Max. number of training epochs 16
MAX_TRAINING_STEPS Max. number of training iterations 100,000
BATCH_SIZE Number of samples in each batch 128
TARGET_BATCH_SIZE Number of samples in each backward step, gradients will be accumulated over TARGET_BATCH_SIZE//BATCH_SIZE batches 256
WARMUP_STEPS Number of learning rate warmup steps 4000
LEARNING_RATE Initial learning rate, will be decayed after constant warmup of WARMUP_STEPS steps 1e-4

Others

Variable Description Default value
CHECKPOINT Path to checkpoint from which to resume training
VAE_CHECKPOINT Path to VQ-VAE checkpoint to be used for the learned description
ROOT_DIR The folder containing MIDI files to train on ./lmd_full
OUTPUT_DIR Folder for saving checkpoints ./results
LOGGING_DIR Folder for saving logs ./logs
N_WORKERS Number of workers to be used for the dataloader available CPUs

Generation (generate.py)

The generation script uses command line arguments instead of environment variables.

Argument Description Default value
--model Specify which model will be loaded
--checkpoint Path to the checkpoint for the specified model
--vae_checkpoint Path to the VQ-VAE checkpoint to be used for the learned description (if applicable)
--lmd_dir Folder containing MIDI files to extract descriptions from ./lmd_full
--output_dir Folder to save generated MIDI samples to ./samples
--max_iter Max. number of tokens that should be generated 16,000
--max_bars Max. number of bars that should be generated 32
--make_medleys Set to True if descriptions should be combined into medleys. False
--n_medley_pieces Number of pieces to be combined into one 2
--n_medley_bars Number of bars to take from each piece 16
--verbose Logging level, set to 0 for silent execution 2

Evaluation (evaluate.py)

The evaluation script uses command line arguments instead of environment variables.

Argument Description Default value
--samples_dir Folder containing generated samples which should be evaluated ./samples
--output_file CSV file to which a detailed log of all metrics will be saved to ./metrics.csv
--max_samples Limit the number of samples to be used for computing evaluation metrics 1024

figaro's People

Contributors

alicimertcan avatar dvruette avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

figaro's Issues

Using pm.time_to_tick() twice on Tempo

While looking through your input_representation.py, I noticed that you use self.pm.time_to_tick() twice when saving the tempo_items.

I am talking about this part:

    max_tick = self.pm.time_to_tick(self.pm.get_end_time())
    existing_ticks = {item.start: item.pitch for item in self.tempo_items}
    wanted_ticks = np.arange(0, max_tick+1, DEFAULT_RESOLUTION)
    output = []
    for tick in wanted_ticks:
      if tick in existing_ticks:
        output.append(Item(
          name='Tempo',
          start=self.pm.time_to_tick(tick),
          end=None,
          velocity=None,
          pitch=existing_ticks[tick]))
      else:
        output.append(Item(
          name='Tempo',
          start=self.pm.time_to_tick(tick),
          end=None,
          velocity=None,
          pitch=output[-1].pitch))
    self.tempo_items = output

In line 145 you use max_tick = self.pm.time_to_tick(self.pm.get_end_time()) and then use a loop to go through the ticks from 0 to max_tick. When you append items to the output, you use start=self.pm.time_to_tick(tick), but tick is already a tick and not a time. This gives far bigger values for the tempo start compared to the chords and notes.

I don't know if changing this will help when using your pretrained weights, since this bug may have been there since training. I just wanted to note it nonetheless.

Training with extended chord vocabulary

I am trying to train FIGARO with an extended chord vocabulary (using the provided checkpoints).

I edited get_chord_tokens(...) in vocab.py to match the chord qualities in my dataset. However, when loading the checkpoint, I ran into the error of size missmatch for in_layer.weight and out_layer.weight - obviously, as the vocabulary changed.
Do you happen to know which additional steps are needed to continue training from the existing checkpoints, with a dataset that contains more chord qualities than the ones from the paper?

Thank you in advance!

about your forked MuseMorphose

I am sorry for the off topic.
I saw your forked MuseMorphose.
I can not use original Pre-Trained Models.
So if you don't mind, could you please provide your Pre-Trained Models?
Thank you.

TypeError: __init__() got an unexpected keyword argument 'vae_run'

When I try to generate midi from the command:
python src/generate.py --model figaro --checkpoint ./checkpoints/figaro.ckpt --vae_checkpoint ./checkpoints/vq-vae.ckpt
The issue can be repeated.
And I am sure that the version of every packages I downloaded are correct.

more detailed information about "duration"

Thank you for your work.

Do you mind helping me out with a little bit more information about the "mean_duration" parameter for the expert description?
The only detailed information about duration that I can find in the paper is: "Mean duration is quantized to 32 logarithmically spaced intervals in [0, 128] positions (12 positions per quarter note)."
If I understand this correctly, this would imply that a duration value of 1 is equal to a note being played for 1/12 quarter notes, whereas a duration value of 32 would mean that it is played for 128/12 quarter notes. However, looking at the generated results from the example descriptions, this doesn't seem to be the case.

Kind regards, and thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.