karamarieliu / gst_tacotron2_wavenet Goto Github PK

13.0 4.0 3.0 73 KB

Python 99.32% Jupyter Notebook 0.68%

gst_tacotron2_wavenet's Introduction

Tacotron-2:

Tensorflow implementation of DeepMind's Tacotron-2. A deep neural network architecture described in this paper: Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions

Repository Structure:

Tacotron-2
├── datasets
├── en_UK		(0)
│   └── by_book
│       └── female
├── en_US		(0)
│   └── by_book
│       ├── female
│       └── male
├── LJSpeech-1.1	(0)
│   └── wavs
├── logs-Tacotron	(2)
│   ├── eval_-dir
│   │ 	├── plots
│ 	│ 	└── wavs
│   ├── mel-spectrograms
│   ├── plots
│   ├── pretrained
│   └── wavs
├── logs-Wavenet	(4)
│   ├── eval-dir
│   │ 	├── plots
│ 	│ 	└── wavs
│   ├── plots
│   ├── pretrained
│   └── wavs
├── papers
├── tacotron
│   ├── models
│   └── utils
├── tacotron_output	(3)
│   ├── eval
│   ├── gta
│   ├── logs-eval
│   │   ├── plots
│   │   └── wavs
│   └── natural
├── wavenet_output	(5)
│   ├── plots
│   └── wavs
├── training_data	(1)
│   ├── audio
│   ├── linear
│	└── mels
└── wavenet_vocoder
	└── models

The previous tree shows the current state of the repository (separate training, one step at a time).

Step (0): Get your dataset, here I have set the examples of Ljspeech, en_US and en_UK (from M-AILABS).
Step (1): Preprocess your data. This will give you the training_data folder.
Step (2): Train your Tacotron model. Yields the logs-Tacotron folder.
Step (3): Synthesize/Evaluate the Tacotron model. Gives the tacotron_output folder.
Step (4): Train your Wavenet model. Yield the logs-Wavenet folder.
Step (5): Synthesize audio using the Wavenet model. Gives the wavenet_output folder.

Note:

Our preprocessing only supports Ljspeech and Ljspeech-like datasets (M-AILABS speech data)! If running on datasets stored differently, you will probably need to make your own preprocessing script.
In the previous tree, files were not represented and max depth was set to 3 for simplicity.
If you run training of both models at the same time, repository structure will be different.

Model Architecture:

The model described by the authors can be divided in two parts:

Spectrogram prediction network
Wavenet vocoder

To have an in-depth exploration of the model architecture, training procedure and preprocessing logic, refer to our wiki

Current state:

To have an overview of our advance on this project, please refer to this discussion

since the two parts of the global model are trained separately, we can start by training the feature prediction model to use his predictions later during the wavenet training.

How to start

first, you need to have python 3 installed along with Tensorflow.

next you can install the requirements. If you are an Anaconda user: (else replace pip with pip3 and python with python3)

pip install -r requirements.txt

Dataset:

We tested the code above on the ljspeech dataset, which has almost 24 hours of labeled single actress voice recording. (further info on the dataset are available in the README file when you download it)

We are also running current tests on the new M-AILABS speech dataset which contains more than 700h of speech (more than 80 Gb of data) for more than 10 languages.

After downloading the dataset, extract the compressed file, and place the folder inside the cloned repository.

Preprocessing

Before running the following steps, please make sure you are inside Tacotron-2 folder

cd Tacotron-2

Preprocessing can then be started using:

python preprocess.py

dataset can be chosen using the --dataset argument. If using M-AILABS dataset, you need to provide the language, voice, reader, merge_books and book arguments for your custom need. Default is Ljspeech.

Example M-AILABS:

python preprocess.py --dataset='M-AILABS' --language='en_US' --voice='female' --reader='mary_ann' --merge_books=False --book='northandsouth'

or if you want to use all books for a single speaker:

python preprocess.py --dataset='M-AILABS' --language='en_US' --voice='female' --reader='mary_ann' --merge_books=True

This should take no longer than a few minutes.

Training:

To train both models sequentially (one after the other):

python train.py --model='Tacotron-2'

or:

python train.py --model='Both'

Feature prediction model can separately be trained using:

python train.py --model='Tacotron'

checkpoints will be made each 250 steps and stored under logs-Tacotron folder.

Naturally, training the wavenet separately is done by:

python train.py --model='WaveNet'

logs will be stored inside logs-Wavenet.

Note:

If model argument is not provided, training will default to Tacotron-2 model training. (both models)
Please refer to train arguments under train.py for a set of options you can use.

Synthesis

To synthesize audio in an End-to-End (text to audio) manner (both models at work):

python synthesize.py --model='Tacotron-2'

For the spectrogram prediction network (separately), there are three types of mel spectrograms synthesis:

Evaluation (synthesis on custom sentences). This is what we'll usually use after having a full end to end model.

python synthesize.py --model='Tacotron' --mode='eval'

Natural synthesis (let the model make predictions alone by feeding last decoder output to the next time step).

python synthesize.py --model='Tacotron' --GTA=False

Ground Truth Aligned synthesis (DEFAULT: the model is assisted by true labels in a teacher forcing manner). This synthesis method is used when predicting mel spectrograms used to train the wavenet vocoder. (yields better results as stated in the paper)

python synthesize.py --model='Tacotron' --GTA=True

Synthesizing the waveforms conditionned on previously synthesized Mel-spectrograms (separately) can be done with:

python synthesize.py --model='WaveNet'

Note:

If model argument is not provided, synthesis will default to Tacotron-2 model synthesis. (End-to-End TTS)
Please refer to synthesis arguments under synthesize.py for a set of options you can use.

Pretrained model and Samples:

Pre-trained models and audio samples will be added at a later date. You can however check some primary insights of the model performance (at early stages of training) here.

References and Resources:

gst_tacotron2_wavenet's People

Stargazers

Watchers

Forkers

hazekiahwon chl916185 ruclion

gst_tacotron2_wavenet's Issues

Conv2DCustomBackpropFilterOp only supports NHWC.

I saw another repo very similar to this one and it had an issue of the problem I'm having with wavenet. I was able to train tacotron to 100k and it saved out the GTA files as well. I tried to train Wavenet but get the error Conv2DCustomBackpropFilterOp only supports NHWC.

I found Rayhane-mamah/Tacotron-2/issues/73#issuecomment-497370684 which has a solution for it but it's with code that was added to that repo after this one was forked so it's not working for me.

Is it possible to support Windows with this code?

Generated 578 test batches of size 1 in 0.807 sec
2020-03-18 13:38:25.399613: E T:\src\github\tensorflow\tensorflow\core\common_runtime\executor.cc:697] Executor failed to create kernel. Invalid argument: Conv2DCustomBackpropFilterOp only supports NHWC.
         [[Node: model/optimizer/gradients/model/inference/conv2d_transpose_1/conv2d_transpose_grad/Conv2DBackpropFilter = Conv2DBackpropFilter[T=DT_FLOAT, _class=["loc:@model/optimizer/clip_by_global_norm/mul_199"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 16], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/optimizer/gradients/model/inference/conv2d_transpose_1/BiasAdd_grad/tuple/control_dependency/_2171, model/optimizer/gradients/model/inference/conv2d_transpose/conv2d_transpose_grad/Shape, model/inference/conv2d_transpose/BiasAdd/_2173)]]
Exiting due to Exception: Conv2DCustomBackpropFilterOp only supports NHWC.
         [[Node: model/optimizer/gradients/model/inference/conv2d_transpose_1/conv2d_transpose_grad/Conv2DBackpropFilter = Conv2DBackpropFilter[T=DT_FLOAT, _class=["loc:@model/optimizer/clip_by_global_norm/mul_199"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 16], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/optimizer/gradients/model/inference/conv2d_transpose_1/BiasAdd_grad/tuple/control_dependency/_2171, model/optimizer/gradients/model/inference/conv2d_transpose/conv2d_transpose_grad/Shape, model/inference/conv2d_transpose/BiasAdd/_2173)]]

Caused by op 'model/optimizer/gradients/model/inference/conv2d_transpose_1/conv2d_transpose_grad/Conv2DBackpropFilter', defined at:
  File "train.py", line 127, in <module>
    main()
  File "train.py", line 119, in main
    wavenet_train(args, log_dir, hparams, args.wavenet_input)
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\train.py", line 244, in wavenet_train
    return train(log_dir, args, hparams, input_path)
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\train.py", line 167, in train
    model, stats = model_train_mode(args, feeder, hparams, global_step)
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\train.py", line 119, in model_train_mode
    model.add_optimizer(global_step)
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\models\wavenet.py", line 365, in add_optimizer
    gradients, variables = zip(*optimizer.compute_gradients(self.loss))
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\training\optimizer.py", line 514, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 596, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 779, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 398, in _MaybeCompile
    return grad_fn()  # Exit early
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 779, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\ops\nn_grad.py", line 54, in _Conv2DBackpropInputGrad
    data_format=op.get_attr("data_format")),
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1190, in conv2d_backprop_filter
    dilations=dilations, name=name)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
    op_def=op_def)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'model/inference/conv2d_transpose_1/conv2d_transpose', defined at:
  File "train.py", line 127, in <module>
    main()
[elided 2 identical lines from previous traceback]
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\train.py", line 167, in train
    model, stats = model_train_mode(args, feeder, hparams, global_step)
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\train.py", line 117, in model_train_mode
    feeder.input_lengths, x=feeder.inputs)
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\models\wavenet.py", line 169, in initialize
    y_hat = self.step(x, c, g, softmax=False) #softmax is automatically computed inside softmax_cross_entropy if needed
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\models\wavenet.py", line 435, in step
    c = transposed_conv(c)
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\wavenet_vocoder\models\modules.py", line 333, in __call__
    return self.convt(inputs)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\layers\base.py", line 362, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 736, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\keras\layers\convolutional.py", line 781, in call
    data_format=conv_utils.convert_data_format(self.data_format, ndim=4))
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 1254, in conv2d_transpose
    name=name)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1340, in conv2d_backprop_input
    dilations=dilations, name=name)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)

InvalidArgumentError (see above for traceback): Conv2DCustomBackpropFilterOp only supports NHWC.
         [[Node: model/optimizer/gradients/model/inference/conv2d_transpose_1/conv2d_transpose_grad/Conv2DBackpropFilter = Conv2DBackpropFilter[T=DT_FLOAT, _class=["loc:@model/optimizer/clip_by_global_norm/mul_199"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 16], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/optimizer/gradients/model/inference/conv2d_transpose_1/BiasAdd_grad/tuple/control_dependency/_2171, model/optimizer/gradients/model/inference/conv2d_transpose/conv2d_transpose_grad/Shape, model/inference/conv2d_transpose/BiasAdd/_2173)]]

local variable 'checkpoint' referenced before assignment

I was training with this using the LJSpeech dataset and default train variables when it reached 100k steps and said

Loading checkpoint: logs-Tacotron-2\taco_pretrained/tacotron_model.ckpt-100000
Loaded metadata for 13100 examples (24.06 hours)
starting synthesis
  0%|▏                                                                            | 29/13100 [00:38<4:45:34,  1.31s/it]
Generated 32 train batches of size 48 in 62.968 sec
  0%|▏                                                                            | 30/13100 [00:43<9:06:18,  2.51s/it]Exception in thread background:
Traceback (most recent call last):
  File "C:\Users\camja\Anaconda3\envs\taco\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\Users\camja\Anaconda3\envs\taco\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\camja\Desktop\gst_tacotron2_wavenet\tacotron\feeder.py", line 166, in _enqueue_next_train_group
    self._session.run(self._enqueue_op, feed_dict=feed_dict)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\client\session.py", line 895, in run
    run_metadata_ptr)
  File "C:\Users\camja\Anaconda3\envs\taco\lib\site-packages\tensorflow\python\client\session.py", line 1053, in _run
    raise RuntimeError('Attempted to use a closed Session.')
RuntimeError: Attempted to use a closed Session.

However, after it was showing this

 67%|████████████████████████████████████████████████▉                        | 8787/13100 [2:31:59<1:06:18,  1.08it/s]T

but I had assumed that it might've just been loading up the files again, which I was wrong about. It looks like it was generating the gta files in tacotron_output but I foolishly ctrl c'd out of that thinking it was something else.

Is it possible to get back into my 100k model, or must I start over?

If I try to resume, I get this

python train.py --model='Tacotron-2' --restore=True --tacotron_train_steps=250000 --wavenet_train_steps=250000
Using TensorFlow backend.

#############################################################

Tacotron GTA Synthesis

###########################################################

Traceback (most recent call last):
  File "train.py", line 127, in <module>
    main()
  File "train.py", line 121, in main
    train(args, log_dir, hparams)
  File "train.py", line 64, in train
    input_path = tacotron_synthesize(args, hparams, checkpoint)
UnboundLocalError: local variable 'checkpoint' referenced before assignment

please share samples

Hi @karamarieliu could you please share samples of this repo.

And is this repo is ready to use ?

I also would like to try gst based on tacotron2