Light

azraelkuan / fftnet Goto Github PK

View Code? Open in Web Editor NEW

64.0 7.0 10.0 70 KB

FFTNet: a Real-Time Speaker-Dependent Neural Vocoder

Python 100.00%

vocoder text-to-speech fftnet deep-neural-networks

fftnet's Introduction

FFTNet

a TensorFlow implementation of the FFTNet

Quick Start

install requirements

pip install -r requirements.txt

Download data click here
Extract Features

python preprocess.py \
    --name cmu_arctic \
    --in_dir your_data_dir \
    --out_dir the_feature_dir \
    --hparams "input_type=mulaw-quantize"  # mulaw_quantize is better in my test

Training Process

you can split your train.txt into two parts in you data_dir

python train.py \
    --train_file "your_data_dir/train.txt" \
    --val_file "your_data_dir/val.txt" \
    --name "upsample_slt"

Synthesis Process

python synthesis.py \
    --checkpoint_path "your_checkpoint_dir" \
    --output "your_output_dir" \
    --local_path "local_condtion_path"

fftnet's People

Contributors

Stargazers

Watchers

Forkers

templeblock nature1317 jaekookang zhangxt suhee05 wgwangang edresson tsaiyihsun alokprasad railsloes

fftnet's Issues

Training with a new dataset

hi,

I tried to train using a new datasets of audiofiles, but I run into problems already by preprocess:
python preprocess.py --name=my_dataset --in_dir=path_to_my_dataset_files/ --out_dir=feature_dir/ --hparams "input_type=mulaw-quantize"
but I get a :

Traceback (most recent call last):
File "preprocess.py", line 53, in
assert name in ["cmu_arctic", "ljspeech"]
AssertionError

Is pretrained Model Avaiable ?

Out of range label values at beginning of training?

First. Thank you very much for this implementation. Great work!

I'm having a problem immediately in training which seems to have invalid logits into cross entropy.

Any ideas?...

Receptive Field: 2048 samples
pad value: 0
Start new training....
/home/rig/speech/fftnet/FFTNet/utils/__init__.py:67: RuntimeWarning: invalid value encountered in log1p
  return np.log1p(x) if isnumpy or isscalar else tf.log1p(x)
Traceback (most recent call last):
  File "/home/rig/.conda/envs/fftn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/rig/.conda/envs/fftn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/rig/.conda/envs/fftn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -2147483648 which is outside the valid range of [0, 256).  Label values: 328 327 327 326 326 327 326 325 326 327 327 328 331 333 333 334 335 335 336 340 343 344 347 350 352 354 360 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 
...
-2147483648 -2147483648 -2147483648 363 361 362 361 362 364 363 363 362 360 362 361 360 363 364 364 363 361 361 358 356 356 356 356 358 358 361 364 364 -2147483648 -2147483648 365 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648
	 [[Node: model/loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/loss/SparseSoftmaxCrossEntropyWithLogits/Reshape, model/loss/SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

do your test it in cpu?

whether it can run in realtime?

About the inference speed

Hi, thanks for your work, I've got a problem during training when I set the batch_size bigger than 1:

Cannot batch tensors with different shapes in component 0. First element had shape [52480] and element 1 had shape [47872].

It seems like that the different length of wav is the reason, so I set the batch_size to 1 later and the problem does not show again. But this adjustment is just for solving the problem, I will never use batch_size = 1 in training, so do you have any idea on how to fix this, thank you~

PS: in modules.py line 159, do you mean to use tf.nn.leaky_relu(), cause there is no alpha in tf.nn.relu().

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.