Giter Site home page Giter Site logo

fftnet's Introduction

FFTNet

a TensorFlow implementation of the FFTNet

Quick Start

  1. install requirements
pip install -r requirements.txt
  1. Download data click here

  2. Extract Features

python preprocess.py \
    --name cmu_arctic \
    --in_dir your_data_dir \
    --out_dir the_feature_dir \
    --hparams "input_type=mulaw-quantize"  # mulaw_quantize is better in my test
  1. Training Process

you can split your train.txt into two parts in you data_dir

python train.py \
    --train_file "your_data_dir/train.txt" \
    --val_file "your_data_dir/val.txt" \
    --name "upsample_slt"
  1. Synthesis Process
python synthesis.py \
    --checkpoint_path "your_checkpoint_dir" \
    --output "your_output_dir" \
    --local_path "local_condtion_path"

fftnet's People

Contributors

azraelkuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

fftnet's Issues

Training with a new dataset

hi,

I tried to train using a new datasets of audiofiles, but I run into problems already by preprocess:
python preprocess.py --name=my_dataset --in_dir=path_to_my_dataset_files/ --out_dir=feature_dir/ --hparams "input_type=mulaw-quantize"
but I get a :

Traceback (most recent call last):
File "preprocess.py", line 53, in
assert name in ["cmu_arctic", "ljspeech"]
AssertionError

Out of range label values at beginning of training?

First. Thank you very much for this implementation. Great work!

I'm having a problem immediately in training which seems to have invalid logits into cross entropy.

Any ideas?...

Receptive Field: 2048 samples
pad value: 0
Start new training....
/home/rig/speech/fftnet/FFTNet/utils/__init__.py:67: RuntimeWarning: invalid value encountered in log1p
  return np.log1p(x) if isnumpy or isscalar else tf.log1p(x)
Traceback (most recent call last):
  File "/home/rig/.conda/envs/fftn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/rig/.conda/envs/fftn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/rig/.conda/envs/fftn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of -2147483648 which is outside the valid range of [0, 256).  Label values: 328 327 327 326 326 327 326 325 326 327 327 328 331 333 333 334 335 335 336 340 343 344 347 350 352 354 360 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 
...
-2147483648 -2147483648 -2147483648 363 361 362 361 362 364 363 363 362 360 362 361 360 363 364 364 363 361 361 358 356 356 356 356 358 358 361 364 364 -2147483648 -2147483648 365 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648
	 [[Node: model/loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/loss/SparseSoftmaxCrossEntropyWithLogits/Reshape, model/loss/SparseSoftmaxCrossEntropyWithLogits/Reshape_1)]]

About the inference speed

Hi, thanks for your work, I've got a problem during training when I set the batch_size bigger than 1:

Cannot batch tensors with different shapes in component 0. First element had shape [52480] and element 1 had shape [47872].

It seems like that the different length of wav is the reason, so I set the batch_size to 1 later and the problem does not show again. But this adjustment is just for solving the problem, I will never use batch_size = 1 in training, so do you have any idea on how to fix this, thank you~

PS: in modules.py line 159, do you mean to use tf.nn.leaky_relu(), cause there is no alpha in tf.nn.relu().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.