azraelkuan / tensorflow_wavenet_vocoder Goto Github PK

View Code? Open in Web Editor NEW

27.0 7.0 10.0 3.21 MB

wavenet vocoder using tensorflow

Python 100.00%

wavenet-vocoder wavenet vocoder tensorflow speech-synthesis python3

tensorflow_wavenet_vocoder's Introduction

Hi there 👋

tensorflow_wavenet_vocoder's People

Contributors

Stargazers

Watchers

Forkers

entn-at maozhiqiang huguanglong jaekookang shubhampachori12110095 cbrom hyzhan icewwn hccho2 yjingyu

tensorflow_wavenet_vocoder's Issues

there are some question about my implementation

Hi azraelkuan, thanks for your work and sharing!
I encounter three questions during implementation.

my environment: win10, python==3.6.7, tensorflow==1.11, anaconda==3

one is when I finished "preprocess.py", my file(LJSpeech-1-mel.npy) was only 176KB and file(LJSpeech-1-audio.npy) was only 281KB. I think it may be that something is repeatedly covered or the problem is caused by the difference between windows and Linux? i am not sure about this.

The second problem is encountered during the Synthesize step. I didn't find a file called "eavl.txt". There are only three files (LJSpeech-1-audio.npy, LJSpeech-1-mel.npy, train.txt) in output path after preprocess.

the final question is it seems that the input of the parameter is adjusted, unlike the command in the readme?
about '--eval_txt' i just set the output folder for preprocess.

tensorflow_wavenet_vocoder>python mul_generate.py --eval_txt ./FeaPath/ --wav_out_path ./WavOut/ checkpoint ./log_ljspeech/train/2018-11-18T18-07-48/model.ckpt-99999 ---hparams gc_enable=False,global_channel=0,global_cardinality=0,NPY_DATAROOT=/your_npy_datadir/,sample_rate=22050 usage: mul_generate.py [-h] [--logdir LOGDIR] [--temperature TEMPERATURE] [--save_every SAVE_EVERY] [--eval_txt EVAL_TXT] [--hparams HPARAMS] checkpoint mul_generate.py: error: unrecognized arguments: --wav_out_path checkpoint ./log_ljspeech/train/2018-11-18T18-07-48/model.ckpt-99999 ---hparams gc_enable=False,global_channel=0,global_cardinality=0,NPY_DATAROOT=/your_npy_datadir/,sample_rate=22050

or this code can't running on windows?
Tell me if I'm wrong, thanks ^_^

missing the last segment of local condition

s = np.random.randint(0, len(local_condition) - max_frames)

--> s = np.random.randint(0, len(local_condition) - max_frames +1)

How many steps of the iteration can have acceptable results

Thanks for sharing the great work.
I have trained on Chinese data for 48000 steps, but now I can only get noise through the mul-gengerate.py.And I found that the loss first decrease and then increase.Can you give me some advise.Thanks.

how to increase the batch size?

Hi azraelkuan!
when i using your wavenet vocoder in my corpus, keeping other parameter unchanged. but the loss does not converge.
my computer is 2080ti with 11 GB's memory, but my batch size can only be 1. when i directly increase it to 2, its memory overflows.
how could i increase my batch size ?
thanks!

环境问题

请问不用gpu 是否也可以训练

there are some question about my implementation，please help me

Exception in thread Thread-1:
Traceback (most recent call last):
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 145, in thread_main
assert_ready_for_upsampling(wav, local_condition)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 50, in assert_ready_for_upsampling
assert len(x) % len(c) == 0 and len(x) // len(c) == audio.get_hop_size()
AssertionError
Exception in thread Thread-2:
Traceback (most recent call last):
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 145, in thread_main
assert_ready_for_upsampling(wav, local_condition)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 50, in assert_ready_for_upsampling
assert len(x) % len(c) == 0 and len(x) // len(c) == audio.get_hop_size()
AssertionError

Exception in thread Thread-8:
Traceback (most recent call last):
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 145, in thread_main
assert_ready_for_upsampling(wav, local_condition)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 50, in assert_ready_for_upsampling
assert len(x) % len(c) == 0 and len(x) // len(c) == audio.get_hop_size()
AssertionError

Exception in thread Thread-7:
Traceback (most recent call last):
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 145, in thread_main
assert_ready_for_upsampling(wav, local_condition)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 50, in assert_ready_for_upsampling
assert len(x) % len(c) == 0 and len(x) // len(c) == audio.get_hop_size()
AssertionError

Exception in thread Thread-6:
Traceback (most recent call last):
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 145, in thread_main
assert_ready_for_upsampling(wav, local_condition)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 50, in assert_ready_for_upsampling
assert len(x) % len(c) == 0 and len(x) // len(c) == audio.get_hop_size()
AssertionError

Exception in thread Thread-4:
Traceback (most recent call last):
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 145, in thread_main
assert_ready_for_upsampling(wav, local_condition)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 50, in assert_ready_for_upsampling
assert len(x) % len(c) == 0 and len(x) // len(c) == audio.get_hop_size()
AssertionError

Exception in thread Thread-5:
Traceback (most recent call last):
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 145, in thread_main
assert_ready_for_upsampling(wav, local_condition)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 50, in assert_ready_for_upsampling
assert len(x) % len(c) == 0 and len(x) // len(c) == audio.get_hop_size()
AssertionError

Exception in thread Thread-3:
Traceback (most recent call last):
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\wtx\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 145, in thread_main
assert_ready_for_upsampling(wav, local_condition)
File "D:\tf-wavenet_vocoder-master\apps\vocoder\datasets\data_feeder.py", line 50, in assert_ready_for_upsampling
assert len(x) % len(c) == 0 and len(x) // len(c) == audio.get_hop_size()
AssertionError

Could you please share some trained params?

Dear Kuan Chen,
Could you please share your checkpoints for the LJspeech and CMU_arctic, as well as the corresponding hparams settings? That will be much helpful for me.
Thank you in advance!

Songxiang Liu

Do your project generate raw speech samples from acoustic features, or from text, like TTS?

I found this project is trying to generate raw speech samples from acoustic features like previous vocoders (WORLD,DIO,etc.)
https://github.com/r9y9/wavenet_vocoder

But it seems your project is to generate waveform and conditioned by text files?

It's input might different with the traditional one, right?

Thanks

ignore this

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.