Comments (30)
@lapwing I updated some codes today, with which the quality of generated speech is much better than before (87K steps). It aslo works with small reduce factor.
eval-87k-r2.zip
I will also eval the performance these days.
from gst-tacotron.
@syang1993 Thanks! will wait to see good result.
BTW, could we put the eval mel to r9y9‘s wavenet ?
from gst-tacotron.
@ZohaibAhmed Hi, since I'm on a summer vocation these weeks, I will send it to you after I go back to school. Besides, you can get this model using Blizzard 2011 database, it will not take so long time.
@syang1993 Hi, Could you send me the trained model that you used to generate the samples for eval-87k-r2.zip? Thank you!
from gst-tacotron.
Yes the voice from the link is a little shaking. Can you share the hyper-parameter and the alignments of the test setences of your experiments?
Besides, I found use the character-leval inputs is better than phoneme-level in my earlier experiments, though the paper used the phoneme inputs.
from gst-tacotron.
the new update seems to have set cmu-dict to false. Is that what you used to get those results?
from gst-tacotron.
Thank you very much! sorry for the delay... I have uploaded the files you wanted to the link above, if you still want to take a look.
The quality is very good! Do you have any plans to release the checkpoints?
from gst-tacotron.
@fazlekarim Yes, the hparams in the repo is 100% matching to my experiments.
from gst-tacotron.
@lapwing I didn't see the setting in your link, anyway I guess you can try the new code.
Since I only have one single GPU, it will take several days to test the new code. I'm very grateful if you can help to test the performance. As for now, I find the quality is better, but the style is learned slower than before(100K steps). I will continue training to see if it can get stable results with and without style attention. Also I will upload the checkpoints and new samples once I finished theses experiments in several days.
from gst-tacotron.
Sure! I will do the same and will upload the results so that we can compare. Thanks for your nice work!
from gst-tacotron.
@syang1993 are those samples generated directly from Tacotron ? the audio quality is amazing
from gst-tacotron.
@butterl Which sample do you mean? The samples attached in this issues were generated from gst-tacotron repo directly using blizzard 2011 data. The samples in the demo page were also directly genereted from gst-tacotron using blizzard2013 data. I also did experiments with tacotron using bc2011 data, the samples can be found in keithito/tacotron#182
from gst-tacotron.
@syang1993 thanks for reaching out, I‘ve tried keithito/tacotron and Rayhane-mamah/Tacotron-2 all seems generate wav with shake & echo like @lapwing’s sample and even worse(even with wavenet 300K as vcoder), and your sample wav attached is much clear, and you posted “I updated some codes today” 15 days ago, but I do not find the exactly patch.
Will try with this amazing repo to reproduce
from gst-tacotron.
@butterl Maybe you can try the modified keithito's tacotron in my repo, which is forked from the original one and fixed the issues to support small reduce factor. @fazlekarim may have tried this repo, I'm not sure whether he get good results. And the commit of "I updated some code today" is ba10ee1
from gst-tacotron.
@butterl I was satisfied with my results. I can show them to you if you are interested?
from gst-tacotron.
@fazlekarim thanks for reaching out, I'm be very interested in your sample , because mine is much worse with other repo even trained to 400K, and now I will switch to this one and give feed back
from gst-tacotron.
This is the only one I have saved in this computer. Let me know what you think about it.
from gst-tacotron.
@fazlekarim thanks for reaching out , the wav is good ,and seem have more shaking than eval-87k-r2.zip @syang1993 shared
@syang1993 I trained in my machine and the result is good, but for eval it failed some times (2/3)
and with use_gst=False the eval would returns error
Use random weight for GST.
Traceback (most recent call last):
File "eval.py", line 65, in <module>
main()
File "eval.py", line 61, in main
run_eval(args)
File "eval.py", line 25, in run_eval
synth.load(args.checkpoint, args.reference_audio)
File "/home/public/gst-tacotron/synthesizer.py", line 29, in load
self.model.initialize(inputs, input_lengths, mel_targets=mel_targets, reference_mel=reference_mel)
File "/home/public/gst-tacotron/models/tacotron.py", line 88, in initialize
style_embeddings = tf.matmul(random_weights, tf.nn.tanh(gst_tokens))
UnboundLocalError: local variable 'gst_tokens' referenced before assignment
from gst-tacotron.
@butterl How many steps do you train? Do you also use the BC2013 or BC2011 data?
If you set use_gst=False
, it means you will not use the style attention, then you must feed reference_audio
to model during eval.
from gst-tacotron.
@syang1993 the training step is 77k
I tried with two experiments on eval:
- use_gst=True,and feed wav from the training set , the out sometimes fail(not aligned and wav is small)
- use_gst=False,and with reference_audio path feed,erro turns out to be as below, seems network could not mach
Loading checkpoint: ./logs-tacotron/model.ckpt-77000
Traceback (most recent call last):
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [384,256] rhs shape= [512,256]
[[Node: save/Assign_152 = Assign[T=DT_FLOAT, _class=["loc:@model/inference/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/memory_layer/kernel, save/RestoreV2/_213)]]
[[Node: save/RestoreV2/_154 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_160_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:169)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "eval.py", line 65, in <module>
main()
File "eval.py", line 61, in main
run_eval(args)
File "eval.py", line 25, in run_eval
synth.load(args.checkpoint, args.reference_audio)
File "/home/public/gst-tacotron/synthesizer.py", line 37, in load
saver.restore(self.session, checkpoint_path)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1755, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [384,256] rhs shape= [512,256]
[[Node: save/Assign_152 = Assign[T=DT_FLOAT, _class=["loc:@model/inference/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/memory_layer/kernel, save/RestoreV2/_213)]]
[[Node: save/RestoreV2/_154 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_160_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:169)]]
Caused by op 'save/Assign_152', defined at:
File "eval.py", line 65, in <module>
main()
File "eval.py", line 61, in main
run_eval(args)
File "eval.py", line 25, in run_eval
synth.load(args.checkpoint, args.reference_audio)
File "/home/public/gst-tacotron/synthesizer.py", line 36, in load
saver = tf.train.Saver()
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1293, in __init__
self.build()
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1302, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1339, in _build
build_save=build_save, build_restore=build_restore)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 796, in _build_internal
restore_sequentially, reshape)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 471, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 161, in restore
self.op.get_shape().is_fully_defined())
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 280, in assign
validate_shape=validate_shape)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 58, in assign
use_locking=use_locking, name=name)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [384,256] rhs shape= [512,256]
[[Node: save/Assign_152 = Assign[T=DT_FLOAT, _class=["loc:@model/inference/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/memory_layer/kernel, save/RestoreV2/_213)]]
[[Node: save/RestoreV2/_154 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_160_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:169)]]
from gst-tacotron.
@butterl Since the model is more complex than tacotron, it may need more data and training steps to get convergence. The flag use_gst
means two different model, you must train an new model with use_gst=False
setting.
from gst-tacotron.
@syang1993 tried with 100K model the output is good, but the eval text would cut by “,”
e.g. "he'd like to help the girl, who’s wearing the red coat." will only output wav before ", " ,and would output all when remove “,”,tried some print
wav = audio.inv_preemphasis(wav)
print("wav len="+str(len(wav)))
end_point = audio.find_endpoint(wav)
wav = wav[:end_point]
print("wav len="+str(len(wav)))
wav len=400600
wav len=102400
seems the wav is cut here by slicense time
from gst-tacotron.
@butterl Without the third line, does the generated wav contain the latter speech? If it contains, maybe we need to modify the value of min_silence_sec (default 0.8)
in find_endpoint
function. Thanks for pointing it out.
from gst-tacotron.
I think the new code works very well. I trained up to 437k
and you can find the samples generated using you reference_audio-2.wav
file in the following link:
https://www.dropbox.com/sh/8cbrog2mtc8h8xw/AABOTLi0j8-06At3zdrHeQNra?dl=0
However, I found out that every time I do eval from the same checkpoint I get different results. Why is it so?
from gst-tacotron.
@lapwing Thanks for sharing , it sounds good. I'm not sure why it generate different results, there may exist a generation issue. I'm on a summer vacation these weeks and cannot test it. I will test it later to find what cause this problem. If you find it out, could you let me know? Thanks.
from gst-tacotron.
@syang1993 - is it possible to get the trained model that you used to generate the samples for eval-87k-r2.zip?
from gst-tacotron.
@ZohaibAhmed Hi, since I'm on a summer vocation these weeks, I will send it to you after I go back to school. Besides, you can get this model using Blizzard 2011 database, it will not take so long time.
from gst-tacotron.
Thanks for your nice work. I have trained the model on Blizzard 2013 dataset. The synthesized files from 185k and 385k checkpoints are available in the following link. I used the samples from LJ-Speech (
LJ001-0001.wav
) and Nancy (nancy.wav
) as reference files for checking the performance. I also included themocdel.checkpoint
files and the audio files at each step (step-185000-audio.wav
,step-385000-audio.wav
).
https://www.dropbox.com/sh/jhcynw65o1tmj7r/AABJN4cBotdbs-A5-Rk89vt0a?dl=0
Any idea on how to improve the shaking voice?
@lapwing could you share the hyper-parameter?
The PT couldn't be reload with default hyper-parameter
Thank you!
from gst-tacotron.
@lapwing can you share the 437k model ?
from gst-tacotron.
@lapwing Thanks for sharing , it sounds good. I'm not sure why it generate different results, there may exist a generation issue. [...]
At the top of eval.py
, before anything else is imported, I put
import random
random.seed(42)
import numpy
numpy.random.seed(42)
from tensorflow import set_random_seed
set_random_seed(42)
This sets a fixed seed for all random number generators that could be involved - and it does the trick. Now, I don't see any random numbers used in the gst-tacotron code itself that would cause randomness at inference time, but maybe something's going on in some imported lib. Anyway, the fixed seeds lead to reproducible results.
from gst-tacotron.
Hello! Thank you for your work! Could you send me the pretrained model please? [email protected]
from gst-tacotron.
Related Issues (20)
- GMM Attention HOT 5
- No clear speech HOT 7
- Some problems when preprocessing ljspeech dataset HOT 1
- Reference Encoder Padding
- where do you insert or import wav file of models voice for training?
- Why use the 'tf.layer.conv1d' for query, key transformation instead of fully connected layer?
- Error in datafeeder.py HOT 1
- Path for Reference Audio HOT 1
- erro in eval.py HOT 1
- Check failed: dnnReLUCreateBackward_F32 HOT 1
- can we synthesis speaker-A's tone with speaker-B's prosody?
- What is in reference audio path?
- Pretrained Weights HOT 1
- Unable to reproduce results
- Mumbling in synthesis HOT 1
- Regarding the trained model
- Using pre-trained model of Keithito's tacotron implementation
- Add style weights when there is no reference audio
- shape of linear_outputs is not same as while training
- training stops many seconds to create new queue of data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gst-tacotron.