Giter Site home page Giter Site logo

eval form checkpoints about gst-tacotron HOT 30 OPEN

syang1993 avatar syang1993 commented on June 11, 2024
eval form checkpoints

from gst-tacotron.

Comments (30)

syang1993 avatar syang1993 commented on June 11, 2024 3

@lapwing I updated some codes today, with which the quality of generated speech is much better than before (87K steps). It aslo works with small reduce factor.
eval-87k-r2.zip

I will also eval the performance these days.

from gst-tacotron.

butterl avatar butterl commented on June 11, 2024 1

@syang1993 Thanks! will wait to see good result.
BTW, could we put the eval mel to r9y9‘s wavenet ?

from gst-tacotron.

peter05010402 avatar peter05010402 commented on June 11, 2024 1

@ZohaibAhmed Hi, since I'm on a summer vocation these weeks, I will send it to you after I go back to school. Besides, you can get this model using Blizzard 2011 database, it will not take so long time.

@syang1993 Hi, Could you send me the trained model that you used to generate the samples for eval-87k-r2.zip? Thank you!

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

Yes the voice from the link is a little shaking. Can you share the hyper-parameter and the alignments of the test setences of your experiments?
Besides, I found use the character-leval inputs is better than phoneme-level in my earlier experiments, though the paper used the phoneme inputs.

from gst-tacotron.

fazlekarim avatar fazlekarim commented on June 11, 2024

the new update seems to have set cmu-dict to false. Is that what you used to get those results?

from gst-tacotron.

marymirzaei avatar marymirzaei commented on June 11, 2024

Thank you very much! sorry for the delay... I have uploaded the files you wanted to the link above, if you still want to take a look.
The quality is very good! Do you have any plans to release the checkpoints?

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@fazlekarim Yes, the hparams in the repo is 100% matching to my experiments.

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@lapwing I didn't see the setting in your link, anyway I guess you can try the new code.

Since I only have one single GPU, it will take several days to test the new code. I'm very grateful if you can help to test the performance. As for now, I find the quality is better, but the style is learned slower than before(100K steps). I will continue training to see if it can get stable results with and without style attention. Also I will upload the checkpoints and new samples once I finished theses experiments in several days.

from gst-tacotron.

marymirzaei avatar marymirzaei commented on June 11, 2024

Sure! I will do the same and will upload the results so that we can compare. Thanks for your nice work!

from gst-tacotron.

butterl avatar butterl commented on June 11, 2024

@syang1993 are those samples generated directly from Tacotron ? the audio quality is amazing

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@butterl Which sample do you mean? The samples attached in this issues were generated from gst-tacotron repo directly using blizzard 2011 data. The samples in the demo page were also directly genereted from gst-tacotron using blizzard2013 data. I also did experiments with tacotron using bc2011 data, the samples can be found in keithito/tacotron#182

from gst-tacotron.

butterl avatar butterl commented on June 11, 2024

@syang1993 thanks for reaching out, I‘ve tried keithito/tacotron and Rayhane-mamah/Tacotron-2 all seems generate wav with shake & echo like @lapwing’s sample and even worse(even with wavenet 300K as vcoder), and your sample wav attached is much clear, and you posted “I updated some codes today” 15 days ago, but I do not find the exactly patch.

Will try with this amazing repo to reproduce

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@butterl Maybe you can try the modified keithito's tacotron in my repo, which is forked from the original one and fixed the issues to support small reduce factor. @fazlekarim may have tried this repo, I'm not sure whether he get good results. And the commit of "I updated some code today" is ba10ee1

from gst-tacotron.

fazlekarim avatar fazlekarim commented on June 11, 2024

@butterl I was satisfied with my results. I can show them to you if you are interested?

from gst-tacotron.

butterl avatar butterl commented on June 11, 2024

@fazlekarim thanks for reaching out, I'm be very interested in your sample , because mine is much worse with other repo even trained to 400K, and now I will switch to this one and give feed back

from gst-tacotron.

fazlekarim avatar fazlekarim commented on June 11, 2024

This is the only one I have saved in this computer. Let me know what you think about it.

eval-227300_ref-original.zip

from gst-tacotron.

butterl avatar butterl commented on June 11, 2024

@fazlekarim thanks for reaching out , the wav is good ,and seem have more shaking than eval-87k-r2.zip @syang1993 shared

@syang1993 I trained in my machine and the result is good, but for eval it failed some times (2/3)
image

and with use_gst=False the eval would returns error

Use random weight for GST.
Traceback (most recent call last):
  File "eval.py", line 65, in <module>
    main()
  File "eval.py", line 61, in main
    run_eval(args)
  File "eval.py", line 25, in run_eval
    synth.load(args.checkpoint, args.reference_audio)
  File "/home/public/gst-tacotron/synthesizer.py", line 29, in load
    self.model.initialize(inputs, input_lengths, mel_targets=mel_targets, reference_mel=reference_mel)
  File "/home/public/gst-tacotron/models/tacotron.py", line 88, in initialize
    style_embeddings = tf.matmul(random_weights, tf.nn.tanh(gst_tokens))
UnboundLocalError: local variable 'gst_tokens' referenced before assignment

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@butterl How many steps do you train? Do you also use the BC2013 or BC2011 data?

If you set use_gst=False, it means you will not use the style attention, then you must feed reference_audio to model during eval.

from gst-tacotron.

butterl avatar butterl commented on June 11, 2024

@syang1993 the training step is 77k
I tried with two experiments on eval:

  1. use_gst=True,and feed wav from the training set , the out sometimes fail(not aligned and wav is small)
  2. use_gst=False,and with reference_audio path feed,erro turns out to be as below, seems network could not mach
Loading checkpoint: ./logs-tacotron/model.ckpt-77000
Traceback (most recent call last):
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
    return fn(*args)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
    target_list, status, run_metadata)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [384,256] rhs shape= [512,256]
         [[Node: save/Assign_152 = Assign[T=DT_FLOAT, _class=["loc:@model/inference/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/memory_layer/kernel, save/RestoreV2/_213)]]
         [[Node: save/RestoreV2/_154 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_160_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:169)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "eval.py", line 65, in <module>
    main()
  File "eval.py", line 61, in main
    run_eval(args)
  File "eval.py", line 25, in run_eval
    synth.load(args.checkpoint, args.reference_audio)
  File "/home/public/gst-tacotron/synthesizer.py", line 37, in load
    saver.restore(self.session, checkpoint_path)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1755, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
    options, run_metadata)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [384,256] rhs shape= [512,256]
         [[Node: save/Assign_152 = Assign[T=DT_FLOAT, _class=["loc:@model/inference/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/memory_layer/kernel, save/RestoreV2/_213)]]
         [[Node: save/RestoreV2/_154 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_160_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:169)]]

Caused by op 'save/Assign_152', defined at:
  File "eval.py", line 65, in <module>
    main()
  File "eval.py", line 61, in main
    run_eval(args)
  File "eval.py", line 25, in run_eval
    synth.load(args.checkpoint, args.reference_audio)
  File "/home/public/gst-tacotron/synthesizer.py", line 36, in load
    saver = tf.train.Saver()
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1293, in __init__
    self.build()
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1302, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1339, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 796, in _build_internal
    restore_sequentially, reshape)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 471, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 161, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 280, in assign
    validate_shape=validate_shape)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 58, in assign
    use_locking=use_locking, name=name)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
    op_def=op_def)
  File "/home/public/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [384,256] rhs shape= [512,256]
         [[Node: save/Assign_152 = Assign[T=DT_FLOAT, _class=["loc:@model/inference/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/memory_layer/kernel, save/RestoreV2/_213)]]
         [[Node: save/RestoreV2/_154 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_160_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"](save/RestoreV2:169)]]

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@butterl Since the model is more complex than tacotron, it may need more data and training steps to get convergence. The flag use_gst means two different model, you must train an new model with use_gst=False setting.

from gst-tacotron.

butterl avatar butterl commented on June 11, 2024

@syang1993 tried with 100K model the output is good, but the eval text would cut by “,”
e.g. "he'd like to help the girl, who’s wearing the red coat." will only output wav before ", " ,and would output all when remove “,”,tried some print

    wav = audio.inv_preemphasis(wav)
    print("wav len="+str(len(wav)))
    end_point = audio.find_endpoint(wav)
    wav = wav[:end_point]
    print("wav len="+str(len(wav)))

wav len=400600
wav len=102400
seems the wav is cut here by slicense time

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@butterl Without the third line, does the generated wav contain the latter speech? If it contains, maybe we need to modify the value of min_silence_sec (default 0.8) in find_endpoint function. Thanks for pointing it out.

from gst-tacotron.

marymirzaei avatar marymirzaei commented on June 11, 2024

I think the new code works very well. I trained up to 437k and you can find the samples generated using you reference_audio-2.wav file in the following link:
https://www.dropbox.com/sh/8cbrog2mtc8h8xw/AABOTLi0j8-06At3zdrHeQNra?dl=0

However, I found out that every time I do eval from the same checkpoint I get different results. Why is it so?

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@lapwing Thanks for sharing , it sounds good. I'm not sure why it generate different results, there may exist a generation issue. I'm on a summer vacation these weeks and cannot test it. I will test it later to find what cause this problem. If you find it out, could you let me know? Thanks.

from gst-tacotron.

ZohaibAhmed avatar ZohaibAhmed commented on June 11, 2024

@syang1993 - is it possible to get the trained model that you used to generate the samples for eval-87k-r2.zip?

from gst-tacotron.

syang1993 avatar syang1993 commented on June 11, 2024

@ZohaibAhmed Hi, since I'm on a summer vocation these weeks, I will send it to you after I go back to school. Besides, you can get this model using Blizzard 2011 database, it will not take so long time.

from gst-tacotron.

peter05010402 avatar peter05010402 commented on June 11, 2024

Thanks for your nice work. I have trained the model on Blizzard 2013 dataset. The synthesized files from 185k and 385k checkpoints are available in the following link. I used the samples from LJ-Speech (LJ001-0001.wav) and Nancy (nancy.wav) as reference files for checking the performance. I also included the mocdel.checkpoint files and the audio files at each step (step-185000-audio.wav, step-385000-audio.wav).
https://www.dropbox.com/sh/jhcynw65o1tmj7r/AABJN4cBotdbs-A5-Rk89vt0a?dl=0
Any idea on how to improve the shaking voice?

@lapwing could you share the hyper-parameter?
The PT couldn't be reload with default hyper-parameter
Thank you!

from gst-tacotron.

ishandutta2007 avatar ishandutta2007 commented on June 11, 2024

@lapwing can you share the 437k model ?

from gst-tacotron.

renerocksai avatar renerocksai commented on June 11, 2024

@lapwing Thanks for sharing , it sounds good. I'm not sure why it generate different results, there may exist a generation issue. [...]

At the top of eval.py, before anything else is imported, I put

import random
random.seed(42)
import numpy
numpy.random.seed(42)
from tensorflow import set_random_seed
set_random_seed(42)

This sets a fixed seed for all random number generators that could be involved - and it does the trick. Now, I don't see any random numbers used in the gst-tacotron code itself that would cause randomness at inference time, but maybe something's going on in some imported lib. Anyway, the fixed seeds lead to reproducible results.

from gst-tacotron.

luantunez avatar luantunez commented on June 11, 2024

Hello! Thank you for your work! Could you send me the pretrained model please? [email protected]

from gst-tacotron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.