Giter Site home page Giter Site logo

Comments (10)

yannier912 avatar yannier912 commented on September 3, 2024

🔗加的格式有点问题,但能点开

from tacotronv2_wavernn_chinese.

lturing avatar lturing commented on September 3, 2024

我这里的wavernn中没有用的fmin,fmax这些。
说实话,我没有利用D8 finetune 过 wavernn,但我在adaptive分支给出了方法。
感觉你再多训练一段时间,再听听效果。
不知道,你这边拿这个tts来做什么。如果是学习的话,建议仔细阅读代码,这样某些细节才不会忽略(我研究这个花了三个多月)

from tacotronv2_wavernn_chinese.

yannier912 avatar yannier912 commented on September 3, 2024

@lturing 您好,请问您训练wavernn最终loss到多少呢?
我有个项目需要用到tts,类似有声小说那种,时间不充裕所以研究的不深入,正在学习中~ 之前做图像视频,刚接触语音所以比较懵,感谢您的分享。

from tacotronv2_wavernn_chinese.

lturing avatar lturing commented on September 3, 2024

我晚上回去后看看

from tacotronv2_wavernn_chinese.

yannier912 avatar yannier912 commented on September 3, 2024

嗯嗯好的,谢谢!

from tacotronv2_wavernn_chinese.

lturing avatar lturing commented on September 3, 2024

不好意思哈,master分支下的wavernn的训练记录被我删除了。由于时间的原因,我也没有finetune过wavernn。
推荐你用多人数据集(比如thchs30,或者openslr找多人数据集(不一定要用中文的))训练wavernn,这样训练后的模型的泛化性比较好,能够直接用在新说话人上,不用finetune。
另外,由于wavernn的速度很慢,可以用multi-band melgan声码器。

from tacotronv2_wavernn_chinese.

yannier912 avatar yannier912 commented on September 3, 2024

@lturing
嗯嗯昨天跟另一个帖子下的同学请教,他finetune到680k loss2.3左右效果就比较好了,我也继续跑下再看看。
多人数据集训练wavernn的泛化能力这个方法我后面试一下,如果能训练出一个通用声码器就好了,感谢您的指导!
另外多次看到您提到multi-band melgan声码器,我下载了作者预训练好的multi_band_melgan和parallel_wavegan模型,准备直接和tacotron2结合试一下效果。请问您是自己训练的multi_band_melgan吗?因为作者multi_band_melgan使用的标贝女声数据集,不知对男声合成效果怎么样呢。
感谢耐心指导!~

from tacotronv2_wavernn_chinese.

lturing avatar lturing commented on September 3, 2024

如果用单人数据集训练出的multi-band melgan的话,感觉用在陌生的人声上,效果估计也不太好,也需要finetune(你可以先试试)。
另外,我这里的tacotron预测的mel频谱的范围是[-4, 4],但开源的声码器(比如mutli-band melgan)的输入的mel频谱的范围跟[-4, 4]的范围不一致,需要调整。
我有自己训练的。我发现用vctk(英文多人数据集)训练的multi-band melgan声码器,可以直接拿来合成中文wav

from tacotronv2_wavernn_chinese.

yannier912 avatar yannier912 commented on September 3, 2024

@lturing
嗯了解了,那我都可以用多人数据集训练multi-band melgan和wavernn,看下效果。
谢谢谢谢!~

from tacotronv2_wavernn_chinese.

coranholmes avatar coranholmes commented on September 3, 2024

我这里的wavernn中没有用的fmin,fmax这些。
说实话,我没有利用D8 finetune 过 wavernn,但我在adaptive分支给出了方法。
感觉你再多训练一段时间,再听听效果。
不知道,你这边拿这个tts来做什么。如果是学习的话,建议仔细阅读代码,这样某些细节才不会忽略(我研究这个花了三个多月)

你好,我刚去看了下adaptive分支的README,只看到介绍了怎么finetune Tacotron好像没有讲到怎么finetune wavernn呀,可以简单介绍一下吗?谢谢!

from tacotronv2_wavernn_chinese.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.