Giter Site home page Giter Site logo

fre-gan-pytorch's Introduction

Fre-GAN Vocoder

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Training:

python train.py --config config.json

Citation:

@misc{kim2021fregan,
      title={Fre-GAN: Adversarial Frequency-consistent Audio Synthesis}, 
      author={Ji-Hoon Kim and Sang-Hoon Lee and Ji-Hyun Lee and Seong-Whan Lee},
      year={2021},
      eprint={2106.02297},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Note

  • For more complete and end to end Voice cloning or Text to Speech (TTS) toolbox please visit Deepsync Technologies.

References:

fre-gan-pytorch's People

Contributors

rishikksh20 avatar thepowerfuldeez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

fre-gan-pytorch's Issues

Training multi-speaker

Have you trained multi-speaker fre-gan? What is the better config on multi-speaker dataset?

Pre-trained model

Is there a pre-trained model available? It would save many hours of initial training. Many thanks.

comparison with univnet

Hi! How this work compares with UnivNet for which one you already implemented code: https://github.com/rishikksh20/UnivNet-pytorch
This paper is a little bit newer but afaik they're more concerned about generalizability of model for unseen speakers whlie this work focuses on overall quality (especially in high frequences)
can you maybe elaborate?

do nn upsample before mel condition

for generator code line 137:

if i >= self.cond_level: 
                mel = self.cond_up[i - self.cond_level](mel)
                x += mel
if i > self.cond_level:
    if output is None:
        output = self.res_output[i - self.cond_level - 1](x)
    else:
        output = self.res_output[i - self.cond_level - 1](output)

in the code, for the nn upsample input is: mel condition + resblock output.

image

but in the paper, nn upsample input only is resblock output or the last nn upsample output;

image

so, Is this more reasonable?

if i > self.cond_level:
    if output is None:
        output = self.res_output[i - self.cond_level - 1](x)
    else:
        output = self.res_output[i - self.cond_level - 1](output)
if i >= self.cond_level: 
                mel = self.cond_up[i - self.cond_level](mel)
                x += mel

Inconsistency with paper

https://arxiv.org/pdf/2106.02297.pdf

In section 2.3
"After each level of DWT, all the frequency sub-bands are channel-wise concatenated and passed to convolutional layers"


if i == 0:
x = torch.cat([x, x_d1], dim=2)
if i == 1:
x = torch.cat([x, x_d2], dim=2)
i = i + 1

You are concatenating on the length dim resulting in an odd looking tensor where the first half is audio features and the 2nd half is DWT features, and local waveform/DWT information can't mix properly.

Is there any reason for this? I feel very confused looking at this, but you've done it twice so I assume there's some reason for this.

compare with hifigan

Hello, it is a great work. When use the predict mel spectrum , which is better between fregan and hifigan ?
Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.