Giter Site home page Giter Site logo

bigvgan's Introduction

Neural Source-Filter BigVGAN

Just For Fun

nsf_bigvgan_mel

Dataset preparation

Put the dataset into the data_raw directory according to the following file structure

data_raw
├───speaker0
│   ├───000001.wav
│   ├───...
│   └───000xxx.wav
└───speaker1
    ├───000001.wav
    ├───...
    └───000xxx.wav

Install dependencies

  • 1 software dependency

    pip install -r requirements.txt

  • 2 download pretrained nsf_bigvgan_pretrain_32K.pth, and test

    python nsf_bigvgan_inference.py --config configs/nsf_bigvgan.yaml --model nsf_bigvgan_pretrain_32K.pth --wave test.wav

Data preprocessing

  • 1, re-sampling: 32kHz

    python prepare/preprocess_a.py -w ./data_raw -o ./data_bigvgan/waves-32k

  • 3, extract pitch

    python prepare/preprocess_f0.py -w data_bigvgan/waves-32k/ -p data_bigvgan/pitch

  • 4, extract mel: [100, length]

    python prepare/preprocess_spec.py -w data_bigvgan/waves-32k/ -s data_bigvgan/mel

  • 5, generate training index

    python prepare/preprocess_train.py

data_bigvgan/
│
└── waves-32k
│    └── speaker0
│    │      ├── 000001.wav
│    │      └── 000xxx.wav
│    └── speaker1
│           ├── 000001.wav
│           └── 000xxx.wav
└── pitch
│    └── speaker0
│    │      ├── 000001.pit.npy
│    │      └── 000xxx.pit.npy
│    └── speaker1
│           ├── 000001.pit.npy
│           └── 000xxx.pit.npy
└── mel
     └── speaker0
     │      ├── 000001.mel.pt
     │      └── 000xxx.mel.pt
     └── speaker1
            ├── 000001.mel.pt
            └── 000xxx.mel.pt

Train

  • 1, start training

    python nsf_bigvgan_trainer.py -c configs/nsf_bigvgan.yaml -n nsf_bigvgan

  • 2, resume training

    python nsf_bigvgan_trainer.py -c configs/nsf_bigvgan.yaml -n nsf_bigvgan -p chkpt/nsf_bigvgan/***.pth

  • 3, view log

    tensorboard --logdir logs/

Inference

  • 1, export inference model

    python nsf_bigvgan_export.py --config configs/nsf_bigvgan.yaml --checkpoint_path chkpt/nsf_bigvgan/***.pt

  • 2, extract mel

    python spec/inference.py -w test.wav -m test.mel.pt

  • 3, extract F0

    python pitch/inference.py -w test.wav -p test.csv

  • 4, infer

    python nsf_bigvgan_inference.py --config configs/nsf_bigvgan.yaml --model nsf_bigvgan_g.pth --wave test.wav

    or

    python nsf_bigvgan_inference.py --config configs/nsf_bigvgan.yaml --model nsf_bigvgan_g.pth --mel test.mel.pt --pit test.csv

Augmentation of mel

For the over smooth output of acoustic model, we use gaussian blur for mel when train vocoder

# gaussian blur
model_b = get_gaussian_kernel(kernel_size=5, sigma=2, channels=1).to(device)
# mel blur
mel_b = mel[:, None, :, :]
mel_b = model_b(mel_b)
mel_b = torch.squeeze(mel_b, 1)
mel_r = torch.rand(1).to(device) * 0.5
mel_b = (1 - mel_r) * mel_b + mel_r * mel
# generator
optim_g.zero_grad()
fake_audio = model_g(mel_b, pit)

mel_gaussian_blur

Source of code and References

https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/tree/master/project/01-nsf

https://github.com/mindslab-ai/univnet [paper]

https://github.com/NVIDIA/BigVGAN [paper]

bigvgan's People

Contributors

maxmax2016 avatar 0mis avatar kakaruhayate avatar dlseed avatar dcvalish avatar

Stargazers

JK avatar 无重力广场 avatar Derek McNeil avatar  avatar Mark Robert Miller avatar  avatar Kyunghoon Kim avatar CLORISDEE avatar Nguyễn Lê Phúc Vinh avatar  avatar KIK avatar  avatar  avatar wblgers avatar Yehor Smoliakov avatar  avatar Alef Iury avatar  avatar Adrian Carballo avatar Sofian Mejjoute avatar Qian Liu avatar Yichen Wu avatar Lucas Ramon avatar  avatar liuhuang31 avatar QinHsiu avatar  avatar Yejin avatar Sandalots avatar 爱可可-爱生活 avatar  avatar Bao-Sinh Nguyen avatar Tomáš Nekvinda avatar XiaHan avatar Minsu Kang avatar HAESUNG JEON (chad.plus) avatar Hyeongju Kim avatar Sang-Hoon Lee avatar Rishikesh (ऋषिकेश) avatar liu_yezhou avatar Yuan-Man avatar Justin John avatar  avatar HeyangXue1997 avatar  avatar Wendong Gan avatar

Watchers

 avatar  avatar Justin John avatar Wendong Gan avatar

bigvgan's Issues

good idea

希望生成器和辨别器排列组合找出一个最好的组合来。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.