Giter Site home page Giter Site logo

shubhampachori12110095 / pytorchwavenetvocoder Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kan-bayashi/pytorchwavenetvocoder

0.0 1.0 0.0 153 KB

WaveNet-Vocoder implementation with pytorch

Home Page: https://kan-bayashi.github.io/WaveNetVocoderSamples/

License: Apache License 2.0

Shell 29.04% Python 49.19% Perl 21.77%

pytorchwavenetvocoder's Introduction

PYTORCH-WAVENET-VOCODER

Build Status

This repository is the wavenet-vocoder implementation with pytorch.

Requirements

  • cuda 8.0
  • python 3.6
  • virtualenv

Recommend to use the GPU with 10GB> memory.

Setup

$ git clone https://github.com/kan-bayashi/PytorchWaveNetVocoder.git
$ cd PytorchWaveNetVocoder/tools
$ make -j

Run example

All examples are based on kaldi-style recipe.

# build SD model
$ cd egs/arctic/sd
$ ./run.sh 

# build SI-CLOSE model
$ cd egs/arctic/si-close
$ ./run.sh 

# build SI-OPEN model
$ cd egs/arctic/si-open
$ ./run.sh

If slurm is installed in your servers, you can run recipes with slurm.

$ cd egs/arctic/sd

# edit configuration
$ vim cmd.sh # please edit as follows
---
# for local
# export train_cmd="run.pl"
# export cuda_cmd="run.pl --gpu 1"

# for slurm (you can change configuration file "conf/slurm.conf")
export train_cmd="slurm.pl --config conf/slurm.conf"
export cuda_cmd="slurm.pl --gpu 1 --config conf/slurm.conf"
---

$ vim conf/slurm.conf # edit <your_partition_name>
---
command sbatch --export=PATH  --ntasks-per-node=1
option time=* --time $0
option mem=* --mem-per-cpu $0
option mem=0
option num_threads=* --cpus-per-task $0 --ntasks-per-node=1
option num_threads=1 --cpus-per-task 1  --ntasks-per-node=1
default gpu=0
option gpu=0 -p <your_partion_name>
option gpu=* -p <your_partion_name> --gres=gpu:$0 --time 10-00:00:00
---

# run the recipe
$ ./run.sh

Finally, you can get the generated wav files in exp/train_*/wav_restored.

Use pre-trained model to decode your own data

To synthesize your own data, things what you need are as follows:

- checkpoint-final.pkl (model parameter file)
- model.conf (model configuration file)
- stats.h5 (feature statistics file)
- *.wav (your own wav file)

The procedure is as follows:

$ cd egs/arctic/si-close

# download pre-trained model which trained with 6 arctic speakers
$ wget "https://www.dropbox.com/s/xt7qqmfgamwpqqg/si-close_lr1e-4_wd0_bs20k_ns_up.zip?dl=0" -O si-close_lr1e-4_wd0_bs20k_ns_up.zip

# unzip 
$ unzip si-close_lr1e-4_wd0_bs20k_ns_up.zip

# make filelist of your own wav files
$ find <your_wav_dir> -name "*.wav" > wav.scp

# feature extraction
$ . ./path.sh
$ feature_extract.py \
    --waveforms wav.scp \
    --wavdir wav/test \
    --hdf5dir hdf5/test \
    --fs 16000 \
    --shiftms 5 \
    --minf0 <set_appropriate_value> \
    --maxf0 <set_appropriate_value> \
    --mcep_dim 24 \
    --mcep_alpha 0.41 \
    --highpass_cutoff 70 \
    --fftl 1024 \
    --n_jobs 1 
    
# make filelist of feature file
$ find hdf5/test -name "*.h5" > feats.scp
    
# decode with pre-trained model
$ decode.py \
    --feats feats.scp \
    --stats si-close_lr1e-4_wd0_bs20k_ns_up/stats.h5 \
    --outdir si-close_lr1e-4_wd0_bs20k_ns_up/wav \
    --checkpoint si-close_lr1e-4_wd0_bs20k_ns_up/checkpoint-final.pkl \
    --config si-close_lr1e-4_wd0_bs20k_ns_up/model.conf \
    --fs 16000 \
    --n_jobs 1 \
    --n_gpus 1

# make filelist of generated wav file
$ find si-close_lr1e-4_wd0_bs20k_ns_up/wav -name "*.wav" > wav_generated.scp

# restore noise shaping
$ noise_shaping.py \
    --waveforms wav_generated.scp \
    --stats si-close_lr1e-4_wd0_bs20k_ns_up/stats.h5 \
    --writedir si-close_lr1e-4_wd0_bs20k_ns_up/wav_restored \
    --fs 16000 \
    --shiftms 5 \
    --fftl 1024 \
    --mcep_dim_start 2 \
    --mcep_dim_end 27 \
    --mcep_alpha 0.41 \
    --mag 0.5 \
    --inv false \
    --n_jobs 1

Finally, you can hear the generated wav files in si-close_lr1e-4_wd0_bs20k_ns_up/wav_restored.

Results

Generated examples are available from here.

References

Please cite the following articles.

@article{hayashi2018sp,
  title={複数話者WaveNetボコーダに関する調査}.
  author={林知樹 and 小林和弘 and 玉森聡 and 武田一哉 and 戸田智基},
  journal={電子情報通信学会技術研究報告},
  year={2018}
}
@inproceedings{hayashi2017multi,
  title={An Investigation of Multi-Speaker Training for WaveNet Vocoder},
  author={Hayashi, Tomoki and Tamamori, Akira and Kobayashi, Kazuhiro and Takeda, Kazuya and Toda, Tomoki},
  booktitle={Proc. ASRU 2017},
  year={2017}
}
@inproceedings{tamamori2017speaker,
  title={Speaker-dependent WaveNet vocoder},
  author={Tamamori, Akira and Hayashi, Tomoki and Kobayashi, Kazuhiro and Takeda, Kazuya and Toda, Tomoki},
  booktitle={Proceedings of Interspeech},
  pages={1118--1122},
  year={2017}
}

Author

Tomoki Hayashi @ Nagoya University
e-mail:[email protected]

pytorchwavenetvocoder's People

Contributors

kan-bayashi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.