Giter Site home page Giter Site logo

betavae_vc's Introduction

BetaVAE_VC

This repo contains code for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE" in SLT 2022.

0. Setup Conda Environment

conda env create -f environment.yaml
conda activate betavae-vc-env

1. Data preprocessing

  • Download corpus
  1. English: VCTK
  2. Mandarin: AISHELL3
  • Modify the paths specified in configs/haparams.py: corpus_dir for both VCTK and AiShell3, dataset_dir for extracted features and TFRecord files.
  • Prepare the dataset for training:
python preprocess.py

2. Training

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --out_dir ./outputs --data_dir /path/to/save/features/tfrecords

3. Inference

# inference from mels
# test-mels.txt contains list of paths for mel-spectrograms with *.npy format, one path per line
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference-from-mel.py --ckpt_path ./outputs/models/ckpt-500 --test_dir outputs/tests --src_mels test-mels.txt --ref_mels test-mels.txt

# inference from wavs
# test-wavs.txt contains list of paths for speech with *.wav format, one path per line
CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference-from-wav.py --ckpt_path ./outputs/models/ckpt-500 --test_dir outputs/tests --src_wavs test-wavs.txt --ref_wavs test-wavs.txt

4. Latent extraction

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python feature_extraction.py --data_dir /path/to/save/features/tfrecords --save_dir ./outputs/features --ckpt_path ./outputs/models/ckpt-300

5. EER computation based on the extracted latents

# compute EER using content embeddings
python tests/compute_eer.py --data_dir ./outputs/features/EN --mode content
# compute EER using speaker embeddings
python tests/compute_eer.py --data_dir ./outputs/features/EN --mode spk

Cite this work

@inproceedings{slt2022_hui_disentanle,
  author    = {Hui Lu and
               Disong Wang and
               Xixin Wu and
               Zhiyong Wu and
               Xunying Liu and
               Helen Meng},
  title     = {Disentangled Speech Representation Learning for One-Shot Cross-Lingual
               Voice Conversion Using Beta-VAE},
  booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2022, Doha, Qatar,
               January 9-12, 2023},
  pages     = {814--821},
  publisher = {{IEEE}},
  year      = {2022},
  doi       = {10.1109/SLT54892.2023.10022787},
}

betavae_vc's People

Contributors

light1726 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

betavae_vc's Issues

some questions about model inference

Is the converted audio in the sample obtained using inference-from-mel or inference-from-wav? The paper mentions using hifigan as the vocoder and that hifigan was also trained on the training dataset, but it doesn't seem to be represented in the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.