VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion (Interspeech 2021)
Paper(coming soon) | Pre-trained models | Demo
Diagram of the VQMIVC system.
- Quick start with pre-trained models
Python 3.6 is used, other requirements are listed in 'requirements.txt'
pip install -r requirements.txt
- Step1. Data preparation & preprocessing
-
Put VCTK corpus under directory: 'Dataset/'
-
Training/testing speakers split & feature (mel+lf0) extraction:
python preprocess.py
- Step2. model training:
-
Training with mutual information minimization (MIM):
python train.py use_CSMI=True use_CPMI=True use_PSMI=True
-
Training without MIM:
python train.py use_CSMI=False use_CPMI=False use_PSMI=False
- Step3. model testing:
-
Put PWG vocoder under directory: 'vocoder/'
-
Inference with model trained with MIM:
python convert.py checkpoint=checkpoints/useCSMITrue_useCPMITrue_usePSMITrue_useAmpTrue/model.ckpt-500.pt
-
Inference with model trained without MIM:
python convert.py checkpoint=checkpoints/useCSMIFalse_useCPMIFalse_usePSMIFalse_useAmpTrue/model.ckpt-500.pt