FAcodec

This repository is a Pytorch implementation for the training of FAcodec, which was proposed in paper NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Implementation is slightly different from the original paper. While the original version is based on phoneme prediction, which requires text transcription and phoneme-audio alignment, this implementation is based on predicting semantic latent, eliminating the need for this difficult part in data preparation (alignment is especially difficult).

Current implementation is only experimented with VCTK dataset but it has already demonstrated successful speech reconstruction and content, timbre & prosody disentanglement abilities.

Requirements

Python 3.10

Dataset

prepare your dataset annotation in a .txt file with each line containing:

<absolute_path_to_audio>\t<speaker_id>\t<language>\t<transcript>\t<phonemized_transcript>

An example of the dataset annotation file is provided in data/val.txt. Note that speaker ids, languages or transcripts can be omitted if you do not have them, they are just placeholders. Only make sure audio paths are correct.
Put your prepared data under ./data/train.txt and ./data/val.txt.

Training

Download the semantic teacher (currently we are using SpeechTokenizer) from here and put the checkpoint and config file under ./w2v_models/.
Then run the following command:

accelerate launch train.py

Default tensorboard log directory is ./Models/run/tensorboard
You will need at least 100k steps until you observe successful disentanglement.

Evaluation

Results will be logged to tensorboard. Default tensorboard log directory is ./Models/run/eval/tensorboard.

python eval.py

Inference

(To be implemented)

Pretrained Models

Currently, training is only implemented with VCTK dataset. Pretrained model's generalization ability is not good but can demonstrate successful speech reconstruction and disentanglement.
It will be released soon.

ctwgl / facodec Goto Github PK

facodec's Introduction

FAcodec

Requirements

Dataset

Training

Evaluation

Inference

Pretrained Models

Appendix: Loss curves during training

facodec's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent