Giter Site home page Giter Site logo

facevc's Introduction

FaceVC

This is the official implementation for "Face-based Voice Conversion: Learning the Voice behind a Face" (FaceVC).

The audio demo for FaceVC can be found at https://facevc.github.io/.

Data Preprocessing

For generating face embedding, please refer to https://github.com/timesler/facenet-pytorch.

For generating speaker embedding and spectrogram, please refer to https://github.com/auspicious3000/autovc.

In-the-wild data

  1. Prepare a data list of all the training utterance path (for making speaker dictionary).
  2. Prepare face embedding / speaker embedding / spectrogram of in-the-wild data.
  3. Set following path in data_loader_noisy.py.
spk_lst = ''
root_face = ''
root_speech = ''
root_mel = ''

Lab-collected data

  1. Prepare a data list of all the training utterance path (for making speaker dictionary).
  2. Prepare speaker embedding / spectrogram of lab-collected data.
  3. Set following path in data_loader_clean.py.
spk_lst = ''
root_speaker = ''
root_mel = ''

Training

  1. Create environment.
$ python -m venv env
$ source env/bin/activate
$ pip install -r requirements.txt
  1. Set configuration in main.py according to the training stage.
    parser.add_argument('--stage', type=int, default=3)

    # Model configuration.
    ### Generator for stage I or pseudo generator for stage III ###
    ### Note: Weight of reconstruction loss is set to 1. ###
    parser.add_argument('--lambda_cd_pse', type=float, default=0.1, help='weight for hidden code loss')#1
    parser.add_argument('--lambda_ge2e_pse', type=float, default=0.05, help='weight for ge2e loss')
    parser.add_argument('--dim_neck_pse', type=int, default=32)
    parser.add_argument('--dim_emb_pse', type=int, default=512)
    parser.add_argument('--dim_pre_pse', type=int, default=512)
    parser.add_argument('--freq_pse', type=int, default=32)

    ### Generator for stage II or referance generator for stage III ###
    ### Note: Weight of reconstruction loss is set to 1. ###
    parser.add_argument('--lambda_cd_ref', type=float, default=0.1, help='weight for hidden code loss')#1
    parser.add_argument('--dim_neck_ref', type=int, default=32)
    parser.add_argument('--dim_emb_ref', type=int, default=256)
    parser.add_argument('--dim_pre_ref', type=int, default=512)
    parser.add_argument('--freq_ref', type=int, default=32)

    # Training configuration.
    ### Loading pretrained pseudo generator (from stage I) / referance generator (from stage II) for stage III ###
    parser.add_argument('--pseG_path', type=str, default='pretrain_VC/pseG/G.ckpt', help='pseG model name')
    parser.add_argument('--refG_path', type=str, default='pretrain_VC/refG/G.ckpt', help='refG model name')

    parser.add_argument('--batch_size', type=int, default=2, help='mini-batch size')
    parser.add_argument('--num_iters', type=int, default=2000000, help='number of total iterations')
    parser.add_argument('--len_crop', type=int, default=128, help='dataloader output sequence length')
    parser.add_argument('--clip', type=int, default=1, help='clip value of gradient clip')
    parser.add_argument('--model_id', type=str, default='test', help='model name')
    
    # Logging and checkpointing.
    parser.add_argument('--log_step', type=int, default=10)
    parser.add_argument('--save_step', type=int, default=1000)
  1. Run main.py
$ python main.py
  1. Tensorboard
$ tensorboard --logdir log --host tunnel_host --port tunnel_port

Testing

  1. Set configuration in test_conversion.py according to the training stage.
parser.add_argument('--stage', type=int, default=4)
parser.add_argument('--outdir', type=str, default='reb_stage3_nofixGpse_tune1')

# stage I  : fill in G_pse_path
# stage II : fill in G_ref_path
# stage III: fill in G_pse_path, G_ref_path, W_path
parser.add_argument('--G_pse_path', type=str, default='', help='model path')
parser.add_argument('--G_ref_path', type=str, default='', help='model path')
parser.add_argument('--W_path', type=str, default='', help='model path')
  1. Run test_conversion.py
$ python test_conversion.py
  1. vocoder.py
$ python vocoder.py

facevc's People

Contributors

hsiaohan0827 avatar

Stargazers

Bobo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.