Warning
Please note that this project is currently under active development and is not yet operational. Features may be incomplete, and functionality is not guaranteed. Development will be slow for a while as I am busy with classes & working on other projects.
Video to video translation and dubbing via few shot voice cloning & audio-based lip sync.
See the demo »
Report Bug
·
Request Feature
Table of Contents
demo.mp4
Currently supports English and Chinese
- Vocal isolation: Isolation of vocals from source video using deep neural networks
- Transcription: Transcription of source video via whisper
- Translation: Translation from source video via CTranslate2 and OPUS-MT
- Few-shot voice cloning: Realistic voice cloning and TTS with as little as 5 seconds of audio from source video
- Audio-based lip sync: Alter faces in source video to match translated audio
Currently only tested in a Windows 11 environment with Python 3.9, PyTorch 2.1.1, CUDA 11.8.
- Python 3.9
- Anaconda (recommended)
- Clone the repo
git clone https://github.com/huangjackson/v2vt.git cd v2vt
- Create a conda environment (recommended)
conda create -n v2vt python=3.9 conda activate v2vt
- Install ffmpeg
conda install ffmpeg
- Install PyTorch and CUDA
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia
- Install requirements from requirements.txt
pip install -r requirements.txt
- Navigate to directory
cd v2vt
- Run CLI
python v2vt.py --help
Listed generally in order of priority:
- Vocal isolation
- Transcription
- Translation
- Voice cloning/TTS
- *Match speed of original video (#3)
- Multiple GPUs support
- Support training & using multiple models
- Lip sync
- Additional languages (currently only en & zh)
- Improve overall speed
- Improve logging (#4)
- Create Colab
- Create live demo on HuggingFace
See the open issues for a full list of proposed features (and known issues).
Any contributions are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
- Fork the Project
- Create your Feature Branch (
git checkout -b feat/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feat/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
See individual files and folders for any other licenses credited.
Jackson Huang - [email protected]
Project Link: https://github.com/huangjackson/v2vt
Special thanks to the following people and projects:
- GPT-SoVITS
- video-retalking
- CTranslate2
- ultimatevocalremovergui
- KUIELab & Woosung Choi - For the original MDX-Net music demixing model
- KimberleyJensen - For the Kim Vocal 2 MDX-Net model
- Opus-MT - For translation models
- faster-whisper