Giter Site home page Giter Site logo

megactor's Introduction

MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

Shurong Yang*, Huadong Li*, Juhao Wu*, Minhao Jing*†, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan

MEGVII Technology

*Equal contribution Lead this project Corresponding author



News & TODO List

  • [✅2024.05.24] Inference settings are released.

  • [✅2024.05.31] Arxiv paper are released.

  • [✅2024.06.13] Data curation pipeline are released .

  • [❌] Training setup to be released.

MegActor Features:

Usability: animates a portrait with video while ensuring consistent motion.

Reproducibility: fully open-source and trained on publicly available datasets.

Efficiency: ⚡200 V100 hours of training to achieve pleasant motions on portraits.

Overview

Model

MegActor is an intermediate-representation-free portrait animator that uses the original video, rather than intermediate features, as the driving factor to generate realistic and vivid talking head videos. Specifically, we utilize two UNets: one extracts the identity and background features from the source image, while the other accurately generates and integrates motion features directly derived from the original videos. MegActor can be trained on low-quality, publicly available datasets and excels in facial expressiveness, pose diversity, subtle controllability, and visual quality.

Pre-generated results

demo.mp4
demo4.mp4
demo6.mp4

Preparation

  • Environments

    Detailed environment settings should be found with environment.yaml

    • Linux
      conda env create -f environment.yaml
      pip install -U openmim
      
      mim install mmengine
      mim install "mmcv>=2.0.1"
      mim install "mmdet>=3.1.0"
      mim install "mmpose>=1.1.0"
      
      conda install -c conda-forge cudatoolkit-dev -y
      
  • Dataset.

    • For a detailed description of the data processing procedure, please refer to the accompanying below. Data Process Pipeline
  • Pretrained weights

    Please find our pretrained weights at https://huggingface.co/HVSiniX/RawVideoDriven. Or simply use

    git clone https://huggingface.co/HVSiniX/RawVideoDriven && ln -s RawVideoDriven/weights weights

Training

To be released.

Inference

Currently only single-GPU inference is supported.

CUDA_VISIBLE_DEVICES=0 python eval.py --config configs/infer12_catnoise_warp08_power_vasa.yaml --source {source image path} --driver {driving video path}

Demo

For gradio interface, please run

python demo/run_gradio.py

BibTeX

@misc{yang2024megactor,
      title={MegActor: Harness the Power of Raw Video for Vivid Portrait Animation}, 
      author={Shurong Yang and Huadong Li and Juhao Wu and Minhao Jing and Linze Li and Renhe Ji and Jiajun Liang and Haoqiang Fan},
      year={2024},
      eprint={2405.20851},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

Many thanks to the authors of mmengine, MagicAnimate, Controlnet_aux, and Detectron2.

Contact

If you have any questions, feel free to open an issue or contact us at [email protected], [email protected] or [email protected].

megactor's People

Contributors

evephil avatar lhd777 avatar bugwholesaler avatar yangshurong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.