theericma / otavatar Goto Github PK

This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023].

Python 78.81% Shell 0.08% C++ 5.19% Cuda 15.92%

cvpr2023 deep-learning deepfake face-animation face-reenactment image-animation motion-transfer pose-transfer pytorch talking-head

otavatar's Introduction

OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering

Paper | Demo

Update

April.30: The model weight is released. The dataset is also available in Google Drive, see below for detail.

April.4: The preprocessed dataset is released, please see the Data preparation section. Some missing files are also uploaded.

Get started

Environment Setup

git clone [email protected]:theEricMa/OTAvatar.git
cd OTAvatar
conda env create -f environment.yml
conda activate otavatar

Pre-trained Models

Download and copy EG3D FFHQ model ffhqrebalanced512-64.pth [Baidu Netdisk][Google Drive] to the pretrained directory. It is the ffhqrebalanced512-64.pkl file obtained from webpage, and converted to .pth format using the pkl2pth script.

Download arcface_resnet18.pth and save to the pretrained directory.

Data preparation

We upload the processed dataset hdtf_lmdb_inv in [Baidu Netdisk][Google Drive]. In the root directory,

mkdir datasets
mv <your hdtf_lmdb_inv path> datasets/

Generally the processing scripts is a mixture of that in PIRenderer and ADNeRF. We plan to further open a new repo to upload our revised preocessing script.

Face Animation

Create the folder result/otavatarif it does not exist. Place the model downloaded from [Baidu Netdisk][Google Drive] under this directory. Run,

export CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=1 --master_port 12345 inference_refine_1D_cam.py \
--config ./config/otavatar.yaml \
--name otavatar \
--no_resume \
--which_iter 2000 \
--image_size 512 \
--ws_plus \
--cross_id \
--cross_id_target WRA_EricCantor_000 \
--output_dir ./result/otavatar/evaluation/cross_ws_plus_WRA_EricCantor_000

To animate each identity given the motion from WRA_EricCantor_000.

Or simply run,

sh scripts/inference.sh

Start Training

Run,

export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m torch.distributed.launch --nproc_per_node=4 --master_port 12346 train_inversion.py \
--config ./config/otavatar.yaml \
--name otavatar

Or simply run,

sh scripts/train.sh

Acknowledgement

We appreciate the model or code from EG3D, PIRenderer, StyleHEAT, EG3D-projector.

Citation

If you find this work helpful, please cite:

@article{ma2023otavatar,
  title={OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering},
  author={Ma, Zhiyuan and Zhu, Xiangyu and Qi, Guojun and Lei, Zhen and Zhang, Lei},
  journal={arXiv preprint arXiv:2303.14662},
  year={2023}
}

otavatar's People

Contributors

Stargazers

Watchers

otavatar's Issues

question about data.utils.process_camera_inv

I'm confused when I read this function. Do the operations like trans[2] += -10, c *= 0.27 c[1] += 0.015 c[2] += 0.161, K[0,0] = 2985.29/700 * focal / 1050 K[1,1] = 2985.29/700 * focal / 1050 and pose[:3, 3] = pose[:3, 3]/4.0 * 2.7 have any special meaning?

def process_camera_inv(translation, Rs, focals): #crop_params):

    c_list = []

    N = len(translation)
    # for trans, R, crop_param in zip(translation,Rs, crop_params):
    for idx, (trans, R, focal) in enumerate(zip(translation, Rs, focals)):

        idx_prev = max(idx - 1, 0)
        idx_last = min(idx + 2, N - 1)

        trans = np.mean(translation[idx_prev: idx_last], axis = 0)
        R = np.mean(Rs[idx_prev: idx_last], axis = 0)

        # why
        trans[2] += -10
        c = -np.dot(R, trans)

        # # no why
        # c = trans

        pose = np.eye(4)
        pose[:3, :3] = R
        
        # why
        c *= 0.27
        c[1] += 0.015
        c[2] += 0.161
        # c[2] += 0.050  # 0.160

        pose[0, 3] = c[0]
        pose[1, 3] = c[1]
        pose[2, 3] = c[2]

        # focal = 2985.29
        w = 1024#224
        h = 1024#224


        K =np.eye(3)
        K[0][0] = focal
        K[1][1] = focal
        K[0][2] = w/2.0
        K[1][2] = h/2.0

        Rot = np.eye(3)
        Rot[0, 0] = 1
        Rot[1, 1] = -1
        Rot[2, 2] = -1        
        pose[:3, :3] = np.dot(pose[:3, :3], Rot)

        # fix intrinsics
        K[0,0] = 2985.29/700 * focal / 1050
        K[1,1] = 2985.29/700 * focal / 1050
        K[0,2] = 1/2
        K[1,2] = 1/2     
        assert K[0,1] == 0
        assert K[2,2] == 1
        assert K[1,0] == 0
        assert K[2,0] == 0
        assert K[2,1] == 0  

        # fix_pose_orig
        pose = np.array(pose).copy()

        # why
        pose[:3, 3] = pose[:3, 3]/4.0 * 2.7
        # # no why
        # t_1 = np.array([-1.3651,  4.5466,  6.2646])
        # s_1 = np.array([-2.3178, -2.3715, -1.9653]) + 1
        # t_2 = np.array([-2.0536,  6.4069,  4.2269])
        # pose[:3, 3] = (pose[:3, 3] + t_1) * s_1 + t_2

        c = np.concatenate([pose.reshape(-1), K.reshape(-1)])
        c_list.append(c.astype(np.float32))          

    return c_list

Multi-GPU trainning error (RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:3)

@theEricMa theEricMa My server environment can run single-gpu training, but encounters the following issues when executing multi-gpu training tasks. After changing nproc_per_node from 1 to 4, this error occurred.

(otavatar) ➜  OTAvatar git:(main) ✗ CUDA_VISIBLE_DEVICES=2,3,4,5 python -m torch.distributed.launch --nproc_per_node=4 --master_port 12222 train_inversion.py --config ./config/otavatar.yaml --name otavatar_gpu4
...
loading id loss module: <All keys matched successfully>
loading id loss module: <All keys matched successfully>
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic            Weight 1.0
Loss TV                   Weight 1.0
Loss pixel                Weight 1
Loss a_norm               Weight 0.0
Loss a_mutual             Weight 0.0
Loss local                Weight 10.0
Loss local_s              Weight 10.0
Loss id                   Weight 1.0
Loss id_s                 Weight 1.0
loading id loss module: <All keys matched successfully>
loading id loss module: <All keys matched successfully>
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
No checkpoint found.
Epoch 0 ...
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator

  0%|          | 0/2 [00:00<?, ?it/s]Setting up PyTorch plugin "bias_act_plugin"... Setting up PyTorch plugin "bias_act_plugin"... Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Setting up PyTorch plugin "upfirdn2d_plugin"... Done.


  0%|          | 0/100 [00:00<?, ?it/s]�[ADone.
Setting up PyTorch plugin "upfirdn2d_plugin"... Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Done.
Done.
Done.
Traceback (most recent call last):
  File "/gpfsdata/home/x/OTAvatar/loss/identity.py", line 353, in forward
    loss = criterion(self.facenet(gt_align).detach(), self.facenet(pred_align))
  File "/gpfsdata/home/x/miniconda3/envs/otavatar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/gpfsdata/home/x/miniconda3/envs/otavatar/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 154, in forward
    raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:3
...

Full log here err.log
What could be the possible reasons?

Can you please tell on what operating system is this code running ?

when release the code

Nice work!

When will the code be released approximately?

Pretrained model batchsize and gpus

          Hi, may I know how much GPU is used for the training? Mine is 4 A100s (80GB mem), so the batchsize is 8(per GPU) * 4 (GPU num) = 32, therefore the 2000 iters will spend more than 1 epoch. If you cannot support batchsize=8 per GPU, please try more GPUs. Larger batchsize leads to more stable training.

Originally posted by @theEricMa in #10 (comment)

I trained with 4(per GPU) * 6(GPU num), 1500 iters spent exactlly 1 epoch.

The pretrained model named epoch_00005_iteration_000002000, maybe you trained this model with more than 8(per GPU) * 8(GPU num)?

error

No module named 'models.volumetric_rendering'

metric code about cross-identity reenactment

@theEricMa Could you share the codes about FID、AKD、AED、CSIM、APD？I check repos like PIRenderer、FOMM and StyleHeat, but only FOMM share the code about AED and AKD, which maybe right. So we wish you sincerely release your codes about criteria on cross-identity reenactment.

Could you please upload the remaining preprocessing scripts?

Great work! I hope to use my datasets, so I need your scripts.

Is there anyway to download the ffhqrebalanced512-64.pth from Baidu Netdisk without an Baidu account.

I don't have a Chinese phone number so I can't register for a Baidu account.

Question about 'FaceTrainer' object has no attribute 'net_G_module'

I'm getting an error after loading the model during testing
load [net_Warp] and [net_Warp_ema] from result\otavatar\epoch_00005_iteration_000002000_checkpoint.pt
Done with loading the checkpoint.
0%| | 0/3537 [00:00<?, ?it/s]
0%| | 0/19 [00:42<?, ?it/s]
Traceback (most recent call last):
File "inference_refine_1D_cam.py", line 166, in
opt_Ws, w_opt, w_std = trainer.inverse_setup(1,)
File "F:\00Liss\01mycode\09Voice_driven_face_generation\03code\39-OTAvatar-main\trainers\decouple_by_invert.py", line 600, in inverse_setup
w_avg, w_std = self.sample_zs()
File "F:\00Liss\01mycode\09Voice_driven_face_generation\03code\39-OTAvatar-main\trainers\decouple_by_invert.py", line 766, in sample_zs
self.net_G_module.z_dim
AttributeError: 'FaceTrainer' object has no attribute 'net_G_module'

When do you plan to upload the data preprocessing script？

Great works!
I would like to utilize your work as the baseline, when do you plan to upload the data preprocessing script？

Quantitative Evaluation

Hi, Thanks for open-sourcing this awesome work. Could you please let me know how to get the numbers in Table 1 in the paper? I couldn't find details about Multi-View Reenactment and Cross-Identity Reenactment.

Specifically,

Are the metrics in Table 1 computed on HDTF or Multiface?
For Multi-View Reenactment, do you use the first frame for each test video for identity and then reconstruct the rest frames? And for Cross-Identity Reenactment, do you use the WRA_EricCantor_000 video (as here) to drive the first frame of each test video?

Also, do you have any plan to release the script for CSIM, AED, APD and AKD computation, or could you please point me to the external code you used for these metrics.

Thanks in advance!

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 150437) of binary: /data1/anconda3/envs/otavatar/bin/python

When I deployed according to README, I encountered this issue. I'm not quite sure what caused this. Following is the code snippet and error log of my implementation. Please take a look at that and suggest me a solution. @theEricMa

Loading ResNet ArcFace
loading id loss module: <All keys matched successfully>
Loading ResNet ArcFace
loading id loss module: <All keys matched successfully>
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic            Weight 1.0
Loss TV                   Weight 1.0
Loss pixel                Weight 1
Loss a_norm               Weight 0.0
Loss a_mutual             Weight 0.0
Loss local                Weight 10.0
Loss local_s              Weight 10.0
Loss id                   Weight 1.0
Loss id_s                 Weight 1.0
We train Generator
load [net_Warp] and [net_Warp_ema] from result/otavatar/epoch_00005_iteration_000002000_checkpoint.pt
Done with loading the checkpoint.
  0%|                                                                                                                                                                           | 0/19 [00:00<?, ?it/sSetting up PyTorch plugin "bias_act_plugin"... Done.                                                                                                                           | 0/3537 [00:00<?, ?it/s]
                          Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00,  4.51it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:16<00:00,  6.12it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3537/3537 [06:13<00:00,  9.48it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 150437) of binary: /data1/anconda3/envs/otavatar/bin/python█████████▉| 3536/3537 [06:13<00:00, 11.93it/s]
Traceback (most recent call last):
  File "/data1/anconda3/envs/otavatar/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data1/anconda3/envs/otavatar/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
=======================================================
inference_refine_1D_cam.py FAILED
-------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
-------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-05-23_21:02:27
  host      : zss-Precision-5820-Tower-X-Series
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 150437)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 150437
=======================================================

question about emotion code , ws_stdv

Thank you for your work!
I'm confused when I read

# model forward
            ws_scaling, ws_trans, alpha = net_Warp(target_semantic) # None, motion_latent, motion_feat
            ws_scaling = ws_scaling + 1 if ws_scaling is not None else 1
            ws_trans = ws_trans * self.ws_stdv.to(ws) # ?

why do you do like this: ws_trans = ws_trans * self.ws_stdv.to(w_opt)
why ws_trans do not directly come from network but continue to multiply with ws_stdv?
Anyway, I know that the ws_stdv means what, but how to get it? Does it come from eg3d office?

How to makes face animation with specified image

@theEricMa @87003697 great job，but how to make animate from a single image

About training consumption and inference speed

Great work, it inspired me a lot!

May I ask the GPU memory needed to train the model? I don't have many GPUs and I'm afraid that the experiment can't be reproduced normally.

Besides, I'm also curious about the inference time (FPS).

Looking forward to hearing from you, thanks!

Difference between code and paper

Hi, thanks for your great work !
In line 730 of decouple_by_invert.py, the parameter of the motion controller is updated together with the eg3d generator.
However, in the algorithm of the appendix, the parameters of the motion controller is not updated while finetuning theta_eg

By the way, in the 16th line of the algorithm, Lt is not mentioned in the original paper, is there a mistake?

pretrained data

Hello sir, could you update new google drive link about pretrained data?
It's very hard for me to upload data to remote server because of my limited upload speed and unstable connection (like broken pipe with command scp)....Thanks a lot

cann

Nice Work!
When runing the inference_refine_1D_cam.py file, the error is unknown location of config file.
Indeed, the config file cannot be found under the file Config.

Makes face animation with specified image

Hi!
Since the project(OTAvatar_processing) only handles video types. I converted an the specified image into a video with 120 frames(duplicate). Then I obtained data(target video and source video(specified image)) in mdb format through Project OTAvatar_processing.

After running project OTAvatar, the obtained results are as follows：

Uploading video_target_to_video_source.mp4…
https://github.com/theEricMa/OTAvatar/assets/117260350/48f089c3-6b13-4712-85e7-4ad71a747325

The results are not very satisfactory. I don’t know if I made mistakes in processing the images.
Thanks!

No arcface_resnet18.pth found in arcface-pytorch

Could you provide a download link for the pretrained model？

Thanks

ModuleNotFoundError: No module named 'third_part'

Both train/inference stuck here, I dont know how many missing files left, maybe you could check them in another clean machine?

  File "/home/x/OTAvatar/util/lpips.py", line 8, in <module>
    from third_part.PerceptualSimilarity.models import dist_model as dm
ModuleNotFoundError: No module named 'third_part'

BTW, there're some missing python packages in enviroment.yaml: opencv-python traitlets PyYAML lmdb

How to makes face animation with specified image

@theEricMa if I want to make face animation with own image and drive videos, what I can do to make it?

About requirement.yaml

Hi, I wonder why there are some modules that are not contained in the environment.yml, as a result of which we users have to run the inference code to find which modules are missing and then install these modules one by one.

No checkpoint at 2000

I was able to run the inference but there's still no checkpoint at 2000. The output result is a video at iteration 00000 which has no mouth movement.

Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Loading ResNet ArcFace
loading id loss module:
Loading ResNet ArcFace
loading id loss module:
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic Weight 1.0
Loss TV Weight 1.0
Loss pixel Weight 1
Loss a_norm Weight 0.0
Loss a_mutual Weight 0.0
Loss local Weight 10.0
Loss local_s Weight 10.0
Loss id Weight 1.0
Loss id_s Weight 1.0
We train Generator
No checkpoint found at iteration 2000.
0%| | 0/19 [00:00<?, ?it/s] 0%| | 0/19 [00:07<?, ?it/s]

How to transfer expression from target img to source img

After run this command

export CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=1 --master_port 12345 inference_refine_1D_cam.py \
--config ./config/otavatar.yaml \
--name config/otavatar.yaml \
--no_resume \
--which_iter 2000 \
--image_size 512 \
--ws_plus \
--cross_id \
--cross_id_target WRA_EricCantor_000 \
--output_dir ./result/otavatar/evaluation/cross_ws_plus_WRA_EricCantor_000

got these videos, It is obvious that the pose has only been transferred from the target image.

How to fix it?

About downloading processed data

I have downloaded the file hdtf_lmdb_inv.zip from google drive, but when I unzip the file, it shows

Archive: hdtf_lmdb_inv.zip
warning [hdtf_lmdb_inv.zip]: 61915031118 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [hdtf_lmdb_inv.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)

There may be something wrong about the file, would you please provide md5 for checking the integrity of the zipped file?

can you share your pretrained model?

And when doing inference,i met this error :
python: /opt/conda/conda-bld/magma-cuda113_1619629459349/work/interface_cuda/interface.cpp:899: void magma_queue_create_from_cuda_internal(magma_device_t, cudaStream_t, cublasHandle_t, cusparseHandle_t, magma_queue**, const char*, const char*, int): Assertion `queue->dBarray__ != __null' failed.

License file

Can you please update the license file?