Giter Site home page Giter Site logo

theericma / otavatar Goto Github PK

View Code? Open in Web Editor NEW
291.0 291.0 33.0 12.67 MB

This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023].

Python 78.81% Shell 0.08% C++ 5.19% Cuda 15.92%
cvpr2023 deep-learning deepfake face-animation face-reenactment image-animation motion-transfer pose-transfer pytorch talking-head

otavatar's People

Contributors

87003697 avatar theericma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

otavatar's Issues

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 150437) of binary: /data1/anconda3/envs/otavatar/bin/python

When I deployed according to README, I encountered this issue. I'm not quite sure what caused this. Following is the code snippet and error log of my implementation. Please take a look at that and suggest me a solution. @theEricMa

Loading ResNet ArcFace
loading id loss module: <All keys matched successfully>
Loading ResNet ArcFace
loading id loss module: <All keys matched successfully>
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic            Weight 1.0
Loss TV                   Weight 1.0
Loss pixel                Weight 1
Loss a_norm               Weight 0.0
Loss a_mutual             Weight 0.0
Loss local                Weight 10.0
Loss local_s              Weight 10.0
Loss id                   Weight 1.0
Loss id_s                 Weight 1.0
We train Generator
load [net_Warp] and [net_Warp_ema] from result/otavatar/epoch_00005_iteration_000002000_checkpoint.pt
Done with loading the checkpoint.
  0%|                                                                                                                                                                           | 0/19 [00:00<?, ?it/sSetting up PyTorch plugin "bias_act_plugin"... Done.                                                                                                                           | 0/3537 [00:00<?, ?it/s]
                          Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00,  4.51it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:16<00:00,  6.12it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3537/3537 [06:13<00:00,  9.48it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 150437) of binary: /data1/anconda3/envs/otavatar/bin/python█████████▉| 3536/3537 [06:13<00:00, 11.93it/s]
Traceback (most recent call last):
  File "/data1/anconda3/envs/otavatar/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data1/anconda3/envs/otavatar/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
=======================================================
inference_refine_1D_cam.py FAILED
-------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
-------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-05-23_21:02:27
  host      : zss-Precision-5820-Tower-X-Series
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 150437)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 150437
=======================================================

About requirement.yaml

Hi, I wonder why there are some modules that are not contained in the environment.yml, as a result of which we users have to run the inference code to find which modules are missing and then install these modules one by one.

About training consumption and inference speed

Great work, it inspired me a lot!

May I ask the GPU memory needed to train the model? I don't have many GPUs and I'm afraid that the experiment can't be reproduced normally.

Besides, I'm also curious about the inference time (FPS).

Looking forward to hearing from you, thanks!

error

No module named 'models.volumetric_rendering'

ModuleNotFoundError: No module named 'third_part'

Both train/inference stuck here, I dont know how many missing files left, maybe you could check them in another clean machine?

  File "/home/x/OTAvatar/util/lpips.py", line 8, in <module>
    from third_part.PerceptualSimilarity.models import dist_model as dm
ModuleNotFoundError: No module named 'third_part'

BTW, there're some missing python packages in enviroment.yaml: opencv-python traitlets PyYAML lmdb

Question about 'FaceTrainer' object has no attribute 'net_G_module'

I'm getting an error after loading the model during testing
load [net_Warp] and [net_Warp_ema] from result\otavatar\epoch_00005_iteration_000002000_checkpoint.pt
Done with loading the checkpoint.
0%| | 0/3537 [00:00<?, ?it/s]
0%| | 0/19 [00:42<?, ?it/s]
Traceback (most recent call last):
File "inference_refine_1D_cam.py", line 166, in
opt_Ws, w_opt, w_std = trainer.inverse_setup(1,)
File "F:\00Liss\01mycode\09Voice_driven_face_generation\03code\39-OTAvatar-main\trainers\decouple_by_invert.py", line 600, in inverse_setup
w_avg, w_std = self.sample_zs()
File "F:\00Liss\01mycode\09Voice_driven_face_generation\03code\39-OTAvatar-main\trainers\decouple_by_invert.py", line 766, in sample_zs
self.net_G_module.z_dim
AttributeError: 'FaceTrainer' object has no attribute 'net_G_module'

pretrained data

Hello sir, could you update new google drive link about pretrained data?
It's very hard for me to upload data to remote server because of my limited upload speed and unstable connection (like broken pipe with command scp)....Thanks a lot

No checkpoint at 2000

I was able to run the inference but there's still no checkpoint at 2000. The output result is a video at iteration 00000 which has no mouth movement.

Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Loading ResNet ArcFace
loading id loss module:
Loading ResNet ArcFace
loading id loss module:
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic Weight 1.0
Loss TV Weight 1.0
Loss pixel Weight 1
Loss a_norm Weight 0.0
Loss a_mutual Weight 0.0
Loss local Weight 10.0
Loss local_s Weight 10.0
Loss id Weight 1.0
Loss id_s Weight 1.0
We train Generator
No checkpoint found at iteration 2000.
0%| | 0/19 [00:00<?, ?it/s] 0%| | 0/19 [00:07<?, ?it/s]

Difference between code and paper

Hi, thanks for your great work !
In line 730 of decouple_by_invert.py, the parameter of the motion controller is updated together with the eg3d generator.
However, in the algorithm of the appendix, the parameters of the motion controller is not updated while finetuning theta_eg
截屏2023-05-15 16 50 00
By the way, in the 16th line of the algorithm, Lt is not mentioned in the original paper, is there a mistake?

metric code about cross-identity reenactment

@theEricMa Could you share the codes about FID、AKD、AED、CSIM、APD?I check repos like PIRenderer、FOMM and StyleHeat, but only FOMM share the code about AED and AKD, which maybe right. So we wish you sincerely release your codes about criteria on cross-identity reenactment.

Multi-GPU trainning error (RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:3)

@theEricMa theEricMa My server environment can run single-gpu training, but encounters the following issues when executing multi-gpu training tasks. After changing nproc_per_node from 1 to 4, this error occurred.

(otavatar) ➜  OTAvatar git:(main) ✗ CUDA_VISIBLE_DEVICES=2,3,4,5 python -m torch.distributed.launch --nproc_per_node=4 --master_port 12222 train_inversion.py --config ./config/otavatar.yaml --name otavatar_gpu4
...
loading id loss module: <All keys matched successfully>
loading id loss module: <All keys matched successfully>
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic            Weight 1.0
Loss TV                   Weight 1.0
Loss pixel                Weight 1
Loss a_norm               Weight 0.0
Loss a_mutual             Weight 0.0
Loss local                Weight 10.0
Loss local_s              Weight 10.0
Loss id                   Weight 1.0
Loss id_s                 Weight 1.0
loading id loss module: <All keys matched successfully>
loading id loss module: <All keys matched successfully>
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
No checkpoint found.
Epoch 0 ...
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator

  0%|          | 0/2 [00:00<?, ?it/s]Setting up PyTorch plugin "bias_act_plugin"... Setting up PyTorch plugin "bias_act_plugin"... Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Setting up PyTorch plugin "upfirdn2d_plugin"... Done.


  0%|          | 0/100 [00:00<?, ?it/s]�[ADone.
Setting up PyTorch plugin "upfirdn2d_plugin"... Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Done.
Done.
Done.
Traceback (most recent call last):
  File "/gpfsdata/home/x/OTAvatar/loss/identity.py", line 353, in forward
    loss = criterion(self.facenet(gt_align).detach(), self.facenet(pred_align))
  File "/gpfsdata/home/x/miniconda3/envs/otavatar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/gpfsdata/home/x/miniconda3/envs/otavatar/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 154, in forward
    raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:3
...

Full log here err.log
What could be the possible reasons?

question about data.utils.process_camera_inv

I'm confused when I read this function. Do the operations like trans[2] += -10, c *= 0.27 c[1] += 0.015 c[2] += 0.161, K[0,0] = 2985.29/700 * focal / 1050 K[1,1] = 2985.29/700 * focal / 1050 and pose[:3, 3] = pose[:3, 3]/4.0 * 2.7 have any special meaning?

def process_camera_inv(translation, Rs, focals): #crop_params):

    c_list = []

    N = len(translation)
    # for trans, R, crop_param in zip(translation,Rs, crop_params):
    for idx, (trans, R, focal) in enumerate(zip(translation, Rs, focals)):

        idx_prev = max(idx - 1, 0)
        idx_last = min(idx + 2, N - 1)

        trans = np.mean(translation[idx_prev: idx_last], axis = 0)
        R = np.mean(Rs[idx_prev: idx_last], axis = 0)

        # why
        trans[2] += -10
        c = -np.dot(R, trans)

        # # no why
        # c = trans

        pose = np.eye(4)
        pose[:3, :3] = R
        
        # why
        c *= 0.27
        c[1] += 0.015
        c[2] += 0.161
        # c[2] += 0.050  # 0.160

        pose[0, 3] = c[0]
        pose[1, 3] = c[1]
        pose[2, 3] = c[2]

        # focal = 2985.29
        w = 1024#224
        h = 1024#224


        K =np.eye(3)
        K[0][0] = focal
        K[1][1] = focal
        K[0][2] = w/2.0
        K[1][2] = h/2.0

        Rot = np.eye(3)
        Rot[0, 0] = 1
        Rot[1, 1] = -1
        Rot[2, 2] = -1        
        pose[:3, :3] = np.dot(pose[:3, :3], Rot)

        # fix intrinsics
        K[0,0] = 2985.29/700 * focal / 1050
        K[1,1] = 2985.29/700 * focal / 1050
        K[0,2] = 1/2
        K[1,2] = 1/2     
        assert K[0,1] == 0
        assert K[2,2] == 1
        assert K[1,0] == 0
        assert K[2,0] == 0
        assert K[2,1] == 0  

        # fix_pose_orig
        pose = np.array(pose).copy()

        # why
        pose[:3, 3] = pose[:3, 3]/4.0 * 2.7
        # # no why
        # t_1 = np.array([-1.3651,  4.5466,  6.2646])
        # s_1 = np.array([-2.3178, -2.3715, -1.9653]) + 1
        # t_2 = np.array([-2.0536,  6.4069,  4.2269])
        # pose[:3, 3] = (pose[:3, 3] + t_1) * s_1 + t_2

        c = np.concatenate([pose.reshape(-1), K.reshape(-1)])
        c_list.append(c.astype(np.float32))          

    return c_list

How to transfer expression from target img to source img

After run this command

export CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=1 --master_port 12345 inference_refine_1D_cam.py \
--config ./config/otavatar.yaml \
--name config/otavatar.yaml \
--no_resume \
--which_iter 2000 \
--image_size 512 \
--ws_plus \
--cross_id \
--cross_id_target WRA_EricCantor_000 \
--output_dir ./result/otavatar/evaluation/cross_ws_plus_WRA_EricCantor_000

got these videos, It is obvious that the pose has only been transferred from the target image.
WRA_EricCantor_000_to_WRA_VickyHartzler_0002023512226202
How to fix it?

cann

Nice Work!
When runing the inference_refine_1D_cam.py file, the error is unknown location of config file.
Indeed, the config file cannot be found under the file Config.
image

Pretrained model batchsize and gpus

          Hi, may I know how much GPU is used for the training? Mine is 4 A100s (80GB mem), so the batchsize is 8(per GPU) * 4 (GPU num) = 32, therefore the 2000 iters will spend more than 1 epoch. If you cannot support batchsize=8 per GPU, please try more GPUs. Larger batchsize leads to more stable training.  

Originally posted by @theEricMa in #10 (comment)

I trained with 4(per GPU) * 6(GPU num), 1500 iters spent exactlly 1 epoch.
image
The pretrained model named epoch_00005_iteration_000002000, maybe you trained this model with more than 8(per GPU) * 8(GPU num)?

Quantitative Evaluation

Hi, Thanks for open-sourcing this awesome work. Could you please let me know how to get the numbers in Table 1 in the paper? I couldn't find details about Multi-View Reenactment and Cross-Identity Reenactment.

Specifically,

  • Are the metrics in Table 1 computed on HDTF or Multiface?
  • For Multi-View Reenactment, do you use the first frame for each test video for identity and then reconstruct the rest frames? And for Cross-Identity Reenactment, do you use the WRA_EricCantor_000 video (as here) to drive the first frame of each test video?

Also, do you have any plan to release the script for CSIM, AED, APD and AKD computation, or could you please point me to the external code you used for these metrics.

Thanks in advance!

About downloading processed data

I have downloaded the file hdtf_lmdb_inv.zip from google drive, but when I unzip the file, it shows

Archive: hdtf_lmdb_inv.zip
warning [hdtf_lmdb_inv.zip]: 61915031118 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [hdtf_lmdb_inv.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)

There may be something wrong about the file, would you please provide md5 for checking the integrity of the zipped file?

can you share your pretrained model?

can you share your pretrained model?

And when doing inference,i met this error :
python: /opt/conda/conda-bld/magma-cuda113_1619629459349/work/interface_cuda/interface.cpp:899: void magma_queue_create_from_cuda_internal(magma_device_t, cudaStream_t, cublasHandle_t, cusparseHandle_t, magma_queue**, const char*, const char*, int): Assertion `queue->dBarray__ != __null' failed.

image

Makes face animation with specified image

Hi!
Since the project(OTAvatar_processing) only handles video types. I converted an the specified image into a video with 120 frames(duplicate). Then I obtained data(target video and source video(specified image)) in mdb format through Project OTAvatar_processing.

After running project OTAvatar, the obtained results are as follows:

Uploading video_target_to_video_source.mp4…
https://github.com/theEricMa/OTAvatar/assets/117260350/48f089c3-6b13-4712-85e7-4ad71a747325

The results are not very satisfactory. I don’t know if I made mistakes in processing the images.
Thanks!

question about emotion code , ws_stdv

Thank you for your work!
I'm confused when I read

# model forward
            ws_scaling, ws_trans, alpha = net_Warp(target_semantic) # None, motion_latent, motion_feat
            ws_scaling = ws_scaling + 1 if ws_scaling is not None else 1
            ws_trans = ws_trans * self.ws_stdv.to(ws) # ?

why do you do like this: ws_trans = ws_trans * self.ws_stdv.to(w_opt)
why ws_trans do not directly come from network but continue to multiply with ws_stdv?
Anyway, I know that the ws_stdv means what, but how to get it? Does it come from eg3d office?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.