theericma / otavatar Goto Github PK
View Code? Open in Web Editor NEWThis is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023].
This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023].
When I deployed according to README, I encountered this issue. I'm not quite sure what caused this. Following is the code snippet and error log of my implementation. Please take a look at that and suggest me a solution. @theEricMa
Loading ResNet ArcFace
loading id loss module: <All keys matched successfully>
Loading ResNet ArcFace
loading id loss module: <All keys matched successfully>
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic Weight 1.0
Loss TV Weight 1.0
Loss pixel Weight 1
Loss a_norm Weight 0.0
Loss a_mutual Weight 0.0
Loss local Weight 10.0
Loss local_s Weight 10.0
Loss id Weight 1.0
Loss id_s Weight 1.0
We train Generator
load [net_Warp] and [net_Warp_ema] from result/otavatar/epoch_00005_iteration_000002000_checkpoint.pt
Done with loading the checkpoint.
0%| | 0/19 [00:00<?, ?it/sSetting up PyTorch plugin "bias_act_plugin"... Done. | 0/3537 [00:00<?, ?it/s]
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00, 4.51it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:16<00:00, 6.12it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3537/3537 [06:13<00:00, 9.48it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 150437) of binary: /data1/anconda3/envs/otavatar/bin/python█████████▉| 3536/3537 [06:13<00:00, 11.93it/s]
Traceback (most recent call last):
File "/data1/anconda3/envs/otavatar/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data1/anconda3/envs/otavatar/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/run.py", line 715, in run
elastic_launch(
File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/data1/anconda3/envs/otavatar/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
=======================================================
inference_refine_1D_cam.py FAILED
-------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
-------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-05-23_21:02:27
host : zss-Precision-5820-Tower-X-Series
rank : 0 (local_rank: 0)
exitcode : -9 (pid: 150437)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 150437
=======================================================
Hi, I wonder why there are some modules that are not contained in the environment.yml, as a result of which we users have to run the inference code to find which modules are missing and then install these modules one by one.
Great work, it inspired me a lot!
May I ask the GPU memory needed to train the model? I don't have many GPUs and I'm afraid that the experiment can't be reproduced normally.
Besides, I'm also curious about the inference time (FPS).
Looking forward to hearing from you, thanks!
No module named 'models.volumetric_rendering'
Both train/inference stuck here, I dont know how many missing files left, maybe you could check them in another clean machine?
File "/home/x/OTAvatar/util/lpips.py", line 8, in <module>
from third_part.PerceptualSimilarity.models import dist_model as dm
ModuleNotFoundError: No module named 'third_part'
BTW, there're some missing python packages in enviroment.yaml: opencv-python traitlets PyYAML lmdb
I'm getting an error after loading the model during testing
load [net_Warp] and [net_Warp_ema] from result\otavatar\epoch_00005_iteration_000002000_checkpoint.pt
Done with loading the checkpoint.
0%| | 0/3537 [00:00<?, ?it/s]
0%| | 0/19 [00:42<?, ?it/s]
Traceback (most recent call last):
File "inference_refine_1D_cam.py", line 166, in
opt_Ws, w_opt, w_std = trainer.inverse_setup(1,)
File "F:\00Liss\01mycode\09Voice_driven_face_generation\03code\39-OTAvatar-main\trainers\decouple_by_invert.py", line 600, in inverse_setup
w_avg, w_std = self.sample_zs()
File "F:\00Liss\01mycode\09Voice_driven_face_generation\03code\39-OTAvatar-main\trainers\decouple_by_invert.py", line 766, in sample_zs
self.net_G_module.z_dim
AttributeError: 'FaceTrainer' object has no attribute 'net_G_module'
Nice work!
When will the code be released approximately?
Hello sir, could you update new google drive link about pretrained data?
It's very hard for me to upload data to remote server because of my limited upload speed and unstable connection (like broken pipe with command scp)....Thanks a lot
I don't have a Chinese phone number so I can't register for a Baidu account.
Great work! I hope to use my datasets, so I need your scripts.
I was able to run the inference but there's still no checkpoint at 2000. The output result is a video at iteration 00000 which has no mouth movement.
Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Perceptual loss:
Mode: vgg19
Loading ResNet ArcFace
loading id loss module:
Loading ResNet ArcFace
loading id loss module:
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic Weight 1.0
Loss TV Weight 1.0
Loss pixel Weight 1
Loss a_norm Weight 0.0
Loss a_mutual Weight 0.0
Loss local Weight 10.0
Loss local_s Weight 10.0
Loss id Weight 1.0
Loss id_s Weight 1.0
We train Generator
No checkpoint found at iteration 2000.
0%| | 0/19 [00:00<?, ?it/s] 0%| | 0/19 [00:07<?, ?it/s]
Hi, thanks for your great work !
In line 730 of decouple_by_invert.py, the parameter of the motion controller is updated together with the eg3d generator.
However, in the algorithm of the appendix, the parameters of the motion controller is not updated while finetuning theta_eg
By the way, in the 16th line of the algorithm, Lt
is not mentioned in the original paper, is there a mistake?
Can you please update the license file?
@theEricMa Could you share the codes about FID、AKD、AED、CSIM、APD?I check repos like PIRenderer、FOMM and StyleHeat, but only FOMM share the code about AED and AKD, which maybe right. So we wish you sincerely release your codes about criteria on cross-identity reenactment.
Could you provide a download link for the pretrained model?
Thanks
@theEricMa theEricMa My server environment can run single-gpu training, but encounters the following issues when executing multi-gpu training tasks. After changing nproc_per_node from 1 to 4, this error occurred.
(otavatar) ➜ OTAvatar git:(main) ✗ CUDA_VISIBLE_DEVICES=2,3,4,5 python -m torch.distributed.launch --nproc_per_node=4 --master_port 12222 train_inversion.py --config ./config/otavatar.yaml --name otavatar_gpu4
...
loading id loss module: <All keys matched successfully>
loading id loss module: <All keys matched successfully>
Loss perceptual_inverse_lr Weight 1.0
Loss perceptual_inverse_sr Weight 1.0
Loss perceptual_refine_lr Weight 1.0
Loss perceptual_refine_sr Weight 1.0
Loss monotonic Weight 1.0
Loss TV Weight 1.0
Loss pixel Weight 1
Loss a_norm Weight 0.0
Loss a_mutual Weight 0.0
Loss local Weight 10.0
Loss local_s Weight 10.0
Loss id Weight 1.0
Loss id_s Weight 1.0
loading id loss module: <All keys matched successfully>
loading id loss module: <All keys matched successfully>
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
No checkpoint found.
Epoch 0 ...
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
Loading model from: /gpfsdata/home/x/OTAvatar/third_part/PerceptualSimilarity/weights/v0.1/alex.pth
We train Generator
0%| | 0/2 [00:00<?, ?it/s]Setting up PyTorch plugin "bias_act_plugin"... Setting up PyTorch plugin "bias_act_plugin"... Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
0%| | 0/100 [00:00<?, ?it/s]�[ADone.
Setting up PyTorch plugin "upfirdn2d_plugin"... Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
Done.
Done.
Done.
Traceback (most recent call last):
File "/gpfsdata/home/x/OTAvatar/loss/identity.py", line 353, in forward
loss = criterion(self.facenet(gt_align).detach(), self.facenet(pred_align))
File "/gpfsdata/home/x/miniconda3/envs/otavatar/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/gpfsdata/home/x/miniconda3/envs/otavatar/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 154, in forward
raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:3
...
Full log here err.log
What could be the possible reasons?
I'm confused when I read this function. Do the operations like trans[2] += -10
, c *= 0.27 c[1] += 0.015 c[2] += 0.161
, K[0,0] = 2985.29/700 * focal / 1050 K[1,1] = 2985.29/700 * focal / 1050
and pose[:3, 3] = pose[:3, 3]/4.0 * 2.7
have any special meaning?
def process_camera_inv(translation, Rs, focals): #crop_params):
c_list = []
N = len(translation)
# for trans, R, crop_param in zip(translation,Rs, crop_params):
for idx, (trans, R, focal) in enumerate(zip(translation, Rs, focals)):
idx_prev = max(idx - 1, 0)
idx_last = min(idx + 2, N - 1)
trans = np.mean(translation[idx_prev: idx_last], axis = 0)
R = np.mean(Rs[idx_prev: idx_last], axis = 0)
# why
trans[2] += -10
c = -np.dot(R, trans)
# # no why
# c = trans
pose = np.eye(4)
pose[:3, :3] = R
# why
c *= 0.27
c[1] += 0.015
c[2] += 0.161
# c[2] += 0.050 # 0.160
pose[0, 3] = c[0]
pose[1, 3] = c[1]
pose[2, 3] = c[2]
# focal = 2985.29
w = 1024#224
h = 1024#224
K =np.eye(3)
K[0][0] = focal
K[1][1] = focal
K[0][2] = w/2.0
K[1][2] = h/2.0
Rot = np.eye(3)
Rot[0, 0] = 1
Rot[1, 1] = -1
Rot[2, 2] = -1
pose[:3, :3] = np.dot(pose[:3, :3], Rot)
# fix intrinsics
K[0,0] = 2985.29/700 * focal / 1050
K[1,1] = 2985.29/700 * focal / 1050
K[0,2] = 1/2
K[1,2] = 1/2
assert K[0,1] == 0
assert K[2,2] == 1
assert K[1,0] == 0
assert K[2,0] == 0
assert K[2,1] == 0
# fix_pose_orig
pose = np.array(pose).copy()
# why
pose[:3, 3] = pose[:3, 3]/4.0 * 2.7
# # no why
# t_1 = np.array([-1.3651, 4.5466, 6.2646])
# s_1 = np.array([-2.3178, -2.3715, -1.9653]) + 1
# t_2 = np.array([-2.0536, 6.4069, 4.2269])
# pose[:3, 3] = (pose[:3, 3] + t_1) * s_1 + t_2
c = np.concatenate([pose.reshape(-1), K.reshape(-1)])
c_list.append(c.astype(np.float32))
return c_list
After run this command
export CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=1 --master_port 12345 inference_refine_1D_cam.py \
--config ./config/otavatar.yaml \
--name config/otavatar.yaml \
--no_resume \
--which_iter 2000 \
--image_size 512 \
--ws_plus \
--cross_id \
--cross_id_target WRA_EricCantor_000 \
--output_dir ./result/otavatar/evaluation/cross_ws_plus_WRA_EricCantor_000
got these videos, It is obvious that the pose has only been transferred from the target image.
How to fix it?
@theEricMa if I want to make face animation with own image and drive videos, what I can do to make it?
Hi, may I know how much GPU is used for the training? Mine is 4 A100s (80GB mem), so the batchsize is 8(per GPU) * 4 (GPU num) = 32, therefore the 2000 iters will spend more than 1 epoch. If you cannot support batchsize=8 per GPU, please try more GPUs. Larger batchsize leads to more stable training.
Originally posted by @theEricMa in #10 (comment)
I trained with 4(per GPU) * 6(GPU num), 1500 iters spent exactlly 1 epoch.
The pretrained model named epoch_00005_iteration_000002000, maybe you trained this model with more than 8(per GPU) * 8(GPU num)?
@theEricMa @87003697 great job,but how to make animate from a single image
Hi, Thanks for open-sourcing this awesome work. Could you please let me know how to get the numbers in Table 1 in the paper? I couldn't find details about Multi-View Reenactment and Cross-Identity Reenactment.
Specifically,
WRA_EricCantor_000
video (as here) to drive the first frame of each test video?Also, do you have any plan to release the script for CSIM, AED, APD and AKD computation, or could you please point me to the external code you used for these metrics.
Thanks in advance!
Great works!
I would like to utilize your work as the baseline, when do you plan to upload the data preprocessing script?
I have downloaded the file hdtf_lmdb_inv.zip from google drive, but when I unzip the file, it shows
Archive: hdtf_lmdb_inv.zip
warning [hdtf_lmdb_inv.zip]: 61915031118 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [hdtf_lmdb_inv.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
There may be something wrong about the file, would you please provide md5 for checking the integrity of the zipped file?
can you share your pretrained model?
And when doing inference,i met this error :
python: /opt/conda/conda-bld/magma-cuda113_1619629459349/work/interface_cuda/interface.cpp:899: void magma_queue_create_from_cuda_internal(magma_device_t, cudaStream_t, cublasHandle_t, cusparseHandle_t, magma_queue**, const char*, const char*, int): Assertion `queue->dBarray__ != __null' failed.
Hi!
Since the project(OTAvatar_processing) only handles video types. I converted an the specified image into a video with 120 frames(duplicate). Then I obtained data(target video and source video(specified image)) in mdb format through Project OTAvatar_processing.
After running project OTAvatar, the obtained results are as follows:
Uploading video_target_to_video_source.mp4…
https://github.com/theEricMa/OTAvatar/assets/117260350/48f089c3-6b13-4712-85e7-4ad71a747325
The results are not very satisfactory. I don’t know if I made mistakes in processing the images.
Thanks!
Thank you for your work!
I'm confused when I read
# model forward
ws_scaling, ws_trans, alpha = net_Warp(target_semantic) # None, motion_latent, motion_feat
ws_scaling = ws_scaling + 1 if ws_scaling is not None else 1
ws_trans = ws_trans * self.ws_stdv.to(ws) # ?
why do you do like this: ws_trans = ws_trans * self.ws_stdv.to(w_opt)
why ws_trans do not directly come from network but continue to multiply with ws_stdv?
Anyway, I know that the ws_stdv means what, but how to get it? Does it come from eg3d office?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.