weizhi-zhong / ip_lap Goto Github PK
View Code? Open in Web Editor NEWCVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
License: Apache License 2.0
CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
License: Apache License 2.0
how can i fix it
Hi, thanks for sharing this incredible work with the community. It helps in collaborating and advancing the lip-sync technology.
It would be great if you could share the pre-trained checkpoint of the discriminator model used in the Video Renderer module. This will help in running fine-tuning experiments.
Thanks.
Hello, great work on this project! is there some way to ignore the frames where faces weren't detected?
你好,我想训练一个只适用于某特定人物的模型,请问训练video render时数据集的时长以及多样性是否有要求?
你好,请问给出的Video Renderer预训练模型是在训练到多少个step才停下来的 ,大约训练了多久 ?感谢:)
@Weizhi-Zhong
Thanks for your extraordinary work. But when I run the code, I found the lip shakes so much. There may be lack of some postprocess? The result as following:
https://drive.google.com/file/d/1L6XfTZKV_nvqv1FW3fFp1-9quOUgALYe/view?usp=sharing
Hello I recently installed with no issues but when I run the test on windows annaconda I get the following
(iplip) C:\Users\leolo\IP_LAP>python inference_single.py
Traceback (most recent call last):
File "inference_single.py", line 34, in <module>
fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False, device='cuda')
File "C:\Users\leolo\anaconda3\envs\iplip\lib\enum.py", line 354, in __getattr__
raise AttributeError(name) from None
AttributeError: _2D
All my versions seems correct according to the repo and the requirments.txt, the only thing i am doing differently is using
python inference_single.py
as when i use
CUDA_VISIBLE_DEVICES=0 python inference_single.py
i get the following error
(iplip) C:\Users\leolo\IP_LAP>CUDA_VISIBLE_DEVICES=0 python inference_single.py
'CUDA_VISIBLE_DEVICES' is not recognized as an internal or external command,
operable program or batch file.
我在训练 landmark_generator时,为了能够从上次中断的地方继续训练,发现调用 load_checkpoint()传入的参数 reset_optimizer=False时,会出现下面的错误
开始 landmark_generator_training******************
Project_name: landmarks
Load checkpoint from: ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch_1166_checkpoint_step000035000.pth
Load optimizer state from ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch_1166_checkpoint_step000035000.pth
init dataset,filtering very short videos.....
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 49644/49644 [00:04<00:00, 11383.00it/s]
complete,with available vids: 49475
init dataset,filtering very short videos.....
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 11265.29it/s]
complete,with available vids: 9976
0%| | 0/30 [00:00<?, ?it/s]Saved checkpoint: ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch1166_step000035000.pth
Evaluating model for 25 epochs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:45<00:00, 6.62s/it]
eval_L1_loss 0.005300633320584894 global_step: 35000█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:45<00:00, 6.63s/it]
eval_velocity_loss 0.04097183309495449 global_step: 35000
0%| | 0/30 [02:56<?, ?it/s]
Traceback (most recent call last):
File "train_landmarks_generator.py", line 341, in
optimizer.step()
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/optimizer.py", line 23, in use_grad
ret = func(self, *args, **kwargs)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 252, in step
found_inf=found_inf)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 316, in adam
found_inf=found_inf)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 363, in single_tensor_adam
exp_avg.mul(beta1).add(grad, alpha=1 - beta1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
如果改为 reset_optimizer=True,可以正常训练,但是生成的 pth文件大小和之前不太一样,不知道为啥会相差十几K,这个正常吗
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 02:20 landmarks_epoch_166_checkpoint_step000005000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 03:40 landmarks_epoch_333_checkpoint_step000010000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 05:00 landmarks_epoch_500_checkpoint_step000015000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 06:21 landmarks_epoch_666_checkpoint_step000020000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 07:41 landmarks_epoch_833_checkpoint_step000025000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167281157 1月 2 09:01 landmarks_epoch_1000_checkpoint_step000030000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167281157 1月 2 10:21 landmarks_epoch_1166_checkpoint_step000035000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 13:36 landmarks_epoch1199_step000036000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 13:54 landmarks_epoch1232_step000037000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:01 landmarks_epoch1266_step000038000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:19 landmarks_epoch1299_step000039000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:38 landmarks_epoch1332_step000040000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:56 landmarks_epoch1366_step000041000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:15 landmarks_epoch1399_step000042000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:33 landmarks_epoch1432_step000043000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:51 landmarks_epoch1466_step000044000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:10 landmarks_epoch1499_step000045000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:28 landmarks_epoch1532_step000046000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:47 landmarks_epoch1566_step000047000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 18:05 landmarks_epoch1599_step000048000.pth
非常感谢作者能开源这么优秀的项目!
我使用自己的视频及音频做了下推理测试,发现了类似于wav2lip的问题,即是当出现音频停下的静默片段,人物的嘴仍然会露出原始嘴型(可能嘴是闭上的,但是嘴角仍然按照原始嘴型在抽动)。想请问下作者这个问题是否有解决思路?如果在训练时添加这样静默的数据,是否会改善这个问题?
Hello, thank you very much for opening up this excellent work. When using your code for rendering, we found that the lip synchronization effect in Mandarin is not very good, and there may be unnatural synthesis effects on the face. Do you have any suggestions for optimization? Thank you.
En: When I run CUDA_VISIBLE_DEVICES=0 python inference_single.py, if the face in the frame is particularly large, the face detection works fine. However, when the face in the frame is relatively small, it throws an error "not detect face" with the following code:
inference_single.py code line 254: with mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, refine_landmarks=True, min_detection_confidence=0.5) as face_mesh:
I have even tried changing min_detection_confidence to a very small value like 0.01, but it still cannot detect the face. How can I modify it? Thank you for creating such an amazing project. I really appreciate it.
中文:当我运行了CUDA_VISIBLE_DEVICES=0 python inference_single.py之后,如果人脸在画面中特别大,人脸检测正常,当人脸在画面比例较小时候,就会报错not detect face,with mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, refine_landmarks=True,
min_detection_confidence=0.5) as face_mesh:我把min_detection_confidence改成特别小的数值也不行,比如0.01,还是检测不到人脸。怎么修改?感谢你们写了这么棒的项目。非常感谢
How to know N_l:N_l+T is lip_embedding and N_l+T: is jaw_embedding. As used in code below. I am using more no of landmark points. so i need to know how you are getting this information.
The code is attached below:
#3. fuse embedding
output_tokens=self.fusion_transformer(ref_embedding,mel_embedding,pose_embedding)
#4.output landmark
**lip_embedding=output_tokens[:,N_l:N_l+T,:] #(B,T,dim)
jaw_embedding=output_tokens[:,N_l+T:,:] #(B,T,dim)**
output_mouse_landmark=self.mouse_keypoint_map(lip_embedding) ##(B,T,40*2)
output_jaw_landmark=self.jaw_keypoint_map(jaw_embedding) ##(B,T,17*2)
Hi,
Thanks for sharing code and your contribution in the field of talking head generation.
I am confused about the training loss of the video_renderer. I have trained the video_renderer in LRS2 dataset which passed 24 epoches, 33220 steps, but the running_gen_loss seemed change randomly. The logs are as follow:
Is that normal or I did something wrong with that? Looking for your reply! Thanks a lot!
Hi,
it's me again.
I have successfully train a talking head using your repo.
However, I noticed there are some bad cases: when the audio is actually silent, the lips of talking head still move and open mouth.
Any insights about this? thanks.
Setup:
Hi,请问在训练landmark generator的时候,是需要手动停止吗?因为我看这个while循环的意思似乎没有break的逻辑:
https://github.com/Weizhi-Zhong/IP_LAP/blob/main/train_landmarks_generator.py#L303
如果是这样的话,请问给出的预训练模型是在训练到了多少个step才停下来的 、训练了多久呢 ?感谢。
Thank for your great work!
But when I run the inference_single.py using my own video, the code cannot work. It show:
Traceback (most recent call last): File "inference_single.py", line 509, in <module> full = merge_face_contour_only(original_background, T_input_frame[2], T_ori_face_coordinates[2][1],fa) #(H,W,3) File "inference_single.py", line 145, in merge_face_contour_only preds = fa.get_landmarks(input_img)[0] # 68x2 TypeError: 'NoneType' object is not subscriptable
So, I wonder how to solve the problem, tks again!
Hello, I trained the landmarks model by the videos I collected from bilibili, running_L1_loss is 0.0066 and running_velocity_loss is 0.0048. but in the inference result, the lip shakes heavily and mismatch with the upper face
could you give me some suggestions to fix that? Thanks very much!
我最近阅读了您的论文,并发现它对该领域做出了有益的贡献。感谢您宝贵的研究成果。
我想知道这个提供的在英文数据集上的预训练模型,是否可以在中文数据集上微调。
是否具有可行性和有效性,是否有任何见解或建议。在将这些模型应用于中文语言时,是否存在任何挑战或需要注意的事项?此外,如果您对于在中文数据上微调的具体技术或方法有任何建议,我将非常感谢您的见解。
Thank you for your hard work on this project! I'm excited to see the code and would love to know when it will be made available to the public. Do you have an estimated timeline for when this might happen?
I see the pre-generated crops of faces from individual frames are currently resized to N x N.
Will aligning the faces in the same video clip using the facial landmarks that are output during the landmark detection help make a better landmark generator?
非常感谢您如此出色的工作,在测试之后发现,生成的视频人脸会有很细微的抖动,经过拆帧之后发现,两帧之间生成的区别过大,其中一张下巴很长,像是双下巴或者下巴部分重叠一样,因此产生了面部的抖动,我已经尝试了修改面部平滑的数值,也尝试减小面部检测框的大小,但还是不行,请问您那边有什么思路吗
I prepared the materials required for training and executed the training file, and the following error occurred, I don't know how to solve it, please advise
I have installed the environment I need for the program to run
When I execute the training code, something like this happens:
I noticed that the program filtered out all my footage, so I changed the min_len to 0
As you can see, he will make a mistake
The error of ValueError: Caught ValueError in DataLoader worker process 0. can be modified to 0 num_workers removed, but obviously this is not appropriate
how to solve this issue
After I running inference_single.py. It generated this video https://drive.google.com/file/d/1sJNC_3rjy1Op8aKz4cKBBgbpvkOIlYAx/view?usp=sharing
I have debugged the code it stuck here subprocess.call(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) and never reach this line print("succeed output results to:", outfile_path)
如题
T_pose和predict_content拼接的时候有问题,怎么解决呀?
Hi Dear,
Your work is so excellent! But after test many videos, I find generated lip is so small and it can not handle long range face. Could you have any ideas to solve these issues?
输出结果存在严重的位移和强烈抖动的问题,猜测是可能人脸对齐和特征点识别的问题,就是不知道有什么样的改进方案。
Thank you for your open source.
I tried the CUDA_VISIBLE_DEVICES=0 python inference_single.py
and got the result ./test_result/129result_N_25_Nl_15.mp4
every thing goes fine, except the inference speed is slow(only got around 5.0it/s using RTX4090)
Is there any suggestion for optimizing the speed?
Thanks for the awesome work!
I have 2 questions:
I attempted to run train_video_renderer.py on the LRS2 dataset using four RTX 4090 GPUs, but the training speed is exceptionally slow. In a previous issue, I noticed that the author suggested running approximately 300 epochs for optimal results. However, the speed I'm experiencing is much lower than expected. Does anyone has same issue?
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation. i mean, the method seems be very familiar. and inference fps of 3090ti is?
在训练video_render的时候使用8*V100/16GB 的时候会提示“Segmentation fault”;但当把batch-size该成16的时候就可以正常训练了,请问这个是正常的吗 ?如果想尽可能地利用我的8张V100 应该如何修改代码呢 ?
I find the last shape of output in landmarks_generator is 57, why 57?
Thanks for your work!
When I run the training code, the process is too slow,and I found the time is mainly spent on the dataloader.
How to solve the problem?
中文的可以替换LRS2方案的思路有没有诶
训练完landmark 和render模型执行inference_single.py推理时,提示加载checkpoint错误,状态字典缺少一些keys。信息如下:
landmark_generator_model loaded from : checkpoints/landmark_generation/Pro_landmarkT5_d512_fe1024_lay4_head4/landmarkT5_d512_fe1024_lay4_head4_epoch_2020_checkpoint_step000012120.pth
renderer loaded from : checkpoints/renderer/Pro_renderer_T1_ref_N3/renderer_T1_ref_N3_epoch_7000_checkpoint_step000042000.pth
Load checkpoint from: checkpoints/landmark_generation/Pro_landmarkT5_d512_fe1024_lay4_head4/landmarkT5_d512_fe1024_lay4_head4_epoch_2020_checkpoint_step000012120.pth
--local/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
--local/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1
. You can also use weights=VGG19_Weights.DEFAULT
to get the most up-to-date weights.
warnings.warn(msg)
Perceptual loss:
Mode: vgg19
Load checkpoint from: checkpoints/renderer/Pro_renderer_T1_ref_N3/renderer_T1_ref_N3_epoch_7000_checkpoint_step000042000.pth
Traceback (most recent call last):
File "IP_LAP/inference_single.py", line 194, in
renderer = load_model(model=Renderer(), path=renderer_checkpoint_path)
File "IP_LAP/inference_single.py", line 173, in load_model
model.load_state_dict(new_s)
File "local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Renderer:
Missing key(s) in state_dict: "flow_module.conv1.weight", "flow_module.conv1.bias", "flow_module.conv1_bn.weight", "flow_module.conv1_bn.bias", "flow_module.conv1_bn.running_mean", "flow_module.conv1_bn.running_var", "flow_module.conv2.weight", "flow_module.conv2.bias", "flow_module.conv2_bn.weight", "flow_module.conv2_bn.bias", "flow_module.conv2_bn.running_mean", "flow_module.conv2_bn.running_var", "flow_module.spade_layer_1.conv_1.weight", "flow_module.spade_layer_1.conv_1.bias", "flow_module.spade_layer_1.conv_2.weight", "flow_module.spade_layer_1.conv_2.bias", "flow_module.spade_layer_1.spade_layer_1.conv1.weight", "flow_module.spade_layer_1.spade_layer_1.conv1.bias", "flow_module.spade_layer_1.spade_layer_1.gamma.weight", "flow_module.spade_layer_1.spade_layer_1.gamma.bias", "flow_module.spade_layer_1.spade_layer_1.beta.weight", "flow_module.spade_layer_1.spade_layer_1.beta.bias", "flow_module.spade_layer_1.spade_layer_2.conv1.weight", "flow_module.spade_layer_1.spade_layer_2.conv1.bias", "flow_module.spade_layer_1.spade_layer_2.gamma.weight", "flow_module.spade_layer_1.spade_layer_2.gamma.bias", "flow_module.spade_layer_1.spade_layer_2.beta.weight", "flow_module.spade_layer_1.spade_layer_2.beta.bias", "flow_module.spade_layer_2.conv_1.weight", "flow_module.spade_layer_2.conv_1.bias", "flow_module.spade_layer_2.conv_2.weight", "flow_module.spade_layer_2.conv_2.bias", "flow_module.spade_layer_2.spade_layer_1.conv1.weight", "flow_module.spade_layer_2.spade_layer_1.conv1.bias", "flow_module.spade_layer_2.spade_layer_1.gamma.weight", "flow_module.spade_layer_2.spade_layer_1.gamma.bias", "flow_module.spade_layer_2.spade_layer_1.beta.weight", "flow_module.spade_layer_2.spade_layer_1.beta.bias", "flow_module.spade_layer_2.spade_layer_2.conv1.weight", "flow_module.spade_layer_2.spade_layer_2.conv1.bias", "flow_module.spade_layer_2.spade_layer_2.gamma.weight", "flow_module.spade_layer_2.spade_layer_2.gamma.bias", "flow_module.spade_layer_2.spade_layer_2.beta.weight", "flow_module.spade_layer_2.spade_layer_2.beta.bias", "flow_module.spade_layer_4.conv_1.weight", "flow_module.spade_layer_4.conv_1.bias", "flow_module.spade_layer_4.conv_2.weight", "flow_module.spade_layer_4.conv_2.bias", "flow_module.spade_layer_4.spade_layer_1.conv1.weight", "flow_module.spade_layer_4.spade_layer_1.conv1.bias", "flow_module.spade_layer_4.spade_layer_1.gamma.weight", "flow_module.spade_layer_4.spade_layer_1.gamma.bias", "flow_module.spade_layer_4.spade_layer_1.beta.weight", "flow_module.spade_layer_4.spade_layer_1.beta.bias", "flow_module.spade_layer_4.spade_layer_2.conv1.weight", "flow_module.spade_layer_4.spade_layer_2.conv1.bias", "flow_module.spade_layer_4.spade_layer_2.gamma.weight", "flow_module.spade_layer_4.spade_layer_2.gamma.bias", "flow_module.spade_layer_4.spade_layer_2.beta.weight", "flow_module.spade_layer_4.spade_layer_2.beta.bias", "flow_module.conv_4.weight", "flow_module.conv_4.bias", "flow_module.conv_5.0.weight", "flow_module.conv_5.0.bias", "flow_module.conv_5.2.weight", "flow_module.conv_5.2.bias".
I'm currently working on a project where I'm utilizing your method. I've observed that the method is configured to operate at 25 frames per second (fps), and I'm trying to understand the rationale behind this choice.
I have a collection of videos in my dataset that are recorded at 30fps, which is a standard frame rate for many recording devices including the smartphone.
However, when I downsample these videos (with ffmpeg) from 30fps to 25fps in order to match the method's operating rate, the resultant videos appear very choppy and lack smoothness. I have tried adding motion blur between the frames, but without much success.
Are there specific reasons why the method is set to work at 25fps instead of the more commonly used 30fps? Would it be possible to modify the method to operate at 30fps without significantly impacting its performance or the results?
Additionally, I would appreciate any suggestions on how to prevent the loss of smoothness when downsampling videos from 30fps to 25fps.
作者您好,请问推理出来的视频为什么分辨率变得很低了,很模糊
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.