weizhi-zhong / ip_lap Goto Github PK

CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors

License: Apache License 2.0

Python 100.00%

audio-driven-talking-face cvpr2023 talking-face-generation talking-head

ip_lap's People

Contributors

Stargazers

Watchers

Forkers

matlab2017 zhangsanfeng86 jackzhousz taenggutae zhangziliang04 phaethonp clcarwin zcloud2014 whitebaby zhongshijun maque222 saikiran321 pustar meghanabraj94 slaustld deepmakerai editit 1bstories-tech tinaa23 irislabs-co cncbec monk-after-90s syunar aimonk-labs xzone911 ianlynnee rogerle 731why 751620780 littlejacky xiaogit23 xxsuper yzxzero kekewind wangdezhong hanxinying ccc0168 oldtan2020 asoefsh songfang zengqf1985 chenpython adas-eye starrysky1986 meogoo haotianliangye caoyuhang xjx777 dallesketch qinguoliang zhalok curui knzhang colinyyj chenjoachim adambear quantjia pawansharmaaaa ccc7861 max-presence liuzhen-001 nixiaodi boboyiyi assassindesign yhqiu phkhang jackgo2080 doomconquer kekewolf misback zhouweiwei1822 mr-sure

ip_lap's Issues

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

how can i fix it

Awesome Work! It'd be great if you could share the pre-trained checkpoint of the discriminator model for finetuning Video Renderer module.

@Weizhi-Zhong

Hi, thanks for sharing this incredible work with the community. It helps in collaborating and advancing the lip-sync technology.

It would be great if you could share the pre-trained checkpoint of the discriminator model used in the Video Renderer module. This will help in running fine-tuning experiments.

Thanks.

Ignore frames where face is not detected

Hello, great work on this project! is there some way to ignore the frames where faces weren't detected?

train video render

你好，我想训练一个只适用于某特定人物的模型，请问训练video render时数据集的时长以及多样性是否有要求？

关于Video Renderer模型的训练

你好，请问给出的Video Renderer预训练模型是在训练到多少个step才停下来的，大约训练了多久？感谢:)

The lip shakes so much

@Weizhi-Zhong
Thanks for your extraordinary work. But when I run the code, I found the lip shakes so much. There may be lack of some postprocess? The result as following:
https://drive.google.com/file/d/1L6XfTZKV_nvqv1FW3fFp1-9quOUgALYe/view?usp=sharing

AttributeError: _2D

Hello I recently installed with no issues but when I run the test on windows annaconda I get the following

(iplip) C:\Users\leolo\IP_LAP>python inference_single.py
Traceback (most recent call last):
  File "inference_single.py", line 34, in <module>
    fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False, device='cuda')
  File "C:\Users\leolo\anaconda3\envs\iplip\lib\enum.py", line 354, in __getattr__
    raise AttributeError(name) from None
AttributeError: _2D

All my versions seems correct according to the repo and the requirments.txt, the only thing i am doing differently is using
python inference_single.py
as when i use
CUDA_VISIBLE_DEVICES=0 python inference_single.py
i get the following error

(iplip) C:\Users\leolo\IP_LAP>CUDA_VISIBLE_DEVICES=0 python inference_single.py
'CUDA_VISIBLE_DEVICES' is not recognized as an internal or external command,
operable program or batch file.

Awesome work！I would like to know how to reproduce the performance values in the experimental table 1 of your paper?

非常感谢作者您为社区带来的开源贡献！我想知道如何通过您提供的checkpoint来复现您论文中表格1里面的精度数据(以lrs2为例) inference_single.py只提供了一段测试输出的视频，没有提供相应的精度数据~

调用load_checkpoint导致的问题

我在训练 landmark_generator时，为了能够从上次中断的地方继续训练，发现调用 load_checkpoint()传入的参数 reset_optimizer=False时，会出现下面的错误

开始 landmark_generator_training******************
Project_name: landmarks
Load checkpoint from: ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch_1166_checkpoint_step000035000.pth
Load optimizer state from ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch_1166_checkpoint_step000035000.pth
init dataset,filtering very short videos.....
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 49644/49644 [00:04<00:00, 11383.00it/s]
complete,with available vids: 49475

init dataset,filtering very short videos.....
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 11265.29it/s]
complete,with available vids: 9976

0%| | 0/30 [00:00<?, ?it/s]Saved checkpoint: ./checkpoints/landmark_generation/Pro_landmarks/landmarks_epoch1166_step000035000.pth
Evaluating model for 25 epochs
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:45<00:00, 6.62s/it]
eval_L1_loss 0.005300633320584894 global_step: 35000█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [02:45<00:00, 6.63s/it]
eval_velocity_loss 0.04097183309495449 global_step: 35000
0%| | 0/30 [02:56<?, ?it/s]
Traceback (most recent call last):
File "train_landmarks_generator.py", line 341, in
optimizer.step()
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/optimizer.py", line 23, in use_grad
ret = func(self, *args, **kwargs)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 252, in step
found_inf=found_inf)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 316, in adam
found_inf=found_inf)
File "/opt/anaconda3/envs/iplap_py37/lib/python3.7/site-packages/torch/optim/adam.py", line 363, in single_tensor_adam
exp_avg.mul(beta1).add(grad, alpha=1 - beta1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

如果改为 reset_optimizer=True，可以正常训练，但是生成的 pth文件大小和之前不太一样，不知道为啥会相差十几K，这个正常吗

-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 02:20 landmarks_epoch_166_checkpoint_step000005000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 03:40 landmarks_epoch_333_checkpoint_step000010000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 05:00 landmarks_epoch_500_checkpoint_step000015000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 06:21 landmarks_epoch_666_checkpoint_step000020000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167279907 1月 2 07:41 landmarks_epoch_833_checkpoint_step000025000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167281157 1月 2 09:01 landmarks_epoch_1000_checkpoint_step000030000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167281157 1月 2 10:21 landmarks_epoch_1166_checkpoint_step000035000.pth

调用 load_checkpoint() 传入 reset_optimizer=True

-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 13:36 landmarks_epoch1199_step000036000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 13:54 landmarks_epoch1232_step000037000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:01 landmarks_epoch1266_step000038000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:19 landmarks_epoch1299_step000039000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:38 landmarks_epoch1332_step000040000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 15:56 landmarks_epoch1366_step000041000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:15 landmarks_epoch1399_step000042000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:33 landmarks_epoch1432_step000043000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 16:51 landmarks_epoch1466_step000044000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:10 landmarks_epoch1499_step000045000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:28 landmarks_epoch1532_step000046000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 17:47 landmarks_epoch1566_step000047000.pth
-rw-rw-r-- 1 tailangjun tailangjun 167261549 1月 2 18:05 landmarks_epoch1599_step000048000.pth

静默音频嘴无法停下的问题

非常感谢作者能开源这么优秀的项目！
我使用自己的视频及音频做了下推理测试，发现了类似于wav2lip的问题，即是当出现音频停下的静默片段，人物的嘴仍然会露出原始嘴型（可能嘴是闭上的，但是嘴角仍然按照原始嘴型在抽动）。想请问下作者这个问题是否有解决思路？如果在训练时添加这样静默的数据，是否会改善这个问题？

Mandarin lip synchronization

Hello, thank you very much for opening up this excellent work. When using your code for rendering, we found that the lip synchronization effect in Mandarin is not very good, and there may be unnatural synthesis effects on the face. Do you have any suggestions for optimization? Thank you.

卡在命令行了

生成完毕后，命令行里没有结束提醒

一直卡在这儿

not detect face

En: When I run CUDA_VISIBLE_DEVICES=0 python inference_single.py, if the face in the frame is particularly large, the face detection works fine. However, when the face in the frame is relatively small, it throws an error "not detect face" with the following code:

inference_single.py code line 254: with mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, refine_landmarks=True, min_detection_confidence=0.5) as face_mesh:

I have even tried changing min_detection_confidence to a very small value like 0.01, but it still cannot detect the face. How can I modify it? Thank you for creating such an amazing project. I really appreciate it.

中文：当我运行了CUDA_VISIBLE_DEVICES=0 python inference_single.py之后，如果人脸在画面中特别大，人脸检测正常，当人脸在画面比例较小时候，就会报错not detect face，with mp_face_mesh.FaceMesh(static_image_mode=False, max_num_faces=1, refine_landmarks=True,
min_detection_confidence=0.5) as face_mesh:我把min_detection_confidence改成特别小的数值也不行，比如0.01，还是检测不到人脸。怎么修改？感谢你们写了这么棒的项目。非常感谢

lip_embedding and jaw_embedding

How to know N_l:N_l+T is lip_embedding and N_l+T: is jaw_embedding. As used in code below. I am using more no of landmark points. so i need to know how you are getting this information.
The code is attached below:

#3. fuse embedding
output_tokens=self.fusion_transformer(ref_embedding,mel_embedding,pose_embedding)

#4.output  landmark
**lip_embedding=output_tokens[:,N_l:N_l+T,:] #(B,T,dim)
jaw_embedding=output_tokens[:,N_l+T:,:] #(B,T,dim)**
output_mouse_landmark=self.mouse_keypoint_map(lip_embedding)  ##(B,T,40*2)
output_jaw_landmark=self.jaw_keypoint_map(jaw_embedding)   ##(B,T,17*2)

Can share the log in the process of training video_renderer or give some guidance in the loss change?

Hi,
Thanks for sharing code and your contribution in the field of talking head generation.
I am confused about the training loss of the video_renderer. I have trained the video_renderer in LRS2 dataset which passed 24 epoches, 33220 steps, but the running_gen_loss seemed change randomly. The logs are as follow:

Is that normal or I did something wrong with that? Looking for your reply! Thanks a lot!

The talking head speaks at silence parts

Hi,

it's me again.

I have successfully train a talking head using your repo.

However, I noticed there are some bad cases: when the audio is actually silent, the lips of talking head still move and open mouth.

Any insights about this? thanks.

Setup:

dataset used for training: 30 hours

train landmark generator

Hi，请问在训练landmark generator的时候，是需要手动停止吗？因为我看这个while循环的意思似乎没有break的逻辑：

https://github.com/Weizhi-Zhong/IP_LAP/blob/main/train_landmarks_generator.py#L303

如果是这样的话，请问给出的预训练模型是在训练到了多少个step才停下来的、训练了多久呢？感谢。

The face cannot be detected when I infer my own input video.

Thank for your great work!
But when I run the inference_single.py using my own video, the code cannot work. It show:

Traceback (most recent call last): File "inference_single.py", line 509, in <module> full = merge_face_contour_only(original_background, T_input_frame[2], T_ori_face_coordinates[2][1],fa) #(H,W,3) File "inference_single.py", line 145, in merge_face_contour_only preds = fa.get_landmarks(input_img)[0] # 68x2 TypeError: 'NoneType' object is not subscriptable

So, I wonder how to solve the problem, tks again!

the lip shakes heavily after training

Hello, I trained the landmarks model by the videos I collected from bilibili, running_L1_loss is 0.0066 and running_velocity_loss is 0.0048. but in the inference result, the lip shakes heavily and mismatch with the upper face

4result_N_25_Nl_15.mp4

could you give me some suggestions to fix that? Thanks very much!

finetune question

我最近阅读了您的论文，并发现它对该领域做出了有益的贡献。感谢您宝贵的研究成果。

我想知道这个提供的在英文数据集上的预训练模型，是否可以在中文数据集上微调。

是否具有可行性和有效性，是否有任何见解或建议。在将这些模型应用于中文语言时，是否存在任何挑战或需要注意的事项？此外，如果您对于在中文数据上微调的具体技术或方法有任何建议，我将非常感谢您的见解。

期待支持288x288或者384x384

train, inference code and pretrained weight

Thank you for your hard work on this project! I'm excited to see the code and would love to know when it will be made available to the public. Do you have an estimated timeline for when this might happen?

large content landmarks

Nice work!

trying to train IP_LAP with custom data, but got results: content landmark is generally larger than the pose landmark. => therefore there is a mismatch. However, if I use pretrained model , the size of resulting content landmark is correct.

training dataset: 5 min 480 x 640 video

Video pre-processing step

I see the pre-generated crops of faces from individual frames are currently resized to N x N.

Will aligning the faces in the same video clip using the facial landmarks that are output during the landmark detection help make a better landmark generator?

inference result is incorrect

Why the resulting video mouth flutters back and forth？

面部会有细微抖动

非常感谢您如此出色的工作，在测试之后发现，生成的视频人脸会有很细微的抖动，经过拆帧之后发现，两帧之间生成的区别过大，其中一张下巴很长，像是双下巴或者下巴部分重叠一样，因此产生了面部的抖动，我已经尝试了修改面部平滑的数值，也尝试减小面部检测框的大小，但还是不行，请问您那边有什么思路吗

Inquiries about training operations

I prepared the materials required for training and executed the training file, and the following error occurred, I don't know how to solve it, please advise

I have installed the environment I need for the program to run

When I execute the training code, something like this happens：

I noticed that the program filtered out all my footage, so I changed the min_len to 0

As you can see, he will make a mistake

The error of ValueError: Caught ValueError in DataLoader worker process 0. can be modified to 0 num_workers removed, but obviously this is not appropriate

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

how to solve this issue

This file isn't playable. That might be because the file type is unsupported, the file extension is incorrect, or the file is corrupt.

After I running inference_single.py. It generated this video https://drive.google.com/file/d/1sJNC_3rjy1Op8aKz4cKBBgbpvkOIlYAx/view?usp=sharing
I have debugged the code it stuck here subprocess.call(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) and never reach this line print("succeed output results to:", outfile_path)

我们创建了一个中文讨论组，有需要的加我微信douzijun1999

1705126444.mp4

怎么让最终的视频只输出右侧的而不输出左侧的

如题

T_pose和predict_content拼接的时候有问题，怎么解决呀？

Lip is so small and can not handle long range face

Hi Dear,
Your work is so excellent! But after test many videos, I find generated lip is so small and it can not handle long range face. Could you have any ideas to solve these issues?

Looking forward to the pre-printed version of your paper!

输出结果抖动强烈和位移的问题

输出结果存在严重的位移和强烈抖动的问题，猜测是可能人脸对齐和特征点识别的问题，就是不知道有什么样的改进方案。

train video render, getting artifacts in the mouth

Thanks for your great work and dedication!
I reproduced the code, and training with lrs2 the running_warp_loss: 8.72333, running_gen_loss: 5.7946 but got the artifacts here

训练video renderer的时候进度条卡住

非常感谢作者您为社区带来的开源贡献！我遇到了一个问题，在训练video renderer的时候数据加载完毕之后卡住了，如下图，我只有一个gpu，执行的命令是：python train_video_renderer.py --sketch_root .\output\lrs2_sketch --face_img_root .\output\lrs2_face --audio_root .\output\lrs2_audio windows系统

video renderer speed is slow

Thank you for your open source.
I tried the CUDA_VISIBLE_DEVICES=0 python inference_single.py
and got the result ./test_result/129result_N_25_Nl_15.mp4
every thing goes fine, except the inference speed is slow(only got around 5.0it/s using RTX4090)
Is there any suggestion for optimizing the speed?

The lip has a little shaking

Thanks for the awesome work!
I have 2 questions:

In the generated videos, the lip has a little shaking, do you have any suggestions to improve it？
Can we change the sketch manually during the salient part to make the mouth close?

Slow Training Speed on LRS2 Dataset with 4x RTX 4090 GPUs,(train_video_renderer.py)

I attempted to run train_video_renderer.py on the LRS2 dataset using four RTX 4090 GPUs, but the training speed is exceptionally slow. In a previous issue, I noticed that the author suggested running approximately 300 epochs for optimal results. However, the speed I'm experiencing is much lower than expected. Does anyone has same issue?

train video_renderer.py, Get error

@Weizhi-Zhong@Weizhi-Zhong
Thanks for your great work.

But Get Error When train video_renderer.py.

is this compare with lsp?

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation. i mean, the method seems be very familiar. and inference fps of 3090ti is?

Segmentation fault

在训练video_render的时候使用8*V100/16GB 的时候会提示“Segmentation fault”；但当把batch-size该成16的时候就可以正常训练了，请问这个是正常的吗？如果想尽可能地利用我的8张V100 应该如何修改代码呢？

what's the output of landmarks_generator?

I find the last shape of output in landmarks_generator is 57, why 57?

why dataloader is too slow？

Thanks for your work！
When I run the training code, the process is too slow，and I found the time is mainly spent on the dataloader.
How to solve the problem?

LRS2不能商用诶

中文的可以替换LRS2方案的思路有没有诶

load trained_checkpoint error

训练完landmark 和render模型执行inference_single.py推理时，提示加载checkpoint错误，状态字典缺少一些keys。信息如下：

landmark_generator_model loaded from : checkpoints/landmark_generation/Pro_landmarkT5_d512_fe1024_lay4_head4/landmarkT5_d512_fe1024_lay4_head4_epoch_2020_checkpoint_step000012120.pth
renderer loaded from : checkpoints/renderer/Pro_renderer_T1_ref_N3/renderer_T1_ref_N3_epoch_7000_checkpoint_step000042000.pth
Load checkpoint from: checkpoints/landmark_generation/Pro_landmarkT5_d512_fe1024_lay4_head4/landmarkT5_d512_fe1024_lay4_head4_epoch_2020_checkpoint_step000012120.pth
--local/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
--local/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1. You can also use weights=VGG19_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)

Perceptual loss:
Mode: vgg19
Load checkpoint from: checkpoints/renderer/Pro_renderer_T1_ref_N3/renderer_T1_ref_N3_epoch_7000_checkpoint_step000042000.pth
Traceback (most recent call last):
File "IP_LAP/inference_single.py", line 194, in
renderer = load_model(model=Renderer(), path=renderer_checkpoint_path)
File "IP_LAP/inference_single.py", line 173, in load_model
model.load_state_dict(new_s)
File "local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Renderer:
Missing key(s) in state_dict: "flow_module.conv1.weight", "flow_module.conv1.bias", "flow_module.conv1_bn.weight", "flow_module.conv1_bn.bias", "flow_module.conv1_bn.running_mean", "flow_module.conv1_bn.running_var", "flow_module.conv2.weight", "flow_module.conv2.bias", "flow_module.conv2_bn.weight", "flow_module.conv2_bn.bias", "flow_module.conv2_bn.running_mean", "flow_module.conv2_bn.running_var", "flow_module.spade_layer_1.conv_1.weight", "flow_module.spade_layer_1.conv_1.bias", "flow_module.spade_layer_1.conv_2.weight", "flow_module.spade_layer_1.conv_2.bias", "flow_module.spade_layer_1.spade_layer_1.conv1.weight", "flow_module.spade_layer_1.spade_layer_1.conv1.bias", "flow_module.spade_layer_1.spade_layer_1.gamma.weight", "flow_module.spade_layer_1.spade_layer_1.gamma.bias", "flow_module.spade_layer_1.spade_layer_1.beta.weight", "flow_module.spade_layer_1.spade_layer_1.beta.bias", "flow_module.spade_layer_1.spade_layer_2.conv1.weight", "flow_module.spade_layer_1.spade_layer_2.conv1.bias", "flow_module.spade_layer_1.spade_layer_2.gamma.weight", "flow_module.spade_layer_1.spade_layer_2.gamma.bias", "flow_module.spade_layer_1.spade_layer_2.beta.weight", "flow_module.spade_layer_1.spade_layer_2.beta.bias", "flow_module.spade_layer_2.conv_1.weight", "flow_module.spade_layer_2.conv_1.bias", "flow_module.spade_layer_2.conv_2.weight", "flow_module.spade_layer_2.conv_2.bias", "flow_module.spade_layer_2.spade_layer_1.conv1.weight", "flow_module.spade_layer_2.spade_layer_1.conv1.bias", "flow_module.spade_layer_2.spade_layer_1.gamma.weight", "flow_module.spade_layer_2.spade_layer_1.gamma.bias", "flow_module.spade_layer_2.spade_layer_1.beta.weight", "flow_module.spade_layer_2.spade_layer_1.beta.bias", "flow_module.spade_layer_2.spade_layer_2.conv1.weight", "flow_module.spade_layer_2.spade_layer_2.conv1.bias", "flow_module.spade_layer_2.spade_layer_2.gamma.weight", "flow_module.spade_layer_2.spade_layer_2.gamma.bias", "flow_module.spade_layer_2.spade_layer_2.beta.weight", "flow_module.spade_layer_2.spade_layer_2.beta.bias", "flow_module.spade_layer_4.conv_1.weight", "flow_module.spade_layer_4.conv_1.bias", "flow_module.spade_layer_4.conv_2.weight", "flow_module.spade_layer_4.conv_2.bias", "flow_module.spade_layer_4.spade_layer_1.conv1.weight", "flow_module.spade_layer_4.spade_layer_1.conv1.bias", "flow_module.spade_layer_4.spade_layer_1.gamma.weight", "flow_module.spade_layer_4.spade_layer_1.gamma.bias", "flow_module.spade_layer_4.spade_layer_1.beta.weight", "flow_module.spade_layer_4.spade_layer_1.beta.bias", "flow_module.spade_layer_4.spade_layer_2.conv1.weight", "flow_module.spade_layer_4.spade_layer_2.conv1.bias", "flow_module.spade_layer_4.spade_layer_2.gamma.weight", "flow_module.spade_layer_4.spade_layer_2.gamma.bias", "flow_module.spade_layer_4.spade_layer_2.beta.weight", "flow_module.spade_layer_4.spade_layer_2.beta.bias", "flow_module.conv_4.weight", "flow_module.conv_4.bias", "flow_module.conv_5.0.weight", "flow_module.conv_5.0.bias", "flow_module.conv_5.2.weight", "flow_module.conv_5.2.bias".

Videos at 25fps tend to be choppy by default

I'm currently working on a project where I'm utilizing your method. I've observed that the method is configured to operate at 25 frames per second (fps), and I'm trying to understand the rationale behind this choice.

I have a collection of videos in my dataset that are recorded at 30fps, which is a standard frame rate for many recording devices including the smartphone.

However, when I downsample these videos (with ffmpeg) from 30fps to 25fps in order to match the method's operating rate, the resultant videos appear very choppy and lack smoothness. I have tried adding motion blur between the frames, but without much success.

Are there specific reasons why the method is set to work at 25fps instead of the more commonly used 30fps? Would it be possible to modify the method to operate at 30fps without significantly impacting its performance or the results?

Additionally, I would appreciate any suggestions on how to prevent the loss of smoothness when downsampling videos from 30fps to 25fps.

推理视频分辨率降低

作者您好，请问推理出来的视频为什么分辨率变得很低了，很模糊