ajay-sainy / wav2lip-gfpgan Goto Github PK

View Code? Open in Web Editor NEW

919.0 15.0 234.0 29.94 MB

High quality Lip sync

Python 97.88% Shell 0.09% Jupyter Notebook 2.03%

deepfakes gfpgan wav2lip

wav2lip-gfpgan's Introduction

Combine Lip Sync AI and Face Restoration AI to get ultra high quality videos.

Demo Video

Projects referred:

Video sources:

wav2lip-gfpgan's People

Contributors

Stargazers

Watchers

Forkers

ishine whitefu zestfulcitrus wawaa rookiexyz nelsontseng0704 volckan zhangziliang04 adambear dfqytcom c1a1o1 jorik041 maxmax2016 beyondchenlin chantysothy eltonkola dineshc275 azuredsky nasa03 n1wd yhbbobo samsgates limitmhw littlebugk walkerfeng yhx123 seraph1188v sekarc37 h5wawaji jacobsamro philmaas hellogaojj gh81997167 sirgenkii liweilalala daxiangquan hengtuibabai ylic2022 indianajson yihangya jiangyv1718 saravananbcs gregoryzeng gauravk95 shivamsinha15 pustar vpegasus garydio a-thousand-suns liusi2328465 yzxzero wyntalgeer laose307 gw250 jonnyzcy carrrie117 jillamethyst xiaoqingwang iamleon121 lianjiang-yulj gtertrais alawnchen ai178092788 kengomj maicled a3294352541 realpatrickw moming133 oceans0423 matou9 lifeissea yapianwan chuanbei888 iwillcodeu magicnight heraldia autohub7 jiangshanfan918 langzizhixin sanif danieltea harrison001 coding-alt w4l6 liujianwei2023 mattxgogo naisongwen imaginashaun johannespischinger dominator88 techsuni2023 qianyanjishu qq731698904 gelove tmaaefu192 ben8804 vincentsider xiedongmingming liruilongs baoxueyuan

wav2lip-gfpgan's Issues

Perfomance

Nice work! What's the pipeline performance? For example, how much does it take to process 60 seconds of fullhd video?

已创建高清模型讨论组，需要交流的请加我微信Rena625729

How much time do you need to lip sync a 10 sec or 1 minute video?

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming.
I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)

Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

ffmpeg

run ：
command = 'ffmpeg -y -i {} -strict -2 {}'.format(args.audio, 'temp/temp.wav')
subprocess.call(command, shell=True)
output：“'ffmpeg' ��ڲ��ⲿ��Ҳ��ǿ��еĳ��
��ļ��”

Still some boxy effects on some of the examples.

Hi @ajay-sainy nice work and efforts. I have tried it but still feels there is some boxy effects around the chin and neck. Can you have any idea how to remove it or how we can improve its quality precisely?

model file parsing_parsenet.pth is corrupted.

I downloaded parsing_parsenet.pth and used it for inference, then the error occured
RuntimeError: unexpected EOF, expected 133541137 more bytes. The file might be corrupted.

I have re-downloaded several times, while it still did not work.

Errors in the last ffmpeg step

Hi,

Thanks for your project, your demo looks great. I have a couple errors in the ffmpeg last step.

Can you help me. Thanks!

Screen capture here: https://markuphero.com/share/IgN6xYsCMRIdhDHKXduR

高清稳定的效果演示

演示效果 https://space.bilibili.com/3494353091693235 ，如有需要，联系metahuman668

wav2lip中文讨论组想加入的请添加微信号: wav2lip

方便的交流wav2lip相关的问题，联系微信号wav2lip申请入群

Does the code in google colab still work?

For me, only the example files work, if I add any other .mp4 and .mp3 file it always gives me this error, before it worked fine, also now asking you to restart colab and define the basePath again.

Just making it clear that ^C goes alone, I don't press anything on the keyboard.

Checkpoint model source has moved

Hi! The GDrive folder which stored the original Wav2Lip model has moved. Can you pls update it for the new location.
Thx

Face not detected! Ensure the video contains a face

python inference.py --checkpoint_path checkpoints/wav2lip.pth --face /root/Wav2Lip-GFPGAN/inputs/girl.mp4 --audio /root/Wav2Lip-GFPGAN/inputs/girl.mp3  --outfile /root/Wav2Lip-GFPGAN/outputs/girl.mp4

100%|███████████████████████████████████████████████████████████████████████████████████| 84/84 [36:44<00:00, 26.24s/it]
  0%|                                                                                            | 0/22 [36:44<?, ?it/s]
Traceback (most recent call last):
  File "inference.py", line 280, in <module>
    main()
  File "inference.py", line 249, in main
    for i, (img_batch, mel_batch, frames, coords) in enumerate(tqdm(gen, 
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1130, in __iter__
    for obj in iterable:
  File "inference.py", line 113, in datagen
    face_det_results = face_detect(frames) # BGR2RGB for CNN face detection
  File "inference.py", line 92, in face_detect
    raise ValueError('Face not detected! Ensure the video contains a face in all the frames.')
ValueError: Face not detected! Ensure the video contains a face in all the frames.

我的视频有人脸啊，而且一直是这个女的在说话，例如下图

请问速度怎么样

gfpgan本身项目跑起来很慢，不知道将wav2lip+gfpgan结合起来，速度怎么样

ZeroDivisionError: float division by zero

Using cuda for inference.
VIDIOC_REQBUFS: Inappropriate ioctl for device
Reading video frames...
Number of frames available for inference: 0
Extracting raw audio...
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
configuration: --prefix=/opt/conda/conda-bld/ffmpeg_1597178665428/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'inputs/kimk_7s_raw.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : https://clipchamp.com
comment : Create videos with https://clipchamp.com/en/video-editor - free online video editor, video compressor, video converter.
Duration: 00:00:08.48, start: 0.000000, bitrate: 7936 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080, 7778 kb/s, 30 fps, 30 tbr, 15360 tbn, 30720 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 192 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'temp/temp.wav':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
ICMT : Create videos with https://clipchamp.com/en/video-editor - free online video editor, video compressor, video converter.
ISFT : Lavf58.45.100
Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
Metadata:
handler_name : SoundHandler
encoder : Lavc58.91.100 pcm_s16le
size= 1456kB time=00:00:08.45 bitrate=1411.4kbits/s speed=1.18e+03x
video:0kB audio:1456kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.013817%
(80, 677)
Traceback (most recent call last):
File "inference.py", line 280, in
main()
File "inference.py", line 232, in main
mel_idx_multiplier = 80./fps
ZeroDivisionError: float division by zero

构造GFPGAN报错

File "inference_gfpgan.py", line 105, in main
restorer = GFPGANer(
File "/home/xxx/2023-05/wav2Lip-GFPGAN/GFPGAN-master/gfpgan/utils.py", line 76, in init
self.face_helper = FaceRestoreHelper(
File "/home/xxx/.local/lib/python3.8/site-packages/facexlib/utils/face_restoration_helper.py", line 103, in init
self.face_parse = init_parsing_model(model_name='parsenet', device=self.device, model_rootpath=model_rootpath)
File "/home/xxx.local/lib/python3.8/site-packages/facexlib/parsing/init.py", line 20, in init_parsing_model
load_net = torch.load(model_path, map_location=lambda storage, loc: storage)
File "/home/xxx/.local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/xxx/.local/lib/python3.8/site-packages/torch/serialization.py", line 1020, in _legacy_load
typed_storage._storage._set_from_file(
RuntimeError: unexpected EOF, expected 761657 more bytes. The file might be corrupted.

多次下载模型都是报告EOF错误，文件可能损坏。

Add combine btaches and download

concatTextFilePath = outputPath + "/concat.txt"
concatTextFile=open(concatTextFilePath,"w")
for ips in range(batch):
  concatTextFile.write("file batch_000" + str(ips) + ".avi\n")
concatTextFile.close()

concatedVideoOutputPath = outputPath + "/concated_output.avi"
!ffmpeg -y -f concat -i {concatFilePath} -c copy {concatedVideoOutputPath} 

finalProcessedOuputVideo = processedVideoOutputPath+'/final_with_audio.avi'
!ffmpeg -y -i {concatedVideoOutputPath} -i {inputAudioPath} -map 0 -map 1:a -c:v copy -shortest {finalProcessedOuputVideo}

from google.colab import files
files.download(finalProcessedOuputVideo)