Combine Lip Sync AI and Face Restoration AI to get ultra high quality videos.
Projects referred:
Video sources:
High quality Lip sync
Combine Lip Sync AI and Face Restoration AI to get ultra high quality videos.
Projects referred:
Video sources:
Nice work! What's the pipeline performance? For example, how much does it take to process 60 seconds of fullhd video?
I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming.
I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)
Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)
run :
command = 'ffmpeg -y -i {} -strict -2 {}'.format(args.audio, 'temp/temp.wav')
subprocess.call(command, shell=True)
output:“'ffmpeg' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���”
Hi @ajay-sainy nice work and efforts. I have tried it but still feels there is some boxy effects around the chin and neck. Can you have any idea how to remove it or how we can improve its quality precisely?
I downloaded parsing_parsenet.pth and used it for inference, then the error occured
RuntimeError: unexpected EOF, expected 133541137 more bytes. The file might be corrupted.
I have re-downloaded several times, while it still did not work.
Hi,
Thanks for your project, your demo looks great. I have a couple errors in the ffmpeg last step.
Can you help me. Thanks!
Screen capture here: https://markuphero.com/share/IgN6xYsCMRIdhDHKXduR
演示效果 https://space.bilibili.com/3494353091693235 ,如有需要,联系metahuman668
方便的交流wav2lip相关的问题,联系微信号wav2lip申请入群
Hi! The GDrive folder which stored the original Wav2Lip model has moved. Can you pls update it for the new location.
Thx
python inference.py --checkpoint_path checkpoints/wav2lip.pth --face /root/Wav2Lip-GFPGAN/inputs/girl.mp4 --audio /root/Wav2Lip-GFPGAN/inputs/girl.mp3 --outfile /root/Wav2Lip-GFPGAN/outputs/girl.mp4
100%|███████████████████████████████████████████████████████████████████████████████████| 84/84 [36:44<00:00, 26.24s/it]
0%| | 0/22 [36:44<?, ?it/s]
Traceback (most recent call last):
File "inference.py", line 280, in <module>
main()
File "inference.py", line 249, in main
for i, (img_batch, mel_batch, frames, coords) in enumerate(tqdm(gen,
File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1130, in __iter__
for obj in iterable:
File "inference.py", line 113, in datagen
face_det_results = face_detect(frames) # BGR2RGB for CNN face detection
File "inference.py", line 92, in face_detect
raise ValueError('Face not detected! Ensure the video contains a face in all the frames.')
ValueError: Face not detected! Ensure the video contains a face in all the frames.
我的视频有人脸啊,而且一直是这个女的在说话,例如下图
gfpgan本身项目跑起来很慢,不知道将wav2lip+gfpgan结合起来,速度怎么样
Using cuda for inference.
VIDIOC_REQBUFS: Inappropriate ioctl for device
Reading video frames...
Number of frames available for inference: 0
Extracting raw audio...
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
configuration: --prefix=/opt/conda/conda-bld/ffmpeg_1597178665428/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'inputs/kimk_7s_raw.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : https://clipchamp.com
comment : Create videos with https://clipchamp.com/en/video-editor - free online video editor, video compressor, video converter.
Duration: 00:00:08.48, start: 0.000000, bitrate: 7936 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080, 7778 kb/s, 30 fps, 30 tbr, 15360 tbn, 30720 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 192 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'temp/temp.wav':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
ICMT : Create videos with https://clipchamp.com/en/video-editor - free online video editor, video compressor, video converter.
ISFT : Lavf58.45.100
Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
Metadata:
handler_name : SoundHandler
encoder : Lavc58.91.100 pcm_s16le
size= 1456kB time=00:00:08.45 bitrate=1411.4kbits/s speed=1.18e+03x
video:0kB audio:1456kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.013817%
(80, 677)
Traceback (most recent call last):
File "inference.py", line 280, in
main()
File "inference.py", line 232, in main
mel_idx_multiplier = 80./fps
ZeroDivisionError: float division by zero
File "inference_gfpgan.py", line 105, in main
restorer = GFPGANer(
File "/home/xxx/2023-05/wav2Lip-GFPGAN/GFPGAN-master/gfpgan/utils.py", line 76, in init
self.face_helper = FaceRestoreHelper(
File "/home/xxx/.local/lib/python3.8/site-packages/facexlib/utils/face_restoration_helper.py", line 103, in init
self.face_parse = init_parsing_model(model_name='parsenet', device=self.device, model_rootpath=model_rootpath)
File "/home/xxx.local/lib/python3.8/site-packages/facexlib/parsing/init.py", line 20, in init_parsing_model
load_net = torch.load(model_path, map_location=lambda storage, loc: storage)
File "/home/xxx/.local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/xxx/.local/lib/python3.8/site-packages/torch/serialization.py", line 1020, in _legacy_load
typed_storage._storage._set_from_file(
RuntimeError: unexpected EOF, expected 761657 more bytes. The file might be corrupted.
多次下载模型都是报告EOF错误,文件可能损坏。
concatTextFilePath = outputPath + "/concat.txt"
concatTextFile=open(concatTextFilePath,"w")
for ips in range(batch):
concatTextFile.write("file batch_000" + str(ips) + ".avi\n")
concatTextFile.close()
concatedVideoOutputPath = outputPath + "/concated_output.avi"
!ffmpeg -y -f concat -i {concatFilePath} -c copy {concatedVideoOutputPath}
finalProcessedOuputVideo = processedVideoOutputPath+'/final_with_audio.avi'
!ffmpeg -y -i {concatedVideoOutputPath} -i {inputAudioPath} -map 0 -map 1:a -c:v copy -shortest {finalProcessedOuputVideo}
from google.colab import files
files.download(finalProcessedOuputVideo)
Google Colab there is a limit, I would like to use it on my computer that has no limit
能做个界面版的么
已创建wav2lip模型技术交流讨论组,需要的添加微信mmns329
I get TypeError: mel() takes 0 positional arguments but 2 positional arguments (and 3 keyword-only arguments) were given
despite having the right format and length.
请问项目对pythoin版本有要求吗
高清模型训练群+An_9901
请问如何使用呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.