Giter Site home page Giter Site logo

wav2lip-gfpgan's Introduction

wav2lip-gfpgan's People

Contributors

ajay-sainy avatar mishra-ankit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wav2lip-gfpgan's Issues

Perfomance

Nice work! What's the pipeline performance? For example, how much does it take to process 60 seconds of fullhd video?

How much time do you need to lip sync a 10 sec or 1 minute video?

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming.
I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)

Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

ffmpeg

run :
command = 'ffmpeg -y -i {} -strict -2 {}'.format(args.audio, 'temp/temp.wav')
subprocess.call(command, shell=True)
output:“'ffmpeg' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���”

model file parsing_parsenet.pth is corrupted.

I downloaded parsing_parsenet.pth and used it for inference, then the error occured
RuntimeError: unexpected EOF, expected 133541137 more bytes. The file might be corrupted.

I have re-downloaded several times, while it still did not work.

Does the code in google colab still work?

For me, only the example files work, if I add any other .mp4 and .mp3 file it always gives me this error, before it worked fine, also now asking you to restart colab and define the basePath again.

image

Just making it clear that ^C goes alone, I don't press anything on the keyboard.

Face not detected! Ensure the video contains a face

python inference.py --checkpoint_path checkpoints/wav2lip.pth --face /root/Wav2Lip-GFPGAN/inputs/girl.mp4 --audio /root/Wav2Lip-GFPGAN/inputs/girl.mp3  --outfile /root/Wav2Lip-GFPGAN/outputs/girl.mp4

100%|███████████████████████████████████████████████████████████████████████████████████| 84/84 [36:44<00:00, 26.24s/it]
  0%|                                                                                            | 0/22 [36:44<?, ?it/s]
Traceback (most recent call last):
  File "inference.py", line 280, in <module>
    main()
  File "inference.py", line 249, in main
    for i, (img_batch, mel_batch, frames, coords) in enumerate(tqdm(gen, 
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1130, in __iter__
    for obj in iterable:
  File "inference.py", line 113, in datagen
    face_det_results = face_detect(frames) # BGR2RGB for CNN face detection
  File "inference.py", line 92, in face_detect
    raise ValueError('Face not detected! Ensure the video contains a face in all the frames.')
ValueError: Face not detected! Ensure the video contains a face in all the frames.

我的视频有人脸啊,而且一直是这个女的在说话,例如下图

image

请问速度怎么样

gfpgan本身项目跑起来很慢,不知道将wav2lip+gfpgan结合起来,速度怎么样

ZeroDivisionError: float division by zero

Using cuda for inference.
VIDIOC_REQBUFS: Inappropriate ioctl for device
Reading video frames...
Number of frames available for inference: 0
Extracting raw audio...
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
configuration: --prefix=/opt/conda/conda-bld/ffmpeg_1597178665428/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'inputs/kimk_7s_raw.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : https://clipchamp.com
comment : Create videos with https://clipchamp.com/en/video-editor - free online video editor, video compressor, video converter.
Duration: 00:00:08.48, start: 0.000000, bitrate: 7936 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080, 7778 kb/s, 30 fps, 30 tbr, 15360 tbn, 30720 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 192 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'temp/temp.wav':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
ICMT : Create videos with https://clipchamp.com/en/video-editor - free online video editor, video compressor, video converter.
ISFT : Lavf58.45.100
Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
Metadata:
handler_name : SoundHandler
encoder : Lavc58.91.100 pcm_s16le
size= 1456kB time=00:00:08.45 bitrate=1411.4kbits/s speed=1.18e+03x
video:0kB audio:1456kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.013817%
(80, 677)
Traceback (most recent call last):
File "inference.py", line 280, in
main()
File "inference.py", line 232, in main
mel_idx_multiplier = 80./fps
ZeroDivisionError: float division by zero

构造GFPGAN报错

File "inference_gfpgan.py", line 105, in main
restorer = GFPGANer(
File "/home/xxx/2023-05/wav2Lip-GFPGAN/GFPGAN-master/gfpgan/utils.py", line 76, in init
self.face_helper = FaceRestoreHelper(
File "/home/xxx/.local/lib/python3.8/site-packages/facexlib/utils/face_restoration_helper.py", line 103, in init
self.face_parse = init_parsing_model(model_name='parsenet', device=self.device, model_rootpath=model_rootpath)
File "/home/xxx.local/lib/python3.8/site-packages/facexlib/parsing/init.py", line 20, in init_parsing_model
load_net = torch.load(model_path, map_location=lambda storage, loc: storage)
File "/home/xxx/.local/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/xxx/.local/lib/python3.8/site-packages/torch/serialization.py", line 1020, in _legacy_load
typed_storage._storage._set_from_file(
RuntimeError: unexpected EOF, expected 761657 more bytes. The file might be corrupted.

多次下载模型都是报告EOF错误,文件可能损坏。

Add combine btaches and download

concatTextFilePath = outputPath + "/concat.txt"
concatTextFile=open(concatTextFilePath,"w")
for ips in range(batch):
  concatTextFile.write("file batch_000" + str(ips) + ".avi\n")
concatTextFile.close()

concatedVideoOutputPath = outputPath + "/concated_output.avi"
!ffmpeg -y -f concat -i {concatFilePath} -c copy {concatedVideoOutputPath} 

finalProcessedOuputVideo = processedVideoOutputPath+'/final_with_audio.avi'
!ffmpeg -y -i {concatedVideoOutputPath} -i {inputAudioPath} -map 0 -map 1:a -c:v copy -shortest {finalProcessedOuputVideo}

from google.colab import files
files.download(finalProcessedOuputVideo)

How use localhost?

Google Colab there is a limit, I would like to use it on my computer that has no limit

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.