Giter Site home page Giter Site logo

ali-vilab / dreamtalk Goto Github PK

View Code? Open in Web Editor NEW
1.3K 28.0 160.0 32.39 MB

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Home Page: https://dreamtalk-project.github.io/

License: MIT License

Python 100.00%
audio-visual-learning face-animation talking-head video-generation

dreamtalk's People

Contributors

alibaba-oss avatar eltociear avatar lukevs avatar yifengma9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dreamtalk's Issues

Face in output not matching with the original face

First of all the lip sync is the best I have seen. However, the face is a little distorted and doesn't match the input face. Is there any settings I can edit in the inference_for_demo_video.py for more accurate face matching? I tried playing with the number of steps of the diffusion model, even setting it up to 500, but didn't notice any difference.

vlcsnap-2024-01-05-17h13m03s565
src_img

failed when generating long video: `'numpy.ndarray' object has no attribute 'unsqueeze'`

I generated a little longer video(3 or 4 minutes), it reports 'numpy.ndarray' object has no attribute 'unsqueeze'

I spotted hear https://github.com/ali-vilab/dreamtalk/blob/main/inference_for_demo_video.py#L118

if len(pose) >= len(gen_exp):
    selected_pose = pose[: len(gen_exp)]
else:
    selected_pose = pose[-1].unsqueeze(0).repeat(len(gen_exp), 1)
    selected_pose[: len(pose)] = pose

the pose is of np type(not a torch Tensor), and the second argument ofrepeat (dimension) may be 0, so I changed as such:

selected_pose = pose[-1][np.newaxis, ...].repeat(len(gen_exp), 0)

It works.

Meaning of the 73 coefficients generated

Hi! In the num_of_frames X 73 array generated by the diffusion model, do the 73 3dmm coefficients correspond with any particular keypoints of the face. I know that 64:73 are for pose, but do the initial 0:64 have some one to one mapping with the elements of face like the 68 keypoint detector has?

TypeError: can only concatenate str (not "bool") to str

Traceback (most recent call last):
File "/home/ubuntu/miniforge3/envs/saba/lib/python3.12/site-packages/gradio/queueing.py", line 522, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/saba/lib/python3.12/site-packages/gradio/route_utils.py", line 260, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/saba/lib/python3.12/site-packages/gradio/blocks.py", line 1689, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/saba/lib/python3.12/site-packages/gradio/blocks.py", line 1255, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/saba/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/saba/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/saba/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/saba/lib/python3.12/site-packages/gradio/utils.py", line 750, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/src/dreamtalk/app.py", line 78, in infer
"--style_clip_path=data/style_clip/3DMM/" + emotional_style,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "bool") to str

nvrtc compilation failed

(dreamtalk) F:\AI\dreamtalk-main>python inference_for_demo_video.py --wav_path data/audio/acknowledgement_english.m4a --style_clip_path data/style_clip/3DMM/M030_front_neutral_level1_001.mat --pose_path data/pose/RichardShelby_front_neutral_level1_001.mat --image_path data/src_img/uncropped/male_face.png --cfg_scale 1.0 --max_gen_len 30 --output_name acknowledgement_english@M030_front_neutral_level1_001@male_face
D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 10.2.1 (GCC) 20200726
configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libsrt --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libgsm --enable-librav1e --disable-w32threads --enable-libmfx --enable-ffnvcodec --enable-cuda-llvm --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/audio/acknowledgement_english.m4a':
Metadata:
major_brand : M4A
minor_version : 0
compatible_brands: M4A isommp42
creation_time : 2023-12-20T14:25:20.000000Z
iTunSMPB : 00000000 00000840 00000000 00000000000C23C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Duration: 00:00:16.57, start: 0.044000, bitrate: 246 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 244 kb/s (default)
Metadata:
creation_time : 2023-12-20T14:25:20.000000Z
handler_name : Core Media Audio
Stream mapping:
Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.
Output #0, wav, to 'tmp/acknowledgement_english@M030_front_neutral_level1_001@male_face\acknowledgement_english@M030_front_neutral_level1_001@male_face_16K.wav':
Metadata:
major_brand : M4A
minor_version : 0
compatible_brands: M4A isommp42
iTunSMPB : 00000000 00000840 00000000 00000000000C23C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ISFT : Lavf58.45.100
Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s (default)
Metadata:
creation_time : 2023-12-20T14:25:20.000000Z
handler_name : Core Media Audio
encoder : Lavc58.91.100 pcm_s16le
size= 518kB time=00:00:16.57 bitrate= 256.0kbits/s speed=1.02e+03x
video:0kB audio:518kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.014706%
Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-english were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']

  • This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Traceback (most recent call last):
    File "inference_for_demo_video.py", line 211, in
    max_audio_len=args.max_gen_len,
    File "D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
    File "inference_for_demo_video.py", line 105, in inference_one_video
    ddim_num_step=ddim_num_step,
    File "F:\AI\dreamtalk-main\core\networks\diffusion_net.py", line 226, in sample
    ready_style_code=ready_style_code,
    File "F:\AI\dreamtalk-main\core\networks\diffusion_net.py", line 170, in ddim_sample
    x_t_double, t=t_tensor_double, **context_double
    File "D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "F:\AI\dreamtalk-main\core\networks\diffusion_util.py", line 126, in forward
    style_code = self.style_encoder(style_clip, style_pad_mask)
    File "D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "F:\AI\dreamtalk-main\core\networks\generator.py", line 193, in forward
    style_code = self.aggregate_method(permute_style, pad_mask)
    File "D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "F:\AI\dreamtalk-main\core\networks\self_attention_pooling.py", line 31, in forward
    att_logits = self.W(batch_rep).squeeze(-1)
    File "D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
    File "D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "F:\AI\dreamtalk-main\core\networks\mish.py", line 51, in forward
    return mish(input)
    RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_tanh_mul(float* t0, float* t1, float* aten_mul) {
{
float v = __ldg(t0 + (((512 * blockIdx.x + threadIdx.x) / 65536) * 65536 + 256 * (((512 * blockIdx.x + threadIdx.x) / 256) % 256)) + (512 * blockIdx.x + threadIdx.x) % 256);
float v_1 = __ldg(t1 + (((512 * blockIdx.x + threadIdx.x) / 65536) * 65536 + 256 * (((512 * blockIdx.x + threadIdx.x) / 256) % 256)) + (512 * blockIdx.x + threadIdx.x) % 256);
aten_mul[(((512 * blockIdx.x + threadIdx.x) / 65536) * 65536 + 256 * (((512 * blockIdx.x + threadIdx.x) / 256) % 256)) + (512 * blockIdx.x + threadIdx.x) % 256] = v * (tanhf(v_1));
}
}

'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia

video:0kB audio:214kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.035636%
2024-02-25 18:52:47.991396: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:4193: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn(

Training

Hey,

Are the Style-aware Lip Expert and Style Predictor part of the repo?
I couldn't find them.
Is there any plan to release them?

Thank you

Parameter function

Dear mayf18,

Thank you for your great effort it is great work.

What is the meaning and difference between style_clip_path and base_path parameters?

only audio inference

First thank you for your great effort it is great work. But for inference, just inference with style video is released. Could you share inference code for just audio input?

Thankss!

subprocess.CalledProcessError: Command '['python', 'inference_for_demo_video.py'

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 522, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 260, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1689, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1255, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 750, in wrapper
response = f(*args, **kwargs)
File "", line 61, in infer
execute_command(command)
File "", line 20, in execute_command
subprocess.run(command, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python', 'inference_for_demo_video.py', '--wav_path=/tmp/gradio/178d11976f94d691ff57dd90a34c7603b4309ea3/acknowledgement_english.wav', '--style_clip_path=data/style_clip/3DMM/M030_front_neutral_level1_001.mat', '--pose_path=data/pose/RichardShelby_front_neutral_level1_001.mat', '--image_path=/tmp/gradio/c98d5cb0b71d52ff1b386d8f1dda445058ce5825/IMG_5792-removebg-preview.jpeg', '--cfg_scale=1.0', '--max_gen_len=30', '--output_name=lipsynced_result_20240328141540']' returned non-zero exit status 1.

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) nvrtc compilation failed:

Hello,

when i run the following command -
python inference_for_demo_video.py
--wav_path data/audio/acknowledgement_chinese.m4a
--style_clip_path data/style_clip/3DMM/M030_front_surprised_level3_001.mat
--pose_path data/pose/RichardShelby_front_neutral_level1_001.mat
--image_path data/src_img/cropped/zp1.png
--disable_img_crop
--cfg_scale 1.0
--max_gen_len 30
--output_name acknowledgement_chinese@M030_front_surprised_level3_001@zp1

I get the error
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
configuration: --prefix=/tmp/build/80754af9/ffmpeg_1587154242452/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --cc=/tmp/build/80754af9/ffmpeg_1587154242452/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/audio/acknowledgement_english.m4a':
Metadata:
major_brand : M4A
minor_version : 0
compatible_brands: M4A isommp42
creation_time : 2023-12-20T14:25:20.000000Z
iTunSMPB : 00000000 00000840 00000000 00000000000C23C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Duration: 00:00:16.57, start: 0.044000, bitrate: 246 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 244 kb/s (default)
Metadata:
creation_time : 2023-12-20T14:25:20.000000Z
handler_name : Core Media Audio
Stream mapping:
Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.
Output #0, wav, to 'tmp/acknowledgement_english@M030_front_neutral_level1_001@male_face--device=cpu/acknowledgement_english@M030_front_neutral_level1_001@male_face--device=cpu_16K.wav':
Metadata:
major_brand : M4A
minor_version : 0
compatible_brands: M4A isommp42
iTunSMPB : 00000000 00000840 00000000 00000000000C23C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ISFT : Lavf58.29.100
Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s (default)
Metadata:
creation_time : 2023-12-20T14:25:20.000000Z
handler_name : Core Media Audio
encoder : Lavc58.54.100 pcm_s16le
size= 518kB time=00:00:16.57 bitrate= 256.0kbits/s speed=1.23e+03x
video:0kB audio:518kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.014706%
Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-english were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']

  • This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Traceback (most recent call last):
    File "inference_for_demo_video.py", line 224, in
    max_audio_len=args.max_gen_len,
    File "/home/ziyang/anaconda3/envs/dt/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
    File "inference_for_demo_video.py", line 105, in inference_one_video
    ddim_num_step=ddim_num_step,
    File "/home/ziyang/dreamtalk-main/core/networks/diffusion_net.py", line 226, in sample
    ready_style_code=ready_style_code,
    File "/home/ziyang/dreamtalk-main/core/networks/diffusion_net.py", line 170, in ddim_sample
    x_t_double, t=t_tensor_double, **context_double
    File "/home/ziyang/anaconda3/envs/dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/ziyang/dreamtalk-main/core/networks/diffusion_util.py", line 126, in forward
    style_code = self.style_encoder(style_clip, style_pad_mask)
    File "/home/ziyang/anaconda3/envs/dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/ziyang/dreamtalk-main/core/networks/generator.py", line 193, in forward
    style_code = self.aggregate_method(permute_style, pad_mask)
    File "/home/ziyang/anaconda3/envs/dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/ziyang/dreamtalk-main/core/networks/self_attention_pooling.py", line 31, in forward
    att_logits = self.W(batch_rep).squeeze(-1)
    File "/home/ziyang/anaconda3/envs/dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/ziyang/anaconda3/envs/dt/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
    File "/home/ziyang/anaconda3/envs/dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/ziyang/dreamtalk-main/core/networks/mish.py", line 51, in forward
    return mish(input)
    RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
    nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_tanh_mul(float* t0, float* t1, float* aten_mul) {
{
float v = __ldg(t0 + (((512 * blockIdx.x + threadIdx.x) / 65536) * 65536 + 256 * (((512 * blockIdx.x + threadIdx.x) / 256) % 256)) + (512 * blockIdx.x + threadIdx.x) % 256);
float v_1 = __ldg(t1 + (((512 * blockIdx.x + threadIdx.x) / 65536) * 65536 + 256 * (((512 * blockIdx.x + threadIdx.x) / 256) % 256)) + (512 * blockIdx.x + threadIdx.x) % 256);
aten_mul[(((512 * blockIdx.x + threadIdx.x) / 65536) * 65536 + 256 * (((512 * blockIdx.x + threadIdx.x) / 256) % 256)) + (512 * blockIdx.x + threadIdx.x) % 256] = v * (tanhf(v_1));
}
}

Is there a problem with my GPU? I have no way to solve it.

Best regards,
Ziyang Jiao

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Hello,

when i run the following command -

python inference_for_demo_video.py \
--wav_path data/audio/acknowledgement_english.m4a \
--style_clip_path data/style_clip/3DMM/M030_front_neutral_level1_001.mat \
--pose_path data/pose/RichardShelby_front_neutral_level1_001.mat \
--image_path data/src_img/uncropped/male_face.png \
--cfg_scale 1.0 \
--max_gen_len 30 \
--output_name acknowledgement_english@M030_front_neutral_level1_001@male_face

I get the error

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

my question is if it is possible to use the cpu instead and if so how to achieve it?

Best regards,
Stanko

not work

ubprocess.CalledProcessError: Command '['python', 'inference_for_demo_video.py', '--wav_path=/tmp/gradio/178d11976f94d691ff57dd90a34c7603b4309ea3/acknowledgement_english.wav', '--style_clip_path=data/style_clip/3DMM/M030_front_neutral_level1_001.mat', '--pose_path=data/pose/RichardShelby_front_neutral_level1_001.mat', '--image_path=/tmp/gradio/576327d42cb66a65583efd7a29ee41c0a3d6f92a/微信截图_20240130102341.png', '--cfg_scale=1.0', '--max_gen_len=30', '--output_name=lipsynced_result_20240130022752']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 695, in wrapper
    response = f(*args, **kwargs)
  File "<ipython-input-3-919eaa451d96>", line 49, in infer
    execute_command(command)
  File "<ipython-input-3-919eaa451d96>", line 32, in execute_command
    subprocess.run(command, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python', 'inference_for_demo_video.py', '--wav_path=/tmp/gradio/178d11976f94d691ff57dd90a34c7603b4309ea3/acknowledgement_english.wav', '--style_clip_path=data/style_clip/3DMM/M030_front_neutral_level1_001.mat', '--pose_path=data/pose/RichardShelby_front_neutral_level1_001.mat', '--image_path=/tmp/gradio/576327d42cb66a65583efd7a29ee41c0a3d6f92a/微信截图_20240130102341.png', '--cfg_scale=1.0', '--max_gen_len=30', '--output_name=lipsynced_result_20240130022817']' returned non-zero exit status 1.

TypeError: __call__(): incompatible function arguments. The following argument types are supported: 1. (self: _dlib_pybind11.fhog_object_detector, image: numpy.ndarray, upsample_num_times: int = 0) -> _dlib_pybind11.rectangles

File "/content/drive/MyDrive/dreamtalk/inference_for_demo_video.py", line 207, in
crop_src_image(args.image_path, src_img_path, 0.4)
File "/content/drive/MyDrive/dreamtalk/core/utils.py", line 438, in crop_src_image
faces = detector(img, 0)
TypeError: call(): incompatible function arguments. The following argument types are supported:
1. (self: _dlib_pybind11.fhog_object_detector, image: numpy.ndarray, upsample_num_times: int = 0) -> _dlib_pybind11.rectangles

Invoked with: <_dlib_pybind11.fhog_object_detector object at 0x7855377022f0>, None, 0
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1561, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 695, in wrapper
response = f(*args, **kwargs)
File "/content/drive/MyDrive/dreamtalk/app.py", line 47, in infer
execute_command(command)
File "/content/drive/MyDrive/dreamtalk/app.py", line 27, in execute_command
subprocess.run(command, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python', 'inference_for_demo_video.py', '--wav_path=/tmp/gradio/178d11976f94d691ff57dd90a34c7603b4309ea3/acknowledgement_english.wav', '--style_clip_path=data/style_clip/3DMM/M030_front_neutral_level1_001.mat', '--pose_path=data/pose/RichardShelby_front_neutral_level1_001.mat', '--image_path=None', '--cfg_scale=1.0', '--max_gen_len=30', '--output_name=lipsynced_result_20240306024936']' returned non-zero exit status 1.

Training Script

Hey,
Is there a training script that you could release here for us to try out?
Thanks.

error: subprocess-exited-with-error

There is something wrong with the installation phase, I don't know what's wrong, please help me to fix it, thank you very much!

Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [350 lines of output]
Ignoring numpy: markers 'python_version == "3.6"' don't match your environment
Ignoring numpy: markers 'python_version == "3.7"' don't match your environment
Ignoring numpy: markers 'python_version == "3.8"' don't match your environment
Collecting setuptools
Downloading setuptools-69.0.3-py3-none-any.whl (819 kB)
------------------------------------ 819.5/819.5 kB 647.4 kB/s eta 0:00:00
Collecting wheel
Downloading wheel-0.42.0-py3-none-any.whl (65 kB)
---------------------------------------- 65.4/65.4 kB 3.4 MB/s eta 0:00:00
Collecting scikit-build
Downloading scikit_build-0.17.6-py3-none-any.whl (84 kB)
---------------------------------------- 84.3/84.3 kB ? eta 0:00:00
Collecting cmake
Downloading cmake-3.28.1-py2.py3-none-win_amd64.whl (35.8 MB)
---------------------------------------- 35.8/35.8 MB 4.3 MB/s eta 0:00:00
Collecting pip
Downloading pip-24.0-py3-none-any.whl (2.1 MB)
---------------------------------------- 2.1/2.1 MB 4.1 MB/s eta 0:00:00
Collecting numpy==1.19.3
Downloading numpy-1.19.3.zip (7.3 MB)
---------------------------------------- 7.3/7.3 MB 5.1 MB/s eta 0:00:00
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error

    Preparing metadata (pyproject.toml) did not run successfully.
    exit code: 1

    [306 lines of output]
    setup.py:67: RuntimeWarning: NumPy 1.19.3 may not yet support Python 3.10.
      warnings.warn(
    Running from numpy source directory.
    setup.py:480: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
      run_build = parse_setuppy_commands()
    Processing numpy/random\_bounded_integers.pxd.in
    Processing numpy/random\bit_generator.pyx
    Processing numpy/random\mtrand.pyx
    Processing numpy/random\_bounded_integers.pyx.in
    Processing numpy/random\_common.pyx
    Processing numpy/random\_generator.pyx
    Processing numpy/random\_mt19937.pyx
    Processing numpy/random\_pcg64.pyx
    Processing numpy/random\_philox.pyx
    Processing numpy/random\_sfc64.pyx
    Cythonizing sources
    blas_opt_info:
    blas_mkl_info:
    No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
    customize MSVCCompiler
      libraries mkl_rt not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    blis_info:
      libraries blis not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    openblas_info:
      libraries openblas not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
    get_default_fcompiler: matching types: '['gnu', 'intelv', 'absoft', 'compaqv', 'intelev', 'gnu95', 'g95', 'intelvem', 'intelem', 'flang']'
    customize GnuFCompiler
    Could not locate executable g77
    Could not locate executable f77
    customize IntelVisualFCompiler
    Could not locate executable ifort
    Could not locate executable ifl
    customize AbsoftFCompiler
    Could not locate executable f90
    customize CompaqVisualFCompiler
    Could not locate executable DF
    customize IntelItaniumVisualFCompiler
    Could not locate executable efl
    customize Gnu95FCompiler
    Found executable C:\ProgramData\anaconda3\Library\mingw-w64\bin\gfortran.exe
    Using built-in specs.
    COLLECT_GCC=C:\ProgramData\anaconda3\Library\mingw-w64\bin\gfortran.exe
    COLLECT_LTO_WRAPPER=C:/ProgramData/anaconda3/Library/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../gcc-5.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --with-gxx-include-dir=/mingw64/include/c++/5.3.0 --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-version-specific-runtime-libs --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev5, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
    Thread model: posix
    gcc version 5.3.0 (Rev5, Built by MSYS2 project)
      NOT AVAILABLE

    atlas_3_10_blas_threads_info:
    Setting PTATLAS=ATLAS
      libraries tatlas not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    atlas_3_10_blas_info:
      libraries satlas not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    atlas_blas_threads_info:
    Setting PTATLAS=ATLAS
      libraries ptf77blas,ptcblas,atlas not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    atlas_blas_info:
      libraries f77blas,cblas,atlas not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    accelerate_info:
      NOT AVAILABLE

    C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\system_info.py:1914: UserWarning:
        Optimized (vendor) Blas libraries are not found.
        Falls back to netlib Blas library which has worse performance.
        A better performance should be easily gained by switching
        Blas library.
      if self._calc_info(blas):
    blas_info:
      libraries blas not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\system_info.py:1914: UserWarning:
        Blas (http://www.netlib.org/blas/) libraries not found.
        Directories to search for the libraries can be specified in the
        numpy/distutils/site.cfg file (section [blas]) or by setting
        the BLAS environment variable.
      if self._calc_info(blas):
    blas_src_info:
      NOT AVAILABLE

    C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\system_info.py:1914: UserWarning:
        Blas (http://www.netlib.org/blas/) sources not found.
        Directories to search for the sources can be specified in the
        numpy/distutils/site.cfg file (section [blas_src]) or by setting
        the BLAS_SRC environment variable.
      if self._calc_info(blas):
      NOT AVAILABLE

    non-existing path in 'numpy\\distutils': 'site.cfg'
    lapack_opt_info:
    lapack_mkl_info:
      libraries mkl_rt not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    openblas_lapack_info:
      libraries openblas not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
    get_default_fcompiler: matching types: '['gnu', 'intelv', 'absoft', 'compaqv', 'intelev', 'gnu95', 'g95', 'intelvem', 'intelem', 'flang']'
    customize GnuFCompiler
    customize IntelVisualFCompiler
    customize AbsoftFCompiler
    customize CompaqVisualFCompiler
    customize IntelItaniumVisualFCompiler
    customize Gnu95FCompiler
    Using built-in specs.
    COLLECT_GCC=C:\ProgramData\anaconda3\Library\mingw-w64\bin\gfortran.exe
    COLLECT_LTO_WRAPPER=C:/ProgramData/anaconda3/Library/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../gcc-5.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --with-gxx-include-dir=/mingw64/include/c++/5.3.0 --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-version-specific-runtime-libs --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev5, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
    Thread model: posix
    gcc version 5.3.0 (Rev5, Built by MSYS2 project)
      NOT AVAILABLE

    openblas_clapack_info:
      libraries openblas,lapack not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
    get_default_fcompiler: matching types: '['gnu', 'intelv', 'absoft', 'compaqv', 'intelev', 'gnu95', 'g95', 'intelvem', 'intelem', 'flang']'
    customize GnuFCompiler
    customize IntelVisualFCompiler
    customize AbsoftFCompiler
    customize CompaqVisualFCompiler
    customize IntelItaniumVisualFCompiler
    customize Gnu95FCompiler
    Using built-in specs.
    COLLECT_GCC=C:\ProgramData\anaconda3\Library\mingw-w64\bin\gfortran.exe
    COLLECT_LTO_WRAPPER=C:/ProgramData/anaconda3/Library/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../gcc-5.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --with-gxx-include-dir=/mingw64/include/c++/5.3.0 --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-version-specific-runtime-libs --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev5, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
    Thread model: posix
    gcc version 5.3.0 (Rev5, Built by MSYS2 project)
      NOT AVAILABLE

    flame_info:
      libraries flame not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    atlas_3_10_threads_info:
    Setting PTATLAS=ATLAS
      libraries lapack_atlas not found in C:\ProgramData\anaconda3\lib
      libraries tatlas,tatlas not found in C:\ProgramData\anaconda3\lib
      libraries lapack_atlas not found in C:\
      libraries tatlas,tatlas not found in C:\
      libraries lapack_atlas not found in C:\ProgramData\anaconda3\libs
      libraries tatlas,tatlas not found in C:\ProgramData\anaconda3\libs
    <class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
      NOT AVAILABLE

    atlas_3_10_info:
      libraries lapack_atlas not found in C:\ProgramData\anaconda3\lib
      libraries satlas,satlas not found in C:\ProgramData\anaconda3\lib
      libraries lapack_atlas not found in C:\
      libraries satlas,satlas not found in C:\
      libraries lapack_atlas not found in C:\ProgramData\anaconda3\libs
      libraries satlas,satlas not found in C:\ProgramData\anaconda3\libs
    <class 'numpy.distutils.system_info.atlas_3_10_info'>
      NOT AVAILABLE

    atlas_threads_info:
    Setting PTATLAS=ATLAS
      libraries lapack_atlas not found in C:\ProgramData\anaconda3\lib
      libraries ptf77blas,ptcblas,atlas not found in C:\ProgramData\anaconda3\lib
      libraries lapack_atlas not found in C:\
      libraries ptf77blas,ptcblas,atlas not found in C:\
      libraries lapack_atlas not found in C:\ProgramData\anaconda3\libs
      libraries ptf77blas,ptcblas,atlas not found in C:\ProgramData\anaconda3\libs
    <class 'numpy.distutils.system_info.atlas_threads_info'>
      NOT AVAILABLE

    atlas_info:
      libraries lapack_atlas not found in C:\ProgramData\anaconda3\lib
      libraries f77blas,cblas,atlas not found in C:\ProgramData\anaconda3\lib
      libraries lapack_atlas not found in C:\
      libraries f77blas,cblas,atlas not found in C:\
      libraries lapack_atlas not found in C:\ProgramData\anaconda3\libs
      libraries f77blas,cblas,atlas not found in C:\ProgramData\anaconda3\libs
    <class 'numpy.distutils.system_info.atlas_info'>
      NOT AVAILABLE

    lapack_info:
      libraries lapack not found in ['C:\\ProgramData\\anaconda3\\lib', 'C:\\', 'C:\\ProgramData\\anaconda3\\libs']
      NOT AVAILABLE

    C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\system_info.py:1748: UserWarning:
        Lapack (http://www.netlib.org/lapack/) libraries not found.
        Directories to search for the libraries can be specified in the
        numpy/distutils/site.cfg file (section [lapack]) or by setting
        the LAPACK environment variable.
      return getattr(self, '_calc_info_{}'.format(name))()
    lapack_src_info:
      NOT AVAILABLE

    C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\system_info.py:1748: UserWarning:
        Lapack (http://www.netlib.org/lapack/) sources not found.
        Directories to search for the sources can be specified in the
        numpy/distutils/site.cfg file (section [lapack_src]) or by setting
        the LAPACK_SRC environment variable.
      return getattr(self, '_calc_info_{}'.format(name))()
      NOT AVAILABLE

    numpy_linalg_lapack_lite:
      FOUND:
        language = c
        define_macros = [('HAVE_BLAS_ILP64', None), ('BLAS_SYMBOL_SUFFIX', '64_')]

    C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\dist.py:275: UserWarning: Unknown distribution option: 'define_macros'
      warnings.warn(msg)
    running dist_info
    running build_src
    build_src
    building py_modules sources
    creating build
    creating build\src.win-amd64-3.10
    creating build\src.win-amd64-3.10\numpy
    creating build\src.win-amd64-3.10\numpy\distutils
    building library "npymath" sources
    Using built-in specs.
    COLLECT_GCC=C:\ProgramData\anaconda3\Library\mingw-w64\bin\gfortran.exe
    COLLECT_LTO_WRAPPER=C:/ProgramData/anaconda3/Library/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../gcc-5.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --with-gxx-include-dir=/mingw64/include/c++/5.3.0 --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-version-specific-runtime-libs --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev5, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
    Thread model: posix
    gcc version 5.3.0 (Rev5, Built by MSYS2 project)
    Using built-in specs.
    COLLECT_GCC=C:\ProgramData\anaconda3\Library\mingw-w64\bin\gfortran.exe
    COLLECT_LTO_WRAPPER=C:/ProgramData/anaconda3/Library/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../gcc-5.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --with-gxx-include-dir=/mingw64/include/c++/5.3.0 --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-version-specific-runtime-libs --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev5, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
    Thread model: posix
    gcc version 5.3.0 (Rev5, Built by MSYS2 project)
    Traceback (most recent call last):
      File "C:\ProgramData\anaconda3\Lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 351, in <module>
        main()
      File "C:\ProgramData\anaconda3\Lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 333, in main
        json_out['return_val'] = hook(**hook_input['kwargs'])
      File "C:\ProgramData\anaconda3\Lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 152, in prepare_metadata_for_build_wheel
        return hook(metadata_directory, config_settings)
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\build_meta.py", line 157, in prepare_metadata_for_build_wheel
        self.run_setup()
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\build_meta.py", line 248, in run_setup
        super(_BuildMetaLegacyBackend,
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\build_meta.py", line 142, in run_setup
        exec(compile(code, __file__, 'exec'), locals())
      File "setup.py", line 508, in <module>
        setup_package()
      File "setup.py", line 500, in setup_package
        setup(**metadata)
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\core.py", line 169, in setup
        return old_setup(**new_attr)
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\__init__.py", line 165, in setup
        return distutils.core.setup(**attrs)
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\core.py", line 148, in setup
        dist.run_commands()
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 967, in run_commands
        self.run_command(cmd)
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 986, in run_command
        cmd_obj.run()
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\command\dist_info.py", line 31, in run
        egg_info.run()
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\egg_info.py", line 24, in run
        self.run_command("build_src")
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\dist.py", line 986, in run_command
        cmd_obj.run()
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\build_src.py", line 144, in run
        self.build_sources()
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\build_src.py", line 155, in build_sources
        self.build_library_sources(*libname_info)
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\build_src.py", line 288, in build_library_sources
        sources = self.generate_sources(sources, (lib_name, build_info))
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\build_src.py", line 378, in generate_sources
        source = func(extension, build_dir)
      File "numpy\core\setup.py", line 658, in get_mathlib_info
        st = config_cmd.try_link('int main(void) { return 0;}')
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\command\config.py", line 243, in try_link
        self._link(body, headers, include_dirs,
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\config.py", line 162, in _link
        return self._wrap_method(old_config._link, lang,
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\config.py", line 96, in _wrap_method
        ret = mth(*((self,)+args))
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\command\config.py", line 137, in _link
        (src, obj) = self._compile(body, headers, include_dirs, lang)
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\config.py", line 105, in _compile
        src, obj = self._wrap_method(old_config._compile, lang,
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\command\config.py", line 96, in _wrap_method
        ret = mth(*((self,)+args))
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\command\config.py", line 132, in _compile
        self.compiler.compile([src], include_dirs=include_dirs)
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\_msvccompiler.py", line 401, in compile
        self.spawn(args)
      File "C:\Users\13760\AppData\Local\Temp\pip-build-env-0kjqpbj2\overlay\Lib\site-packages\setuptools\_distutils\_msvccompiler.py", line 505, in spawn
        return super().spawn(cmd, env=env)
      File "C:\Users\13760\AppData\Local\Temp\pip-install-anaxll1k\numpy_c1fe2d57c83c445a91a8530ae37300e0\numpy\distutils\ccompiler.py", line 90, in <lambda>
        m = lambda self, *args, **kw: func(self, *args, **kw)
    TypeError: CCompiler_spawn() got an unexpected keyword argument 'env'
    [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: metadata-generation-failed

  Encountered error while generating package metadata.

  See above for output.

  note: This is an issue with the package mentioned above, not pip.
  hint: See above for details.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

CUDA 12.3,安装后运行不了

我的 CUDA 12.3,安装后运行不了。我在https://pytorch.org/安装了适配的conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia项目依然运行不了。需要升级python嘛?我按照介绍里的安装的:
conda create -n dreamtalk python=3.7.0
conda activate dreamtalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpeg

pip install urllib3==1.26.6
pip install transformers==4.28.1
pip install dlib

但是运行不来啊。

(dreamtalk) C:\Users\sunny\Documents\dreamtalk>python inference_for_demo_video.py ^
More? --wav_path data/audio/acknowledgement_english.m4a ^
More? --style_clip_path data/style_clip/3DMM/M030_front_neutral_level1_001.mat ^
More? --pose_path data/pose/RichardShelby_front_neutral_level1_001.mat ^
More? --image_path data/src_img/uncropped/male_face.png ^
More? --cfg_scale 1.0 ^
More? --max_gen_len 30 ^
More? --output_name acknowledgement_english@M030_front_neutral_level1_001@male_face
Traceback (most recent call last):
File "inference_for_demo_video.py", line 20, in
from generators.utils import get_netG, render_video
File "C:\Users\sunny\Documents\dreamtalk\generators\utils.py", line 8, in
import torchvision
File "C:\Users\sunny.conda\envs\dreamtalk\lib\site-packages\torchvision_init_.py", line 5, in
from torchvision import datasets, io, models, ops, transforms, utils
File "C:\Users\sunny.conda\envs\dreamtalk\lib\site-packages\torchvision\models_init_.py", line 16, in
from .maxvit import *
File "C:\Users\sunny.conda\envs\dreamtalk\lib\site-packages\torchvision\models\maxvit.py", line 3, in
from typing import Any, Callable, List, Optional, OrderedDict, Sequence, Tuple
ImportError: cannot import name 'OrderedDict' from 'typing' (C:\Users\sunny.conda\envs\dreamtalk\lib\typing.py)

(dreamtalk) C:\Users\sunny\Documents\dreamtalk>

av.error.FileNotFoundError: [Errno 2] No such file or directory

Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '/tmp/gradio/178d11976f94d691ff57dd90a34c7603b4309ea3/acknowledgement_english.wav':
Duration: 00:00:16.57, bitrate: 768 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, mono, s16, 768 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.
Output #0, wav, to 'tmp/lipsynced_result_20240306024455/lipsynced_result_20240306024455_16K.wav':
Metadata:
ISFT : Lavf58.76.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.134.100 pcm_s16le
size= 518kB time=00:00:16.57 bitrate= 256.1kbits/s speed= 905x
video:0kB audio:518kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.014706%
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-english and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2024-03-06 02:45:04.494899: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-06 02:45:04.494954: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-06 02:45:04.496171: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-06 02:45:05.725115: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:4296: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn(
Traceback (most recent call last):
File "/content/drive/MyDrive/dreamtalk/inference_for_demo_video.py", line 230, in
render_video(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/content/drive/MyDrive/dreamtalk/generators/utils.py", line 112, in render_video
torchvision.io.write_video(silent_video_path, transformed_imgs.cpu(), fps)
File "/usr/local/lib/python3.10/dist-packages/torchvision/io/video.py", line 134, in write_video
container.mux(packet)
File "av/container/output.pyx", line 211, in av.container.output.OutputContainer.mux
File "av/container/output.pyx", line 217, in av.container.output.OutputContainer.mux_one
File "av/container/output.pyx", line 172, in av.container.output.OutputContainer.start_encoding
File "av/error.pyx", line 336, in av.error.err_check
av.error.FileNotFoundError: [Errno 2] No such file or directory
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1561, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 695, in wrapper
response = f(*args, **kwargs)
File "/content/drive/MyDrive/dreamtalk/app.py", line 47, in infer
execute_command(command)
File "/content/drive/MyDrive/dreamtalk/app.py", line 27, in execute_command
subprocess.run(command, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python', 'inference_for_demo_video.py', '--wav_path=/tmp/gradio/178d11976f94d691ff57dd90a34c7603b4309ea3/acknowledgement_english.wav', '--style_clip_path=data/style_clip/3DMM/M030_front_neutral_level1_001.mat', '--pose_path=data/pose/RichardShelby_front_neutral_level1_001.mat', '--image_path=/tmp/gradio/077f22810adc22e2aaf724be2c6e65713593a23d/cut_img.png', '--cfg_scale=1.0', '--max_gen_len=30', '--output_name=lipsynced_result_20240306024455']' returned non-zero exit status 1.

Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-english were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']

Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'data/audio/German4.wav':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:06.84, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.
Output #0, wav, to 'tmp/codeformer_sr_english@M030_front_neutral_level1_001@male_face/codeformer_sr_english@M030_front_neutral_level1_001@male_face_16K.wav':
Metadata:
ISFT : Lavf58.76.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.134.100 pcm_s16le
size= 214kB time=00:00:06.78 bitrate= 258.2kbits/s speed= 192x
video:0kB audio:214kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.035636%
Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-english were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']

  • This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    2024-02-25 18:04:43.131416: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
    /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:4193: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
    warnings.warn(

Error loading audio file: failed to open file.

python inference_for_demo_video.py --wav_path data/audio/acknowledgement_english.m4a --style_clip_path data/style_clip/3DMM/M030_front_neutral_level1_001.mat --pose_path data/pose/RichardShelby_front_neutral_level1_001.mat --image_path data/src_img/uncropped/male_face.png --cfg_scale 1.0 --max_gen_len 30 --output_name acknowledgement_english@M030_front_neutral_level1_001@male_face1
ffmpeg: error while loading shared libraries: libopenh264.so.5: cannot open shared object file: No such file or directory
Some weights of the model checkpoint at ./models/jonatasgrosman/wav2vec2-large-xlsr-53-english were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']

  • This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    formats: can't open input file `tmp/acknowledgement_english@M030_front_neutral_level1_001@male_face1/acknowledgement_english@M030_front_neutral_level1_001@male_face1_16K.wav': No such file or directory
    Traceback (most recent call last):
    File "inference_for_demo_video.py", line 189, in
    speech_array, sampling_rate = torchaudio.load(wav_16k_path)
    File "/opt/conda/envs/dreamtalk/lib/python3.7/site-packages/torchaudio/backend/sox_io_backend.py", line 151, in load
    filepath, frame_offset, num_frames, normalize, channels_first, format)
    RuntimeError: Error loading audio file: failed to open file.

gradio ?

i have used gradio verion on google collab why not loal installing support it

av.codec.codec.UnknownCodecError: libx264

Traceback (most recent call last):
File "/content/drive/MyDrive/dreamtalk/inference_for_demo_video.py", line 230, in
render_video(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/content/drive/MyDrive/dreamtalk/generators/utils.py", line 112, in render_video
torchvision.io.write_video(silent_video_path, transformed_imgs.cpu(), fps)
File "/usr/local/lib/python3.10/dist-packages/torchvision/io/video.py", line 91, in write_video
stream = container.add_stream(video_codec, rate=fps)
File "av/container/output.pyx", line 67, in av.container.output.OutputContainer.add_stream
File "av/codec/codec.pyx", line 185, in av.codec.codec.Codec.cinit
File "av/codec/codec.pyx", line 194, in av.codec.codec.Codec._init
av.codec.codec.UnknownCodecError: libx264
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1561, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 695, in wrapper
response = f(*args, **kwargs)
File "/content/drive/MyDrive/dreamtalk/app.py", line 47, in infer
execute_command(command)
File "/content/drive/MyDrive/dreamtalk/app.py", line 27, in execute_command
subprocess.run(command, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python', 'inference_for_demo_video.py', '--wav_path=/tmp/gradio/178d11976f94d691ff57dd90a34c7603b4309ea3/acknowledgement_english.wav', '--style_clip_path=data/style_clip/3DMM/M030_fron

License

Hi,
Thanks for releasing this amazing program! I saw it's licensed under the MIT license, which allows commercial use, however in the footer you mention:

This method is intended for RESEARCH/NON-COMMERCIAL USE ONLY.
Are there any plans to remove this restriction?

Might it be possible to remove this?

Thank you!

What config required to run it real-time?

Is there any possibilities to get this running real-time?
GPU and Memory requirements to run with it's best version ?
Thinking of sending audio tensor as audio input.

network issues

Due to network issues, I am unable to directly download the model from the blip-image-captioning-large folder on Hugging Face. After manually downloading it, where should I place it?

RuntimeError: Sizes of tensors must match except in dimension 2. Got 31 and 30 (The offending index is 0)

Traceback (most recent call last):
File "inference_for_demo_video.py", line 238, in
no_move=False,
File "/home/yaohs/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/yaohs/Work/dreamtalk/generators/utils.py", line 102, in render_video
output_dict = net_G(cur_src_img, win_exp)
File "/home/yaohs/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yaohs/Work/dreamtalk/generators/face_model.py", line 35, in forward
output = self.warpping_net(input_image, descriptor)
File "/home/yaohs/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yaohs/Work/dreamtalk/generators/face_model.py", line 94, in forward
output = self.hourglass(input_image, descriptor)
File "/home/yaohs/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yaohs/Work/dreamtalk/generators/base_function.py", line 39, in forward
return self.decoder(self.encoder(x, z), z)
File "/home/yaohs/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yaohs/Work/dreamtalk/generators/base_function.py", line 89, in forward
out = torch.cat([out, x.pop()], 1) if self.skip_connect else out
RuntimeError: Sizes of tensors must match except in dimension 2. Got 31 and 30 (The offending index is 0)

What is this parameter "pose_path"?

I read it and saw pose_path specifies head pose. But I don't clearly understand what "specifies head pose" is? It's image or video and What effect does it have? Thanks

No audio I/O backend is available

Some weights of the model checkpoint at jonatasgrosman/wav2vec2-large-xlsr-53-english were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']

  • This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Traceback (most recent call last):
    File "inference_for_demo_video.py", line 178, in
    speech_array, sampling_rate = torchaudio.load(wav_16k_path)
    File "D:\python\Anaconda\envs\dreamtalk\lib\site-packages\torchaudio\backend\no_backend.py", line 20, in load
    raise RuntimeError('No audio I/O backend is available.')
    RuntimeError: No audio I/O backend is available.

how to redownload Voxceleb2

You said "Since Voxceleb2 official videos are of low resolution, we redownload the original YouTube videos and re-crop the videos" in the paper, but where is the youtube link of these videos?

RuntimeError: Expected 3-dimensional

I tried to install it locally on WSL to test out dreamtalk, but when I try to run 'inference_for_demo_video.py' then I get the following error:

Traceback (most recent call last): File "inference_for_demo_video.py", line 198, in <module> inputs.input_values.to(device), return_dict=False File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1306, in forward extract_features = self.feature_extractor(input_values) File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 453, in forward hidden_states = conv_layer(hidden_states) File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 325, in forward hidden_states = self.conv(hidden_states) File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 263, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/USER/miniconda3/envs/dreamtalk/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 260, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Expected 3-dimensional input for 3-dimensional weight [512, 1, 10], but got 4-dimensional input of size [1, 1, 2, 732160] instead

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.