Giter Site home page Giter Site logo

badtobest / echomimic Goto Github PK

View Code? Open in Web Editor NEW
1.5K 28.0 176.0 22.15 MB

Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Home Page: https://badtobest.github.io/echomimic.html

License: Apache License 2.0

Python 100.00%
audio-driven-talking-face talking-head audio-driven-portrait-animations

echomimic's People

Contributors

joefannie avatar lymhust avatar octavianchen avatar yuange250 avatar zhtjtcz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

echomimic's Issues

ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings'

(echomimic) D:\AI\EchoMimic>python -u infer_audio2vid.py
Traceback (most recent call last):
File "infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "D:\AI\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (C:\Users\Renel\anaconda3\envs\echomimic\lib\site-packages\diffusers\models\embeddings.py)

How can I solve this error?

Traceback (most recent call last):
File "/home/tom/fssd/EchoMimic/infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "/home/tom/fssd/EchoMimic/src/models/unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py)

Any way to prevent the image crop?

Hi
Great work! One of the best I have tried in comparison to Sadtalker, Hallo, Musepose etc.
I just want to know if there’s way to get the video with the same features as the input image (without cropping into the face)?
Thanks for open sourcing :)

运行 python -u infer_audio2vid.py 出现错误 ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (

Traceback (most recent call last):
File "E:\ai3\EchoMimic\infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "E:\ai3\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (E:\ai3\EchoMimic\venv\Lib\site-packages\diffusers\models\embeddings.py)

python -u webgui.py --server_port=3000报错误No module named 'moviepy',moviepy已安装

image
运行
python -u webgui.py --server_port=3000报错误ModuleNotFoundError: No module named 'moviepy',但是已经安装了,这是怎么回事?
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
Traceback (most recent call last):
File "webgui.py", line 24, in
from moviepy.editor import VideoFileClip, AudioFileClip
ModuleNotFoundError: No module named 'moviepy'
(echomimic) PS C:\Users\italk\EchoMimic> pip install moviepy
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: moviepy in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (1.0.3)
Requirement already satisfied: decorator<5.0,>=4.0.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (4.4.2)
Requirement already satisfied: tqdm<5.0,>=4.11.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (4.66.4)
Requirement already satisfied: requests<3.0,>=2.8.1 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (2.32.3)
Requirement already satisfied: proglog<=1.0.0 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (0.1.10)
Requirement already satisfied: numpy>=1.17.3 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (1.24.4)
Requirement already satisfied: imageio<3.0,>=2.5 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (2.34.2)
Requirement already satisfied: imageio-ffmpeg>=0.2.0 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (0.5.1)
Requirement already satisfied: pillow>=8.3.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from imageio<3.0,>=2.5->moviepy) (10.4.0)
Requirement already satisfied: setuptools in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from imageio-ffmpeg>=0.2.0->moviepy) (69.5.1)
Requirement already satisfied: charset-normalizer<4,>=2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (2024.7.4)
Requirement already satisfied: colorama in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from tqdm<5.0,>=4.11.2->moviepy) (0.4.6)

Sample Rate Test (Gradio Configuration)

8000 Sample Rate

8000.mp4

16000

16000Sampling.mp4

32000

32000.mp4

48000

48000.mp4

THere is not much difference between all the sample rate from 8000 to 48000
The Generation time is also almost the same no matter what is the sample rate

Capture2

torchvision::nms的问题已解决 出现'float' object has no attribute 'rint'

accelerate 0.32.1
antlr4-python3-runtime 4.9.3
av 11.0.0
certifi 2024.7.4
charset-normalizer 3.3.2
colorama 0.4.6
decorator 4.4.2
diffusers 0.24.0
einops 0.4.1
facenet-pytorch 2.5.0
ffmpeg-python 0.2.0
filelock 3.15.4
fsspec 2024.6.1
future 1.0.0
huggingface-hub 0.23.4
idna 3.7
imageio 2.34.2
imageio-ffmpeg 0.5.1
importlib_metadata 8.0.0
intel-openmp 2021.4.0
Jinja2 3.1.4
lightning-utilities 0.11.3.post0
MarkupSafe 2.1.5
mkl 2021.4.0
moviepy 1.0.3
mpmath 1.3.0
networkx 3.3
numpy 1.26.4
omegaconf 2.3.0
opencv-python 4.10.0.84
packaging 24.1
pillow 10.4.0
pip 24.0
proglog 0.1.10
psutil 6.0.0
PyYAML 6.0.1
regex 2024.5.15
requests 2.32.3
safetensors 0.4.3
setuptools 69.5.1
sympy 1.13.0
tbb 2021.13.0
tokenizers 0.19.1
torch 2.3.1+cu118
torchaudio 2.3.1
torchmetrics 1.4.0.post0
torchtyping 0.1.4
torchvision 0.18.1
tqdm 4.66.4
transformers 4.42.3
typeguard 4.3.0
typing_extensions 4.12.2
urllib3 2.2.2
wheel 0.43.0
zipp 3.19.2

torch.cuda.is_available()是True
torch 2.0.1版本我也试过了,cuda都是可用状态,但运行程序时还是会出现
S H:\AIaudio_Live\EchoMimic> python .\infer_audio2vid.py
C:\Users\Administrator.conda\envs\echo\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
C:\Users\Administrator.conda\envs\echo\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
please download ffmpeg-static and export to FFMPEG_PATH.
For example: export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
File "H:\AIaudio_Live\EchoMimic\infer_audio2vid.py", line 243, in
main()
File "H:\AIaudio_Live\EchoMimic\infer_audio2vid.py", line 186, in main
det_bboxes, probs = face_detector.detect(face_img)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\facenet_pytorch\models\mtcnn.py", line 313, in detect
batch_boxes, batch_points = detect_face(
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\facenet_pytorch\models\utils\detect_face.py", line 79, in detect_face
pick = batched_nms(boxes_scale[:, :4], boxes_scale[:, 4], image_inds_scale, 0.5)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 75, in batched_nms
return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torch\jit_trace.py", line 1254, in wrapper
return fn(*args, **kwargs)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 94, in batched_nms_coordinate_trick
keep = nms(boxes_for_nms, scores, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 41, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torch_ops.py", line 854, in call
return self
._op(*args, **(kwargs or {}))
NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

Benchmarks

what is the inference time per frame on a low/mid level card?

能否提供多卡推理

能否提供自适应多卡推理功能,我在四卡机器上运行,只使用了第一张显卡,速度不够理想

Pose-Drived Algo Inference not working

I followed the instructions and it is not working

  1. Firstly download the checkpoints with '_pose.pth' postfix from huggingface
image
  1. Edit driver_video and ref_image to your path in demo_motion_sync.py, then run
    left it as it is linking to sample
image
  1. python -u demo_motion_sync.py
    Output
    https://youtu.be/1JsPRYPiQso

  2. python -u infer_audio2vid_pose.py [with draw_mouse=True]
    No output produced, no error in console

(echomimic) C:\sd\EchoMimic>  python -u demo_motion_sync.py
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1720949138.076016    3284 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
288

(echomimic) C:\sd\EchoMimic>python -u infer_audio2vid_pose.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
video in 24 FPS, audio idx in 50FPS
latents shape:torch.Size([1, 4, 160, 64, 64]), video_length:160

(echomimic) C:\sd\EchoMimic>
  1. python -u infer_audio2vid_pose.py [with draw_mouse=False]
    No output produced, no error in console
(echomimic) C:\sd\EchoMimic>python -u infer_audio2vid_pose.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
video in 24 FPS, audio idx in 50FPS
latents shape:torch.Size([1, 4, 160, 64, 64]), video_length:160

(echomimic) C:\sd\EchoMimic>
image

Some weights of the model checkpoint were not used when initializing UNet2DConditionModel

The following warning appears during runtime,Does this matter?

Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
[0, 0, 1342, 1342]

Comfyui nodes? and safetensors?

Are there any comfyui nodes for this and are there safetensor versions of the models used? In this day and age we have to be so careful with models, been a fair few bad actors in the ai community.
On paper this really does look like the answer to my prayers. I have a ton of audio files that i need to get an avatar to speak (single image) for a project I am working on. He is a story teller and Bio reader for our guild. all the nodes I have looked at so far need a video to drive the animation, somewhat defeating the purpose) or not installable.

add ffmpeg to path

I am getting this error

In infer_audio2vid.py
ffmpeg_path = os.getenv('FFMPEG_PATH')

C:\sd>echo %FFMPEG_PATH%
C:\sd\ffmpeg-4.4-amd64-static

Still I see this error

(echomimic) C:\tut\EchoMimic>python infer_audio2vid.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
add ffmpeg to path

"OpenSSL appears to be unavailable" Error

After applying the command
conda create -n echomimic python=3.8

I get the following error in the command prompt window:

Collecting package metadata (current_repodata.json): failed

CondaSSLError: OpenSSL appears to be unavailable on this machine. OpenSSL is required to
download and install packages.

Exception: HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/main/win-64/current_repodata.json (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))

Any idea how to solve this please?

ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings'

(echomimic) C:\ALLWEBUI\EchoMimic\EchoMimic>python -u infer_audio2vid.py
Traceback (most recent call last):
File "infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "C:\ALLWEBUI\EchoMimic\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (C:\Users\admin.conda\envs\echomimic\lib\site-packages\diffusers\models\embeddings.py)

supports Russian speech

Hello and thanks for this wonderful work

Please tell me, before installing I would like to know if this works with Russian voice acting?

PS: sorry for my english

Pink Artifacts Around Image in Video Rendering

Description:
I am experiencing an issue with video rendering where pink artifacts appear around the images. This happens when I use a specific image to generate a video. The pink color appears prominently around the edges of the image, causing an unwanted visual effect.

Steps to Reproduce:

  1. Use the attached image and audio as input.
  2. Generate a video using the standard rendering settings.
  3. Observe the pink artifacts around the edges of the image in the output video.

Expected Behavior:
The image should appear in the video without any pink artifacts or unwanted colors around the edges.

Actual Behavior:
The image is rendered with pink artifacts around the edges, which affects the visual quality of the video.

Attachments:

  1. Input Image
test1
  1. Output Video with Pink Artifacts
output_video_with_audio.mp4

Environment:

  • Operating System: Ubuntu 22.04.4 LTS
  • CUDA 12.2
  • A100 80G
  • python=3.11.9

Additional Information:

  • I have verified that the input image does not have any transparent or pink regions.
  • This issue occurs consistently with this image with a white background but not for sure to others.

Potential Solutions:

  • Ensuring the input image has a solid background without transparency.
  • Checking and adjusting the video encoding settings to handle color space correctly.
  • Preprocessing the image to eliminate any edge effects before rendering.

Any assistance or guidance on how to resolve this issue would be greatly appreciated. Thank you!


Trouble with running EchoMimic -- The system cannot find the file specified

I followed the installation instructions for installing EchoMimic on my pc. When I run the command

python webgui.py --server_port=3000

I get the following error message:

File "C:\Users\frc39\echomimic\EchoMimic\src\pipelines\pipeline_echo_mimic.py", line 395, in call
whisper_feature = self.audio_guider.audio2feat(audio_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\audio2feature.py", line 100, in audio2feat
result = self.model.transcribe(audio_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\transcribe.py", line 85, in transcribe
mel = log_mel_spectrogram(audio)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\audio.py", line 111, in log_mel_spectrogram
audio = load_audio(audio)
^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\audio.py", line 44, in load_audio
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\site-packages\ffmpeg_run.py", line 313, in run
process = run_async(
^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\site-packages\ffmpeg_run.py", line 284, in run_async
return subprocess.Popen(
^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\subprocess.py", line 1026, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\subprocess.py", line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified

python -u infer_audio2vid.py, Task been Killed

python -u infer_audio2vid.py
Adding FFMPEG_PATH to PATH
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Killed

Is there anyone who set up the enviroment successfully?

conda create -n echomimic python=3.8
conda activate echomimic
pip install -r requirements.txt

The above code does not works well for me, and the following errors is given to me, anyone can help me, or give a docker img to others.


raceback (most recent call last):
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/utils/import_utils.py", line 808, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/autoencoders/__init__.py", line 1, in <module>
    from .autoencoder_asym_kl import AsymmetricAutoencoderKL
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/autoencoders/autoencoder_asym_kl.py", line 21, in <module>
    from ..modeling_outputs import AutoencoderKLOutput
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/modeling_outputs.py", line 7, in <module>
    class AutoencoderKLOutput(BaseOutput):
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/utils/outputs.py", line 61, in __init_subclass__
    import torch.utils._pytree
ModuleNotFoundError: No module named 'torch.utils._pytree'

Slow Inference

I followed the instructions in README.md and run it on Colab, but the elapsed time per timestep is about 3 minutes, the length of the audio is 19 seconds, have you ever tested on Colab?

CFG Test (Gradio Configuration)

1.1 CFG This is lowest setting that works in my test with no error

1.1CFG.mp4

1.5

1.5.mp4

2

2.mp4

2.5

2.5CFG.mp4

3

3.mp4

4

4.mp4

5

5.mp4

7

7.mp4

10

10.mp4

Is this suitable for lipsync?

Hi, this project seems to be animating a reference image with driving audio and/or face pose. Is this suitable for lipsync?

Also, this seems to be using video diffusion methods, how do you make it to generate such a long video, without running into resource constraints? Is this using Conv method instead of temporal attention for handling the temporal dimension?

Bugs found

Here are things that probably stopping you from running as same for me. Hope it can help you if meeting same problems.

  1. I am using Windows and Python 3.10.11. The latest torch version (2.3.0) won't work, but previous version 2.2.2 is okay, to install the previous versions check torch installation, e.g. pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
  2. To use pose reference, the mediapipe package needed to be installed pip install mediapipe
  3. The file name denoising_unet-pose.pth needs to be renamed to denoising_unet_pose.pth

The first two can be fixed by modifying the requirements.txt file as follows:

torch>=2.0.1,<=2.2.2
torchvision>=0.15.2,<=0.17.2
torchaudio>=2.0.2,<=2.2.2
mediapipe
transformers>=4.38.2
diffusers==0.24.0
torchmetrics
torchtyping
tqdm
ffmpeg-python==0.2.0
facenet_pytorch==2.5.0
moviepy==1.0.3
einops==0.4.1
omegaconf==2.3.0
opencv-python
av==11.0.0
gradio

Hope the authors can fix them 🔜

Train code Release?

Congrats on this great work!! When will the train code be released? Thanks!

生成帧率过高?

frame
感谢你们的开源,我制作了基于你们的方法的comfyUI插件,但是在默认24帧的帧率下,使用默认的方法,无论是保存图片,还是输出的人物图片(我改成串流输出再后期合成),二者均发现帧率过高,我要调整到12的fps才能让人物的动作变得自然,但是这样带来的效果是,没有了24帧的流程度。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.