badtobest / echomimic Goto Github PK

View Code? Open in Web Editor NEW

1.5K 28.0 176.0 22.15 MB

Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Home Page: https://badtobest.github.io/echomimic.html

License: Apache License 2.0

Python 100.00%

audio-driven-talking-face talking-head audio-driven-portrait-animations

echomimic's People

Contributors

Stargazers

Watchers

Forkers

devilscrypto vcpandya weblackone nelsontseng0704 cloudenginehub tinlerlin i255979 qinb gogelabs suryatmodulus ishine aggreybosire one-pip xiankgx ezhangle yjb2020 liuzl fingerx upseem robin021 mengjintao greengerong abdoiiii samge0 celdiniz shenge010101 huiyao351 pianzinan zzdx713 chaojie12131243 taichuai bilinxing geekcheng zhtjtcz haifengzeng cntigers boragocode linecode fernandonichey imagebody bruinxiong hoangminhanhtai kekewolf wangyichen191 turbochow ruoshuixuelabi shenjianzch thezax jackgo2080 liulangmagong conglesolutionx zcfrank1st wuzhongdehua majiajue addvaluejack cvcuiwei yxlbyc songzw1024 fdsalbj lidi100 m4rio jj-math livemy ykykkun jingconan userk-007 newyngwie dongxiaoke wzhen12 zysilence pfctluke misback aceliuchanghong fsjhut skytodmoon peihaiyang yooyui richiesh dubeno adrianwangzhao xiongkaijun zhangle1993 tsars buhu2k zhuyz21 kellhuang zfbok matou9 zhoulingjie ericsongyl mayuelala gamerhanli cxywzx prog-ape assassinquin chenzhand1 pluto-jds shenshuangxi xjx777 thomascherickal

echomimic's Issues

ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings'

(echomimic) D:\AI\EchoMimic>python -u infer_audio2vid.py
Traceback (most recent call last):
File "infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "D:\AI\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (C:\Users\Renel\anaconda3\envs\echomimic\lib\site-packages\diffusers\models\embeddings.py)

能不能提供提供一个驱动完整图片的功能？或者提供参数只驱动头部区域运动，现在512*512的边缘会运动，使用者无法合成原大小视频

能不能提供提供一个驱动完整图片的功能？或者提供参数只驱动头部区域运动，现在512*512的边缘会运动，使用者无法合回原大小的视频，无法对齐，会有明显分隔

How can I solve this error?

Traceback (most recent call last):
File "/home/tom/fssd/EchoMimic/infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "/home/tom/fssd/EchoMimic/src/models/unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py)

Any way to prevent the image crop?

Hi
Great work! One of the best I have tried in comparison to Sadtalker, Hallo, Musepose etc.
I just want to know if there’s way to get the video with the same features as the input image (without cropping into the face)?
Thanks for open sourcing :)

运行 python -u infer_audio2vid.py 出现错误 ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (

Traceback (most recent call last):
File "E:\ai3\EchoMimic\infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "E:\ai3\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (E:\ai3\EchoMimic\venv\Lib\site-packages\diffusers\models\embeddings.py)

视频生成是否可以流式输出

Question regarding audio features

Dear author, it seems this work is taking using the same audio features as in MuseTalk repo.

https://github.com/BadToBest/EchoMimic/blob/main/src/models/whisper/whisper/model.py#L143-L155

According to this, it seems audio features are just output of two Conv1d's plus plus permute dims plus positional encodings, ignoring all the transformer layers in Whisper. Am I correct in my understanding?

FPS test (Gradio Configuration)

12 FPS

12FPS.mp4

24.mp4

40.mp4

60.mp4

Source Image:

python -u webgui.py --server_port=3000报错误No module named 'moviepy'，moviepy已安装

运行
python -u webgui.py --server_port=3000报错误ModuleNotFoundError: No module named 'moviepy'，但是已经安装了，这是怎么回事？
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
Traceback (most recent call last):
File "webgui.py", line 24, in
from moviepy.editor import VideoFileClip, AudioFileClip
ModuleNotFoundError: No module named 'moviepy'
(echomimic) PS C:\Users\italk\EchoMimic> pip install moviepy
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: moviepy in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (1.0.3)
Requirement already satisfied: decorator<5.0,>=4.0.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (4.4.2)
Requirement already satisfied: tqdm<5.0,>=4.11.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (4.66.4)
Requirement already satisfied: requests<3.0,>=2.8.1 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (2.32.3)
Requirement already satisfied: proglog<=1.0.0 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (0.1.10)
Requirement already satisfied: numpy>=1.17.3 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (1.24.4)
Requirement already satisfied: imageio<3.0,>=2.5 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (2.34.2)
Requirement already satisfied: imageio-ffmpeg>=0.2.0 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (0.5.1)
Requirement already satisfied: pillow>=8.3.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from imageio<3.0,>=2.5->moviepy) (10.4.0)
Requirement already satisfied: setuptools in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from imageio-ffmpeg>=0.2.0->moviepy) (69.5.1)
Requirement already satisfied: charset-normalizer<4,>=2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (2024.7.4)
Requirement already satisfied: colorama in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from tqdm<5.0,>=4.11.2->moviepy) (0.4.6)

Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:

Sample Rate Test (Gradio Configuration)

8000 Sample Rate

8000.mp4

16000

16000Sampling.mp4

32000

32000.mp4

48000

48000.mp4

THere is not much difference between all the sample rate from 8000 to 48000
The Generation time is also almost the same no matter what is the sample rate

torchvision::nms的问题已解决出现'float' object has no attribute 'rint'

accelerate 0.32.1
antlr4-python3-runtime 4.9.3
av 11.0.0
certifi 2024.7.4
charset-normalizer 3.3.2
colorama 0.4.6
decorator 4.4.2
diffusers 0.24.0
einops 0.4.1
facenet-pytorch 2.5.0
ffmpeg-python 0.2.0
filelock 3.15.4
fsspec 2024.6.1
future 1.0.0
huggingface-hub 0.23.4
idna 3.7
imageio 2.34.2
imageio-ffmpeg 0.5.1
importlib_metadata 8.0.0
intel-openmp 2021.4.0
Jinja2 3.1.4
lightning-utilities 0.11.3.post0
MarkupSafe 2.1.5
mkl 2021.4.0
moviepy 1.0.3
mpmath 1.3.0
networkx 3.3
numpy 1.26.4
omegaconf 2.3.0
opencv-python 4.10.0.84
packaging 24.1
pillow 10.4.0
pip 24.0
proglog 0.1.10
psutil 6.0.0
PyYAML 6.0.1
regex 2024.5.15
requests 2.32.3
safetensors 0.4.3
setuptools 69.5.1
sympy 1.13.0
tbb 2021.13.0
tokenizers 0.19.1
torch 2.3.1+cu118
torchaudio 2.3.1
torchmetrics 1.4.0.post0
torchtyping 0.1.4
torchvision 0.18.1
tqdm 4.66.4
transformers 4.42.3
typeguard 4.3.0
typing_extensions 4.12.2
urllib3 2.2.2
wheel 0.43.0
zipp 3.19.2

torch.cuda.is_available()是True
torch 2.0.1版本我也试过了，cuda都是可用状态，但运行程序时还是会出现
S H:\AIaudio_Live\EchoMimic> python .\infer_audio2vid.py
C:\Users\Administrator.conda\envs\echo\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
C:\Users\Administrator.conda\envs\echo\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
please download ffmpeg-static and export to FFMPEG_PATH.
For example: export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
File "H:\AIaudio_Live\EchoMimic\infer_audio2vid.py", line 243, in
main()
File "H:\AIaudio_Live\EchoMimic\infer_audio2vid.py", line 186, in main
det_bboxes, probs = face_detector.detect(face_img)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\facenet_pytorch\models\mtcnn.py", line 313, in detect
batch_boxes, batch_points = detect_face(
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\facenet_pytorch\models\utils\detect_face.py", line 79, in detect_face
pick = batched_nms(boxes_scale[:, :4], boxes_scale[:, 4], image_inds_scale, 0.5)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 75, in batched_nms
return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torch\jit_trace.py", line 1254, in wrapper
return fn(*args, **kwargs)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 94, in batched_nms_coordinate_trick
keep = nms(boxes_for_nms, scores, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 41, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torch_ops.py", line 854, in call
return self._op(*args, **(kwargs or {}))
NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

作者拉微信群，赶紧的

如题。

Is there any way to speed up my test generation time

Benchmarks

what is the inference time per frame on a low/mid level card?

The width and height of the output are inconsistent with the configuration

I configured the width and height according to the original image, but the actual result is different. The height is even higher than what I set.

能否提供多卡推理

能否提供自适应多卡推理功能，我在四卡机器上运行，只使用了第一张显卡，速度不够理想

The image stops moving after 48 seconds

Is the maximum length only 48 seconds? If it exceeds 48 seconds, the image will stop moving.

照片可以为全身照吗

刚体验了下，照片为半身照的，效果还行，echomimic支持照片为全身照的吗？

Pose-Driven Error

How should this problem be solved?

Pose-Drived Algo Inference not working

I followed the instructions and it is not working

Firstly download the checkpoints with '_pose.pth' postfix from huggingface

Edit driver_video and ref_image to your path in demo_motion_sync.py, then run
left it as it is linking to sample

python -u demo_motion_sync.py
Output
https://youtu.be/1JsPRYPiQso
python -u infer_audio2vid_pose.py [with draw_mouse=True]
No output produced, no error in console

(echomimic) C:\sd\EchoMimic>  python -u demo_motion_sync.py
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1720949138.076016    3284 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
288

(echomimic) C:\sd\EchoMimic>python -u infer_audio2vid_pose.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
video in 24 FPS, audio idx in 50FPS
latents shape:torch.Size([1, 4, 160, 64, 64]), video_length:160

(echomimic) C:\sd\EchoMimic>

python -u infer_audio2vid_pose.py [with draw_mouse=False]
No output produced, no error in console

(echomimic) C:\sd\EchoMimic>python -u infer_audio2vid_pose.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
video in 24 FPS, audio idx in 50FPS
latents shape:torch.Size([1, 4, 160, 64, 64]), video_length:160

(echomimic) C:\sd\EchoMimic>

Do you have detailed instructions for configuring parameter settings

When I was using this project, I saw many customizable items inside, but I don't know what their purpose is and what they will affect. What is the best way to match them. If you test it yourself, it involves computing power issues and takes too much time.

Some weights of the model checkpoint were not used when initializing UNet2DConditionModel

The following warning appears during runtime，Does this matter？

Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
[0, 0, 1342, 1342]

Comfyui nodes? and safetensors?

Are there any comfyui nodes for this and are there safetensor versions of the models used? In this day and age we have to be so careful with models, been a fair few bad actors in the ai community.
On paper this really does look like the answer to my prayers. I have a ton of audio files that i need to get an avatar to speak (single image) for a project I am working on. He is a story teller and Bio reader for our guild. all the nodes I have looked at so far need a video to drive the animation, somewhat defeating the purpose) or not installable.

add ffmpeg to path

I am getting this error

In infer_audio2vid.py
ffmpeg_path = os.getenv('FFMPEG_PATH')

C:\sd>echo %FFMPEG_PATH%
C:\sd\ffmpeg-4.4-amd64-static

Still I see this error

(echomimic) C:\tut\EchoMimic>python infer_audio2vid.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
add ffmpeg to path

"OpenSSL appears to be unavailable" Error

After applying the command
conda create -n echomimic python=3.8

I get the following error in the command prompt window:

Collecting package metadata (current_repodata.json): failed

CondaSSLError: OpenSSL appears to be unavailable on this machine. OpenSSL is required to
download and install packages.

Exception: HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/main/win-64/current_repodata.json (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))

Any idea how to solve this please?

Please add colab OR HF demo

need this demo page to check the example script

ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings'

(echomimic) C:\ALLWEBUI\EchoMimic\EchoMimic>python -u infer_audio2vid.py
Traceback (most recent call last):
File "infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "C:\ALLWEBUI\EchoMimic\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (C:\Users\admin.conda\envs\echomimic\lib\site-packages\diffusers\models\embeddings.py)

Eye Glitch + Steps Test (Gradio Configuration)

100 Steps

100.mp4

50 Steps in configuration

Gradio.mp4

20 steps

20.steps.test.mp4

30 steps

30.steps.mp4

Is there any way to minimize the eye glitch at the end?

Those who are having issues installing EchoMimic

This is the best talking avatar that I have ever used, but it takes 7 minutes for a 5-second video in 3080. I gave up trying to install this, but now it is working. This is way better than Hedra and LivePortrait.

I even asked for help here and i still got so many errors with no solution. Thank you so much for this tutorial. below

https://www.youtube.com/watch?v=WtHdvSSQlWo

supports Russian speech

Hello and thanks for this wonderful work

Please tell me, before installing I would like to know if this works with Russian voice acting?

PS: sorry for my english

Pink Artifacts Around Image in Video Rendering

Description:
I am experiencing an issue with video rendering where pink artifacts appear around the images. This happens when I use a specific image to generate a video. The pink color appears prominently around the edges of the image, causing an unwanted visual effect.

Steps to Reproduce:

Use the attached image and audio as input.
Generate a video using the standard rendering settings.
Observe the pink artifacts around the edges of the image in the output video.

Expected Behavior:
The image should appear in the video without any pink artifacts or unwanted colors around the edges.

Actual Behavior:
The image is rendered with pink artifacts around the edges, which affects the visual quality of the video.

Attachments:

Input Image

Output Video with Pink Artifacts

output_video_with_audio.mp4

Environment:

Operating System: Ubuntu 22.04.4 LTS
CUDA 12.2
A100 80G
python=3.11.9

Additional Information:

I have verified that the input image does not have any transparent or pink regions.
This issue occurs consistently with this image with a white background but not for sure to others.

Potential Solutions:

Ensuring the input image has a solid background without transparency.
Checking and adjusting the video encoding settings to handle color space correctly.
Preprocessing the image to eliminate any edge effects before rendering.

Any assistance or guidance on how to resolve this issue would be greatly appreciated. Thank you!

Any video tutorial how to install ffmpeg-static?

Download ffmpeg-static
Download and decompress ffmpeg-static, then

export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static

Trouble with running EchoMimic -- The system cannot find the file specified

I followed the installation instructions for installing EchoMimic on my pc. When I run the command

python webgui.py --server_port=3000

I get the following error message:

File "C:\Users\frc39\echomimic\EchoMimic\src\pipelines\pipeline_echo_mimic.py", line 395, in call
whisper_feature = self.audio_guider.audio2feat(audio_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\audio2feature.py", line 100, in audio2feat
result = self.model.transcribe(audio_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\transcribe.py", line 85, in transcribe
mel = log_mel_spectrogram(audio)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\audio.py", line 111, in log_mel_spectrogram
audio = load_audio(audio)
^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\audio.py", line 44, in load_audio
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\site-packages\ffmpeg_run.py", line 313, in run
process = run_async(
^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\site-packages\ffmpeg_run.py", line 284, in run_async
return subprocess.Popen(
^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\subprocess.py", line 1026, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\subprocess.py", line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified

The paper link is invalid

python -u infer_audio2vid.py, Task been Killed

python -u infer_audio2vid.py
Adding FFMPEG_PATH to PATH
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Killed

Bat File to 1 Click Open Echo Mimic

Can someone please share the .bat file to open Echomimic with 1 Click. THanks in Advance

Is there anyone who set up the enviroment successfully?

conda create -n echomimic python=3.8
conda activate echomimic
pip install -r requirements.txt

The above code does not works well for me, and the following errors is given to me, anyone can help me, or give a docker img to others.


raceback (most recent call last):
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/utils/import_utils.py", line 808, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/autoencoders/__init__.py", line 1, in <module>
    from .autoencoder_asym_kl import AsymmetricAutoencoderKL
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/autoencoders/autoencoder_asym_kl.py", line 21, in <module>
    from ..modeling_outputs import AutoencoderKLOutput
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/modeling_outputs.py", line 7, in <module>
    class AutoencoderKLOutput(BaseOutput):
  File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/utils/outputs.py", line 61, in __init_subclass__
    import torch.utils._pytree
ModuleNotFoundError: No module named 'torch.utils._pytree'

这个项目目前最多能推理多长时间的视频呢？

Slow Inference

I followed the instructions in README.md and run it on Colab, but the elapsed time per timestep is about 3 minutes, the length of the audio is 19 seconds, have you ever tested on Colab?

CFG Test (Gradio Configuration)

1.1 CFG This is lowest setting that works in my test with no error

1.1CFG.mp4

1.5

1.5.mp4

2.mp4

2.5

2.5CFG.mp4

3.mp4

4.mp4

5.mp4

7.mp4

10.mp4

做了个 gradio demo 和 jupyter notebook

解压文件放到 EchoMimic 根目录下
gradio demo and jupyter notebook.zip

Is this suitable for lipsync?

Hi, this project seems to be animating a reference image with driving audio and/or face pose. Is this suitable for lipsync?

Also, this seems to be using video diffusion methods, how do you make it to generate such a long video, without running into resource constraints? Is this using Conv method instead of temporal attention for handling the temporal dimension?

Bugs found

Here are things that probably stopping you from running as same for me. Hope it can help you if meeting same problems.

I am using Windows and Python 3.10.11. The latest torch version (2.3.0) won't work, but previous version 2.2.2 is okay, to install the previous versions check torch installation, e.g. pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
To use pose reference, the mediapipe package needed to be installed pip install mediapipe
The file name denoising_unet-pose.pth needs to be renamed to denoising_unet_pose.pth

The first two can be fixed by modifying the requirements.txt file as follows:

torch>=2.0.1,<=2.2.2
torchvision>=0.15.2,<=0.17.2
torchaudio>=2.0.2,<=2.2.2
mediapipe
transformers>=4.38.2
diffusers==0.24.0
torchmetrics
torchtyping
tqdm
ffmpeg-python==0.2.0
facenet_pytorch==2.5.0
moviepy==1.0.3
einops==0.4.1
omegaconf==2.3.0
opencv-python
av==11.0.0
gradio

Hope the authors can fix them 🔜