badtobest / echomimic Goto Github PK
View Code? Open in Web Editor NEWLifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Home Page: https://badtobest.github.io/echomimic.html
License: Apache License 2.0
Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Home Page: https://badtobest.github.io/echomimic.html
License: Apache License 2.0
(echomimic) D:\AI\EchoMimic>python -u infer_audio2vid.py
Traceback (most recent call last):
File "infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "D:\AI\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (C:\Users\Renel\anaconda3\envs\echomimic\lib\site-packages\diffusers\models\embeddings.py)
能不能提供提供一个驱动完整图片的功能?或者提供参数只驱动头部区域运动,现在512*512的边缘会运动,使用者无法合回原大小的视频,无法对齐,会有明显分隔
Traceback (most recent call last):
File "/home/tom/fssd/EchoMimic/infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "/home/tom/fssd/EchoMimic/src/models/unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (/opt/conda/lib/python3.10/site-packages/diffusers/models/embeddings.py)
Hi
Great work! One of the best I have tried in comparison to Sadtalker, Hallo, Musepose etc.
I just want to know if there’s way to get the video with the same features as the input image (without cropping into the face)?
Thanks for open sourcing :)
Traceback (most recent call last):
File "E:\ai3\EchoMimic\infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "E:\ai3\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (E:\ai3\EchoMimic\venv\Lib\site-packages\diffusers\models\embeddings.py)
Dear author, it seems this work is taking using the same audio features as in MuseTalk repo.
https://github.com/BadToBest/EchoMimic/blob/main/src/models/whisper/whisper/model.py#L143-L155
According to this, it seems audio features are just output of two Conv1d's plus plus permute dims plus positional encodings, ignoring all the transformer layers in Whisper. Am I correct in my understanding?
运行
python -u webgui.py --server_port=3000
报错误ModuleNotFoundError: No module named 'moviepy'
,但是已经安装了,这是怎么回事?
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
D:\ProgramData\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
Traceback (most recent call last):
File "webgui.py", line 24, in
from moviepy.editor import VideoFileClip, AudioFileClip
ModuleNotFoundError: No module named 'moviepy'
(echomimic) PS C:\Users\italk\EchoMimic> pip install moviepy
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: moviepy in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (1.0.3)
Requirement already satisfied: decorator<5.0,>=4.0.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (4.4.2)
Requirement already satisfied: tqdm<5.0,>=4.11.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (4.66.4)
Requirement already satisfied: requests<3.0,>=2.8.1 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (2.32.3)
Requirement already satisfied: proglog<=1.0.0 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (0.1.10)
Requirement already satisfied: numpy>=1.17.3 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (1.24.4)
Requirement already satisfied: imageio<3.0,>=2.5 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (2.34.2)
Requirement already satisfied: imageio-ffmpeg>=0.2.0 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from moviepy) (0.5.1)
Requirement already satisfied: pillow>=8.3.2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from imageio<3.0,>=2.5->moviepy) (10.4.0)
Requirement already satisfied: setuptools in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from imageio-ffmpeg>=0.2.0->moviepy) (69.5.1)
Requirement already satisfied: charset-normalizer<4,>=2 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from requests<3.0,>=2.8.1->moviepy) (2024.7.4)
Requirement already satisfied: colorama in d:\programdata\miniconda3\envs\echomimic\lib\site-packages (from tqdm<5.0,>=4.11.2->moviepy) (0.4.6)
accelerate 0.32.1
antlr4-python3-runtime 4.9.3
av 11.0.0
certifi 2024.7.4
charset-normalizer 3.3.2
colorama 0.4.6
decorator 4.4.2
diffusers 0.24.0
einops 0.4.1
facenet-pytorch 2.5.0
ffmpeg-python 0.2.0
filelock 3.15.4
fsspec 2024.6.1
future 1.0.0
huggingface-hub 0.23.4
idna 3.7
imageio 2.34.2
imageio-ffmpeg 0.5.1
importlib_metadata 8.0.0
intel-openmp 2021.4.0
Jinja2 3.1.4
lightning-utilities 0.11.3.post0
MarkupSafe 2.1.5
mkl 2021.4.0
moviepy 1.0.3
mpmath 1.3.0
networkx 3.3
numpy 1.26.4
omegaconf 2.3.0
opencv-python 4.10.0.84
packaging 24.1
pillow 10.4.0
pip 24.0
proglog 0.1.10
psutil 6.0.0
PyYAML 6.0.1
regex 2024.5.15
requests 2.32.3
safetensors 0.4.3
setuptools 69.5.1
sympy 1.13.0
tbb 2021.13.0
tokenizers 0.19.1
torch 2.3.1+cu118
torchaudio 2.3.1
torchmetrics 1.4.0.post0
torchtyping 0.1.4
torchvision 0.18.1
tqdm 4.66.4
transformers 4.42.3
typeguard 4.3.0
typing_extensions 4.12.2
urllib3 2.2.2
wheel 0.43.0
zipp 3.19.2
torch.cuda.is_available()是True
torch 2.0.1版本我也试过了,cuda都是可用状态,但运行程序时还是会出现
S H:\AIaudio_Live\EchoMimic> python .\infer_audio2vid.py
C:\Users\Administrator.conda\envs\echo\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
C:\Users\Administrator.conda\envs\echo\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
please download ffmpeg-static and export to FFMPEG_PATH.
For example: export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
File "H:\AIaudio_Live\EchoMimic\infer_audio2vid.py", line 243, in
main()
File "H:\AIaudio_Live\EchoMimic\infer_audio2vid.py", line 186, in main
det_bboxes, probs = face_detector.detect(face_img)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\facenet_pytorch\models\mtcnn.py", line 313, in detect
batch_boxes, batch_points = detect_face(
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\facenet_pytorch\models\utils\detect_face.py", line 79, in detect_face
pick = batched_nms(boxes_scale[:, :4], boxes_scale[:, 4], image_inds_scale, 0.5)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 75, in batched_nms
return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torch\jit_trace.py", line 1254, in wrapper
return fn(*args, **kwargs)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 94, in batched_nms_coordinate_trick
keep = nms(boxes_for_nms, scores, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torchvision\ops\boxes.py", line 41, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
File "C:\Users\Administrator.conda\envs\echo\lib\site-packages\torch_ops.py", line 854, in call
return self._op(*args, **(kwargs or {}))
NotImplementedError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, Meta, QuantizedCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
如题。
what is the inference time per frame on a low/mid level card?
能否提供自适应多卡推理功能,我在四卡机器上运行,只使用了第一张显卡,速度不够理想
Is the maximum length only 48 seconds? If it exceeds 48 seconds, the image will stop moving.
刚体验了下,照片为半身照的,效果还行,echomimic支持照片为全身照的吗?
I followed the instructions and it is not working
python -u demo_motion_sync.py
Output
https://youtu.be/1JsPRYPiQso
python -u infer_audio2vid_pose.py [with draw_mouse=True]
No output produced, no error in console
(echomimic) C:\sd\EchoMimic> python -u demo_motion_sync.py
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1720949138.076016 3284 face_landmarker_graph.cc:174] Sets FaceBlendshapesGraph acceleration to xnnpack by default.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
288
(echomimic) C:\sd\EchoMimic>python -u infer_audio2vid_pose.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
video in 24 FPS, audio idx in 50FPS
latents shape:torch.Size([1, 4, 160, 64, 64]), video_length:160
(echomimic) C:\sd\EchoMimic>
(echomimic) C:\sd\EchoMimic>python -u infer_audio2vid_pose.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
video in 24 FPS, audio idx in 50FPS
latents shape:torch.Size([1, 4, 160, 64, 64]), video_length:160
(echomimic) C:\sd\EchoMimic>
When I was using this project, I saw many customizable items inside, but I don't know what their purpose is and what they will affect. What is the best way to match them. If you test it yourself, it involves computing power issues and takes too much time.
The following warning appears during runtime,Does this matter?
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
[0, 0, 1342, 1342]
Are there any comfyui nodes for this and are there safetensor versions of the models used? In this day and age we have to be so careful with models, been a fair few bad actors in the ai community.
On paper this really does look like the answer to my prayers. I have a ton of audio files that i need to get an avatar to speak (single image) for a project I am working on. He is a story teller and Bio reader for our guild. all the nodes I have looked at so far need a video to drive the animation, somewhat defeating the purpose) or not installable.
I am getting this error
In infer_audio2vid.py
ffmpeg_path = os.getenv('FFMPEG_PATH')
C:\sd>echo %FFMPEG_PATH%
C:\sd\ffmpeg-4.4-amd64-static
Still I see this error
(echomimic) C:\tut\EchoMimic>python infer_audio2vid.py
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
C:\Users\nitin\miniconda3\envs\echomimic\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
add ffmpeg to path
After applying the command
conda create -n echomimic python=3.8
I get the following error in the command prompt window:
Collecting package metadata (current_repodata.json): failed
CondaSSLError: OpenSSL appears to be unavailable on this machine. OpenSSL is required to
download and install packages.
Exception: HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/main/win-64/current_repodata.json (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))
Any idea how to solve this please?
need this demo page to check the example script
(echomimic) C:\ALLWEBUI\EchoMimic\EchoMimic>python -u infer_audio2vid.py
Traceback (most recent call last):
File "infer_audio2vid.py", line 23, in
from src.models.unet_2d_condition import UNet2DConditionModel
File "C:\ALLWEBUI\EchoMimic\EchoMimic\src\models\unet_2d_condition.py", line 18, in
from diffusers.models.embeddings import (
ImportError: cannot import name 'PositionNet' from 'diffusers.models.embeddings' (C:\Users\admin.conda\envs\echomimic\lib\site-packages\diffusers\models\embeddings.py)
100 Steps
50 Steps in configuration
20 steps
30 steps
Is there any way to minimize the eye glitch at the end?
This is the best talking avatar that I have ever used, but it takes 7 minutes for a 5-second video in 3080. I gave up trying to install this, but now it is working. This is way better than Hedra and LivePortrait.
I even asked for help here and i still got so many errors with no solution. Thank you so much for this tutorial. below
Hello and thanks for this wonderful work
Please tell me, before installing I would like to know if this works with Russian voice acting?
PS: sorry for my english
Description:
I am experiencing an issue with video rendering where pink artifacts appear around the images. This happens when I use a specific image to generate a video. The pink color appears prominently around the edges of the image, causing an unwanted visual effect.
Steps to Reproduce:
Expected Behavior:
The image should appear in the video without any pink artifacts or unwanted colors around the edges.
Actual Behavior:
The image is rendered with pink artifacts around the edges, which affects the visual quality of the video.
Attachments:
Environment:
Additional Information:
Potential Solutions:
Any assistance or guidance on how to resolve this issue would be greatly appreciated. Thank you!
Download ffmpeg-static
Download and decompress ffmpeg-static, then
export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static
I followed the installation instructions for installing EchoMimic on my pc. When I run the command
python webgui.py --server_port=3000
I get the following error message:
File "C:\Users\frc39\echomimic\EchoMimic\src\pipelines\pipeline_echo_mimic.py", line 395, in call
whisper_feature = self.audio_guider.audio2feat(audio_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\audio2feature.py", line 100, in audio2feat
result = self.model.transcribe(audio_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\transcribe.py", line 85, in transcribe
mel = log_mel_spectrogram(audio)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\audio.py", line 111, in log_mel_spectrogram
audio = load_audio(audio)
^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\echomimic\EchoMimic\src\models\whisper\whisper\audio.py", line 44, in load_audio
.run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\site-packages\ffmpeg_run.py", line 313, in run
process = run_async(
^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\site-packages\ffmpeg_run.py", line 284, in run_async
return subprocess.Popen(
^^^^^^^^^^^^^^^^^
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\subprocess.py", line 1026, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\frc39\miniconda3\envs\echomimic\Lib\subprocess.py", line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified
python -u infer_audio2vid.py
Adding FFMPEG_PATH to PATH
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, down_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, down_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, mid_block.attentions.0.transformer_blocks.0.attn2.to_q.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.weight, mid_block.attentions.0.transformer_blocks.0.attn2.to_out.0.bias, mid_block.attentions.0.transformer_blocks.0.norm2.weight, mid_block.attentions.0.transformer_blocks.0.norm2.bias, conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Killed
Can someone please share the .bat file to open Echomimic with 1 Click. THanks in Advance
conda create -n echomimic python=3.8
conda activate echomimic
pip install -r requirements.txt
The above code does not works well for me, and the following errors is given to me, anyone can help me, or give a docker img to others.
raceback (most recent call last):
File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/utils/import_utils.py", line 808, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/autoencoders/__init__.py", line 1, in <module>
from .autoencoder_asym_kl import AsymmetricAutoencoderKL
File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/autoencoders/autoencoder_asym_kl.py", line 21, in <module>
from ..modeling_outputs import AutoencoderKLOutput
File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/models/modeling_outputs.py", line 7, in <module>
class AutoencoderKLOutput(BaseOutput):
File "/ssdcache/jtmeng/miniconda3/envs/echomimic/lib/python3.8/site-packages/diffusers/utils/outputs.py", line 61, in __init_subclass__
import torch.utils._pytree
ModuleNotFoundError: No module named 'torch.utils._pytree'
I followed the instructions in README.md and run it on Colab, but the elapsed time per timestep is about 3 minutes, the length of the audio is 19 seconds, have you ever tested on Colab?
1.1 CFG This is lowest setting that works in my test with no error
1.5
2
2.5
3
4
5
7
10
解压文件放到 EchoMimic 根目录下
gradio demo and jupyter notebook.zip
Hi, this project seems to be animating a reference image with driving audio and/or face pose. Is this suitable for lipsync?
Also, this seems to be using video diffusion methods, how do you make it to generate such a long video, without running into resource constraints? Is this using Conv method instead of temporal attention for handling the temporal dimension?
Here are things that probably stopping you from running as same for me. Hope it can help you if meeting same problems.
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
pip install mediapipe
denoising_unet-pose.pth
needs to be renamed to denoising_unet_pose.pth
The first two can be fixed by modifying the requirements.txt
file as follows:
torch>=2.0.1,<=2.2.2
torchvision>=0.15.2,<=0.17.2
torchaudio>=2.0.2,<=2.2.2
mediapipe
transformers>=4.38.2
diffusers==0.24.0
torchmetrics
torchtyping
tqdm
ffmpeg-python==0.2.0
facenet_pytorch==2.5.0
moviepy==1.0.3
einops==0.4.1
omegaconf==2.3.0
opencv-python
av==11.0.0
gradio
Hope the authors can fix them 🔜
Congrats on this great work!! When will the train code be released? Thanks!
可以在mac电脑M芯片上面跑吗 后期有考虑支持吗
that would be great
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.