guoqincode / open-animateanyone Goto Github PK
View Code? Open in Web Editor NEWUnofficial Implementation of Animate Anyone
Unofficial Implementation of Animate Anyone
Two papers report better results with the 'thicker' style of conditioning images:
https://arxiv.org/pdf/2308.03610.pdf (avatarverse)
https://arxiv.org/pdf/2311.16498.pdf (magicanimate)
Any reason you chose DWpose?
Hi, @guoqincode, thanks for your effort in reimplementing this! Could you show some video results as demonstration?
Could you please provide the file "configs/prompts/animation_stage_1.yaml" for the animation test for the stage 1?
Operations such as cfg_random_null_text
or cfg_random_null_ref
were not used during the training phase, but guidance_scale: 7.5
and do_classifier_free_guidance=True
was set during inference. Is this as expected?
I tried to run demo.gradio_animate, but the following error was reported. Under the models folder, I did not find hack_poseguider
Traceback (most recent call last):
File "/home/work/diffuser-env/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/work/diffuser-env/python/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/work/AnimateAnyone-unofficial/demo/gradio_animate.py", line 8, in
from demo.animate import AnimateAnyone
File "/home/work/AnimateAnyone-unofficial/demo/animate.py", line 21, in
from models.hack_poseguider import Hack_PoseGuider as PoseGuider
ModuleNotFoundError: No module named 'models.hack_poseguider'
Can you provide the file "configs/prompts/animation_stage_1.yaml"
In Tiktok dataset, there is a masks file。 Maybe the foreground is trained separately, Have you taken this into account?
unet = DDP(unet, device_ids=[local_rank], output_device=local_rank)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 551, in init
self._log_and_throw(
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 686, in _log_and_throw
raise err_type(err_msg)
RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.
latents_pose = poseguider(pose_condition)
# latents_pose = rearrange(latents_pose, "(b f) c h w -> b c f h w", f=video_length)
if do_classifier_free_guidance: latents_pose = latents_pose.repeat(2,1,1,1) # b c h w
here instead of repeating, would passing zeros through the poseguider and then catting be more appropriate?
As the title mentions, has this crossed your mind?
Some weights of the model checkpoint were not used when initializing ReferenceNet:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight, up_blocks.3.attentions.2.proj_out.bias, up_blocks.3.attentions.2.proj_out.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight']
is this correct?the training loss is not decreasing,result:
the pose condition is invalid..
Really nice sharing!
It seems that DDP isn't used in train.py.
Thank you for your work.
When I was in the second stage of training, I kept reporting out-of-memory errors. I have 80G of memory. No matter on a single card or multiple cards, the same error was reported. Even if --train_batch_size is set to 1, what went wrong?
error message:
Traceback (most recent call last):
File "/home/work/animate-anyone/train_2nd_stage.py", line 919, in
main(args)
File "/home/work/animate-anyone/train_2nd_stage.py", line 823, in main
model_pred = unet(
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 632, in forward
return model_forward(*args, **kwargs)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 620, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/work/animate-anyone/animate_anyone/models/unet_3d_condition.py", line 1011, in forward
sample = upsample_block(
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/animate-anyone/animate_anyone/models/unet_3d_blocks.py", line 901, in forward
hidden_states = resnet(hidden_states, temb, scale=lora_scale)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/animate-anyone/animate_anyone/models/resnet.py", line 340, in forward
hidden_states = self.norm1(hidden_states)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 810.00 MiB (GPU 0; 79.35 GiB total capacity; 76.87 GiB already allocated; 64.19 MiB free; 77.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
In AnimateAnyone paper, attention1
is responsible for spatial-attention operation,encoder_hidden_states
is embedded in attention2
, and Is it correct to apply classifier-free guidance to attention2
?
Or for image, is it only necessary to set uncodition image_embeddings
to 0 before input unet?
I trained in stage_1, but the loss does not decrease?,is is correct?
i try the 8bit adam optimizer, i can train stage one on 40g a100. i think it can help reduce the vram usage, but i don't know if it will decrease the model performance. what dou you think? did you try the 8bit adam?
Thank you for your contributions! There are two questions below:
Hi,
Thanks for sharing your implementation. It really helps the community a lot to reproduce animate-anyone. When I try to training the network with your code, I find that in the referencenet_attention, the hidden state size of stable diffusion unet is 768 while the clip image feature extracted from clip-vit-large-patch14 is 1024, which causes size mismatch in network forward (however, the hidden size of clip-vit-base-patch32 is 768). As your config yaml file was clip-vit-base-patch32 and recently change to clip-vit-large-patch14, and you mentioned that you use clip-vit-large-patch14 in another issue. Could you elaborate more details how your code works with clip-vit-large-patch14? I encountered errors when I directly run your training code with clip-vit-large-patch14.
Looking forward to your reply! Thanks again for your effort.
Thanks for sharing this repo @guoqincode.
While trying to run stage 2 training, getting this error: AttributeError: ‘PoseGuider’ object has no attribute ‘module’. Did you mean: ‘modules’?
on line 550 in file train.py
.
Do you happen to know why I might be getting this error?
I training the first stage with 8*A800 80G. However, the max batch size can only be set to 1 on each single GPU. Is that normal?
I got train results for both stages 1 and 2. Inference stage one works but creates a video with the same frame for one second; the inference stage 2 module is not working. I tried python -m pipelines.animation_stage_2 --config configs/prompts/animation_stage_2.yaml
. I set the config values correctly. It throws an import error, than I fixed it. I have this error:
from diffusers.pipeline_utils import DiffusionPipeline
loaded temporal unet's pretrained weights from outputs/train_stage_2-2023-12-22T08-59-53
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/AnimateAnyone-unofficial/pipelines/animation_stage_2.py", line 244, in <module>
run(args)
File "/workspace/AnimateAnyone-unofficial/pipelines/animation_stage_2.py", line 233, in run
main(args)
File "/workspace/AnimateAnyone-unofficial/pipelines/animation_stage_2.py", line 70, in main
unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_motion_unet_path, subfolder=None, unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs), specific_model=config.specific_motion_unet_model)
File "/workspace/AnimateAnyone-unofficial/models/unet.py", line 457, in from_pretrained_2d
raise RuntimeError(f"{config_file} does not exist")
RuntimeError: outputs/train_stage_2-2023-12-22T08-59-53/config.json does not exist
File "train_th.py", line 637, in
main(name=name, launcher=args.launcher, use_wandb=args.wandb, **config)
File "train_th.py", line 460, in main
latents_pose = poseguider(mask_image)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/cto_labs/hongfating/workspace/src/AnimateAnyone-unofficial/models/PoseGuider.py", line 78, in forward
x = self.conv_layers(x)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
return F.batch_norm(
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/functional.py", line 2450, in batch_norm
return torch.batch_norm(
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/fx/traceback.py", line 57, in format_stack
return traceback.format_stack()
It should be here: File "/cto_labs/hongfating/workspace/src/AnimateAnyone-unofficial/models/PoseGuider.py", line 78, in forward
x = self.conv_layers(x)
I have no idea about that.
While reading the code I saw that the standard BasicTransformerBlock from diffusers has been replaced with a modified version that utilizes a new class called SparseCausalAttention2D for the attn1 layer. Could you specify where this class is defined? Or maybe, were you able to successfully train the model without using this class (replacing it with a different one)?
我跑了可能需要十几天呢,,
How can I implement this in Comfy UI?
I saw you added an inference cmd to the readme.
Do you have any preliminary results?
In the README, you mentioned that you would optimize the training code using DeepSpeed and Accelerate. However, as far as I know, the DeepSpeed functionality integrated into the Accelerate library does not support multi-model training. Do you have any suggestions?
急急急急
Great work, and i have some question about the attention modules (spatial attention&cross-attention&temporal attention), but the spatial-attention for calculating reference-net latent feature and denoising-unet latent feature is ignored? (cite:we replace the self-attention layer with spatial-attention layer. Given a feature map x1∈Rt×h×w×c from denoising UNet and x2∈Rh×w×c from ReferenceNet, we first copy x2 by t times and concatenate it with x1 along w dimension)
I modified the paths in the configuration file to point to my local directories (UBC Fashion Video dataset) and started the training process. However, an error occurred during the process.
/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
warn(f"Failed to load image Python extension: {e}")
### Train Info: train stage 1: image pretrain ###
Some weights of the model checkpoint were not used when initializing ReferenceNet:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight, up_blocks.3.attentions.2.proj_out.bias, up_blocks.3.attentions.2.proj_out.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight']
12/20/2023 01:40:44 - INFO - root - ***** Running training *****
12/20/2023 01:40:44 - INFO - root - Num examples = 500
12/20/2023 01:40:44 - INFO - root - Num Epochs = 480
12/20/2023 01:40:44 - INFO - root - Instantaneous batch size per device = 4
12/20/2023 01:40:44 - INFO - root - Total train batch size (w. parallel, distributed & accumulation) = 4
12/20/2023 01:40:44 - INFO - root - Gradient Accumulation steps = 1
12/20/2023 01:40:44 - INFO - root - Total optimization steps = 60000
0%| | 0/60000 [00:00<?, ?it/s]
Steps: 0%| | 0/60000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 629, in <module>
main(name=name, launcher=args.launcher, use_wandb=args.wandb, **config)
File "train.py", line 492, in main
referencenet(latents_ref_img, ref_timesteps, encoder_hidden_states)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/AnimateAnyone-unofficial/models/ReferenceNet.py", line 1005, in forward
sample, res_samples = downsample_block(
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 1086, in forward
hidden_states = attn(
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 315, in forward
hidden_states = block(
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/AnimateAnyone-unofficial/models/ReferenceNet_attention.py", line 199, in hacked_basic_transformer_inner_forward
attn_output = self.attn2(
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 417, in forward
return self.processor(
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 1023, in __call__
key = attn.to_k(encoder_hidden_states, scale=scale)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/lora.py", line 224, in forward
out = super().forward(hidden_states)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x1024 and 768x320)
Steps: 0%| | 0/60000 [00:05<?, ?it/s]
[2023-12-20 01:40:55,416] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 412807) of binary: /home/user/miniconda3/envs/animateanyone-unofficial/bin/python
Traceback (most recent call last):
File "/home/user/miniconda3/envs/animateanyone-unofficial/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-12-20_01:40:55
host : gpuserver
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 412807)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Hello, first of all thanks for your work.
I have some questions. During the second stage of training, in the train_stage_2.yaml file,
poseguider_checkpoint_path: ""
referencenet_checkpoint_path: ""
What should these two contents be? Should the model trained in the first stage be written in referencenet_checkpoint_path?
Or something else, I hope to get your reply.
Per the title I've been a little perplexed to see that what was denoised well at 30 inference timesteps @ 60k training steps, requires 70 steps @ 100k training steps.
My implementation is slightly different than yours so there could be quite a few things going on. Just curious if you noticed any similar behaviors since you're in the middle of training these days.
Thank you
In pipelines/animation_stage_1.py
, the parameters of unet are load from config.pretrained_model_path
, does not load from config.pretrained_unet_path
I noticed that you changed beta_schedule from linear to scaled_linear. Is it because the training results are better when using the latter?
In the README, you mentioned that you would optimize the training code using DeepSpeed and Accelerate. However, as far as I know, the DeepSpeed functionality integrated into the Accelerate library does not support multi-model training. Do you have any suggestions about use deepspeed to optimize the memory?
Magicanimate doesn't seem to have it in their pretrained directory. Is it the same as "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" ?
In the official paper, the authors say
While ReferenceNet introduces a comparable number of parameters to the denoising UNet, in diffusion-based video generation, all video frames undergo denoising multiple times, whereas ReferenceNet only needs to extract features once throughout the entire process
But in your implementation of inference, the forward of ReferenceNet is performed multiple times.
Consider fixing the timestep of ReferenceUnet?
Hi, it seems that you train the 2D unet, referencenet, and poseguider during the first stage,
but you don't save parameters of 2D unet.
my config:
train_data:
csv_path: ../TikTok_info.csv
video_folder:../TikTok_dataset/TikTok_dataset
sample_size: 512
sample_stride: 4
sample_n_frames: 16
clip_model_path: openai/clip-vit-base-patch32
gradient_accumulation_steps: 128
batch_size: 1
use 1 V100, optimizer = torch.optim.SGD(trainable_params, lr=learning_rate / gradient_accumulation_steps, momentum=0.9)
result: show the result of 20000 steps
Could it be because the 20,000 steps I have here are actually only equivalent to more than 300 steps when the batchsize is 64? or other reasons?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.