Giter Site home page Giter Site logo

cfld's Introduction

CFLD arXiv

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Yanzuo Lu, Manlin Zhang, Andy J Ma, Xiaohua Xie, Jian-Huang Lai
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), June 17-21, 2024, Seattle, USA

qualitative

TL;DR

If you want to cite and compare with out method, please download the generated images from Google Drive here. (Including 256x176, 512x352 on DeepFashion, and 128x64 on Market-1501)

pipeline

News๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

  • 2024/02/27ย ย Our paper titled "Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis" is accepted by CVPR 2024.
  • 2024/02/28ย ย We release the code and upload the arXiv preprint.
  • 2024/03/09ย ย The checkpoints on DeepFashion dataset is released on Google Drive.
  • 2024/03/09ย ย We note that the file naming used by different open source codes can be extremely confusing. To facilitate future work, we have organized the generated images of several methods that we used for qualitative comparisons in the paper. They were uniformly resized to 256X176 or 512x352, stored as png files and used the same naming format. Enjoy!๐Ÿค—
  • 2024/03/20ย ย We upload the jupyter notebook for inference/reasoning. You could modify it as you want, e.g. replacing the conditional image with your customized one and randomly sampling a target pose from the test dataset.
  • 2024/04/05ย ย Our paper is accepted as CVPR 2024 Highlight!!!
  • 2024/04/10ย ย The camera-ready version is available on arXiv now. The supplementary material with more discussions and results was added.

Preparation

Install Environment

conda env create -f environment.yaml

Download DeepFashion Dataset

  • Download Img/img_highres.zip from the In-shop Clothes Retrieval Benchmark of DeepFashion, unzip it under ./fashion directory. (Password would be required, please contact the authors of DeepFashion (not us!!!) for permission.)
  • Download train/test pairs and keypoints from DPTN, put them under ./fashion directory.
  • Make sure the tree of ./fashion directory is as follows.
    fashion
    โ”œโ”€โ”€ fashion-resize-annotation-test.csv
    โ”œโ”€โ”€ fashion-resize-annotation-train.csv
    โ”œโ”€โ”€ fashion-resize-pairs-test.csv
    โ”œโ”€โ”€ fashion-resize-pairs-train.csv
    โ”œโ”€โ”€ MEN
    โ”œโ”€โ”€ test.lst
    โ”œโ”€โ”€ train.lst
    โ””โ”€โ”€ WOMEN
    
  • Run generate_fashion_datasets.py with python.

Download Pre-trained Models

Training

For multi-gpu, run the following command by default.

bash scripts/multi_gpu/pose_transfer_train.sh 0,1,2,3,4,5,6,7

For single-gpu, run the following command by default.

bash scripts/single_gpu/pose_transfer_train.sh 0

For ablation studies, run the following command by example to specify configs.

bash scripts/multi_gpu/pose_transfer_train.sh 0,1,2,3,4,5,6,7 --config_file configs/ablation_study/no_app.yaml

Inference

For multi-gpu, run the following command by example to specify checkpoints.

bash scripts/multi_gpu/pose_transfer_test.sh 0,1,2,3,4,5,6,7 MODEL.PRETRAINED_PATH checkpoints

For single-gpu, run the following command by example to specify checkpoints.

bash scripts/single_gpu/pose_transfer_test.sh 0 MODEL.PRETRAINED_PATH checkpoints

Citation

@inproceedings{lu2024coarse,
  title={Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis},
  author={Lu, Yanzuo and Zhang, Manlin and Ma, Andy J and Xie, Xiaohua and Lai, Jian-Huang},
  booktitle={CVPR},
  year={2024}
}

cfld's People

Contributors

yanzuolu avatar

Stargazers

yuangan avatar Zechuan Zhang avatar  avatar John D. Pope avatar ่ข้˜ณ avatar Guian Fang avatar  avatar  avatar  avatar  avatar ForeverPupil avatar EinEnergy avatar  avatar 0xhephaistos avatar  avatar Lizhen Wang avatar LanjiongLi avatar  avatar Fancy22 avatar  avatar Yasin YILMAZ avatar  avatar Cengizhan Yurdakul avatar JiankangDeng avatar Wu Junxian avatar ShenXiaolei avatar ๋ฌธ์ด์„ธ avatar  avatar Yongchang Zhang avatar  avatar  avatar  avatar  avatar Joey_Chen avatar Great Expectations avatar Huayu Zhang avatar Junfa Liu avatar Shuolin avatar  avatar  avatar shuoyang avatar  avatar richfrain avatar HUANG Yuanhao avatar  avatar Kuqs avatar ChangHee Yang avatar  avatar Junho Park avatar Xuanlong Yu avatar Park Sanghyeok avatar  avatar  avatar  avatar WhiteLiu avatar  avatar kiteday avatar  avatar  avatar  avatar tigerwang avatar  avatar  avatar longman avatar Shuchang Zhou avatar Binglei Li avatar coco avatar Sanbu avatar ้…้Ÿณๅ‘˜ avatar liangzhuofa avatar MemeCat avatar mingxuan007 avatar  avatar Bhchen avatar Erin Kennedy avatar Xin Dong avatar nonterreaplusultra avatar ZigChang avatar boris avatar  avatar  avatar Viv avatar Sixiang Chen avatar Guangtao Lyu ( ๅ•ๅ…‰ๆถ› ) avatar fengzhihui avatar  avatar Ajitabh Kumar avatar ZacharyG avatar yunyang Ge avatar Zelin Zhao avatar  avatar  avatar ryan avatar  avatar  avatar huangshenneng avatar  avatar  avatar killer9 avatar  avatar

Watchers

Kostas Georgiou avatar  avatar  avatar Sanbu avatar  avatar

cfld's Issues

About the checkpoint

May I ask if you can provide a trained model that I would like to use for reasoning

Pose2Video

Hi,
Thanks for releasing this amazing program!
Do you have any plans to expand UNET from 2D to 3D and realize pose to video generation? like animate-anyone

questions about metric calculate

thanks for your great efforts in this work
could you release a simple metrics calculate script,that only need to input gen imgs file and gt imgs file?
thanks a lot

Why the Perception-Refined Decoder can extract specific information like gender, hairstyle and so on ?

Hi, I see you wrote that in your paper: "By revisiting how people perceive a person image, we find several common characteristics, i.e., human body parts, age, gender, hairstyle, clothing, and so on, as demonstrated in Fig. 1(a)" But how do you make sure that these transformerblocks extract the right information that you want but not something else ? The shape of the hidden_states that the decoder outputs is [batchsize, 8, 768]. I want to know how these eight kinds of information are decoupled from the other of the Irrelevant information.

Thanks very much!

Code question about decoder.

In your paper, Perception-Refined Decoder uses source image encoder. So, I thought appearance encoder should be used, but in your code you use 'down_block_additional_residuals' which uses pose encoder. Why is it?
def forward(self, batched_inputs):
mask = batched_inputs["mask"] if "mask" in batched_inputs else None
x, features = self.backbone(batched_inputs["img_cond"], mask=mask)
up_block_additional_residuals = self.appearance_encoder(features)

    bsz = x.shape[0]
    if self.training:
        bsz = bsz * 2
        down_block_additional_residuals = self.pose_encoder(torch.cat([batched_inputs["pose_img_src"], batched_inputs["pose_img_tgt"]]))
        up_block_additional_residuals = {k: torch.cat([v, v]) for k, v in up_block_additional_residuals.items()}
        # why self.decoder uses pose_encoder?
        c = self.decoder(x, features, down_block_additional_residuals)

Issue with inferencing: AttributeError of 'FrozenDict'

Hello!

I am a college student interested in your work and currently attempting to leverage your published model for my course project. However while trying to run playground.ipynb (also tried in the command line, same error), I encountered the error below.

Please check it and help me solve this problem. Thank you very much!


AttributeError Traceback (most recent call last)
Cell In[10], line 18
16 inputs = torch.cat([noisy_latents, noisy_latents, noisy_latents], dim=0)
17 inputs = noise_scheduler.scale_model_input(inputs, timestep=t)
---> 18 noise_pred = unet(sample=inputs, timestep=t, encoder_hidden_states=c_new,
19 down_block_additional_residuals=copy.deepcopy(down_block_additional_residuals),
20 up_block_additional_residuals=copy.deepcopy(up_block_additional_residuals))
22 noise_pred_uc, noise_pred_down, noise_pred_full = noise_pred.chunk(3)
23 noise_pred = noise_pred_uc +
24 cfg.TEST.DOWN_BLOCK_GUIDANCE_SCALE * (noise_pred_down - noise_pred_uc) +
25 cfg.TEST.FULL_GUIDANCE_SCALE * (noise_pred_full - noise_pred_down)

File ~/anaconda3/envs/CFLD/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File ~/anaconda3/envs/CFLD/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /home/dlpj/CFLD/models/unet.py:1946, in UNet.forward(self, sample, timestep, **kwargs)
1945 def forward(self, sample, timestep, **kwargs):
-> 1946 return self.model(sample, timestep, **kwargs).sample

File ~/anaconda3/envs/CFLD/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File ~/anaconda3/envs/CFLD/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /home/dlpj/CFLD/models/unet.py:1684, in ResidualUNet2DConditionModel.forward(self, sample, timestep, encoder_hidden_states, class_labels, timestep_cond, attention_mask, cross_attention_kwargs, added_cond_kwargs, down_block_additional_residuals, mid_block_additional_residual, up_block_additional_residuals, encoder_attention_mask, return_dict)
1681 encoder_attention_mask = encoder_attention_mask.unsqueeze(1)
1683 # 0. center input if necessary
-> 1684 if self.config.center_input_sample:
1685 sample = 2 * sample - 1.0
1687 # 1. time

AttributeError: 'FrozenDict' object has no attribute 'center_input_sample'

Classifier-free Guidance for training

Hi authors. Thanks again for your great work! I have noticed that your methodology incorporates the use of classifier-free guidance both during the training and testing phases. The configuration parameter 'MODEL.U_COND_DOWN_BLOCK_GUIDANCE' has been set to False, which seemingly implies that the model has not undergone training in an unconditional pose setting. I would greatly appreciate clarification on your approach in this regard.

image

image

Reproduce question on test split

Hi.
When i use the checkpoint offered and run the pose_transfer_test.sh using the default hyperparam, I can't generate the same picture you offered in README, also results in poor performance in metric compared to the images you offered.
So i want to know are there some changes of hyperparams when you generate results images compared to the default hyperparam

The code of loss function

Thanks for your great work and released code!

I have two problems for code in pose_transfer_train.py:

  1. the losses said in paper are reconstruction loss and mse loss:
    image
    image
    image
    But there is only 1 line code in implementation:
    image

  2. Why do the pose_img_src and pose_img_tgt are concated for pose encoder? And why do the img_src and img_tgt are concated for input?
    image
    image

about Inference

how can i inference with my own local images๏ผŸ
image
it seems like test all the testdataset?

Inference image resolution 256x176

Hi @YanzuoLu , Thanks for sharing this great work!

I want to run the inference for the image resolution 256x176. Could you share the instruction and the config file for your model? In the playground.ipynb, the latent output size is 64x64, and the output of the value decoder is 512x512 now.

Thank you.

about build_pose_img Function's Output

in your code :

  def build_pose_img(self, img_path):
       string = self.annotation_file.loc[os.path.basename(img_path)]
       array = load_pose_cords_from_strings(string['keypoints_y'], string['keypoints_x'])
       pose_map = torch.tensor(cords_to_map(array, tuple(self.pose_img_size), (256, 176)).transpose(2, 0, 1), dtype=torch.float32)
       pose_img = torch.tensor(draw_pose_from_cords(array, tuple(self.pose_img_size), (256, 176)).transpose(2, 0, 1) / 255., dtype=torch.float32)
       pose_img = torch.cat([pose_img, pose_map], dim=0)
       return pose_img

I am curious about the design choice in the build_pose_img function where it concatenates pose_img and pose_map, resulting in a tensor with 21 channels. My initial expectation was that the function would directly return the pose_img with 3 channels. I am interested in understanding the rationale behind using 21 channels instead.

What is the purpose of concatenating pose_img with pose_map, and how does it benefit the overall model or application?

Another question: what is the difference between these two images(img_src and img_cond)? Which img is used for training๏ผŸ

return_dict = {
            "img_src": img_src,
            "img_tgt": img_tgt,
            "img_cond": img_cond,
            "pose_img_src": pose_img_src,
            "pose_img_tgt": pose_img_tgt
        }

about pose_encoder

I want to use DWPOSE to replace OPENPOSE key points. Can I delete the weight of pose_encoder and retrain?

    del state_dict['pose_encoder.conv_in.weight']
    del state_dict['pose_encoder.conv_in.bias']

Issues for running playground.ipynb

There are some errors at the code line, "unet = UNet(cfg).eval().requires_grad_(False).cuda()", during running playground.ipynb. The error messages are as the below.


ValueError Traceback (most recent call last)
Cell In[4], line 4
2 vae = VariationalAutoencoder(pretrained_path="pretrained_models/vae").eval().requires_grad_(False).cuda()
3 model = build_model(cfg).eval().requires_grad_(False).cuda()
----> 4 unet = UNet(cfg).eval().requires_grad_(False).cuda()

File /data/CFLD/models/unet.py:1913, in UNet.init(self, cfg)
1910 def init(self, cfg):
1911 super().init()
-> 1913 self.model = ResidualUNet2DConditionModel.from_pretrained(
1914 cfg.MODEL.UNET_CONFIG.PRETRAINED_PATH, use_safetensors = False)
1915 self.model.requires_grad_(False)
1916 self.model.enable_xformers_memory_efficient_attention()

File /usr/local/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py:114, in validate_hf_hub_args.._inner_fn(*args, **kwargs)
111 if check_use_auth_token:
112 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs)
--> 114 return fn(*args, **kwargs)

File /usr/local/lib/python3.9/site-packages/diffusers/models/modeling_utils.py:650, in ModelMixin.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
647 if low_cpu_mem_usage:
648 # Instantiate model with empty weights
649 with accelerate.init_empty_weights():
--> 650 model = cls.from_config(config, **unused_kwargs)
652 # if device_map is None, load the state dict and move the params from meta device to the cpu
653 if device_map is None:

File /usr/local/lib/python3.9/site-packages/diffusers/configuration_utils.py:260, in ConfigMixin.from_config(cls, config, return_unused_kwargs, **kwargs)
258 # Return model and optionally state and/or unused_kwargs
259 print("init_dict: ", init_dict, flush=True)
--> 260 model = cls(**init_dict)
262 # make sure to also save config parameters that might be used for compatible classes
263 # update _class_name
264 if "_class_name" in hidden_dict:

File /usr/local/lib/python3.9/site-packages/diffusers/configuration_utils.py:654, in register_to_config..inner_init(self, *args, **kwargs)
652 new_kwargs = {**config_init_kwargs, **new_kwargs}
653 getattr(self, "register_to_config")(**new_kwargs)
--> 654 init(self, *args, **init_kwargs)

File /data/CFLD/models/unet.py:1564, in ResidualUNet2DConditionModel.init(self, sample_size, in_channels, out_channels, center_input_sample, flip_sin_to_cos, freq_shift, down_block_types, mid_block_type, up_block_types, only_cross_attention, block_out_channels, layers_per_block, downsample_padding, mid_block_scale_factor, act_fn, norm_num_groups, norm_eps, cross_attention_dim, transformer_layers_per_block, encoder_hid_dim, encoder_hid_dim_type, attention_head_dim, num_attention_heads, dual_cross_attention, use_linear_projection, class_embed_type, addition_embed_type, addition_time_embed_dim, num_class_embeds, upcast_attention, resnet_time_scale_shift, resnet_skip_time_act, resnet_out_scale_factor, time_embedding_type, time_embedding_dim, time_embedding_act_fn, timestep_post_act, time_cond_proj_dim, conv_in_kernel, conv_out_kernel, projection_class_embeddings_input_dim, class_embeddings_concat, mid_block_only_cross_attention, cross_attention_norm, addition_embed_type_num_heads)
1561 else:
1562 add_upsample = False
-> 1564 up_block = get_residual_up_block(
1565 up_block_type,
1566 num_layers=reversed_layers_per_block[i] + 1,
1567 transformer_layers_per_block=reversed_transformer_layers_per_block[i],
1568 in_channels=input_channel,
1569 out_channels=output_channel,
1570 prev_output_channel=prev_output_channel,
1571 temb_channels=blocks_time_embed_dim,
1572 add_upsample=add_upsample,
1573 resnet_eps=norm_eps,
1574 resnet_act_fn=act_fn,
1575 resnet_groups=norm_num_groups,
1576 cross_attention_dim=reversed_cross_attention_dim[i],
1577 num_attention_heads=reversed_num_attention_heads[i],
1578 dual_cross_attention=dual_cross_attention,
1579 use_linear_projection=use_linear_projection,
1580 only_cross_attention=only_cross_attention[i],
1581 upcast_attention=upcast_attention,
1582 resnet_time_scale_shift=resnet_time_scale_shift,
1583 resnet_skip_time_act=resnet_skip_time_act,
1584 resnet_out_scale_factor=resnet_out_scale_factor,
1585 cross_attention_norm=cross_attention_norm,
1586 attention_head_dim=attention_head_dim[i] if attention_head_dim[i] is not None else output_channel,
1587 )
1588 self.up_blocks.append(up_block)
1589 prev_output_channel = output_channel

File /data/CFLD/models/unet.py:1061, in get_residual_up_block(up_block_type, num_layers, in_channels, out_channels, prev_output_channel, temb_channels, add_upsample, resnet_eps, resnet_act_fn, transformer_layers_per_block, num_attention_heads, resnet_groups, cross_attention_dim, dual_cross_attention, use_linear_projection, only_cross_attention, upcast_attention, resnet_time_scale_shift, resnet_skip_time_act, resnet_out_scale_factor, cross_attention_norm, attention_head_dim, upsample_type)
1059 if cross_attention_dim is None:
1060 raise ValueError("cross_attention_dim must be specified for CrossAttnUpBlock2D")
-> 1061 return ResidualCrossAttnUpBlock2D(
1062 num_layers=num_layers,
1063 transformer_layers_per_block=transformer_layers_per_block,
1064 in_channels=in_channels,
1065 out_channels=out_channels,
1066 prev_output_channel=prev_output_channel,
1067 temb_channels=temb_channels,
1068 add_upsample=add_upsample,
1069 resnet_eps=resnet_eps,
1070 resnet_act_fn=resnet_act_fn,
1071 resnet_groups=resnet_groups,
1072 cross_attention_dim=cross_attention_dim,
1073 num_attention_heads=num_attention_heads,
1074 dual_cross_attention=dual_cross_attention,
1075 use_linear_projection=use_linear_projection,
1076 only_cross_attention=only_cross_attention,
1077 upcast_attention=upcast_attention,
1078 resnet_time_scale_shift=resnet_time_scale_shift,
1079 )
1080 elif up_block_type == "SimpleCrossAttnUpBlock2D":
1081 if cross_attention_dim is None:

File /data/CFLD/models/unet.py:890, in ResidualCrossAttnUpBlock2D.init(self, in_channels, out_channels, prev_output_channel, temb_channels, dropout, num_layers, transformer_layers_per_block, resnet_eps, resnet_time_scale_shift, resnet_act_fn, resnet_groups, resnet_pre_norm, num_attention_heads, cross_attention_dim, output_scale_factor, add_upsample, dual_cross_attention, use_linear_projection, only_cross_attention, upcast_attention)
874 resnets.append(
875 ResidualResnetBlock2D(
876 in_channels=resnet_in_channels + res_skip_channels,
(...)
886 )
887 )
888 if not dual_cross_attention:
889 attentions.append(
--> 890 ResidualTransformer2DModel(
891 num_attention_heads,
892 out_channels // num_attention_heads,
893 in_channels=out_channels,
894 num_layers=transformer_layers_per_block,
895 cross_attention_dim=cross_attention_dim,
896 norm_num_groups=resnet_groups,
897 use_linear_projection=use_linear_projection,
898 only_cross_attention=only_cross_attention,
899 upcast_attention=upcast_attention,
900 )
901 )
902 else:
903 attentions.append(
904 DualTransformer2DModel(
905 num_attention_heads,
(...)
911 )
912 )

File /usr/local/lib/python3.9/site-packages/diffusers/configuration_utils.py:654, in register_to_config..inner_init(self, *args, **kwargs)
652 new_kwargs = {**config_init_kwargs, **new_kwargs}
653 getattr(self, "register_to_config")(**new_kwargs)
--> 654 init(self, *args, **init_kwargs)

File /data/CFLD/models/unet.py:502, in ResidualTransformer2DModel.init(self, num_attention_heads, attention_head_dim, in_channels, out_channels, num_layers, dropout, norm_num_groups, cross_attention_dim, attention_bias, sample_size, num_vector_embeds, patch_size, activation_fn, num_embeds_ada_norm, use_linear_projection, only_cross_attention, upcast_attention, norm_type, norm_elementwise_affine)
479 @register_to_config
480 def init(
481 self,
(...)
500 norm_elementwise_affine: bool = True,
501 ):
--> 502 super(Transformer2DModel, self).init()
503 self.use_linear_projection = use_linear_projection
504 self.num_attention_heads = num_attention_heads

File /usr/local/lib/python3.9/site-packages/diffusers/configuration_utils.py:654, in register_to_config..inner_init(self, *args, **kwargs)
652 new_kwargs = {**config_init_kwargs, **new_kwargs}
653 getattr(self, "register_to_config")(**new_kwargs)
--> 654 init(self, *args, **init_kwargs)

File /usr/local/lib/python3.9/site-packages/diffusers/models/transformers/transformer_2d.py:151, in Transformer2DModel.init(self, num_attention_heads, attention_head_dim, in_channels, out_channels, num_layers, dropout, norm_num_groups, cross_attention_dim, attention_bias, sample_size, num_vector_embeds, patch_size, activation_fn, num_embeds_ada_norm, use_linear_projection, only_cross_attention, double_self_attention, upcast_attention, norm_type, norm_elementwise_affine, norm_eps, attention_type, caption_channels, interpolation_scale)
146 raise ValueError(
147 f"Cannot define both num_vector_embeds: {num_vector_embeds} and patch_size: {patch_size}. Make"
148 " sure that either num_vector_embeds or num_patches is None."
149 )
150 elif not self.is_input_continuous and not self.is_input_vectorized and not self.is_input_patches:
--> 151 raise ValueError(
152 f"Has to define in_channels: {in_channels}, num_vector_embeds: {num_vector_embeds}, or patch_size:"
153 f" {patch_size}. Make sure that in_channels, num_vector_embeds or num_patches is not None."
154 )
156 # 2. Define input layers
157 if self.is_input_continuous:

ValueError: Has to define in_channels: None, num_vector_embeds: None, or patch_size: None. Make sure that in_channels, num_vector_embeds or num_patches is not None.

Training GPU

Hello, we are very interested in your project and we would like to try training with your code. What type of GPU did you use?

Thank you.

Is pose map must needed? I want to inference with my own dataset for pose

Is pose map(keypoint coordinate) must needed? I want to inference with my own dataset for pose.

def build_pose_img(annotation_file, img_path):
string = annotation_file.loc[os.path.basename(img_path)]
array = load_pose_cords_from_strings(string['keypoints_y'], string['keypoints_x'])
pose_map = torch.tensor(cords_to_map(array, (256, 256), (256, 176)).transpose(2, 0, 1), dtype=torch.float32)
pose_img = torch.tensor(draw_pose_from_cords(array, (256, 256), (256, 176)).transpose(2, 0, 1) / 255., dtype=torch.float32)
pose_img = torch.cat([pose_img, pose_map], dim=0)
return pose_img

17

Issue with Loading Pre-trained Weights for Fine-tuning

Hello,

I hope this message finds you well. I am a senior student deeply interested in your work and currently attempting to leverage your published model for my academic project.

While trying to load the pre-trained weights for fine-tuning on my dataset, I encountered an error, which I am struggling to resolve. I have attached a screenshot to illustrate the issue more clearly.

The process I followed is based on the instructions provided in your documentation, aiming to load the pre-trained weights and then fine-tune the model on my data. However, upon execution, I encountered the following error:

image
image

I would greatly appreciate it if you could take a moment to look into this matter and provide any guidance or suggestions that might help me resolve this issue.

Thank you very much for your time and assistance. Your work is highly inspiring, and I am eager to apply it to my project successfully.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.