Giter Site home page Giter Site logo

nvlabs / affordance_diffusion Goto Github PK

View Code? Open in Web Editor NEW
89.0 5.0 4.0 12.82 MB

Codes for "Affordance Diffusion: Synthesizing Hand-Object Interactions"

Home Page: https://github.com/NVlabs/affordance_diffusion/blob/master

Python 99.78% Shell 0.22%
diffusion-models vision

affordance_diffusion's Introduction

Affordance Diffusion: Synthesizing Hand-Object Interactions

Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan Birchfield, Jiaming Song, Shubham Tulsiani, Sifei Liu

in CVPR2023

Tl;dr: Given a single RGB image of an object, hallucinate plausible ways of human interacting with it.

[Project Page] [Video] [Arxiv] [Data Generation]

Installation

See install.md

Inference

HOI synthesis

python inference.py data.data_dir='docs/demo/*.*g' test_num=3

Inference script first synthesizes $test_num HOI images in batch and then extract 3D hand pose.

Input Synthesized HOI images Extracted 3D Hand Pose

Interpolation

The script takes in the layout parameter of the $index-th example predicted from inference.py, and smoothly interpolates the HOI synthesis to the horizontally flipped parameters. To run demo,

python -m scripts.interpolate dir=docs/demo_inter

This should gives results similar to:

Input Interpolated Layouts Output
Addtional parameters ``` python -m scripts.interpolate dir=\${output}/release/layout/cascade index=0000_00_s0 ```
  • interpolation.len: length of a interpolation sequence
  • interpolation.num: number of interpolation sequences
  • interpolation.test_name: subfolder to save the output
  • interpolation.orient: whether to horizontally flip approaching direction

Heatmap Guidance

The following command runs guided generation with keypoints in docs/demo_kpts

python inference.py  mode=hijack data.data_dir='docs/demo_kpts/*.png' test_name=hijack

This should gives results similar to:

Input 1 Output 1 Input 2 Output 2

Training

Data Preprocessing

We provide the script to generate the HO3Pair dataset. Please see preprocess/.

Train your own models

python -m models.base -m  --config-name=train \
  expname=reproduce/\${model.module} \
  model=layout 
python -m models.base -m  --config-name=train \
  expname=reproduce/\${model.module} \
  model=content_glide
  • ContentNet-LDM: First download off-shelf pretrained model from here and put it under ${environment.pretrain}/stable/inpaint.ckpt specified in configs/model/content_ldm.yaml:resume_ckpt
python -m models.base -m  --config-name=train \
  expname=reproduce/\${model.module} \
  model=content_ldm 

Split and test images

Per-category HOI4D instance splits (was not used in the paper), test images on HOI4D and EPIC-KITCHENS(VISOR) can be downloaded here.

License

This project is licensed under CC-BY-NC-SA-4.0. Redistribution and use should follow this license.

Acknowledgement

Affordance Diffusion leverages many amazing open-sources shared in research community:

Citation

If you use find this work helpful, please consider citing:

 @inproceedings{ye2023affordance,
                title={Affordance Diffusion: Synthesizing Hand-Object Interactions},
                author={Yufei Ye and Xueting Li and Abhinav Gupta
                        and Shalini De Mello and Stan Birchfield and Jiaming Song
                        and Shubham Tulsiani and Sifei Liu},
                year={2023},
                booktitle ={CVPR},
            }

affordance_diffusion's People

Contributors

judyye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

affordance_diffusion's Issues

Inconsistency in the environment

I noticed that in the preprocessing phase,the Pytorch version is 1.9,but in the environment.yaml Pytorch version is 1.1,which version should I choose to run the project?

Errors in data generation

Hi, I've encountered an issue similar to the one previously documented as issue #14.

Specifically, in the Text2ImUNet model, the in_channel parameter is set to 3, whereas in the provided checkpoint, the in_channel appears to be 7. I'm uncertain if my approach to resolving this inconsistency is correct. While my adjustments did address the initial model loading problem, they have unfortunately led to a new issue.

Here is my modification: just change the name of ckpt in load_base()

if args.base_ckpt is None:
        # model.load_state_dict(load_checkpoint('base-inpaint', device))
        model.load_state_dict(load_checkpoint('base', device))

The dimension mismatch error has been resolved, but I've now encountered a different issue within the generate_data:

File "/public/home/v-liuym/projects/affordance_diffusion/preprocess/../glide_text2im/gaussian_diffusion.py", line 413, in p_sample_loop
    for sample in self.p_sample_loop_progressive(
  File "/public/home/v-liuym/projects/affordance_diffusion/preprocess/../glide_text2im/gaussian_diffusion.py", line 465, in p_sample_loop_progressive
    out = self.p_sample(
  File "/public/home/v-liuym/projects/affordance_diffusion/preprocess/../glide_text2im/gaussian_diffusion.py", line 364, in p_sample
    out = self.p_mean_variance(
  File "/public/home/v-liuym/projects/affordance_diffusion/preprocess/../glide_text2im/respace.py", line 116, in p_mean_variance
    return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)
  File "/public/home/v-liuym/projects/affordance_diffusion/preprocess/../glide_text2im/gaussian_diffusion.py", line 258, in p_mean_variance
    model_output = model(x, t, **model_kwargs)
  File "/public/home/v-liuym/projects/affordance_diffusion/preprocess/../glide_text2im/respace.py", line 146, in __call__
    return self.model(x, new_ts, **kwargs)
  File "generate_data.py", line 161, in model_fn
    model_out = model(combined, ts, **kwargs)
  File "/public/home/v-liuym/.conda/envs/afford_diff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'inpaint_image'

I think I've narrowed down the problem to the setup args for the diffusion model, at least that's what it looks like from the definition here But no luck fixing it yet. Would be awesome if you could give me a hand with this! @JudyYe

RuntimeError: Error(s) in loading state_dict for Text2ImUNet: size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([192, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([192, 3, 3, 3]).

Your project is really impressive, but I encountered some issues while trying to reproduce it. I hope to get your help. I encountered the following error during the data generation phase in the inpainting stage.

[Error message]:
Traceback (most recent call last):
File "/affordance_diffusion/preprocess/generate_data.py", line 495, in
batch_main(args)
File "/affordance_diffusion/preprocess/generate_data.py", line 301, in batch_main
glide['base'] = load_base()
File "/affordance_diffusion/preprocess/generate_data.py", line 88, in load_base
model.load_state_dict(load_checkpoint('base-inpaint', device))
File "miniconda3/envs/afford_diff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Text2ImUNet:
size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([192, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([192, 3, 3, 3]).

Process finished with exit code 1

Could you please help me understand the cause of this? It has been bothering me for a while. By the way, I also wanted to ask about the first stage of data generation, the decoding phase. I couldn't find any data in the output path. Is this normal? I hope to receive your response. Thank you!

Ckpt Loading error

Hi all,

Thanks for your work! I am trying to fine-tune the layout model you provide on my data, but I get the following warning while launching the fine-tuning command:

$ python -m models.base -m  --config-name=train \
  expname=reproduce/\${model.module} \
  model=layout 
[...]
[2023-07-31 12:27:02,222][root][WARNING] - Checkpoint misses key splat_to_mask.template
[2023-07-31 12:27:02,222][root][WARNING] - Checkpoint misses key splat_to_mask.ndcTloll
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key proj_in_param_img.weight
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key proj_in_param_img.bias
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_img.0.norm.weight
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_img.0.norm.bias
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_img.0.qkv.weight
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_img.0.qkv.bias
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_img.0.encoder_kv.weight
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_img.0.encoder_kv.bias
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_img.0.proj_out.weight
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_img.0.proj_out.bias
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_txt.0.norm.weight
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_txt.0.norm.bias
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_txt.0.qkv.weight
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_txt.0.qkv.bias
[2023-07-31 12:27:02,223][root][WARNING] - Checkpoint misses key spatial_txt.0.encoder_kv.weight
[2023-07-31 12:27:02,224][root][WARNING] - Checkpoint misses key spatial_txt.0.encoder_kv.bias
[2023-07-31 12:27:02,224][root][WARNING] - Checkpoint misses key spatial_txt.0.proj_out.weight
[2023-07-31 12:27:02,224][root][WARNING] - Checkpoint misses key spatial_txt.0.proj_out.bias
[2023-07-31 12:27:02,224][root][WARNING] - Checkpoint misses key proj_out_param.weight
[2023-07-31 12:27:02,224][root][WARNING] - Checkpoint misses key proj_out_param.bias

Apparently, some of the weights are not loaded correctly. Is this expected?

module importation error

Hi, thanks for sharing this wonderful work!
While following your instructions to execute the inference.py script, I encountered an issue within the three_d_metric method. Specifically, the demo_handmocap function attempts to import a module named jutils. However, this action triggers an error:

ModuleNotFoundError: No module named 'jutils'

I am wondering whether the jutils referenced here corresponds to the same jutils module utilized within the affordance_diffusion project. Could you please confirm if they are identical, or if additional steps are required to resolve this module importation error?

Thanks!

Interpolation error

When I try to run demo:python -m scripts.interpolate dir=docs/demo_inter,it raised a FileNotFoundError,FileNotFoundError: No such file: '/home/chen/Projects/affordance_diffusion/docs/demo_inter/inter/superres/0000_01_s0_00_s0.png'

ImportError: cannot import name 'Ego_Centric_HOI_Detector' from 'handmocap.hand_bbox_detector'

Thank you for showing such a fantastic work! I am interested in the hand contact evaluation shown in inference, However, when running the code, an error :cannot import name 'Ego_Centric_HOI_Detector' from 'handmocap.hand_bbox_detector' appears

Is it possible to give some hint on how to fix this problem or some details about how to evaluate the contact recall? Thank you so much for your time!

rm error

In the preprocessing stage,I met an error:
rm: cannot remove 'data/hoi4d/HOI4D_release/ZY20210800004/H4/C14/N21/S174/s02/T1/align_frames/': No such file or directory
rm -r data/hoi4d/HOI4D_release/ZY20210800004/H4/C14/N21/S174/s02/T1/align_frames/

rm: cannot remove 'data/hoi4d/HOI4D_release/ZY20210800004/H4/C14/N21/S174/s02/T1/align_frames/*': No such file or directory
my dataset folder structure is as follows:preprocess/data/hoi4d/HOI4D_annotations,and my --data_dir is data/hoi4d/
Is align_frames folder missed?or something wrong with my setting,do you met this error before?

manopth module

ModuleNotFoundError: No module named 'manopth',where can I download manopth module?,is it OK to just download manopth project from github,and put it below jutils?or the document missed?

Is config.yaml missed?

After python inference.py data.data_dir='docs/demo/*.*g' test_num=3,FileNotFoundError: [Errno 2] No such file or directory: '/data/PycharmProjects/affordance_diffusion/output/release/layout/config.yaml',do you know how to solve this problem?and the same error occured when python -m scripts.interpolate dir=docs/demo_inter,looking forward to your reply,thanks.

error in preprocessing

When python generate_data.py --data_dir data/ --save_dir output/ --inpaint(I put HOI4D_release and HOI4D_annotatinos dataset under data folder),an error:No such file or directory:data/HOI4D_release/ZY20210800001/.../align_frames/xxx.png,do you how to rectify this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.