aim-uofa / genpercept Goto Github PK

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Home Page: https://huggingface.co/spaces/guangkaixu/GenPercept

License: BSD 2-Clause "Simplified" License

Shell 0.82% Python 99.18%

depth-estimation dichotomous-image-segmentation human-pose-estimation image-matting monocular-depth-estimation surface-normals semantic-segmentation

genpercept's Introduction

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Guangkai Xu, Yongtao Ge, Mingyu Liu, Chengxiang Fan, Kangyang Xie, Zhiyue Zhao, Hao Chen, Chunhua Shen,

Zhejiang University

HuggingFace (Space) | HuggingFace (Model) | arXiv

🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️

📢 News

2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation.
2024.4.7: Add HuggingFace App demo.
2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the GitHub repo.
2024.3.15: Release arXiv v2 paper, with supplementary material.
2024.3.10: Release arXiv v1 paper.

🖥️ Dependencies

conda create -n genpercept python=3.10
conda activate genpercept
pip install -r requirements.txt
pip install -e .

🚀 Inference

Download the pre-trained models genpercept_ckpt_v1.zip from BaiduNetDisk (Extract code: g2cm), HuggingFace, or Rec Cloud Disk (To be uploaded). Please unzip the package and put the checkpoints under ./weights/v1/.

Then, place images in the ./input/$TASK_TYPE dictionary, and run the following script. The output depth will be saved in ./output/$TASK_TYPE. The $TASK_TYPE can be chosen from depth, normal, and dis.

sh scripts/inference_depth.sh

For surface normal estimation and dichotomous image segmentation , run the following script:

bash scripts/inference_normal.sh
bash scripts/inference_dis.sh

Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)

📖 Recommanded Works

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. arXiv, GitHub.
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arXiv, GitHub.
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. arXiv, GitHub.

🏅 Results in Paper

Depth and Surface Normal

Dichotomous Image Segmentation

Image Matting

Human Pose Estimation

🎫 License

For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

🎓 Citation

@article{xu2024diffusion,
  title={Diffusion Models Trained with Large Data Are Transferable Visual Models},
  author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua},
  journal={arXiv preprint arXiv:2403.06090},
  year={2024}
}

genpercept's People

Contributors

Stargazers

Watchers

Forkers

aiyb1314 zhizhangxian bruinxiong hugoycj

genpercept's Issues

Installation failed

I am not able to download: genpercept_ckpt_v1.zip

Baidu is in Chinese language and I don't want to install their software for safety reasons and from Hugginface a download is also not possible.
A folder "weights/v1" does not appear after installing the repro. But you say: "put checkpoints under ./weights/v1/."

Quantitative result mismatch

Hello, and thank you for your excellent work.
I noticed that the quantitative results from tables 1, 9, 10, and 12 differ.
For example, the KITTI-AbsRel scores for GenPercept (or Baseline) are 0.099, 0.140, 0.145, and 0.140, respectively.
Could you help me understand what I might be missing?

Normal checkpoint

Hi, thanks for your excellent work! I was wondering about when you can release the normal checkpoint?

Huggingface Depth demo gives Error as result

The huggingface depth demo appears to not be working. It keeps resulting in "Error", which the other two demo tabs do work.

Local Installation Explanation

Some additional local installation tips would be appreciated. I can't get your project running.

It seems this checkpoint is needed:
https://pan.baidu.com/s/1n6FlqrOTZqHX-F6OhcvNyA?pwd=g2cm

But the website is in Chinese and it requires a separate application download? And you have to make an account within this app?

You have huggingface and another option to download the model, but I don't see the 9.14GB genpercept_ckpt_v1 file on huggingface.

The third option "or Rec Cloud Disk (To be uploaded)" just links to the github project page, not to a checkpoint ckpt file.

Any plan to release training code?

Hello, thank you for doing wonderful work!
I wondering whether you are going to share the training code.

Specific data amount on hypersim & v-kitti for depth prediction

Dear authors, excellent work, it would be a milestone in CV community!

I got an simple question: what is the specific data amount of hypersim & v-kitti used for training depth estimator.

For instance, in hypersim, did you used only the training split (54K), or the entire dataset (74K)?

Thanks so much!

How to encode pose?

Thank you for the code. How do you encode the 17-channel heatmaps to 3 channels to fit the VAE?

Question about GenPercept performance

Hi there,

Thank you for your great work on GenPercept! The idea of one-step estimation is very promising.

I noticed that the performance of GenPercept seems to be lower than Marigold and GeoWizard. I'm wondering if this difference is due to:

The number of estimation steps (multi-step vs. one-step)?
The amount of training data used?
A combination of both factors?

Any insights you could provide would be greatly appreciated!

aim-uofa / genpercept Goto Github PK

genpercept's Introduction

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

HuggingFace (Space) | HuggingFace (Model) | arXiv

🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️

📢 News

🖥️ Dependencies

🚀 Inference

📖 Recommanded Works

🏅 Results in Paper

Depth and Surface Normal

Dichotomous Image Segmentation

Image Matting

Human Pose Estimation

🎫 License

🎓 Citation

genpercept's People

Contributors

Stargazers

Watchers

Forkers

genpercept's Issues

Recommend Projects

Recommend Topics

Recommend Org