Giter Site home page Giter Site logo

aim-uofa / genpercept Goto Github PK

View Code? Open in Web Editor NEW
110.0 5.0 4.0 29.82 MB

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Home Page: https://huggingface.co/spaces/guangkaixu/GenPercept

License: BSD 2-Clause "Simplified" License

Shell 0.82% Python 99.18%
depth-estimation dichotomous-image-segmentation human-pose-estimation image-matting monocular-depth-estimation surface-normals semantic-segmentation

genpercept's Introduction

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Guangkai Xu,   Yongtao Ge,   Mingyu Liu,   Chengxiang Fan,   Kangyang Xie,   Zhiyue Zhao,   Hao Chen,   Chunhua Shen,  

Zhejiang University

🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️

image

📢 News

  • 2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation.
  • 2024.4.7: Add HuggingFace App demo.
  • 2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the GitHub repo.
  • 2024.3.15: Release arXiv v2 paper, with supplementary material.
  • 2024.3.10: Release arXiv v1 paper.

🖥️ Dependencies

conda create -n genpercept python=3.10
conda activate genpercept
pip install -r requirements.txt
pip install -e .

🚀 Inference

Download the pre-trained models genpercept_ckpt_v1.zip from BaiduNetDisk (Extract code: g2cm), HuggingFace, or Rec Cloud Disk (To be uploaded). Please unzip the package and put the checkpoints under ./weights/v1/.

Then, place images in the ./input/$TASK_TYPE dictionary, and run the following script. The output depth will be saved in ./output/$TASK_TYPE. The $TASK_TYPE can be chosen from depth, normal, and dis.

sh scripts/inference_depth.sh

For surface normal estimation and dichotomous image segmentation , run the following script:

bash scripts/inference_normal.sh
bash scripts/inference_dis.sh

Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)

📖 Recommanded Works

  • Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. arXiv, GitHub.
  • GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arXiv, GitHub.
  • FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. arXiv, GitHub.

🏅 Results in Paper

Depth and Surface Normal

image

Dichotomous Image Segmentation

image

Image Matting

image

Human Pose Estimation

image

🎫 License

For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

🎓 Citation

@article{xu2024diffusion,
  title={Diffusion Models Trained with Large Data Are Transferable Visual Models},
  author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua},
  journal={arXiv preprint arXiv:2403.06090},
  year={2024}
}

genpercept's People

Contributors

cshen avatar guangkaixu avatar yongtaoge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

genpercept's Issues

Installation failed

I am not able to download: genpercept_ckpt_v1.zip

Baidu is in Chinese language and I don't want to install their software for safety reasons and from Hugginface a download is also not possible.
A folder "weights/v1" does not appear after installing the repro. But you say: "put checkpoints under ./weights/v1/."

Quantitative result mismatch

Hello, and thank you for your excellent work.
I noticed that the quantitative results from tables 1, 9, 10, and 12 differ.
For example, the KITTI-AbsRel scores for GenPercept (or Baseline) are 0.099, 0.140, 0.145, and 0.140, respectively.
Could you help me understand what I might be missing?

Normal checkpoint

Hi, thanks for your excellent work! I was wondering about when you can release the normal checkpoint?

Local Installation Explanation

Some additional local installation tips would be appreciated. I can't get your project running.

It seems this checkpoint is needed:
https://pan.baidu.com/s/1n6FlqrOTZqHX-F6OhcvNyA?pwd=g2cm

But the website is in Chinese and it requires a separate application download? And you have to make an account within this app?

You have huggingface and another option to download the model, but I don't see the 9.14GB genpercept_ckpt_v1 file on huggingface.

The third option "or Rec Cloud Disk (To be uploaded)" just links to the github project page, not to a checkpoint ckpt file.

Specific data amount on hypersim & v-kitti for depth prediction

Dear authors, excellent work, it would be a milestone in CV community!

I got an simple question: what is the specific data amount of hypersim & v-kitti used for training depth estimator.

For instance, in hypersim, did you used only the training split (54K), or the entire dataset (74K)?

Thanks so much!

How to encode pose?

Thank you for the code. How do you encode the 17-channel heatmaps to 3 channels to fit the VAE?

Question about GenPercept performance

Hi there,

Thank you for your great work on GenPercept! The idea of one-step estimation is very promising.

I noticed that the performance of GenPercept seems to be lower than Marigold and GeoWizard. I'm wondering if this difference is due to:

  • The number of estimation steps (multi-step vs. one-step)?
  • The amount of training data used?
  • A combination of both factors?

Any insights you could provide would be greatly appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.