Giter Site home page Giter Site logo

aiyb1314 / genpercept Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aim-uofa/genpercept

0.0 0.0 0.0 27.55 MB

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Home Page: https://huggingface.co/spaces/guangkaixu/GenPercept

License: Creative Commons Zero v1.0 Universal

Shell 0.46% Python 99.54%

genpercept's Introduction

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Guangkai Xu,   Yongtao Ge,   Mingyu Liu,   Chengxiang Fan,   Kangyang Xie,   Zhiyue Zhao,   Hao Chen,   Chunhua Shen,  

Zhejiang University

🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️

image

📢 News

  • 2024.3.10: Release arXiv v1 paper.
  • 2024.3.15: Release arXiv v2 paper, with supplementary material.
  • 2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the GitHub repo.
  • 2024.4.7: Add HuggingFace App demo.

🖥️ Dependencies

conda create -n genpercept python=3.10
conda activate genpercept
pip install -r requirements.txt
pip install -e .

🚀 Inference

Download the pre-trained depth model depth_v1.zip from BaiduNetDisk (Extract code: z938) or Rec Cloud Disk. Put the package under ./weights/ and unzip it, the checkpoint will be stored under ./weights/depth_v1/.

Then, place images in the ./input/ dictionary, and run the following script. The output depth will be saved in ./output/.

sh scripts/inference_depth.sh

For surface normal estimation, run the following script:

bash scripts/inference_normal.sh

Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)

📖 Recommanded Works

  • Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. arXiv, GitHub.
  • GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arXiv, GitHub.
  • FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. arXiv, GitHub.

🏅 Results in Paper

Depth and Surface Normal

image

Dichotomous Image Segmentation

image

Image Matting

image

Human Pose Estimation

image

🎫 License

For non-commercial use, this code is released under the LICENSE. For commercial use, please contact Chunhua Shen.

🎓 Citation

@article{xu2024diffusion,
  title={Diffusion Models Trained with Large Data Are Transferable Visual Models},
  author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua},
  journal={arXiv preprint arXiv:2403.06090},
  year={2024}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.