Giter Site home page Giter Site logo

ranqing / dsine Goto Github PK

View Code? Open in Web Editor NEW

This project forked from baegwangbin/dsine

0.0 0.0 0.0 168.32 MB

[CVPR 2024 - Oral] Rethinking Inductive Biases for Surface Normal Estimation

Home Page: https://baegwangbin.github.io/DSINE/

License: Other

Python 4.15% Jupyter Notebook 95.85%

dsine's Introduction

Rethinking Inductive Biases for Surface Normal Estimation

Official implementation of the paper

Rethinking Inductive Biases for Surface Normal Estimation

CVPR 2024 [oral]

Gwangbin Bae and Andrew J. Davison

[paper.pdf] [arXiv] [youtube] [project page]

Abstract

Despite the growing demand for accurate surface normal estimation models, existing methods use general-purpose dense prediction models, adopting the same inductive biases as other tasks. In this paper, we discuss the inductive biases needed for surface normal estimation and propose to (1) utilize the per-pixel ray direction and (2) encode the relationship between neighboring surface normals by learning their relative rotation. The proposed method can generate crisp โ€” yet, piecewise smooth โ€” predictions for challenging in-the-wild images of arbitrary resolution and aspect ratio. Compared to a recent ViT-based state-of-the-art model, our method shows a stronger generalization ability, despite being trained on an orders of magnitude smaller dataset.

Getting Started

Start by installing the dependencies.

conda create --name DSINE python=3.10
conda activate DSINE

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
python -m pip install geffnet
python -m pip install glob2

Then, download the model weights from this link and save it under ./checkpoints/.

Test on images

  • Run python test.py to generate predictions for the images under ./samples/img/. The result will be saved under ./samples/output/.
  • Our model assumes known camera intrinsics, but providing approximate intrinsics still gives good results. For some images in ./samples/img/, the corresponding camera intrinsics (fx, fy, cx, cy - assuming perspective camera with no distortion) is provided as a .txt file. If such a file does not exist, the intrinsics will be approximated, by assuming $60^\circ$ field-of-view.

Additional instructions

If you want to make contributions to this repo, please make a pull request and add instructions in the following format.

Using torch hub to predict normal (contribution by hugoycj)
import torch
import cv2
import numpy as np

# Load the normal predictor model from torch hub
normal_predictor = torch.hub.load("hugoycj/DSINE-hub", "DSINE", trust_repo=True)

# Load the input image using OpenCV
image = cv2.imread(args.input, cv2.IMREAD_COLOR)
h, w = image.shape[:2]

# Use the model to infer the normal map from the input image
with torch.inference_mode():
    normal = normal_predictor.infer_cv2(image)[0]  # Output shape: (H, W, 3)
    normal = (normal + 1) / 2  # Convert values to the range [0, 1]

# Convert the normal map to a displayable format
normal = (normal * 255).cpu().numpy().astype(np.uint8).transpose(1, 2, 0)
normal = cv2.cvtColor(normal, cv2.COLOR_RGB2BGR)

# Save the output normal map to a file
cv2.imwrite(args.output, normal)

If the network is unavailable to retrieve weights, you can use local weights for torch hub as shown below:

normal_predictor = torch.hub.load("hugoycj/DSINE-hub", "DSINE", local_file_path='./checkpoints/dsine.pt', trust_repo=True)
Generating ground truth surface normals We provide the code used to generate the ground truth surface normals from ground truth depth maps. See data/d2n/README.md for more detail.
About the coordinate system We use the right-handed coordinate system with (X, Y, Z) = (right, down, front). An important thing to note is that both the ground truth normals and our prediction are the outward normals. For example, in the case of a fronto-parallel wall facing the camera, the normals would be (0, 0, 1), not (0, 0, -1). If you instead need to use the inward normals, please do normals = -normals.

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{bae2024dsine,
    title={Rethinking Inductive Biases for Surface Normal Estimation},
    author={Gwangbin Bae and Andrew J. Davison},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2024}
}

dsine's People

Contributors

baegwangbin avatar hugoycj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.