Giter Site home page Giter Site logo

xheon / panoptic-reconstruction Goto Github PK

View Code? Open in Web Editor NEW
186.0 11.0 27.0 14.36 MB

Official implementation of the NeurIPS 2021 paper "Panoptic 3D Scene Reconstruction from a Single RGB Image"

Home Page: https://manuel-dahnert.com/research/panoptic-reconstruction

License: Other

Python 75.18% C 1.68% C++ 3.14% Cuda 20.00%
neurips-2021 computer-vision deep-learning python pytorch 3d-reconstruction reconstruction 3d-scene-reconstruction

panoptic-reconstruction's Introduction

Panoptic 3D Scene Reconstruction from a Single RGB Image

Panoptic 3D Scene Reconstruction from a Single RGB Image
Manuel Dahnert, Ji Hou, Matthias Nießner, Angela Dai
Neural Information Processing Systems (NeurIPS) - 2021

If you find this work useful for your research, please consider citing

@inproceedings{dahnert2021panoptic,
  title={Panoptic 3D Scene Reconstruction From a Single RGB Image},
  author={Dahnert, Manuel and Hou, Ji and Nie{\ss}ner, Matthias and Dai, Angela},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

Abstract

Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for robotics, motion planning, or augmented reality. Existing works in 3D perception from a single RGB image tend to focus on geometric reconstruction only, or geometric reconstruction with semantic segmentation or instance segmentation. Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction - from a single RGB image, predicting the complete geometric reconstruction of the scene in the camera frustum of the image, along with semantic and instance segmentations. We thus propose a new approach for holistic 3D scene understanding from a single RGB image which learns to lift and propagate 2D features from an input image to a 3D volumetric scene representation. We demonstrate that this holistic view of joint scene reconstruction, semantic, and instance segmentation is beneficial over treating the tasks independently, thus outperforming alternative approaches.

Environment

The code was tested with the following configuration:

  • Ubuntu 20.04
  • Python 3.8
  • Pytorch 1.7.1
  • CUDA 10.2
  • Minkowski Engine 0.5.1, fork
  • Mask RCNN Benchmark
  • Nvidia 2080 Ti, 11GB

Installation

# Basic conda enviromnent: Creates new conda environment `panoptic`
conda env create --file environment.yaml
conda activate panoptic

MaskRCNN Benchmark

Follow the official instructions to install the maskrcnn-benchmark repo.

Minkowski Engine (fork, custom)

Follow the instructions to compile our forked Minkowski Engine version from source.

Compute library

Finally, compile this library.

# Install library
cd lib/csrc/
python setup.py install

Inference

To run the method on a 3D-Front sample run python tools/test_net_single_image.py with the pre-trained checkpoint (see table below).

python tools/test_nest_single_image.py -i <path_to_input_image> -o <output_path>

Datasets

3D-FRONT [1]

The 3D-FRONT indoor datasets consists of 6,813 furnished apartments.
We use Blender-Proc [2] to render photo-realistic images from individual rooms. We use version from 2020-06-14 of the data.

Download:

We provide the preprocessed 3D-Front data, please see the following table for links to the main zips.
Extract the downloaded data into data/front3d/ or adjust the root data path lib/config/paths_catalog.py.
By downloading our derived work from the original 3D-Front you accept their original Terms of Use.

File Description Num. Samples Size Version Link
front3d.zip Containing all files for all 2D-3D pairs, which were used for this project: color, depth, 2D & 3D segmentation, 3D geometry & weighting masks. 134,389 144G 2022-04-28 link
panoptic-front3d.pth Pre-trained weights for 3D-Front data. 1 106M 2022-04-28 link
panoptic-front3d-mask_depth_r18.pth Pre-trained weights Depth + Mask for 3D-Front 2D data. 1 69M 2022-08-23 link
front3d-2d.zip Containing only RGB, depth, 2D segmentation (semantic & instances) with 11-class set. 134,389 39G 2022-04-28 link
front3d-3d_geometry.zip Containing only 3D geometry as truncated (unsigned) distance field at 3cm voxel resolution. 134,389 100G 2022-04-28 link
front3d-3d_segmentation.zip Containing only 3D segmentations (semantic & instance) with 11-class set. 134,389 2G 2022-04-28 link
front3d-3d_weighting.zip Containing only precomputed 3D weighting masks. 134,389 4G 2022-04-28 link
front3d-2d_normals.zip Additional: Containing the normal maps for each each sample. 134,389 2022-04-28
front3d-camposes.zip Additional: Containing the camera information for each sample (camera pose, intrinsic, assigned room id) and room mapping. 134,389 100M 2022-04-28 link
front3d-additional_samples.zip Additional: Containing additional samples, which were excluded, e.g. due to inconsistent number of instances between 2D image and 3D frustum. (May not include all files per sample) 62,963 86G 2022-04-28 link
front3d-room_meshes.zip Additional: Preprocessed room meshes, which was used as replacement for the original room geometry (walls, floor, ceiling) to have closed rooms. 6723 scenes, 49142 rooms in total 406M 2022-04-28 link
front3d-tos.pdf The official 3D-Front Terms of Use. 1 60KB 2020-06-18 link

Modifications:

  • We replace all walls and ceilings and "re-draw" them in order to close holes in the walls, e.g. empty door frames or windows.
    For the ceiling we use the same geometry as the floor plane to have a closed room envelope.
  • We remove following mesh categories: "WallOuter", "WallBottom", "WallTop", "Pocket", "SlabSide", "SlabBottom", "SlabTop", "Front", "Back", "Baseboard", "Door", "Window", "BayWindow", "Hole", "WallInner", "Beam"
  • During rendering, we only render geometry which is assigned to the current room
  • We sample each individual (non-empty) room
    • num max tries: 50,000
    • num max samples per room: 50
  • Camera:
    • we fix the camera height at 0.75m and choose a forward-looking camera angle (similar to the original frames in 3D-Front)

Structure

<scene_id>/            
    ├── rgb_<frame_id>.png                  # Color image: 320x240x3
    ├── depth_<frame_id>.exr                # Depth image: 320x240x1
    ├── segmap_<frame_id>.mapped.npz        # 2D Segmentation: 320x240x2, with 0: pre-mapped semantics, 1: instances
    ├── geometry_<frame_id>.npz             # 3D Geometry: 256x256x256x1, truncated, (unsigned) distance field at 3cm voxel resolution and 12 voxel truncation.
    ├── segmentation_<frame_id>.mapped.npz  # 3D Segmentation: 256x256x256x2, with 0: pre-mapped semantics & instances
    ├── weighting_<frame_id>.mapped.npz     # 3D Weighting mask: 256x256x256x1
    ...

In total, we generate 197,352 frames of which 134,389 were used for this project. We filter out frames, which have an inconsistent number of 2D and 3D instances.

For the 3D generation we use a custom C++ pipeline which loads the sampled camera poses, room layout mesh, and the scene objects. The geometry is cropped to the camera frustum, such that only geometry within the frustum contributes to the DF calculation. It generates 3D unsigned distance fields at 3cm resolution together with 3D semantic and instance segmentation.

Change Log

  • 2022-04-28:
    • Move data to new location with better connectivity. Please use the links in the table above to download the data.
    • Provide additional 3D-Front data samples, which were generated but not used for this project.
    • Rename 3D-Front checkpoint from panoptic_front3d_v2.pth to panoptic_front3d.pth. Note: This checkpoint evaluates to ~43% PRQ, compared to the stated 46.77% PRQ reported in the paper - we are currently investigating this gap.
    • Some 3D segmentation samples did break during conversion to the provided npz format due to an integer overflow of the labels. - Sorry for any inconveniences caused.
    • Add evaluation code.
    • Fix a voxel shift in the backprojection layer.
  • 2022-04-05: Add initial Matterport release (dataloader, file lists, example sample).
  • 2022-03-03: Add script to evaluate a single image.
  • 2022-02-12: Bug fixes
  • 2021-12-22: Initial commit of cleaned up code.

References

  1. Fu et al. - 3d-Front: 3d Furnished Rooms with Layouts and Semantics
  2. Denninger et al. - BlenderProc

panoptic-reconstruction's People

Contributors

watarungurunnn avatar xheon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

panoptic-reconstruction's Issues

About matterport dataset

Thanks for such an amazing work!!! I am now working on semantic scene completion, and wonder whether you can share your processed matterport3d data?

Data Generation

Hello,

I am really impressed with your paper and thank you for making it available on github.

I am having trouble with generating training data for new datasets, could you please share the data generation code?
I saw isssue #7, I guess I need to send an email?

Thank you.

Visualizing output generated by the network

Thank you for sharing your awesome code. I loved the images generated by you in your paper. Do you happen to have any code that I can use to create similar pictures using the .ply files generated by the network?

Front3D dataset required

Hi Manuel, I'm interested in your recent work "Panoptic 3D Scene Reconstruction from a Single RGB Image", and want to try your released code. However, I cannot find the exact file in the code, such as "resources/front3d/train_list_2d.txt". I know maybe the dataset is big, could you provide me with the processed dataset link in dropbox or google drive or the preprocessed code as your paper said? I'm looking forward to your reply. Thanks a lot.
@xheon

About 2D PRETRAIN

Excuse me, where can I find the 2D pre training model?

or, how can I train my 2D model myself?

In front3d_train_3d.yaml,

MODEL:
    FIX2D: True
    PRETRAIN: "/cluster_HDD/gondor/mdahnert/panoptic-release/front3d-mask_depth_r18.pth"

I'd appreciate it if you could provide some idea on this problem. Thank you!

Installation Help

Hey guys,

I just installed this library, but it wasn't as simple as I would have hoped. Here is my process it might help someone else in the future:

At first you download the github repo:

git clone https://github.com/xheon/panoptic-reconstruction.git
cd panoptic-reconstruction

Depending now on the graphics card you use you have to adapt a few things, I used a 3070 for testing this code, so I needed to use at least CUDA 11. That means I changed the following lines in the environment.yaml. I also used a more recent python version:

-    - python>=3.8
+    - python>=3.9
-    - pytorch>=1.7.1
+    - pytorch=1.10.0
-    - cudatoolkit=10.2
+    - cudatoolkit=11.3

Before now installing the environment you might need to make sure that you have support for OpenEXR, on Ubuntu 20.4 this can be installed via:

sudo apt-get install libopenexr-dev
sudo apt-get install openexr

After that we can just go forward and create the environment:

conda env create --file environment.yaml
conda activate panoptic

Now, you need to install some 3rdparty libs, we start with cocodataset API:

cd .. && mkdir 3rdparty && cd 3rdparty
conda install ipython pip
pip install ninja yacs cython matplotlib tqdm opencv-python
export INSTALL_DIR=$PWD
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install

Afterwards, you need to install cityscapescripts:

cd $INSTALL_DIR
git clone https://github.com/mcordts/cityscapesScripts.git
cd cityscapesScripts/
python setup.py build_ext install

Now, it is time to install apex, which is the most difficult to install from this group, if it fails there is a suggested fix for Ubuntu 20.4 below:

cd $INSTALL_DIR
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

This might fail, because your conda environment does not provide nvidias compiler nvcc, on ubuntu 20.4 you can change this via:

sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt update
sudo apt install cuda-11-3 cuda-runtime-11-3 cuda-demo-suite-11-3 cuda-drivers cuda-drivers-510 nvidia-dkms-510 nvidia-driver-510 nvidia-kernel-common-510 libnvidia-extra-510

After this cuda install, you need to reboot and afterwards set the CUDA_HOME and add the nvcc to your path:

export CUDA_HOME="/usr/local/cuda-11.3"
export PATH=$PATH:/usr/local/cuda-11.3/bin

Now you can run the command again, make sure that you first go to your 3rd_party folder:

cd apex
python setup.py install --cuda_ext --cpp_ext

After installing all these packages, you can now install the maskrcnn-benchmark:

cd ..
git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
cd maskrcnn-benchmark
python setup.py build develop

Finally, you need to add the MinkowskiEngine:

cd ..
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
python setup.py install --blas=openblas --force_cuda

After you installed that package you only need to install the panoptic packge itself:

# Install library
cd ../panoptic-reconstruction/
cd lib/csrc/
python setup.py install

I hope this helps someone.

PS: I am not the maintainer of this package and I can not help you if this doesn't work for you.

Best,
Max

About Training Parallelism

Hi. I wanted to train the model on my own dataset, but I found my cuda memory runs out when processing occupancy_256 prediction. I tried nn.DataParallel to try to run the model on multiple GPUs, but it raises such an error:

AttributeError: 'MinkowskiConvolution' object has no attribute 'dimension'

I searched for this error and found out it was an unresolved issue of MinkowskiEngine (link here). I wonder how you trained the model on your computer, and could you please be kind to inform other possible solutions to make it work? Thank you!

Visual Results in Papers

Excuse me, could you please provide me with the sample IDs corresponding to the images in your paper? I am a Master‘s student interested in your work, but while reading the paper, I could not find any information regarding the specific sample IDs associated with each image.

2023-03-29_00-34

cannot use the code

my conda environment is: cuda11.1+pytorch1.9.0,
when running this command: python tesr_net_single_image_.py
it occurs the error: RuntimeError: expected all tensors to be on the same device, but found at least two device, cuda:0 and cpu!
but i took a look at the test_net_single_image.py, there may be nothing wrong with the code, i don't know how to solve it, i will be very appreciated if any one could help me.

Configs Provided are for training 3D network

Dear Author,

The config file you provided 'front3d_train_3d.yaml' is for training the 3D network only.
Can you please share full pipeline training configs? Both 2D networks and 3D networks?

AttributeError: 'Tensor' object has no attribute 'depth_map'

Hi. I wanted to train the depth2d part of the model, but it raises the following error:

Traceback (most recent call last):
  File "/home/student/.conda/envs/panoptic_cp/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/student/.conda/envs/panoptic_cp/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/student/data/student/zhouyingquan/panoptic-reconstruction/tools/train_net.py", line 56, in <module>
    main()
  File "/data/student/data/student/zhouyingquan/panoptic-reconstruction/tools/train_net.py", line 51, in main
    trainer.do_train()
  File "/data/student/data/student/zhouyingquan/panoptic-reconstruction/lib/engine/trainer.py", line 91, in do_train
    losses, results = self.model(images, targets)
  File "/home/student/.conda/envs/panoptic_cp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/student/data/student/zhouyingquan/panoptic-reconstruction/lib/modeling/panoptic_reconstruction.py", line 49, in forward
    depth_losses, depth_results = self.depth2d(image_features["blocks"], depth_targets)
  File "/home/student/.conda/envs/panoptic_cp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/student/data/student/zhouyingquan/panoptic-reconstruction/lib/modeling/depth/depth_prediction.py", line 57, in forward
    valid_masks = torch.stack([(depth.depth_map != 0.0).bool() for depth in depth_target], dim=0)
  File "/data/student/data/student/zhouyingquan/panoptic-reconstruction/lib/modeling/depth/depth_prediction.py", line 57, in <listcomp>
    valid_masks = torch.stack([(depth.depth_map != 0.0).bool() for depth in depth_target], dim=0)
AttributeError: 'Tensor' object has no attribute 'depth_map'

I inspected the code and found the following code in lib/modeling/depth/depth_prediction.py:

    def forward(self, features, depth_target) -> ModuleResult:
        depth_pred, depth_feature = self.model(features)
        depth_return = [DepthMap(p_[0].cpu(), t_.get_intrinsic()) for p_, t_ in zip(depth_pred, depth_target)]
        depth_target = torch.stack([target.get_tensor() for target in depth_target]).float().to(config.MODEL.DEVICE).unsqueeze(1)
        ...
        if self.training:
            valid_masks = torch.stack([(depth.depth_map != 0.0).bool() for depth in depth_target], dim=0)

If my observations are correct, in the third line of the function, depth_targets has become a list of Tensors, so the attribute depth_map is lost. Is this a bug? How can I resolve this?

Depth pre-processing

Hi! Thanks so much for sharing the code of this amazing work!

I am trying to train some other models on your dataset. I notice that in the data pre-processing step, you flip the depth map here (the operation of [::-1, ::-1]). However, when I visualize the color and depth in the input, I find that they are not aligned after the depth is flipped. Is there any particular reason that you apply this flipping operation?

Besides, could you help provide the name of each semantic label in your dataset? Thank you!

Missing function in sparse projection module

In lib/modeling/projection/sparse_projection.py line 38, there is a function self.compute_camera2frustum_transform that is called, but there is no corresponding function within the class definition. Is there somewhere else I can find this code in the repo?

edit - is this function intended to compute the extrinsic matrix?

panoptic_front3d.pth

Hi Manuel,I'm a freshman who just started running code recently and interested in your recent work "Panoptic 3D Scene Reconstruction from a Single RGB Image", and want to try your released code. However, I cannot find the exact pretrain model , such as "panoptic_front3d.pth". Could you provide me with the pretrain model link in google drive or baidu cloud? I'm looking forward to your reply. Thanks a lot. @xheon

Link Download issue

Hi. I wanted to download the model from link in Table, but I can not open the .pth file's link in Table.
Could you please be kind to update the link?

generated traning data for other method

the segmentation data(e.g. segmentation_0007 mapped.npz) is all the zero matries, cannot use it to train other model such as Total3DUnderstanding, i will be very appreciated of anyone could solve this problem~👀🙏🙏

Regarding the 3D weighting mask

I have downloaded the 3d weighting mask. In each of the 6081 existing scenes there are some .npz files. I cannot understand each of represent which items in that scene? Are they related to the existing rooms to each or not. Because the number of those .npz files are different from existing items in each scene.

Thank you so much, in advance.

Some questions about using the code

I am a fresh man about cv and I found your project very interesting. So I follow the steps to install the enviroments, I used pycharm to open the code, but I have a very basic question that how can I run this project, I run the train_net.py directly but there are some errors.
I know this are very basicly, but I would be very grateful if you could help me.

Camera viepoint

Excuse me, could you please opensource the code which can get camera parameter to capture this image. (same viewpoint with input image)
image

Tool for data generation

Hi

I was wondering if you could release the code for the generation of the unsigned distance field to try your repository on new datasets?

Thank you

Request for evaluation code

Hi, Thanks for your amazing work! I am wondering where can I find your evaluation code for your new metric?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.