Giter Site home page Giter Site logo

lijx10 / deepi2p Goto Github PK

View Code? Open in Web Editor NEW
203.0 8.0 36.0 13.01 MB

DeepI2P: Image-to-Point Cloud Registration via Deep Classification. CVPR 2021

License: MIT License

Python 34.15% Jupyter Notebook 17.38% CMake 2.59% Makefile 0.38% C++ 45.08% C 0.01% Shell 0.13% Cuda 0.28%
point-cloud deep-learning computer-vision registration image-processing 3d-vision neural-network cnn localization

deepi2p's Introduction

#DeepI2P: Image-to-Point Cloud Registration via Deep Classification

Summary

PyTorch implementation for our CVPR 2021 paper DeepI2P. DeepI2P solves the problem of cross modality registration, i.e, solve the relative rotation R and translation t between the camera and the lidar.

DeepI2P: Image-to-Point Cloud Registration via Deep Classification
Jiaxin Li 1, Gim Hee Lee 2
1ByteDance, 2National University of Singapore

Method

The intuition is to perform the Inverse Camera Projection, as shown in the images below. overview_1 overview_2

Repo Structure

  • data: Generate and process datasets
  • evaluation: Registration codes, include Inverse Camera Projection, ICP, PnP
    • frustum_reg: C++ codes of the Inverse Camera Projection, using Gauss-Newton Optimization. Installation method is shown below. It requires the Ceres Solver.
    python evaluation/frustum_reg/setup.py install
    
    • icp: codes for ICP (Iterative Closest Point)
    • registration_lsq.py: Python code for Inverse Camera Projection, which utilizes the per-point coarse classification prediction, and the frustum_reg solver.
    • registration_pnp.py: Python code for PnP solver utilizing the per-point fine classification prediction.
  • kitti: Training codes for KITTI
  • nuscenes: Training codes for nuscenes
  • oxford: Training codes for Oxford Robotcar dataset
  • models: Networks and layers
    • 'index_max_ext': This is a custom operation from SO-Net, which is the backbone of our network. Installation:
    python models/index_max_ext/setup.py install
    
    • networks_img.py: Network to process images. It is a resnet-like structure.
    • networks_pc.py: Network to process point clouds, it is from SO-Net
    • network_united.py: Network to fuse information between point clouds and images.

Dataset and Models

deepi2p's People

Contributors

lijx10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepi2p's Issues

training time

could you tell me your training time and GPU configuration,besides Please also provide the Ceres version

Different versions of open3d used in data preprocess of KITTI

In file kitti_pc_bin_to_npz_in_img_frame.py and kitti_pc_bin_to_npy_with_downsample_sn.py, open3d.geometry.voxel_down_sample(), open3d.geometry.estimate_normals() and open3d.geometry.orient_normals_to_align_with_direction() is used but these APIs are now replaced by 'pcd.voxel_down_sample()', 'pcd.estimated_normals()' and pcd.orient_normals_to_align_with_direction(), which may cause errors if you use the pip install open3d to install open3d. However in file frame_accumulation.py, the newest API is used.

Results for KITTI

@lijx10 Hello! Thanks for the excellent work.
Could you provide inference results for KITTI data, e.g., P_gt_all_np, P_pred_all, and the RTE/RRE list used to make histogram in the paper?

I read the past issue #2 and have understand that the pretrained model cannot be provided. So I tried to reproduce the result by myself, but the model achieve much worse result than the report (average RTE became over 9 meter).

I cannot determine which code didn't work well in my environments. So, I'm grad if you provide detailed version of the results, such like P_gt, P_pred, and RTE/RRE.

Range of scene

Dear Jiaxin,
Recently I have read DeepI2P, and for the experiment, I wonder the range of the scene you have used. For example, as for KITTI, you have write the range from -1 to 80m in 'option.py', but in paper it seem that you used -80 to 80m.
Looking forward to your reply.

In the code, the package was not found.

Hello, I encountered some issues while reproducing the code. In the data->oxford->build_dataset.py file, there is a missing package robotcar_dataset_sdk in the line from data.oxford.robotcar_dataset_sdk.python.build_pointcloud import build_pointcloud. Could you please provide this package? Thank you once again for reading my question, and I look forward to your response. Thank you.

Enquiry for kitti dataloader file,line 296

In kitti_pc_img_pose_loader.py file,line 296,the transformation matrix was written as:
Pc = np.dot(self.calib_helper.get_matrix(seq, img_key), self.calib_helper.get_matrix(seq, 'Tr'))
In my view,this matrix only works in function "search_for_accumulation".
To transform pcs from timestamp j to timestamp i ,it is a little strange to :

  1. transforming pcs into camera 0
  2. translation from camera 0 to camera i
  3. pose transforming.
  4. ...
    Although camera i is parallel to camera 0 so there is no problem in your code,why not just throw away the translation step(2)?
    It is confusing...

About the Transformation from velo to image

Hi,
I really appreciate your work.
I'm reading your code about calculating the Transformation from Velodyne to the camera image plane at code,
but I get confused.
For convenience, I capture part of the code below:

        P_cam_nwu = np.asarray([[0, -1, 0, 0], 
                                [0, 0, -1, 0], 
                                [1, 0, 0, 0], 
                                [0, 0, 0, 1]], dtype=pc_np.dtype)
        P_nwu_cam = np.linalg.inv(P_cam_nwu)

        # now the point cloud is in CAMERA coordinate
        Pr_Pcamnwu = np.dot(Pr, P_cam_nwu)
        pc_np = transform_pc_np(Pr_Pcamnwu, pc_np)
        sn_np = transform_pc_np(Pr_Pcamnwu, sn_np)

        # assemble P. P * pc will get the point cloud in the camera image coordinate
        PcPnwucamPrinv = np.dot(Pc, np.dot(P_nwu_cam, Pr_inv))
        P = np.dot(Pji, PcPnwucamPrinv)  # 4x4
  • according to my understanding:
    • P_cam_nwu is actually close to Tr in calib.txt, representing the coordinate transformation from velo to cam.
    • Pr is the random rotation matrix
    • Pc is P2 @ Tr or P3 @ Tr, representing the transformation from velo coordinate to the camera image plane (2 or 3)
    • Pji is the relative pose, j'th frame wrt. i'th frame
  • Question:
    • P = Pji @ Pc @ P_nwu_cam @ Pr_inv is equivalent to P = Pji @ (P2 @ Tr) @ (Tr_inv) @ Pr_inv = Pji @ P2 @ Pr_inv
      • assemble P can transform i'th frame in Velodyne coordinate into j'th frame in camera image plane (include random rotation).
      • but if there lacks the coordinate transfoamation from velo to cam (Tr)?
      • I mean, the ground truth may be P = Pji @ P2 @ Tr @ Pr_inv?
    • pc_np = transform_pc_np(Pr_Pcamnwu, pc_np) is equivalent to pc_np = Pr @ P_cam_nwu @ pc_np
      • so if the Pr is the random transform on points under the camera coordinate rather than the NMU coordinate? (annotation at Line 352)
      • this actually doesn't affect the results, just for rigor.

May I ask if there is something wrong in my understanding, if so, Could you please give me a brief explanation?
Thank you in advance and look forward to your reply.

Unable to find the CmakeLists.txt Error

I tried running the repo with the steps mentioned in the readme. So when running the following command it throws an error.
python evaluation/frustum_reg/setup.py install

It is unable to find a CmakeLists.txt. Plese see the attached screenshot of the error below.

image

How to solve the environment problems

Hello, I'm sorry to disturb but I met a problem.
When I use kitti_pc_bin_to_npy_with_downsample_sn.py to process raw KITTI data, I install the newest open3d package with "pip install open3d".(python = 3.10, cuda=11.3 , torch = 1.12.1+cu113, torchvision = 0.13.1+cu113, and the newest opencv-python)
But I met the error with " module 'open3d.cuda.pybind.geometry' has no attribute 'voxel_down_sample' ". Could you please tell me the detailed environment packages? Or the version of open3d. Please

About the experimental results in Table 1

Sorry to bother you. I'm very interested in this excellent work, DeepI2P: Image-to-Point Cloud Registration via Deep Classification. I ran the code you released on the Oxford dataset. However, I found that point clouds are not randomly rotated and translated during the test phase. Is this released code not consistent with the experimental settings in the paper? Or are the experimental results in Table 1 obtained on the point clouds without random rotation and translation? Thank you very much!

How to prepare dataset for training?

Hi Jiaxin,

Maybe I should call you Teacher Li, as I am your student during SHENLAN college 3D point cloud processing class ~~~

Thanks a lot for sharing these valuable codes, I would like to do some research based on your work. However, I cannot find where to download the proper training/val datasets and which are the right pre-processing scripts.
Besides, I can only find the pretrained model of the oxford dataset. If it is possible to provide pretrained model of the KITTI dataset?
And I do not find scripts to generate classification results for pose optimization. If these scripts are available?

By the way, your class at SHENLAN college is really good.

Looking forward to your reply, and thanks for your help.

How long is the training time

Hello, thanks for sharing your code. I want to follow up your work but may not have enough equipment. What is the GPU you are using, and how long is the training time on the two data sets?

The point cloud saving code in data.oxford.build_dataset.py

Thank you for sharing the code of DeepI2P.
there is a code line below:
#310 pointcloud = np.dot(np.dot(G_camera_image_inv, G_camera_posesource), pointcloud)
I don't know why the pointcloud in the camera cordinate needs to multiply the reversion of the intrincs.
And after the check of the later code until the metric evaluation, I think this is a bug existing in the project.
Can you explain that or fix the bug?
Thanks

How to process my data

Thank you very much for your outstanding work. Now I want to use your code, but I don't quite understand how to use evaluation/*.py, do they have any sequence? I don't need to retrain the model, just use the model you have trained in advance to process my data. I would be grateful if you could give me some help.

The KITTI dataset

Thank you for sharing the code of DeepI2P. I tried to run the model on the KITTI dataset, but I found that the input of the model was from the preprocessed dataset rather than the official odometry data set. Could you share the preprocessed KITTI dataset? Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.