Giter Site home page Giter Site logo

Comments (5)

sujithvemi avatar sujithvemi commented on May 26, 2024 1

@i-chaochen I really wish I could help you here. But I really don't know much about LIDAR and was not able to fully understand what the paper said.

You can check the supplemental material provided here, it might help you.

from desire.

stratomaster31 avatar stratomaster31 commented on May 26, 2024

I'm working on this model. I've coded the CVAE and I have good results in training phase, but not for test phase...
Which are the decoder1 inputs? In the paper is not specified...

from desire.

i-chaochen avatar i-chaochen commented on May 26, 2024

I think this work only can handle Stanford Drone Dataset? Do you know how to process KITTI dataset?

In the original paper, as the following

As the dataset does not provide semantic labels for 3D points (which we need for scene context), we first perform semantic segmentations of images and project Velodyne laser scans onto the image plane using the provided camera matrix to label 3D points. The semantically labeled 3D points are then registered into the world coordinates using GPS-IMU tags. Finally we create top-down view feature maps I of size H ×W × C.

If I understood correctly, they did these:

  1. first do the semantic segmentation of all images, get masks of data.

  2. project laser data into 2d image and put mask data on this 2d image. // i.e., opencv's projectPoints() to do the project?

    2.1 Since KITTI is bin format, we need to convert it to PCD first.

    2.2 Do the registration for all PCD files to fuse as a global frame, and then finally we can use camera matrix (provided by KITTI) and extrinsic matrix (calculated by GPU-IMU) to covert it as a 2d image, and we also will project segmentation mask from step-1 to this projected 2d image.

Anyone can correct me if I'm wrong? Thanks in advance!

from desire.

sujithvemi avatar sujithvemi commented on May 26, 2024

@i-chaochen

I don't work with LiDAR data, so I can't comment on bin and PCD format etc. But the approach that you are taking sounds fine to me. To summarize, this is my understanding:

  • Project the Velodyne 3D laser scan to 2D image plane
  • All the points in the third dimension that fall on same point in the 2D image plane get the same label as that is recognised from semantic segmentation
  • Now the points are converted to world co-ordinate frame
  • Build a BEV 3D matrix with the third dimension being a one-hot vector corresponding to the class from semantic segmentation (cropping of this feature map can be done before building it)

Feel free to comment if I am wrong in any sense, so we can better understand. Thanks in advance.

from desire.

i-chaochen avatar i-chaochen commented on May 26, 2024

@i-chaochen

I don't work with LiDAR data, so I can't comment on bin and PCD format etc. But the approach that you are taking sounds fine to me. To summarize, this is my understanding:

  • Project the Velodyne 3D laser scan to 2D image plane
  • All the points in the third dimension that fall on same point in the 2D image plane get the same label as that is recognised from semantic segmentation
  • Now the points are converted to world co-ordinate frame
  • Build a BEV 3D matrix with the third dimension being a one-hot vector corresponding to the class from semantic segmentation (cropping of this feature map can be done before building it)

Feel free to comment if I am wrong in any sense, so we can better understand. Thanks in advance.

@sujithvemi Thanks for the feedback. I am not sure I fully understood what the original paper means for project Velodyne laser scans onto the image plane. What this image plane looks like? Does it look like this one?

Screenshot 2019-11-08 at 00 48 51

Also, since they already project 3D scans to 2D image plane, why they need to register 3D scans to the world coordinate using GPS-IMU tag? 2D image coordinate can be used for the prediction anyway.

If they want to do the register to the world coordinate, I think they will need intrinsic and extrinsic (it can provide by GPS-IMU?) matrices instead of GPS-IMU.

from desire.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.