I ran this set of codes and found that the lidar point cloud would flicker during the

I found that the lidar input of the network model seems to be wrong,about autonomousvision/transfuser

Comments (12)

ap229997 commented on May 24, 2024

The simulator is run at 20 fps whereas the rotation frequency of the LiDAR sensor is 10 fps (in accordance with the official leaderboard framework). This is why you notice the flickering. I agree that the rotation frequency should be set to 20 fps to get proper LiDAR input but we had to change it to match the leaderboard framework. Generally, there isn't much difference between consecutive frames so this should still work fine.

In my experiments, I had noticed that removing the LiDAR (which is equivalent to the AIM model) leads to worse performance. Can you tell me which setting you used to test the performance without the LiDAR input?

from transfuser.

HongYegg commented on May 24, 2024

I use latefusion. I think the difference between consecutive frames is very big, because it is the point cloud before and after, but the point cloud is very different when it is processed and input to the network. For example, the first frame is normal, and the second frame is the point cloud behind the car. After this part of the code, the point cloud will be basically completely black.

lidar_processed = list()
# transform the lidar point clouds to local coordinate frame
ego_theta = self.input_buffer['thetas'][-1]
ego_x, ego_y = self.input_buffer['gps'][-1]
for i, lidar_point_cloud in enumerate(self.input_buffer['lidar']):
curr_theta = self.input_buffer['thetas'][i]
curr_x, curr_y = self.input_buffer['gps'][i]
lidar_point_cloud[:,1] *= -1 # inverts x, y
lidar_transformed = transform_2d_points(lidar_point_cloud,
np.pi/2-curr_theta, -curr_x, -curr_y, np.pi/2-ego_theta, -ego_x, -ego_y)
lidar_transformed = torch.from_numpy(lidar_to_histogram_features(lidar_transformed, crop=self.config.input_resolution)).unsqueeze(0)
lidar_processed.append(lidar_transformed.to('cuda', dtype=torch.float32))
encoding.append(self.net.lidar_encoder(lidar_processed))

Unless we modify this line of code, the point cloud behind the vehicle in the second frame will be displayed normally. But even this is still unreasonable.
Wait a moment. I will send you a video via email.

from transfuser.

HongYegg commented on May 24, 2024

Modify this line of code for the point cloud behind the vehicle in the second frame to display it normally. lidar_point_cloud[:,1] *= -1 # inverts x, y

from transfuser.

ap229997 commented on May 24, 2024

Can you tell me which CARLA version you are using?

from transfuser.

HongYegg commented on May 24, 2024

CARLA 0.9.10.1

from transfuser.

HongYegg commented on May 24, 2024

Sorry, the video is temporarily inconvenient to record. Probably the effect is to visualize lidar_transformed = transform_2d_points(lidar_point_cloud, np.pi/2-curr_theta, -curr_x, -curr_y, np.pi/2-ego_theta, -ego_x, -ego_y) This variable, there will be a frame of normal, and a frame of basically black, so I think such input gives It is unreasonable to train on the network.

from transfuser.

ap229997 commented on May 24, 2024

Are you visualizing the LiDAR input at every frame or every 10th frame (since we save the data after every 10th frame)?

from transfuser.

HongYegg commented on May 24, 2024

I'm not quite sure what you mean specifically. I just added a few lines of visualization code to the part of the code I mentioned above. It should be displayed at a frequency of 20 frames.

from transfuser.

HongYegg commented on May 24, 2024

But every runstep, the input that the network gets is really unreasonable, I think this is the key to the problem.

from transfuser.

ap229997 commented on May 24, 2024

In our code, the LiDAR is processed so as to give the front half of the point cloud at every frame and this leads to alternating normal and black frames. However, when we generate the data for training, we save data every 10th frame, i.e., 2 frames per second even though the input stream consists of 20 frames per second (we don't store every frame in the training dataset).

if self.step % 10 == 0 and self.save_path is not None:
            self.save(far_node, near_command, steer, throttle, brake, target_speed, data)

So, even though the input alternates between normal and black, our training dataset only contains the normal frames.

I agree that at runtime, the network input is unreasonable. The best solution would be to set the rotation frequency of LiDAR to 20 fps (as you pointed out).

from transfuser.

HongYegg commented on May 24, 2024

Very good, what I worry about is whether the trained model will have this problem. That being the case, I can adjust it myself, thank you for your answer, thank you very much. If I still have any questions, I will ask you again.

from transfuser.

ap229997 commented on May 24, 2024

Thanks for pointing this out. It'll be interesting to visualize the attention maps of the transfuser in these 'blank' frames.

from transfuser.

I found that the lidar input of the network model seems to be wrong about transfuser HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent