Giter Site home page Giter Site logo

Comments (4)

waveleaf27 avatar waveleaf27 commented on May 28, 2024 1

Hi shikongzxz,
The deviation in height is caused by voxelization and the lack of translation in the z-axis. This means that the height of an object, after aggregating all frames, may be larger than its true value.
As we do not have access to the original data, this deviation is considered acceptable in this competition. However, we plan to collaborate with the official team of nuscene to address the pose issue after the competition.

from cvpr2023-3d-occupancy-prediction.

waveleaf27 avatar waveleaf27 commented on May 28, 2024

Can you further explain why the height of an object is calculated using the number of pixels in an image? Objects of the same height in 3D space may appear at different vertical position in the image due to projection.

For your question, check the projecting process. It should be noted that the voxel ground truth is in the vehicle coordinate. The following pseudo code example shows how to project a 3D point onto an image.
reference_points_cam = torch.matmul(torch.matmul(lidar2img.to(torch.float32), ego2lidar.to(torch.float32)), points.to(torch.float32))

There are also two extra reasons that can cause deviation between point clouds and images pixels:

  1. Voxel size. In this challenge, we provided a voxel size of 0.4m to reduce the GPU memory requirements. As you mentioned, the voxel center is actually not the actual point location due to discretization error. When adopting a smaller voxel size, we have better alignment between point clouds and images, as shown in the figure below.
  2. Nuscene (issues-721) lacks translation in the z-axis, which makes it difficult to achieve accurate 6D localization and can lead to misalignment of point clouds when accumulating them over the entire scene.

image

from cvpr2023-3d-occupancy-prediction.

shikongzxz avatar shikongzxz commented on May 28, 2024

I am not counting the pixels, but the voxels which are projected to the image. If you doubt that, you can visualize the GT file mini/gts/scene-0061/ca9a282c9e77460f8360f564131a8af5/labels.npz, it gives this image:
image

the pink voxels, which stands for a person, it occupies 7 voxels.

P.S. I used the following code to project the center of the voxels to the image:

    meta_file = "annotations.json"
    with open(meta_file, 'r') as f:
        meta = json.load(f)

    scene_name = meta["train_split"][0]
    scene_info = meta["scene_infos"][scene_name]

    color_map = get_color_map_nuscenes_seg()

    frame_info = next(iter(scene_info.values()))
    timestamp = frame_info["timestamp"]
    ts_save_path = os.path.join(save_dir, str(timestamp))
    os.makedirs(ts_save_path, exist_ok=True)

    voxel_gt_file = os.path.join(data_root, frame_info["gt_path"])
    voxel_gt = np.load(voxel_gt_file)
    voxel_semantics = voxel_gt["semantics"]
    mask_lidar = voxel_gt["mask_lidar"]
    mask_camera = voxel_gt["mask_camera"]
    grid_shape = voxel_semantics.shape
    voxel_center, voxel_semantics = voxel2points(
        voxel_semantics, [voxel_size] * 3)
    voxel_center = voxel_center.numpy()
    voxel_semantics = voxel_semantics.numpy()
    print(voxel_center.shape)
    print(voxel_semantics.shape)

    # rander each image view
    for cam_name, cam_info in frame_info["camera_sensor"].items():
        img = cv2.imread("imgs/"+cam_info["img_path"])
        print(cam_info["img_path"])
        img_h, img_w = img.shape[:2]
        K = np.array(cam_info["intrinsics"])
        # get camera horizontal fov, if not given
        fov_half = get_horizontal_fov(img_w, K)
        print(f"Half of fov of {cam_name} is {fov_half}")

        extrinsic = cam_info["extrinsic"]
        T_ego2camera = transform_matrix(
            np.array(extrinsic['translation']),
            Quaternion(extrinsic['rotation']),
            inverse=True,
        )
        pts = np.concatenate([voxel_center,
                              np.ones((voxel_center.shape[0], 1))], axis=-1)
        pts = np.squeeze(np.matmul(T_ego2camera[None, :, :],
                                   pts[:, :, None]))
        pts = pts[:, :3]
        alpha = np.abs(np.arctan2(pts[:, 0], pts[:, 2]))
        view_mask = alpha < fov_half
        pts = pts[view_mask]
        view_sematics = voxel_semantics[view_mask]

        pts = np.squeeze(np.matmul(K[None, :, :],
                                   pts[:, :, None]))
        pts /= pts[:, [2]]
        pts = pts.astype(int)
        for i in range(pts.shape[0]):
            if 0 < pts[i, 0] < img_w and 0 < pts[i, 1] < img_h:
                color = [int(color_map[view_sematics[i], j]) for j in
                         range(2, -1, -1)]
                cv2.circle(img, (pts[i, 0], pts[i, 1]), 1, color, -1)

        cv2.imwrite(os.path.join(ts_save_path, f"{cam_name}.jpg"), img)

from cvpr2023-3d-occupancy-prediction.

shikongzxz avatar shikongzxz commented on May 28, 2024

Thanks for replying. Looking forward to a better version of the data :)

from cvpr2023-3d-occupancy-prediction.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.