Hello Siyuan,
First of all, thanks so much for your work. I learned a lot from reading your paper and code.
My understanding is that each 3D bounding box is parameterized by 3 basis vectors, 3 coefficients, and a 3D centroid. Theses parameters define the 3D bounding box in the world coordinate system. The extrinsic camera matrix R is the transformation from the world coordinate system to the camera coordinate system, and therefore from p_homo = K * R * P
, we can recover 2D image coordinates p_homo
from the bounding box corner P
in the world space.
If my understanding is correct, when we perform image flipping in dataset preprocessing, we have to flip the 3D bounding box labels in the camera coordinate system, instead of the world coordinate system. However, at this line and this line, it appears to me that you are doing it in the world coordinate system directlty.
This sometimes lead to some errors. From my observation, changing the logic to the following can reduce such errors:
# read camera parameters
K = self.meta['K'][idx]
R = self.meta['R'][idx]
yaw, pitch, roll = yaw_pitch_row_from_r(R)
if flip:
R_old = R
R = get_rotation_matrix_from_yaw_pitch_roll(-yaw, pitch, roll)
else:
R = get_rotation_matrix_from_yaw_pitch_roll(yaw, pitch, roll)
# read 3D bounding boxes
num_boxes = len(self.meta['boxes'][idx])
raw_basis = np.array([self.meta['boxes'][idx][i]['basis'] for i in range(num_boxes)])
raw_coeffs = np.array([self.meta['boxes'][idx][i]['coeffs'] for i in range(num_boxes)])
raw_centroid = np.array([self.meta['boxes'][idx][i]['centroid'] for i in range(num_boxes)])
if flip:
for i in range(num_boxes):
# get 3D corners in the world space
corners3d = get_corners_of_bb3d_no_index(raw_basis[i],
raw_coeffs[i],
raw_centroid[i])
# get 3D corners in the camera space
corners3d = np.matmul(R_old, corners3d.transpose()).transpose()
# flip x axis
corners3d[:, 0] = -corners3d[:, 0]
# get 3D corners back in world space
corners3d = np.matmul(R.transpose(), corners3d.transpose()).transpose()
# extract centroid, basis, and coeffs from 3D corners
raw_centroid[i] = corners3d.mean(axis=0)
b0_with_scale = (corners3d[1] - corners3d[0]) / 2
c0 = np.linalg.norm(b0_with_scale)
b0 = b0_with_scale / c0
b1_with_scale = (corners3d[1] - corners3d[2]) / 2
c1 = np.linalg.norm(b1_with_scale)
b1 = b1_with_scale / c1
b2_with_scale = (corners3d[1] - corners3d[5]) / 2
c2 = np.linalg.norm(b2_with_scale)
b2 = b2_with_scale / c2
raw_basis[i, 0] = -b0 # flip basis 0
raw_basis[i, 1] = b1
# keep b2 as [0, -1, 0] to avoid numerical issues
raw_coeffs[i] = [-c0, c1, c2]
Looking forward to discussing this with you!