duplicateWithKeys
in cuda_rasterizer/rasterizer_impl.cu
I quantized a pre-trained 3D Gaussian cloud with 8-bit and rendered it. And I found that after duplicateWithKeys
, some bad values come forth, which raises:
RuntimeError: CUDA error: an illegal memory access was encountered
As shown in below, to find the bug, I print the the key (tile|depth) and value (gaussian_id) after duplicateWithKeys
:
Code
uint64_t *host_keys;
host_keys = (uint64_t *)malloc(num_rendered * sizeof(uint64_t));
CHECK_CUDA(cudaMemcpy(host_keys, binningState.point_list_keys_unsorted, num_rendered * sizeof(uint64_t), cudaMemcpyDeviceToHost), debug)
printf("unsorted keys:\n");
for (int i = 0; i < num_rendered; i++) {
uint64_t key_val = *(host_keys + i);
uint32_t currtile = key_val >> 32;
if (currtile > tile_grid.x * tile_grid.y) {
printf("ERROR, host check, currtile: %u, idx: %d\n", currtile, i);
}
}
Output
unsorted keys:
ERROR, host check, currtile: 1076426179, idx: 631887
ERROR, host check, currtile: 1076426179, idx: 658578
ERROR, host check, currtile: 1075539383, idx: 688927
ERROR, host check, currtile: 1061872036, idx: 740076
ERROR, host check, currtile: 1074785821, idx: 749138
ERROR, host check, currtile: 1076426179, idx: 751560
ERROR, host check, currtile: 1076426179, idx: 808032
ERROR, host check, currtile: 1056177442, idx: 819421
ERROR, host check, currtile: 1075539383, idx: 843349
ERROR, host check, currtile: 1074785821, idx: 883928
ERROR, host check, currtile: 1075539383, idx: 897679
ERROR, host check, currtile: 1071215373, idx: 899628
ERROR, host check, currtile: 1076426179, idx: 911535
ERROR, host check, currtile: 1071288477, idx: 911622
ERROR, host check, currtile: 1075539383, idx: 914914
The value of tile_id
is far more bigger than the max value that a tile_id
can reach. Here the max value refers to the tile_grid.x * tile_grid.y
.
Besides, I also check the tile_id
value inside the duplicateWithKeys
, such overflow values do not appear. Non of the tile_id exceeds the max value (tile_grid.x * tile_grid.y
):
__global__ void duplicateWithKeys(
int P,
const float2* points_xy,
const float* depths,
const uint32_t* offsets,
uint64_t* gaussian_keys_unsorted,
uint32_t* gaussian_values_unsorted,
int* radii,
dim3 grid)
{
auto idx = cg::this_grid().thread_rank();
if (idx >= P)
return;
// printf("idx-%d radius-%d\n", idx, *(radii + idx));
// Generate no key/value pair for invisible Gaussians
if (radii[idx] > 0)
{
// int tbd = 0;
// if (radii[idx] > 0) tbd = 1;
// printf("!!!here: %d, radius: %d, big: %d\n", idx, *(radii + idx), tbd);
// Find this Gaussian's offset in buffer for writing keys/values.
uint32_t off = (idx == 0) ? 0 : offsets[idx - 1];
uint2 rect_min, rect_max;
getRect(points_xy[idx], radii[idx], rect_min, rect_max, grid);
// For each tile that the bounding rect overlaps, emit a
// key/value pair. The key is | tile ID | depth |,
// and the value is the ID of the Gaussian. Sorting the values
// with this key yields Gaussian IDs in a list, such that they
// are first sorted by tile and then by depth.
for (int y = rect_min.y; y < rect_max.y; y++)
{
for (int x = rect_min.x; x < rect_max.x; x++)
{
uint64_t key = y * grid.x + x;
if (key > grid.x * grid.y) {
printf("ERROR, duplicateWithKeys, key: %u\n", key);
}
key <<= 32;
key |= *((uint32_t*)&depths[idx]);
gaussian_keys_unsorted[off] = key;
gaussian_values_unsorted[off] = idx;
uint32_t tile_id = gaussian_keys_unsorted[off] >> 32;
if (tile_id > grid.x * grid.y) {
printf("ERROR, duplicateWithKeys, tile id: %u\n", tile_id);
}
off++;
}
}
if (off != offsets[idx]) {
printf("ERROR, duplicateWithKeys, off: %u < offsets[idx]: %u \n", off, offsets[idx]);
}
}
}
I am completely confused now, why are there a batch of incorrect keys (tile_id
) appearing after depulicateKeys while there is not error keys (tile_id
) happens when running the depulicateKeys
function?
Can anyone tell me how to deal with this bug? Thanks a lot, god bless you!!!