graphdeco-inria / diff-gaussian-rasterization Goto Github PK

License: Other

CMake 1.28% C 0.57% Cuda 69.88% C++ 18.06% Python 10.21%

diff-gaussian-rasterization's Introduction

Differential Gaussian Rasterization

Used as the rasterization engine for the paper "3D Gaussian Splatting for Real-Time Rendering of Radiance Fields". If you can make use of it in your own research, please be so kind to cite us.

BibTeX

@Article{kerbl3Dgaussians,
      author       = {Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
      title        = {3D Gaussian Splatting for Real-Time Radiance Field Rendering},
      journal      = {ACM Transactions on Graphics},
      number       = {4},
      volume       = {42},
      month        = {July},
      year         = {2023},
      url          = {https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/}
}

diff-gaussian-rasterization's People

Contributors

Stargazers

Watchers

Forkers

wzy-99 grgkopanas jonathonluiten ingra14m muskie82 nicolaihaeni heheyas camenduru dmillner jerry18231174 knut0815 rohithagaram cshenton atlantixjj johnryan465 brent-zoomers haoranchen1104 modexus sairisheek nishadgothoskar shunchengwu stalhabukhari juan-slamcore viswaravi 5l1v3r1 zixunh clarte53 jyvenom bobboli yehaike linyou jianhengliu dcharatan codeyfrommars expenses alpara prashanthsadasivan anonymouslosty dojizz kimsoohwan warmsnow-sh deepshwang lrq3000 chaphlagical joao-andreotti liu-xinhang cosmoscout skywolf829 skluge hdupuyang yixunliang kekeh1 wootwootwootwoot ggq7702 yizhou-li-cv stanleydukor fukexue orwlit zhaoliangzhang pimpale ashishd siyun-liang lambdald idlesilver limacv chingswy xvdp yousiki ziyc eric-666 renxiangdai yertleturtlegit accelsnow ralphhauwertut coltonstearns 463f vllab iiiiqiiii jsxzs teju81 muminkhan m-gjerde dxyang william122742 kangkang-wky chenyutongthu r0bertr limpbot ashwinbaluja byrax15 bruinxiong yuzhongruicn thomasparistech josephldobson inuex35 rosetta-leong colabi grumpyzhou zhuhu00 kumass2020

diff-gaussian-rasterization's Issues

Equirectangular Projection in Rendering

Hello,
As fas as I know, in this repo, only perspective projection has been implemented in rendering.
I was just wondering if equirectangular projection is possible, if so, do you guys think of implementing it? If not, can you give me some hint how to do that?
Thanks.

Single pixel color

hello @kwea123 @Snosixtyboo @grgkopanas,
I am interested in getting final color at particular location for different view directions, by providing 3DGS pointcloud, location of target pixel for tracking in first fully rasterized image using first colmap camera pose from colmap pose sequence or 3D location in scene (how to find it?), and colmap camera poses.
Can you please guide me to achieve this?

Questions about the camera and minimum render distance

Hi, I have two questions:

Is orthographic camera available, in addition to perspective camera?
Currently the splats disappear when they are close to the camera (which makes sense for your use case). Would it be possible to disable this? Ofc, this would only make sense with an orthographic camera.

Sorry if this is the wrong place to ask.

Open-source license for repo?

Is there a top level LICENSE that could be added for the repository? I see references to LICENSE.md, I guess that refers to the one in https://github.com/graphdeco-inria/gaussian-splatting

" [WinError 2] The system cannot find the file specified" when building

Not sure if I should be posting this here or in the gaussian splatting repo, but when setting up the conda environment, the building of this package fails, with the following message to go off:
"running build_ext
error: [WinError 2] The system cannot find the file specified"
This is incredibly perplexing, and it persists no matter what method I try and use to build it. My OS is windows 11 22H2, with CUDA 11 on an RTX 3060, if this helps.

`duplicateWithKeys` generates bad values when rendering 8-bit quantized Gaussians.

duplicateWithKeys in cuda_rasterizer/rasterizer_impl.cu

I quantized a pre-trained 3D Gaussian cloud with 8-bit and rendered it. And I found that after duplicateWithKeys, some bad values come forth, which raises:

RuntimeError: CUDA error: an illegal memory access was encountered

As shown in below, to find the bug, I print the the key (tile|depth) and value (gaussian_id) after duplicateWithKeys:

Code

uint64_t *host_keys;
host_keys = (uint64_t *)malloc(num_rendered * sizeof(uint64_t));

CHECK_CUDA(cudaMemcpy(host_keys, binningState.point_list_keys_unsorted, num_rendered * sizeof(uint64_t), cudaMemcpyDeviceToHost), debug)
printf("unsorted keys:\n");
for (int i = 0; i < num_rendered; i++) {
    uint64_t key_val = *(host_keys + i);
    uint32_t currtile = key_val >> 32;
        if (currtile > tile_grid.x * tile_grid.y) {
	        printf("ERROR, host check, currtile: %u, idx: %d\n", currtile, i);
        }
}

Output

unsorted keys:
ERROR, host check, currtile: 1076426179, idx: 631887
ERROR, host check, currtile: 1076426179, idx: 658578
ERROR, host check, currtile: 1075539383, idx: 688927
ERROR, host check, currtile: 1061872036, idx: 740076
ERROR, host check, currtile: 1074785821, idx: 749138
ERROR, host check, currtile: 1076426179, idx: 751560
ERROR, host check, currtile: 1076426179, idx: 808032
ERROR, host check, currtile: 1056177442, idx: 819421
ERROR, host check, currtile: 1075539383, idx: 843349
ERROR, host check, currtile: 1074785821, idx: 883928
ERROR, host check, currtile: 1075539383, idx: 897679
ERROR, host check, currtile: 1071215373, idx: 899628
ERROR, host check, currtile: 1076426179, idx: 911535
ERROR, host check, currtile: 1071288477, idx: 911622
ERROR, host check, currtile: 1075539383, idx: 914914

The value of tile_id is far more bigger than the max value that a tile_id can reach. Here the max value refers to the tile_grid.x * tile_grid.y.

Besides, I also check the tile_id value inside the duplicateWithKeys, such overflow values do not appear. Non of the tile_id exceeds the max value (tile_grid.x * tile_grid.y):

__global__ void duplicateWithKeys(
	int P,
	const float2* points_xy,
	const float* depths,
	const uint32_t* offsets,
	uint64_t* gaussian_keys_unsorted,
	uint32_t* gaussian_values_unsorted,
	int* radii,
	dim3 grid)
{
	auto idx = cg::this_grid().thread_rank();
	if (idx >= P)
		return;
	
	// printf("idx-%d radius-%d\n", idx, *(radii + idx));
	// Generate no key/value pair for invisible Gaussians
	if (radii[idx] > 0)
	{
		// int tbd = 0;
		// if (radii[idx] > 0) tbd = 1;
		// printf("!!!here: %d, radius: %d, big: %d\n", idx, *(radii + idx), tbd);

		// Find this Gaussian's offset in buffer for writing keys/values.
		uint32_t off = (idx == 0) ? 0 : offsets[idx - 1];
		uint2 rect_min, rect_max;

		getRect(points_xy[idx], radii[idx], rect_min, rect_max, grid);

		// For each tile that the bounding rect overlaps, emit a 
		// key/value pair. The key is |  tile ID  |      depth      |,
		// and the value is the ID of the Gaussian. Sorting the values 
		// with this key yields Gaussian IDs in a list, such that they
		// are first sorted by tile and then by depth. 
		for (int y = rect_min.y; y < rect_max.y; y++)
		{
			for (int x = rect_min.x; x < rect_max.x; x++)
			{
				uint64_t key = y * grid.x + x;
				if (key > grid.x * grid.y) {
					printf("ERROR, duplicateWithKeys, key: %u\n", key);
				}
				key <<= 32;
				key |= *((uint32_t*)&depths[idx]);
				gaussian_keys_unsorted[off] = key;
				gaussian_values_unsorted[off] = idx;
				uint32_t tile_id = gaussian_keys_unsorted[off] >> 32;
				if (tile_id > grid.x * grid.y) {
					printf("ERROR, duplicateWithKeys, tile id: %u\n", tile_id);
				}
				off++;
			}
		}
		if (off != offsets[idx]) {
			printf("ERROR, duplicateWithKeys, off: %u < offsets[idx]: %u \n", off, offsets[idx]);
		} 
	} 
}

I am completely confused now, why are there a batch of incorrect keys (tile_id) appearing after depulicateKeys while there is not error keys (tile_id) happens when running the depulicateKeys function?

Can anyone tell me how to deal with this bug? Thanks a lot, god bless you!!!

Segmentation fault when trying to replace float with tensor

Hi I want to enable gradients for the tan_fovx and tan_fovy variables in this library.
I modified their types in rasterize_points.cu to const torch::Tensor&, which were originally float. And referenced them with tan_fovx.contiguous().data() so they can be used as const float* in other places. In the actual computations, I used *tan_fovx to refer to them.
I was able to get the library to build but it segfaulted in run time. What am I doing wrong? Thanks.

How to draw the Gaussian geometry in the image?

Is this support drawing the Gaussian geometry in the image?

can it support CUDA 12?

Is it possible to use this without coda, like cpu only?

About the low-pass filter

diff-gaussian-rasterization/cuda_rasterizer/forward.cu

Lines 108 to 112 in 59f5f77

    
           // Apply low-pass filter: every Gaussian should be at least 
        
           // one pixel wide/high. Discard 3rd row and column. 
        
           cov[0][0] += 0.3f; 
        
           cov[1][1] += 0.3f; 
        
           return { float(cov[0][0]), float(cov[0][1]), float(cov[1][1]) };

According to the equation 33 in EWA Splatting, this code is used to conduct low-pass filter by modifying the cov mat. And EWA Splatting said the low-pass filter is represented by a Gaussian with identity cov mat.
So why add 0.3 here? And why 0.3 is related to a pixel?

Thanks!

Debugging the project

I want to debug the cuda kernels. I've prepared a data checkpoint for debugging. However, I'm not sure how to configure Cmake to make it work.

For now, I tried adding a main.cpp which includes rasterize_points.h, and use the following cmake:

cmake_minimum_required(VERSION 3.20)
project(TestProject LANGUAGES CXX CUDA)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)

# List your CUDA source files here
set(CUDA_SOURCES
    cuda_rasterizer/backward.h
    cuda_rasterizer/backward.cu
    cuda_rasterizer/forward.h
    cuda_rasterizer/forward.cu
    cuda_rasterizer/auxiliary.h
    cuda_rasterizer/rasterizer_impl.cu
    cuda_rasterizer/rasterizer_impl.h
    cuda_rasterizer/rasterizer.h)

# Create an executable that includes both your main.cpp and CUDA source files
add_executable(test_executable main.cpp ${CUDA_SOURCES})

# Specify CUDA architectures (adjust as needed)
set_target_properties(test_executable PROPERTIES CUDA_ARCHITECTURES "70;75;86")

# Include directories, if you have any
target_include_directories(test_executable PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/cuda_rasterizer)
target_include_directories(test_executable PRIVATE third_party/glm ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
target_include_directories(test_executable PRIVATE ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})

# Link with CUDA libraries if needed
target_link_libraries(test_executable PRIVATE cuda)

I create build, and call cmake .., but get an error:

CUDA_ARCHITECTURES is empty for target "cmTC_73c6d".

I assume I missed some additional flags, since I'm able to install the project via pip.

_

Pixel Rendering

Great work! I am following your released diff-gaussian-rasterization library, and would like to ask whether you will release the pixel rendering module which returns the color of each single ray. Thank you in advance!

The biggest number of channel supported is 41?

How could I add more channels? when I tried 42 channels, it got errors. I could only split it?

Feature suggestion: backface-culling for the rasterizer

Hi there,
Large splats which represent the scene background often get in front of the camera when zooming-out of the area of interest.

The training procedure results in multiple properties per splat (xyz, opacity, rot_i, scale_i, opacity, f_dc_i, spherical_harmonics f_rest_i) as well as normal nx, ny, nz per splat.

It would be great if the rasterizer could take normals into account to do backface-culling - only adding the gausian-splat contribution to screen pixels during rasterization if the corresponding splat faces the camera - some boolean operator along these lines: dot(normal, ray) < 0.

verify the camera

Hi,

Is there anyway I can verify the camera loaded correctly with the rendering code? I use a preset camera and follow the colmap definition. It can project input point clouds to 2D without problem. However, the model is not converging. I'm wondering if I can directly verify the camera from the code.

how do i do to see Progressively Growing of Gaussians in real-time??

Hello, I'd like to see Progressively Growing of Gaussians in real-time. What parts do I need to modify for that?
Like the picture below

About cacluating p_hom using p_orig and projmatrix

Thanks for your great job!
I have some question about cacluating p_hom using p_orig and projmatrix.

diff-gaussian-rasterization/cuda_rasterizer/forward.cu

Line 197 in 59f5f77

    
           float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };

I don't understand why it's possible to use points directly under the world coordinate system. I think it should to use viewmatrix to transform p_orig under the camera coordinate system first, and then calculate the projected coordinates. Looking forward to your

How do I render data for more channels

I set shs to none before render, and colors_precomp concate became the dimension I needed, and I also changed #define NUM_CHANNELS 6
But the result of render is wrong, may I ask which step I may have done wrong?

my_radius calculation

https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/59f5f77e3ddbac3ed9db93ec2cfe99ed6c5d121d/cuda_rasterizer/forward.cu#L232C1-L233C1

I'm trying to understand above code snipet,if it's 3-sigma rule,shouldn't it be:

float my_radius = ceil(sqrt(3.0f * max(lambda1, lambda2)));

Question regarding equation for calculating J

Hi everyone! Thanks for the wonderful work and for releasing it!

I had a problem reading the paper and looking at the code. On the paper "EWA volume splatting" on equation (29), how to compute the J would be like this:

However, on the code (cuda_rastizer/forward.cu), its implemented like this:

Which is different from the paper. I understood why you are multiplying by the focal lenght bu I did not understand the row of zeros. Also, since there is a row of zeros, this matrix is not invertible, which would mean that cov is not invertible, which would mean it is not a valid covariance matrix, which, I imagined would cause some problems.

Is there anything I missed? Why did the authors implement the code in this manner?

Thanks for the help!!

renderCUDA will encounter resource comptive?

in renderCUDA()
each pixel accums alpha-weight color of gaussians recepted by the current tile
all pixel in the same tile will share the collected gaussains sorted by depth.
BLOCK_SIZE (=256) gaussians will be collected first, sync, then use, and then next 256 gaussians will be collected.
But there no sync between the use and next collection.
I have a question:
Maybe some threads still use current 256 gaussian, where some other threads alread finish current use and assigned gaussians in next 256 gaussians to the shared memory collected_id. It is a bug if it will happen. Is my understand correct?

Merge `fast_culling` into `main` branch

Hello!

I see that the C++ renderer in GitLab uses the fast_culling branch of the repository. While the PyTorch training code uses the main branch.

I see that the fast_culling version gives a very reasonable speedup when rendering.

Is there a reason why this branch cannot be merged into main?

i can't open 'gaussian-splatting-main/submodules/simple-knn'

when i install envirnoment i got a problem
Pip subprocess error:
ERROR: Directory 'submodules/simple-knn' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.

failed

CondaEnvException: Pip failed

is the 128 here for alignment?

diff-gaussian-rasterization/cuda_rasterizer/rasterizer_impl.h

Line 72 in 59f5f77

return ((size_t)size) + 128;

In frustum question

Hi, I love your work and am trying to understand every bit of it. I am currently trying to understand this bit that seems wrong to me:

__forceinline__ __device__ bool in_frustum(int idx,
	const float* orig_points,
	const float* viewmatrix,
	const float* projmatrix,
	bool prefiltered,
	float3& p_view)
{
	float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };

	// Bring points to screen space
	float4 p_hom = transformPoint4x4(p_orig, projmatrix);
	float p_w = 1.0f / (p_hom.w + 0.0000001f);
	float3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w };
	p_view = transformPoint4x3(p_orig, viewmatrix);

	if (p_view.z <= 0.2f)// || ((p_proj.x < -1.3 || p_proj.x > 1.3 || p_proj.y < -1.3 || p_proj.y > 1.3)))
	{
		if (prefiltered)
		{
			printf("Point is filtered although prefiltered is set. This shouldn't happen!");
			__trap();
		}
		return false;
	}
	return true;
}

You are first applying the projection transformation and then the viewing transformation. But in your paper you use the following equation: $\Sigma' = JW\Sigma W^TJ^T$ to convert to screen space, being $J$ the projection and $W$ the viewing transformation. The order is the opposite here, why? Since everything works I assume that there must be a reason this is also a valid representation, but can't understand why. Also, in the final version you discard gaussians that are too near to the camera, ignoring their x-y axis location, as denoted by the commented part. Any reason for that? I guess it must be empirical, what happened when you changed the criteria?

Diagonal scaling matrix get incorrectly initialized to a full-one matrix

Hi guys,

I wonder that ths S matrix defined in computeCov3D funcs for both fwd/bwd computations is now incorrectly initialized to full-one:

diff-gaussian-rasterization/cuda_rasterizer/forward.cu

Line 121 in 59f5f77

glm::mat3 S = glm::mat3(1.0f);

If I understand correctly, here we expect an identity matrix instead:

glm::mat3 S = glm::mat3(
    1.0f, 0.0f, 0.0f,
    0.0f, 1.0f, 0.0f,
    0.0f, 0.0f, 1.0f
);

Fortunately, such a full-one matrix with a scaled main diagonal is usually positively defined and symmetric. Thus, the reparameterized $R^TSSR$ is still a covariance matrix.

I have roughly checked that this issue is not so influential to the PSNR performance. If required and agreed, I'm more than glad to submit a hotfix PR for it.

Confusing loop variable naming in backward::renderCUDA

diff-gaussian-rasterization/cuda_rasterizer/backward.cu

Lines 476 to 477 in 59f5f77

    
           for (int i = 0; i < C; i++) 
        
           	collected_colors[i * BLOCK_SIZE + block.thread_rank()] = colors[coll_id * C + i];

	for (int i = 0; i < rounds; i++, toDo -= BLOCK_SIZE)
	{
		...
			for (int i = 0; i < C; i++)  // <--
				collected_colors[i * BLOCK_SIZE + block.thread_rank()] = colors[coll_id * C + i];

i is being used for the inner and outer loop here. Is it intended exactly as it is written? I think it could be good to change i to ch like in the other loops in this outer loop to avoid confusion.

	// Apply low-pass filter: every Gaussian should be at least
	// one pixel wide/high. Discard 3rd row and column.
	cov[0][0] += 0.3f;
	cov[1][1] += 0.3f;
	return { float(cov[0][0]), float(cov[0][1]), float(cov[1][1]) };

	for (int i = 0; i < C; i++)
	collected_colors[i * BLOCK_SIZE + block.thread_rank()] = colors[coll_id * C + i];

graphdeco-inria / diff-gaussian-rasterization Goto Github PK

diff-gaussian-rasterization's Introduction

Differential Gaussian Rasterization

BibTeX

diff-gaussian-rasterization's People

Contributors

Stargazers

Watchers

Forkers

diff-gaussian-rasterization's Issues

Recommend Projects

Recommend Topics

Recommend Org