Giter Site home page Giter Site logo

diff-gaussian-rasterization's Introduction

Differential Gaussian Rasterization

Used as the rasterization engine for the paper "3D Gaussian Splatting for Real-Time Rendering of Radiance Fields". If you can make use of it in your own research, please be so kind to cite us.

BibTeX

@Article{kerbl3Dgaussians,
      author       = {Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George},
      title        = {3D Gaussian Splatting for Real-Time Radiance Field Rendering},
      journal      = {ACM Transactions on Graphics},
      number       = {4},
      volume       = {42},
      month        = {July},
      year         = {2023},
      url          = {https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/}
}

diff-gaussian-rasterization's People

Contributors

snosixtyboo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diff-gaussian-rasterization's Issues

Equirectangular Projection in Rendering

Hello,
As fas as I know, in this repo, only perspective projection has been implemented in rendering.
I was just wondering if equirectangular projection is possible, if so, do you guys think of implementing it? If not, can you give me some hint how to do that?
Thanks.

Single pixel color

hello @kwea123 @Snosixtyboo @grgkopanas,
I am interested in getting final color at particular location for different view directions, by providing 3DGS pointcloud, location of target pixel for tracking in first fully rasterized image using first colmap camera pose from colmap pose sequence or 3D location in scene (how to find it?), and colmap camera poses.
Can you please guide me to achieve this?

Questions about the camera and minimum render distance

Hi, I have two questions:

  1. Is orthographic camera available, in addition to perspective camera?
  2. Currently the splats disappear when they are close to the camera (which makes sense for your use case). Would it be possible to disable this? Ofc, this would only make sense with an orthographic camera.

Sorry if this is the wrong place to ask.

" [WinError 2] The system cannot find the file specified" when building

Not sure if I should be posting this here or in the gaussian splatting repo, but when setting up the conda environment, the building of this package fails, with the following message to go off:
"running build_ext
error: [WinError 2] The system cannot find the file specified"
This is incredibly perplexing, and it persists no matter what method I try and use to build it. My OS is windows 11 22H2, with CUDA 11 on an RTX 3060, if this helps.

`duplicateWithKeys` generates bad values when rendering 8-bit quantized Gaussians.

duplicateWithKeys in cuda_rasterizer/rasterizer_impl.cu

I quantized a pre-trained 3D Gaussian cloud with 8-bit and rendered it. And I found that after duplicateWithKeys, some bad values come forth, which raises:

RuntimeError: CUDA error: an illegal memory access was encountered

As shown in below, to find the bug, I print the the key (tile|depth) and value (gaussian_id) after duplicateWithKeys:

Code

uint64_t *host_keys;
host_keys = (uint64_t *)malloc(num_rendered * sizeof(uint64_t));

CHECK_CUDA(cudaMemcpy(host_keys, binningState.point_list_keys_unsorted, num_rendered * sizeof(uint64_t), cudaMemcpyDeviceToHost), debug)
printf("unsorted keys:\n");
for (int i = 0; i < num_rendered; i++) {
    uint64_t key_val = *(host_keys + i);
    uint32_t currtile = key_val >> 32;
        if (currtile > tile_grid.x * tile_grid.y) {
	        printf("ERROR, host check, currtile: %u, idx: %d\n", currtile, i);
        }
}

Output

unsorted keys:
ERROR, host check, currtile: 1076426179, idx: 631887
ERROR, host check, currtile: 1076426179, idx: 658578
ERROR, host check, currtile: 1075539383, idx: 688927
ERROR, host check, currtile: 1061872036, idx: 740076
ERROR, host check, currtile: 1074785821, idx: 749138
ERROR, host check, currtile: 1076426179, idx: 751560
ERROR, host check, currtile: 1076426179, idx: 808032
ERROR, host check, currtile: 1056177442, idx: 819421
ERROR, host check, currtile: 1075539383, idx: 843349
ERROR, host check, currtile: 1074785821, idx: 883928
ERROR, host check, currtile: 1075539383, idx: 897679
ERROR, host check, currtile: 1071215373, idx: 899628
ERROR, host check, currtile: 1076426179, idx: 911535
ERROR, host check, currtile: 1071288477, idx: 911622
ERROR, host check, currtile: 1075539383, idx: 914914

The value of tile_id is far more bigger than the max value that a tile_id can reach. Here the max value refers to the tile_grid.x * tile_grid.y.

Besides, I also check the tile_id value inside the duplicateWithKeys, such overflow values do not appear. Non of the tile_id exceeds the max value (tile_grid.x * tile_grid.y):

__global__ void duplicateWithKeys(
	int P,
	const float2* points_xy,
	const float* depths,
	const uint32_t* offsets,
	uint64_t* gaussian_keys_unsorted,
	uint32_t* gaussian_values_unsorted,
	int* radii,
	dim3 grid)
{
	auto idx = cg::this_grid().thread_rank();
	if (idx >= P)
		return;
	
	// printf("idx-%d radius-%d\n", idx, *(radii + idx));
	// Generate no key/value pair for invisible Gaussians
	if (radii[idx] > 0)
	{
		// int tbd = 0;
		// if (radii[idx] > 0) tbd = 1;
		// printf("!!!here: %d, radius: %d, big: %d\n", idx, *(radii + idx), tbd);

		// Find this Gaussian's offset in buffer for writing keys/values.
		uint32_t off = (idx == 0) ? 0 : offsets[idx - 1];
		uint2 rect_min, rect_max;

		getRect(points_xy[idx], radii[idx], rect_min, rect_max, grid);

		// For each tile that the bounding rect overlaps, emit a 
		// key/value pair. The key is |  tile ID  |      depth      |,
		// and the value is the ID of the Gaussian. Sorting the values 
		// with this key yields Gaussian IDs in a list, such that they
		// are first sorted by tile and then by depth. 
		for (int y = rect_min.y; y < rect_max.y; y++)
		{
			for (int x = rect_min.x; x < rect_max.x; x++)
			{
				uint64_t key = y * grid.x + x;
				if (key > grid.x * grid.y) {
					printf("ERROR, duplicateWithKeys, key: %u\n", key);
				}
				key <<= 32;
				key |= *((uint32_t*)&depths[idx]);
				gaussian_keys_unsorted[off] = key;
				gaussian_values_unsorted[off] = idx;
				uint32_t tile_id = gaussian_keys_unsorted[off] >> 32;
				if (tile_id > grid.x * grid.y) {
					printf("ERROR, duplicateWithKeys, tile id: %u\n", tile_id);
				}
				off++;
			}
		}
		if (off != offsets[idx]) {
			printf("ERROR, duplicateWithKeys, off: %u < offsets[idx]: %u \n", off, offsets[idx]);
		} 
	} 
}

I am completely confused now, why are there a batch of incorrect keys (tile_id) appearing after depulicateKeys while there is not error keys (tile_id) happens when running the depulicateKeys function?

Can anyone tell me how to deal with this bug? Thanks a lot, god bless you!!!

Segmentation fault when trying to replace float with tensor

Hi I want to enable gradients for the tan_fovx and tan_fovy variables in this library.
I modified their types in rasterize_points.cu to const torch::Tensor&, which were originally float. And referenced them with tan_fovx.contiguous().data() so they can be used as const float* in other places. In the actual computations, I used *tan_fovx to refer to them.
I was able to get the library to build but it segfaulted in run time. What am I doing wrong? Thanks.

About the low-pass filter

// Apply low-pass filter: every Gaussian should be at least
// one pixel wide/high. Discard 3rd row and column.
cov[0][0] += 0.3f;
cov[1][1] += 0.3f;
return { float(cov[0][0]), float(cov[0][1]), float(cov[1][1]) };

According to the equation 33 in EWA Splatting, this code is used to conduct low-pass filter by modifying the cov mat. And EWA Splatting said the low-pass filter is represented by a Gaussian with identity cov mat.
So why add 0.3 here? And why 0.3 is related to a pixel?

Thanks!

Debugging the project

I want to debug the cuda kernels. I've prepared a data checkpoint for debugging. However, I'm not sure how to configure Cmake to make it work.

For now, I tried adding a main.cpp which includes rasterize_points.h, and use the following cmake:

cmake_minimum_required(VERSION 3.20)
project(TestProject LANGUAGES CXX CUDA)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CUDA_STANDARD 17)

# List your CUDA source files here
set(CUDA_SOURCES
    cuda_rasterizer/backward.h
    cuda_rasterizer/backward.cu
    cuda_rasterizer/forward.h
    cuda_rasterizer/forward.cu
    cuda_rasterizer/auxiliary.h
    cuda_rasterizer/rasterizer_impl.cu
    cuda_rasterizer/rasterizer_impl.h
    cuda_rasterizer/rasterizer.h)

# Create an executable that includes both your main.cpp and CUDA source files
add_executable(test_executable main.cpp ${CUDA_SOURCES})

# Specify CUDA architectures (adjust as needed)
set_target_properties(test_executable PROPERTIES CUDA_ARCHITECTURES "70;75;86")

# Include directories, if you have any
target_include_directories(test_executable PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/cuda_rasterizer)
target_include_directories(test_executable PRIVATE third_party/glm ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
target_include_directories(test_executable PRIVATE ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})

# Link with CUDA libraries if needed
target_link_libraries(test_executable PRIVATE cuda)

I create build, and call cmake .., but get an error:

CUDA_ARCHITECTURES is empty for target "cmTC_73c6d".

I assume I missed some additional flags, since I'm able to install the project via pip.

_

_

Pixel Rendering

Great work! I am following your released diff-gaussian-rasterization library, and would like to ask whether you will release the pixel rendering module which returns the color of each single ray. Thank you in advance!

Feature suggestion: backface-culling for the rasterizer

Hi there,
Large splats which represent the scene background often get in front of the camera when zooming-out of the area of interest.

The training procedure results in multiple properties per splat (xyz, opacity, rot_i, scale_i, opacity, f_dc_i, spherical_harmonics f_rest_i) as well as normal nx, ny, nz per splat.

It would be great if the rasterizer could take normals into account to do backface-culling - only adding the gausian-splat contribution to screen pixels during rasterization if the corresponding splat faces the camera - some boolean operator along these lines: dot(normal, ray) < 0.

verify the camera

Hi,

Is there anyway I can verify the camera loaded correctly with the rendering code? I use a preset camera and follow the colmap definition. It can project input point clouds to 2D without problem. However, the model is not converging. I'm wondering if I can directly verify the camera from the code.

About cacluating p_hom using p_orig and projmatrix

Thanks for your great job!
I have some question about cacluating p_hom using p_orig and projmatrix.

float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };

I don't understand why it's possible to use points directly under the world coordinate system. I think it should to use viewmatrix to transform p_orig under the camera coordinate system first, and then calculate the projected coordinates. Looking forward to your

How do I render data for more channels

I set shs to none before render, and colors_precomp concate became the dimension I needed, and I also changed #define NUM_CHANNELS 6
But the result of render is wrong, may I ask which step I may have done wrong?

Question regarding equation for calculating J

Hi everyone! Thanks for the wonderful work and for releasing it!

I had a problem reading the paper and looking at the code. On the paper "EWA volume splatting" on equation (29), how to compute the J would be like this:
image

However, on the code (cuda_rastizer/forward.cu), its implemented like this:
image
Which is different from the paper. I understood why you are multiplying by the focal lenght bu I did not understand the row of zeros. Also, since there is a row of zeros, this matrix is not invertible, which would mean that cov is not invertible, which would mean it is not a valid covariance matrix, which, I imagined would cause some problems.

Is there anything I missed? Why did the authors implement the code in this manner?

Thanks for the help!!

renderCUDA will encounter resource comptive?

in renderCUDA()
each pixel accums alpha-weight color of gaussians recepted by the current tile
all pixel in the same tile will share the collected gaussains sorted by depth.
BLOCK_SIZE (=256) gaussians will be collected first, sync, then use, and then next 256 gaussians will be collected.
But there no sync between the use and next collection.
I have a question:
Maybe some threads still use current 256 gaussian, where some other threads alread finish current use and assigned gaussians in next 256 gaussians to the shared memory collected_id. It is a bug if it will happen. Is my understand correct?

Merge `fast_culling` into `main` branch

Hello!

I see that the C++ renderer in GitLab uses the fast_culling branch of the repository. While the PyTorch training code uses the main branch.

I see that the fast_culling version gives a very reasonable speedup when rendering.

Is there a reason why this branch cannot be merged into main?

In frustum question

Hi, I love your work and am trying to understand every bit of it. I am currently trying to understand this bit that seems wrong to me:

__forceinline__ __device__ bool in_frustum(int idx,
	const float* orig_points,
	const float* viewmatrix,
	const float* projmatrix,
	bool prefiltered,
	float3& p_view)
{
	float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };

	// Bring points to screen space
	float4 p_hom = transformPoint4x4(p_orig, projmatrix);
	float p_w = 1.0f / (p_hom.w + 0.0000001f);
	float3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w };
	p_view = transformPoint4x3(p_orig, viewmatrix);

	if (p_view.z <= 0.2f)// || ((p_proj.x < -1.3 || p_proj.x > 1.3 || p_proj.y < -1.3 || p_proj.y > 1.3)))
	{
		if (prefiltered)
		{
			printf("Point is filtered although prefiltered is set. This shouldn't happen!");
			__trap();
		}
		return false;
	}
	return true;
}

You are first applying the projection transformation and then the viewing transformation. But in your paper you use the following equation: $\Sigma' = JW\Sigma W^TJ^T$ to convert to screen space, being $J$ the projection and $W$ the viewing transformation. The order is the opposite here, why? Since everything works I assume that there must be a reason this is also a valid representation, but can't understand why. Also, in the final version you discard gaussians that are too near to the camera, ignoring their x-y axis location, as denoted by the commented part. Any reason for that? I guess it must be empirical, what happened when you changed the criteria?

Diagonal scaling matrix get incorrectly initialized to a full-one matrix

Hi guys,

I wonder that ths S matrix defined in computeCov3D funcs for both fwd/bwd computations is now incorrectly initialized to full-one:

glm::mat3 S = glm::mat3(1.0f);

If I understand correctly, here we expect an identity matrix instead:

glm::mat3 S = glm::mat3(
    1.0f, 0.0f, 0.0f,
    0.0f, 1.0f, 0.0f,
    0.0f, 0.0f, 1.0f
);

Fortunately, such a full-one matrix with a scaled main diagonal is usually positively defined and symmetric. Thus, the reparameterized $R^TSSR$ is still a covariance matrix.

I have roughly checked that this issue is not so influential to the PSNR performance. If required and agreed, I'm more than glad to submit a hotfix PR for it.

Confusing loop variable naming in backward::renderCUDA

for (int i = 0; i < C; i++)
collected_colors[i * BLOCK_SIZE + block.thread_rank()] = colors[coll_id * C + i];

	for (int i = 0; i < rounds; i++, toDo -= BLOCK_SIZE)
	{
		...
			for (int i = 0; i < C; i++)  // <--
				collected_colors[i * BLOCK_SIZE + block.thread_rank()] = colors[coll_id * C + i];

i is being used for the inner and outer loop here. Is it intended exactly as it is written? I think it could be good to change i to ch like in the other loops in this outer loop to avoid confusion.

Batch Implementation

Hi,

Thanks for open-sourcing this awesome work! I wonder would it be possible to perform batch rendering assuming we have same number of gaussians for each frame? If not, what would be the required changes to be made?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.