The rgbd-to-mesh from pandinosaurus

Surface Mesh Reconstruction from RGBD Images

Final Project for Patrick Cozzi's CIS 565, Fall 2013

Click for video

NOTE:

This project requires a CUDA-capable graphics card, as well as Boost, OpenNI and an Xbox Kinect to run a live demonstration.

BACKGROUND:

Previous work has demonstrated the diverse capabilities of RGBD cameras, from generating highly accurate 3D surface models to reliable 3D pose estimation. However, many algorithms attempt to store the generated environment as a RGB 3D point cloud, which is not easily adaptable to dynamic environments, requires very large quantities of memory to store large environments, and provides no intuition to higher perception processes about distinct objects beyond a volumetric approximation. Other approaches have been able to store and merge the surface data more efficiently, but still regard the environement as a unified whole rather than discrete objects. By extracting meaningful geometry from the RGB-D in the form of triangle meshes instead, a large number of advantages can be realized.

High storage efficiency
Natural low level object segmentation
Easy to manipulate, modify, and render in real time
Efficient and easy to process intuition of geometry that higher cognitive functions can use for object recognition and manipulation tasks.
Straightforward tradeoff between simplicity and accuracy with mesh resolution

CODE TOUR

The overall image processing line is shown below. First, an RGB frame and a depth frame are pulled from the Kinect and pushed to the GPU for processing. A world-space point cloud is then generated from the RGBD data, and a neighborhood-based estimate of the point normals is then extracted for later processing. Finally, the point cloud is triangulated and the generated mesh is passed to OpenGL where a variety of rendering options are implemented.

The underlying architecture is very modular, and can be easily extended to handle input RGBD streams other than the Kinect (as demonstrated in the implementation of log streams). A generic RGBD frame format is used, allowing computation and visualization to be performed without regard to how the data was obtained.

A more detailed view of the program flow is shown below. Note that after the RGB and depth frames are synchronized and shipped to the GPU (the purple arrow), all computation and rendering is performed on the GPU, enhancing performance and allowing the CPU to be free for other tasks. The ComputeNormalsFast kernel supplants an earlier iteration, ComputeNormals, which was written for estimation quality at the cost of a significant performance penalty. The new implementation is much faster thanks to shared memory optimization (see performance section below).

Finally, the following is a more detailed view of the OpenGL rendering pipeline. The rendering pipeline is also written in a very modular manner, allowing both for rapid code modification to experiment with different visualazation techniques, as well as hooks (note the black diamonds) for keypresses to completely change the render output on-the-fly.

CONTROLS

Click and drag the mouse to look around in 3D views

Keypress	Function
w a s d q z	Move Camera in 3D views Forward/Left/Back/Right/Up/Down
W A S D Q Z	Move Camera slowly in 3D views
x	Reset camera view
r	Reload GLSL shaders
p	Restart playback at beginning of log (only for LogDevice input)
=	Increase playback speed (only for LogDevice input)

| Decrease playback speed (only for LogDevice input) F | Increase camera FOV f | Decrease camera FOV ESC | Exit h | Toggle normal hair display in 3D point cloud mode v | Toggle wireframe mode for 3D mesh M | Increase maximum triangle edge length by 1cm m | Increase maximum triangle edge length by 1mm N | Decrease maximum triangle edge length by 1cm n | Decrease maximum triangle edge length by 1mm b | Toggle fast/slow normal computation algorithm B | Toggle normal computation on/off 1 | Display depth overlaid on color input image 2 | Display depth only 3 | Display color only 4 | Display depth, color, and overlay on same screen 5 | Display point cloud buffer debug views 6 | Display 3D point cloud rendered from VBO 7 | Display 3D mesh reconstruction 8 | Display side by side comparison of color input image and 3D mesh reconstruction 9 | Display side by side comparison of 3D point cloud and 3D mesh reconstruction 0 | Display both input images, 3D point, and 3D mesh

PERFORMANCE EVALUATION

Our point normals kernel was implemented as follows. A window radius is first specified as an algorithm parameter. For each point, we loop through its neighboring points in screen space in the square window specified by the radius, and pair it with a screen-space orthogonal point at the same radius. If both points are within a specified radius from the center point in world space, we take the cross product to compute the normal, which is then flipped if pointing away from the camera. If sufficiently many valid normals are found, we average them to produce the final normal estimate, otherwise we discard the point.

To improve the runtime of the point normals kernel, we reimplemented the algorithm using shared memory. In the shared memory implementation, all points in given thread block are first loaded into shared memory, along with the points lying within the specified neighborhood radius of the edges of the thread block, and the distance and cross product calculations are then performed using shared memory access. The results of the shared memory optimization on kernel runtime are shown for a range of window radii using a thread block size of 8x8.

As demonstrated, the shared memory optimization reduced the kernel runtime by approximately a factor of 2. The impact on the overall FPS was less dramatic, though still pronounced, due to the time spent in the rendering pipeline.

All testing for this project was conducted on the following hardware: