Giter Site home page Giter Site logo

cupcl's People

Contributors

haoyu-nv avatar leif-flnv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cupcl's Issues

------------checking CUDA VoxelGrid---------------- Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700 Aborted

When I use CUDA VoxelGrid for filtering, and use the filtered data to prepare for Cluster. when doing the cudaExtractCluster object construction it reports "------------checking CUDA VoxelGrid----------------
Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700
Aborted
" error. The CUDA VoxelGrid function works fine without cudaExtractCluster object creation, this is my first time programming with this library, please tell me the reason for this error. And how can I program with a mix of library functions. Thanks, I appreciate your help. Here is one of the codes that I have tried several times.

void testCUDA(pcl::PointCloud<pcl::PointXYZ>::Ptr cloudSrc,
              pcl::PointCloud<pcl::PointXYZ>::Ptr cloudDst) {
  std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();
  std::chrono::steady_clock::time_point t2 = std::chrono::steady_clock::now();
  std::chrono::duration<double, std::ratio<1, 1000>> time_span =
      std::chrono::duration_cast<
          std::chrono::duration<double, std::ratio<1, 1000>>>(t2 - t1);
  cudaStream_t stream = NULL;
  cudaStreamCreate(&stream);
  unsigned int nCount = cloudSrc->width * cloudSrc->height;
  float *inputData = (float *)cloudSrc->points.data();
  cloudDst->width = nCount;
  cloudDst->height = 1;
  cloudDst->resize(cloudDst->width * cloudDst->height);
  float *outputData = (float *)cloudDst->points.data();
  memset(outputData, 0, sizeof(float) * 4 * nCount);
  std::cout << "\n------------checking CUDA ---------------- " << std::endl;
  std::cout << "CUDA Loaded " << cloudSrc->width * cloudSrc->height
            << " data points from PCD file with the following fields: "
            << pcl::getFieldsList(*cloudSrc) << std::endl;
  float *input = NULL;
  cudaMallocManaged(&input, sizeof(float) * 4 * nCount, cudaMemAttachHost);
  cudaStreamAttachMemAsync(stream, input);
  cudaMemcpyAsync(input, inputData, sizeof(float) * 4 * nCount,
                  cudaMemcpyHostToDevice, stream);
  cudaStreamSynchronize(stream);
  float *output = NULL;
  cudaMallocManaged(&output, sizeof(float) * 4 * nCount, cudaMemAttachHost);
  cudaStreamAttachMemAsync(stream, output);
  cudaStreamSynchronize(stream);
  cudaFilter filterTest;
  FilterParam_t setP;
  FilterType_t type;
  unsigned int countLeft = 0;
  std::cout << "\n------------checking CUDA VoxelGrid---------------- "
            << std::endl;
  type = VOXELGRID;
  setP.type = type;
  setP.voxelX = 0.02;
  setP.voxelY = 0.02;
  setP.voxelZ = 0.02;
  filterTest.set(setP);
  int status = 0;
  cudaDeviceSynchronize();
  t1 = std::chrono::steady_clock::now();
  status = filterTest.filter(output, &countLeft, input, nCount);
  cudaDeviceSynchronize();
  t2 = std::chrono::steady_clock::now();
  if (status != 0)
    return;
  time_span = std::chrono::duration_cast<
      std::chrono::duration<double, std::ratio<1, 1000>>>(t2 - t1);
  std::cout << "CUDA VoxelGrid by Time: " << time_span.count() << " ms."
            << std::endl;
  std::cout << "CUDA VoxelGrid before filtering: " << nCount << std::endl;
  std::cout << "CUDA VoxelGrid after filtering: " << countLeft << std::endl;
  pcl::PointCloud<pcl::PointXYZ>::Ptr cloudNew(
      new pcl::PointCloud<pcl::PointXYZ>);
  cloudNew->width = countLeft;
  cloudNew->height = 1;
  cloudNew->points.resize(cloudNew->width * cloudNew->height);
  int check = 0;
  for (std::size_t i = 0; i < cloudNew->size(); ++i) {
    cloudNew->points[i].x = output[i * 4 + 0];
    cloudNew->points[i].y = output[i * 4 + 1];
    cloudNew->points[i].z = output[i * 4 + 2];
  }
  pcl::io::savePCDFileASCII("after-cuda-VoxelGrid.pcd", *cloudNew);
  {
    cudaStream_t stream2;
    cudaStreamCreate(&stream2);
    float *input2Data = (float *)cloudNew->points.data();
    float *input2 = NULL;
    cudaMallocManaged(&input2, sizeof(float) * 4 * nCount, cudaMemAttachHost);
    cudaStreamAttachMemAsync(stream2, input2);
    cudaMemcpyAsync(input2, input2Data, sizeof(float) * 4 * nCount,
                    cudaMemcpyHostToDevice, stream2);
    cudaStreamSynchronize(stream2);
    float *output2 = NULL;
    cudaMallocManaged(&output2, sizeof(float) * 4 * nCount, cudaMemAttachHost);
    cudaStreamAttachMemAsync(stream2, output2);
    cudaStreamSynchronize(stream2);
    cudaExtractCluster cudaec;
    extractClusterParam_t ecp;
    ecp.minClusterSize = 100;
    ecp.maxClusterSize = 2500000;
    ecp.voxelX = 0.05;
    ecp.voxelY = 0.05;
    ecp.voxelZ = 0.05;
    ecp.countThreshold = 20;
    cudaec.set(ecp);
    unsigned int *indexEC = NULL;
    cudaMallocManaged(&indexEC, sizeof(float) * 4 * nCount, cudaMemAttachHost);
    cudaStreamAttachMemAsync(stream2, indexEC);
    cudaMemsetAsync(indexEC, 0, sizeof(float) * 4 * nCount, stream2);
    cudaStreamSynchronize(stream2);
  }
  cudaFree(input);
  cudaFree(output);
  cudaStreamDestroy(stream);
}

What point types are supported?

According to your demo codes, each point data takes 4 floats. Does it means the only supported point types are PointXYZ, PointXYZRGB, PointXYZI ? Could you add support for PointXYZINormal in the future?
Thanks

How to display the pointcloud?

when I run the "./demo [*.pcd]" in the directory of cuda-icp, just got the result:
GPU has cuda devices: 1
----device id: 0 info----
GPU : Xavier
Capbility: 7.2
Global memory: 31918MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Loaded 7000 data points for P with the following fields: x y z
Loaded 7000 data points for Q with the following fields: x y z
iter.Maxiterate 0
iter.threshold 1e-12
iter.acceptrate 1

Target rigid transformation : cloud_in -> cloud_icp
Rotation matrix :
| 0.923880 -0.382683 0.000000 |
R = | 0.382683 0.923880 0.000000 |
| 0.000000 0.000000 1.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.200000 >

------------checking CUDA ICP(GPU)----------------
CUDA ICP by Time: 0.797888 ms.
CUDA ICP fitness_score: 0.777453
matrix_icp calculated Matrix by Class ICP
Rotation matrix :
| 1.000000 0.000000 -0.000000 |
R = | -0.000000 1.000000 0.000000 |
| -0.000000 0.000000 1.000000 |
Translation vector :
t = < -0.000000, 0.000000, -0.000000 >

------------checking PCL ICP(CPU)----------------
PCL icp.align Time: 38.2758 ms.
has converged: 1 score: 0.651369
CUDA ICP fitness_score: 0.651369
transformation_matrix:
0.999905 0.00279406 0.0134922 0.0161865
-0.00265722 0.999945 -0.010151 0.00527596
-0.0135198 0.0101141 0.999858 0.0133578
0 0 0 1

------------checking PCL GICP(CPU)----------------
PCL Gicp.align Time: 144.663 ms.
has converged: 1 score: 0.541552
transformation_matrix:
0.99874 0.00468762 0.0499603 -0.0427716
-0.00344507 0.999683 -0.0249281 0.0265501
-0.0500613 0.0247246 0.99844 0.148036
0 0 0 1

so,I really want to know how to display the result,just like https://developer.nvidia.com/zh-cn/blog/cuda-pcl-1-0-jetson/

why cuNDT cannot work with x86 version ubuntu20.04

~/cuPCL/cuNDT$ ./demo

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA A800 80GB PCIe
Capbility: 8.0
Global memory: 81085MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Loaded 7000 data points for P with the following fields: x y z
Loaded 7000 data points for Q with the following fields: x y z
Target rigid transformation : cloud_P -> cloud_Q
Rotation matrix :
| 0.923880 -0.382683 0.000000 |
R = | 0.382683 0.923880 0.000000 |
| 0.000000 0.000000 1.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.200000 >

------------checking PCL NDT(CPU)----------------
PCL align Time: 27.1937 ms.
Normal Distributions Transform has converged: 1 score: 0.648334
Rotation matrix :
| 0.999894 0.004857 0.013688 |
R = | -0.004680 0.999905 -0.012931 |
| -0.013750 0.012865 0.999823 |
Translation vector :
t = < 0.015418, 0.056840, 0.078443 >

------------checking CUDA NDT(GPU)----------------
CUDA NDT by Time: 0.777725 ms.
CUDA NDT fitness_score: 0.349491
Rotation matrix :
| 0.000000 0.000000 0.000000 |
R = | 0.000000 0.000000 0.000000 |
| 0.000000 0.000000 0.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.000000 >

cuda failure

------------checking CUDA PassThrough ----------------
Cuda failure: no kernel image is available for execution on the device at line 102 in file cudaFilter.cpp error status: 209
Aborted (core dumped)

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X2
Capbility: 6.2
Global memory: 7851MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

------------checking CUDA ----------------
CUDA Loaded 119978 data points from PCD file with the following fields: x y z

------------checking CUDA PassThrough ----------------
Cuda failure: no kernel image is available for execution on the device at line 102 in file cudaFilter.cpp error status: 209
Aborted (core dumped)

Source code missing?

this looks interesting but there seems to be no source included in the repo, only so files.

Any plans on releasing them?

may you add some comment in the header?

It will be clearer to have more comment (as in cuda-pcl).

while in cuda-octree it is harder.
I found the distance result for approxNearestSearch is 1e9 times the real sqr_distance.
but for radiusSearch it is directly the result.

Segmentation Fault depending on the point cloud in `cuCluster`

Hello,

I receive segmentation fault only for some point clouds and with small voxel size parameters to the cuCluster. It does not seem related to the point cloud size. If I make the voxel sizes larger, however, the segmentation fault disappears. I thought may be the number of voxels necessary for the volume is larger than INT_MAX, and it causes integer overflow but my calculation shows the required number of voxels is not even near the limit.

I could provide you with some point cloud data that produce the error, if you wish to reproduce.

Because the library is not open source, I cannot really debug. It looks like the cudaExtractClusterImpl::extract function tries to write into the output array, however, the position it tries to write is out of bounds.

// call stack inside the library
libcudacluster.so!cudaExtractClusterImpl::extract(float*, int, float*, unsigned int*)
libcudacluster.so!cudaExtractCluster::extract(float*, int, float*, unsigned int*)
// The limits of the  point cloud
// is [0m,0m,0m] to ~[3m, 2m, 6m] meters.
// I see segmentation faults for voxel size 0.05.
// for 0.1, I do not.
ecp.voxelX = 0.05;
ecp.voxelY = 0.05;
ecp.voxelZ = 0.05;

Voxeldownsampling and Cluster Filter - “Warp out of Range Address”

Problem Explaination

Hello there ! I am trying to use the cuPCL repository: https://github.com/NVIDIA-AI-IOT/cuPCL such to preprocess the PointCloud by a Voxel Downsampling Filter prior to using a the defined Clusterer. The program runs smoothly without the Voxel Downsampling , but the problem comes when only making an Instance of the filter as shown below:

cudaExtractCluster cudaec(stream);
cudaFilter filterTest(stream);

So just using one works, but both produces a: Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700

After some trials with Debugging with CUDA-GDB and CUDA MEMCHECK I came to the following results but do not quite sure if they can be solved as the classes are implemented in a precompiled .so files:

  • Both classes invoke the cudaFillVoxelGirdKernel, and the error occurs on the Kernel Launch of the first function call that invokes the Kernel Launch :
Thread 1 "collision_avoid" received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 0, grid 6, block (3,0,0), thread (160,0,0), device 0, sm 6, warp 4, lane 0]
0x0000555555d50eb0 in cudaFillVoxelGirdKernel(float4*, int4*, int4*, float4*, unsigned int, float, float, float) ()
  • The Thread is trying to write 4 bytes into some Global Memory address (CUDA MEMCHECK):
Invalid __global__ write of size 4
  • And from debugging:
Illegal access to address (@global)0x8007b0800c60 detected
(cuda-gdb) print *0x8007b0800c60
Error: Failed to read local memory at address 0x8007b0800c60 on device 0 sm 0 warp 9 lane 0, error=CUDBG_ERROR_INVALID_MEMORY_ACCESS(0x8).
  • Moreover the following CUDA API Error is Returned:
warning: Cuda API error detected: cuGetProcAddress returned (0x1f4)

This indicates that a named symbol was not found. Examples of symbols are global/constant variable names, driver function names, texture names, and surface names.

What I do not understand is that from the Thread's scope the address is treated as a local address , but actually it seems to be a global one. And whether if the CUDA API Error can be a lead of some sort.

Note that for memory transfer cudaMemMallocManaged has been used (UVM), and even using explicit memory transfers did not solve the issue.

Other efforts to solve the issue was to limit all CUDA computations to match the Device limits as follows:

  size_t limit = 0;
  cudaDeviceGetLimit(&limit, cudaLimitStackSize);
  std::cout << "Stack limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitStackSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitPrintfFifoSize);
  std::cout << "cudaLimitPrintfFifoSize limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitPrintfFifoSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitMallocHeapSize);
  std::cout << "cudaLimitMallocHeapSize limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitMallocHeapSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitDevRuntimeSyncDepth);
  std::cout << "cudaLimitDevRuntimeSyncDepth limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitDevRuntimeSyncDepth, limit);

  cudaDeviceGetLimit(&limit, cudaLimitDevRuntimePendingLaunchCount);
  std::cout << "cudaLimitDevRuntimePendingLaunchCount limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitDevRuntimePendingLaunchCount, limit);

  cudaDeviceGetLimit(&limit, cudaLimitMaxL2FetchGranularity);
  std::cout << "cudaLimitMaxL2FetchGranularity limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitMaxL2FetchGranularity, limit);

But not changes have been yielded.

Device Info

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   56C    P8    18W /  N/A |    123MiB /  7982MiB |     32%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
  Dev PCI Bus/Dev ID  Name Description                                   SM Type  
*   0  01:00.0        NVIDIA GeForce RTX 2080 Super with Max-Q Design     TU104-A   
SMs    Warps/SM Lanes/Warp Max Regs/Lane    Active SMs Mask 
sm_75  48       32         32           256 0x00000000000000000000ffffffffffff

Using Ros Noetic and Ubuntu 20.04

why cuNDT cannot work with x86 version ubuntu20.04

:~/cuPCL/cuNDT$ ./demo

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA A800 80GB PCIe
Capbility: 8.0
Global memory: 81085MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Loaded 7000 data points for P with the following fields: x y z
Loaded 7000 data points for Q with the following fields: x y z
Target rigid transformation : cloud_P -> cloud_Q
Rotation matrix :
| 0.923880 -0.382683 0.000000 |
R = | 0.382683 0.923880 0.000000 |
| 0.000000 0.000000 1.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.200000 >

------------checking PCL NDT(CPU)----------------
PCL align Time: 27.1937 ms.
Normal Distributions Transform has converged: 1 score: 0.648334
Rotation matrix :
| 0.999894 0.004857 0.013688 |
R = | -0.004680 0.999905 -0.012931 |
| -0.013750 0.012865 0.999823 |
Translation vector :
t = < 0.015418, 0.056840, 0.078443 >

------------checking CUDA NDT(GPU)----------------
CUDA NDT by Time: 0.777725 ms.
CUDA NDT fitness_score: 0.349491
Rotation matrix :
| 0.000000 0.000000 0.000000 |
R = | 0.000000 0.000000 0.000000 |
| 0.000000 0.000000 0.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.000000 >

Cuda PCL with ROS

Is it possible to use the cuda PCL with ros? And if so, how?
Thanks in Advance.

Any plan to add thrust support?

Hi,

Thrust gives us c++ STL style cuda programming, which made cuda programming easier and safer.

Will thrust support be added to this project?

Thanks!

When I run "./demo [*.pcd]" in the directory of cuda-segmentation and cuda-filter

GPU has cuda devices: 1
----device id: 0 info----
GPU : Xavier
Capbility: 7.2
Global memory: 31918MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

[pcl::PCDReader::readHeader] Could not find file '[.pcd]'.
Error:can not open the file: [
.pcd]

I cannot find where is the problem,can u help me?

We found the bug when using this function: "ndtTest.ndt((float *)PUVM, nP, (float *)QUVM, nQ, guess, cudaMatrix, stream);"

localization_node: /usr/include/eigen3/Eigen/src/SVD/SVDBase.h:85: const MatrixUType& Eigen::SVDBase::matrixU() const [with Derived = Eigen::JacobiSVD<Eigen::Matrix<float, 3, 3>, 2>; Eigen::SVDBase::MatrixUType = Eigen::Matrix<float, 3, 3>; typename Eigen::internal::traits::MatrixType::Scalar = float]: Assertion `m_isInitialized && "SVD is not initialized."' failed.

cuda version : 11.4
pcl version : 1.10
system : ubuntu 20.04
eigen version : 3.3.7
platform: Orin

cuFilter apply constraints on all 3 axes at one time.

how to apply filter constraints on all x,y,z at same time? the following code only applies filter to the most recent axis..

FilterParam_t setPx, setPy, setPz; // filter parameters for each axis

FilterType_t type = PASSTHROUGH; // only passthrough filter implemented in cuCL library for now

// this filter contraints is being applied for only one axis....
setPx.type = type;
setPx.dim = 0; // 0   // it will be 0,1,2 for x,y,z axes  
setPx.upFilterLimits = 1.5; 
setPx.downFilterLimits = -1.5; 
setPx.limitsNegative = false;   
filterTest.set(setPx);


setPy.type = type;
setPy.dim = 1; // 0   // it will be 0,1,2 for x,y,z axes  
setPy.upFilterLimits = 1.5; 
setPy.downFilterLimits = -1.5; 
setPy.limitsNegative = false;
filterTest.set(setPy);


setPz.type = type;
setPz.dim = 2; // 0   // it will be 0,1,2 for x,y,z axes  
setPz.upFilterLimits = 2.0; 
setPz.downFilterLimits = 0.0; 
setPz.limitsNegative = false;
filterTest.set(setPz);

filterTest.filter(output, &countLeft, input, nCount);

Cuda filter demo, cuda-pcl is worse than pcl if MemCpyTime(4.6285ms) is considered.

Hi everyone,

I added the a timespan calculation to measure the time comsuming for input and output memory allocation.

This kind of memory allocation is needed before every cuda functions calling.

The code is below.

  t1 = std::chrono::steady_clock::now();
  cudaMallocManaged(&input, sizeof(float) * 4 * nCount, cudaMemAttachHost);
  cudaStreamAttachMemAsync (stream, input );
  cudaMemcpyAsync(input, inputData, sizeof(float) * 4 * nCount, cudaMemcpyHostToDevice, stream);
  cudaStreamSynchronize(stream);

  float *output = NULL;
  cudaMallocManaged(&output, sizeof(float) * 4 * nCount, cudaMemAttachHost);
  cudaStreamAttachMemAsync (stream, output );
  cudaStreamSynchronize(stream);
  t2 = std::chrono::steady_clock::now();
  auto time_span1 = std::chrono::duration_cast<std::chrono::duration<double, std::ratio<1, 1000>>>(t2 - t1);

And here is my test result. MemCpy by Time 4.6285ms

image

So according to the real FPS of passthrough filter

cuda-pcl(4.6285+0.456927=5.085427ms) is not better than pcl(4.25133ms).

So what's the best practice of programming with cuda-pcl?

Thanks.

initialization error

On all application the same/similar error appears.
Configuration:

  • Jetson tx2
  • ubuntu 18.04 (in docker)
  • Jetpack 4.6
  • pcl 1.12.1
root@linux:/cuda-pcl/cuda-icp#_` ./cuda_segmentation test_Q.pcd 

GPU has cuda devices: 1
----device id: 0 info----
  GPU : NVIDIA Tegra X2 
  Capbility: 6.2
  Global memory: 3833MB
  Const memory: 64KB
  SM in a block: 48KB
  warp size: 32
  threads in a block: 1024
  block dim: (1024,1024,64)
  grid dim: (2147483647,65535,65535)

Loaded 7000 data points for P with the following fields: x y z
Loaded 7000 data points for Q with the following fields: x y z
 iter.Maxiterate 0
 iter.threshold 1e-12
 iter.acceptrate 1

Target rigid transformation : cloud_in -> cloud_icp
Rotation matrix :
    | 0.923880 -0.382683 0.000000 | 
R = | 0.382683 0.923880 0.000000 | 
    | 0.000000 0.000000 1.000000 | 
Translation vector :
t = < 0.000000, 0.000000, 0.200000 >

------------checking CUDA ICP(GPU)---------------- 
Cuda failure: initialization error at line 189 in file cudaICP.cpp error status: 3
Aborted (core `_dumped)

Running on JP 4.4.0

Just curious if there is something fundamentally different that will prevent JP 4.4 running or that just only it has been tested on JP 4.4.1 per the README.md.

Could you help us recompile the library for Jetson TX2?

Hi! I am trying to use CUDA-PCL on Jetson TX2 (with Jetpack 4.5, CUDA-10.2 and PCL 1.8.1), but I have encountered a CUDA failure problem. It would be great if you can help me out. Here is the output when I run ./demo in cuda-pcl/cuda-segmentation:

nvidia@nvidia-tx2:~/Downloads/cuda-pcl/cuda-segmentation$ ./demo sample.pcd

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X2
Capbility: 6.2
Global memory: 7850MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Cuda failure: no kernel image is available for execution on the device at line 310 in file cudaSegmentation.cpp error status: 209
Aborted (core dumped)

I guess the lib*.so binaries are not compiled with sm=62 so it cannot be executed on Jetson TX2. I will be appreciated if you could fix it for us.

Segmentation fault (program cc1plus) with statistical_outlier_removal header

Hi all,
I am trying to make cuda-filter but g++ crashes. So, I figured out that statistical_outlier_removal.h was causing the error so I commented out. Do you know why this is happening? I had a similar issue within another program where pcl/common/transforms.h caused the same problem. I had to fix it using an alternative library from ROS tf2.
I have Jetson Nano Developer Kit 4GB memory with 2 GB of swap size (I have already tried increasing the swap size).

Cuda failure: no kernel image is available

Hi, I try to run your demo ,but something wrong with it.Print info as below:

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X2
Capbility: 6.2
Global memory: 7850MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

------------checking CUDA ----------------
CUDA Loaded 119978 data points from PCD file with the following fields: x y z

------------checking CUDA PassThrough ----------------
Cuda failure: no kernel image is available for execution on the device at line 102 in file cudaFilter.cpp error status: 209
Aborted (core dumped)

what should I do? please help me.

Will this work in a 64-bit Ubuntu 18.04 computer?

I'm getting the following error when I try to make the cude filter code:

(base) ➜  cuda-filter make
USE Default CUDA DIR: /usr/local/cuda
TARGET_ARCH: x86_64
CUDA_VERSION: 10020
SMS: 30 35 50 53 60 61 70 72 
g++  -I/usr/local/cuda/include -I/include -I/usr/local/include -I/usr/include/eigen3/ -I/usr/include/pcl-1.8/ -I/usr/include/vtk-6.3/ -D_REENTRANT -std=c++11 -O2 -fPIC -o obj/main.o -c main.cpp
g++ -D_REENTRANT -std=c++11  -O2 -o demo obj/main.o  -L/usr/lib -L/usr/local/lib -L/usr/local/cuda/lib64 -lcudart_static -lrt -ldl -lpthread -lcudart -L/lib64 -lcudnn -lpthread -L/usr/lib/aarch64-linux-gnu/ -lboost_system -lpcl_common -lpcl_io -lpcl_recognition -lpcl_features -lpcl_sample_consensus -lpcl_octree -lpcl_search -lpcl_filters -lpcl_kdtree -lpcl_segmentation -lpcl_visualization ./lib/libcudafilter.so
./lib/libcudafilter.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
Makefile:173: recipe for target 'demo' failed
make: *** [demo] Error 1

I wonder if it has anything to do with my computer architecture or am I missing something I need to do?

Thanks!

Cuda-segmentation: NvMapReserveOp failed on Jetson Xavier 32gb

Hello.

Thank you for efforts in providing CUDA-optimized PCL algorithms.

With the Cuda-segmentation demo, I get an CUDA error on the cudaExtractCluster part, while the cudaSegmentation part runs fine.
Other demos such as cuda-filter runs fine.

NvMapReserveOp 0x80000003 failed [22]
NvMapReserveOp 0x80000001 failed [22]
NvMapReserveOp 0x80000000 failed [22]

Do you know why this happens?
And can you help me use the Cuda-segmentation?

Thank you!

Jetson AGX Xavier Developer Kit
PCL 1.8
CUDA 10.2

cluster size always zero using cuda-cluster

Using cuda-cluster for some down-sampled robosense pointcloud, the cudaExtractCluster always return zero cluster, while normal pcl::EuclideanClusterExtraction works fine.

env:
jetson xavier nx
jetpack 4.6
ubuntu18.04
cuda10.2

sample pcd file: sample_pcd.zip

run the cuda-cluster demo with modified 'extractClusterParam' in testCUDA

  ecp.minClusterSize = 5;
  ecp.maxClusterSize = 2500000;
  ecp.voxelX = 0.2;
  ecp.voxelY = 0.2;
  ecp.voxelZ = 0.2;
  ecp.countThreshold = 20;

tried different ClusterParams several times, none succeed(:

Linker Error

I keep getting linker errors for the libcudasegmentation file.
With the error "/usr/bin/ld: ./lib/libcudasegmentation.so: error adding symbols: file in wrong format"
What could be causing this?

Unwanted output LINE:178 of cuCluster

Hello, thanks for the repository.

Whenever I execute the extract method of cuCluster, it prints the following:
LINE:178 18696
The number after 178 changes across calls. This happens with the demo code and also other places. What is the meaning of this output and how can I get rid of it? The output of the demo code in my Jetson AGX Xavier is below. I checked out jp5.x branch, which is compatible with the jetpack version I have.

GPU has cuda devices: 1
----device id: 0 info----
GPU : Xavier
Capbility: 7.2
Global memory: 14907MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

-------------- test CUDA lib -----------
-------------- cudaExtractCluster -----------
LINE:178 18696
CUDA extract by Time: 14.91 ms.
PointCloud representing the Cluster: 162152 data points.
PointCloud representing the Cluster: 7098 data points.
PointCloud representing the Cluster: 1263 data points.
PointCloud representing the Cluster: 257 data points.

-------------- test PCL lib -----------
PCL(CPU) cluster kd-tree by Time: 92.0657 ms.
PCL(CPU) cluster extracted by Time: 5042.35 ms.
PointCloud cluster_indices: 4.
PointCloud representing the Cluster: 166789 data points.
PointCloud representing the Cluster: 7410 data points.
PointCloud representing the Cluster: 1318 data points.
PointCloud representing the Cluster: 427 data points.

cuda-pcl source

if possible, I hope to get the cuda-pcl source code for study,with best wish.

No description of how cuda-cluster operates

I am trying to use clustering, but due to no documentation or any description it is nearly impossible to tune the parameters. There should be at least some note in documentation so user can know what each parameter does.

对动态点云进行地面分割,会出现地面点闪烁问题

我修改了cuPCL的segment代码,加入了ROS接口,订阅动态播放点云包,进行地面分割,有时可以很好的去除地面点,有时地面点无法去除。我的雷达平台相对地面静止,但是地面点云会以一定频率闪烁。想知道哪里出问题了

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.