nvidia-ai-iot / cupcl Goto Github PK

View Code? Open in Web Editor NEW

519.0 519.0 88.0 7.36 MB

A project demonstrating how to use the libs of cuPCL.

License: MIT License

Makefile 29.20% C++ 70.80%

cupcl's People

Contributors

Stargazers

Watchers

Forkers

stanxt xielinjiang jerrycxj riowong huangxiongd zeta1999 martinnievas rob-opsi vlv-squid zhenshenglee maesfahani yonasteodros robbie-juelich hectorcastilloslink-tech drzhoukarl huangguangkan snapbuy russ76 hellicesaouli poet-libai borongyuan hellozwh mfkiwl zhangw66 mlbo cwwubq apvgithub songjunqianli bopeng1234 jxw-tmp 2211715847 hughedeng ascendntnu fannyduan roboterl jordanresearch arslan-z whuzs isabella232 stano45 zhuguangyao-git zenotech carlos-lee123 yanghaisong571816 modelsplaid jjho1314 classicvalues luanjinlu dahuang8 liukang1811 sukoncon nowgood tops666 liaomuqin lihu577 wizyke jiayi113 enzoguido1 bccw2021 chengwei920412 deepbehavier avi9700 alaricyzb dys564843131 brqiankun marmik18 z130110 cc8848 rok-pahic tshiamor tracklidev khshmt enginbozkurt awaker1 ismetatabay zhouleidcc hambin01 allenzhiqiang 6nevergiveup hijasonzou zuernharvesting wff464710641 phoenix8215 yangdaiyu123 shanhedian2017

cupcl's Issues

No need to memset output memory space in pcl testing block.

https://github.com/NVIDIA-AI-IOT/cuda-pcl/blob/3ca9dba5351f7f8eb35f1b73fa1022c1296f5be5/cuda-filter/main.cpp#L207

and

https://github.com/NVIDIA-AI-IOT/cuda-pcl/blob/3ca9dba5351f7f8eb35f1b73fa1022c1296f5be5/cuda-filter/main.cpp#L232

Thanks.

library implementation

.so library in cuda-pcl is wrapper of pcl cuda implementation?
https://github.com/PointCloudLibrary/pcl/blob/master/cuda/filters/include/pcl/cuda/filters/passthrough.h

------------checking CUDA VoxelGrid---------------- Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700 Aborted

When I use CUDA VoxelGrid for filtering, and use the filtered data to prepare for Cluster. when doing the cudaExtractCluster object construction it reports "------------checking CUDA VoxelGrid----------------
Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700
Aborted
" error. The CUDA VoxelGrid function works fine without cudaExtractCluster object creation, this is my first time programming with this library, please tell me the reason for this error. And how can I program with a mix of library functions. Thanks, I appreciate your help. Here is one of the codes that I have tried several times.

void testCUDA(pcl::PointCloud<pcl::PointXYZ>::Ptr cloudSrc,
              pcl::PointCloud<pcl::PointXYZ>::Ptr cloudDst) {
  std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();
  std::chrono::steady_clock::time_point t2 = std::chrono::steady_clock::now();
  std::chrono::duration<double, std::ratio<1, 1000>> time_span =
      std::chrono::duration_cast<
          std::chrono::duration<double, std::ratio<1, 1000>>>(t2 - t1);
  cudaStream_t stream = NULL;
  cudaStreamCreate(&stream);
  unsigned int nCount = cloudSrc->width * cloudSrc->height;
  float *inputData = (float *)cloudSrc->points.data();
  cloudDst->width = nCount;
  cloudDst->height = 1;
  cloudDst->resize(cloudDst->width * cloudDst->height);
  float *outputData = (float *)cloudDst->points.data();
  memset(outputData, 0, sizeof(float) * 4 * nCount);
  std::cout << "\n------------checking CUDA ---------------- " << std::endl;
  std::cout << "CUDA Loaded " << cloudSrc->width * cloudSrc->height
            << " data points from PCD file with the following fields: "
            << pcl::getFieldsList(*cloudSrc) << std::endl;
  float *input = NULL;
  cudaMallocManaged(&input, sizeof(float) * 4 * nCount, cudaMemAttachHost);
  cudaStreamAttachMemAsync(stream, input);
  cudaMemcpyAsync(input, inputData, sizeof(float) * 4 * nCount,
                  cudaMemcpyHostToDevice, stream);
  cudaStreamSynchronize(stream);
  float *output = NULL;
  cudaMallocManaged(&output, sizeof(float) * 4 * nCount, cudaMemAttachHost);
  cudaStreamAttachMemAsync(stream, output);
  cudaStreamSynchronize(stream);
  cudaFilter filterTest;
  FilterParam_t setP;
  FilterType_t type;
  unsigned int countLeft = 0;
  std::cout << "\n------------checking CUDA VoxelGrid---------------- "
            << std::endl;
  type = VOXELGRID;
  setP.type = type;
  setP.voxelX = 0.02;
  setP.voxelY = 0.02;
  setP.voxelZ = 0.02;
  filterTest.set(setP);
  int status = 0;
  cudaDeviceSynchronize();
  t1 = std::chrono::steady_clock::now();
  status = filterTest.filter(output, &countLeft, input, nCount);
  cudaDeviceSynchronize();
  t2 = std::chrono::steady_clock::now();
  if (status != 0)
    return;
  time_span = std::chrono::duration_cast<
      std::chrono::duration<double, std::ratio<1, 1000>>>(t2 - t1);
  std::cout << "CUDA VoxelGrid by Time: " << time_span.count() << " ms."
            << std::endl;
  std::cout << "CUDA VoxelGrid before filtering: " << nCount << std::endl;
  std::cout << "CUDA VoxelGrid after filtering: " << countLeft << std::endl;
  pcl::PointCloud<pcl::PointXYZ>::Ptr cloudNew(
      new pcl::PointCloud<pcl::PointXYZ>);
  cloudNew->width = countLeft;
  cloudNew->height = 1;
  cloudNew->points.resize(cloudNew->width * cloudNew->height);
  int check = 0;
  for (std::size_t i = 0; i < cloudNew->size(); ++i) {
    cloudNew->points[i].x = output[i * 4 + 0];
    cloudNew->points[i].y = output[i * 4 + 1];
    cloudNew->points[i].z = output[i * 4 + 2];
  }
  pcl::io::savePCDFileASCII("after-cuda-VoxelGrid.pcd", *cloudNew);
  {
    cudaStream_t stream2;
    cudaStreamCreate(&stream2);
    float *input2Data = (float *)cloudNew->points.data();
    float *input2 = NULL;
    cudaMallocManaged(&input2, sizeof(float) * 4 * nCount, cudaMemAttachHost);
    cudaStreamAttachMemAsync(stream2, input2);
    cudaMemcpyAsync(input2, input2Data, sizeof(float) * 4 * nCount,
                    cudaMemcpyHostToDevice, stream2);
    cudaStreamSynchronize(stream2);
    float *output2 = NULL;
    cudaMallocManaged(&output2, sizeof(float) * 4 * nCount, cudaMemAttachHost);
    cudaStreamAttachMemAsync(stream2, output2);
    cudaStreamSynchronize(stream2);
    cudaExtractCluster cudaec;
    extractClusterParam_t ecp;
    ecp.minClusterSize = 100;
    ecp.maxClusterSize = 2500000;
    ecp.voxelX = 0.05;
    ecp.voxelY = 0.05;
    ecp.voxelZ = 0.05;
    ecp.countThreshold = 20;
    cudaec.set(ecp);
    unsigned int *indexEC = NULL;
    cudaMallocManaged(&indexEC, sizeof(float) * 4 * nCount, cudaMemAttachHost);
    cudaStreamAttachMemAsync(stream2, indexEC);
    cudaMemsetAsync(indexEC, 0, sizeof(float) * 4 * nCount, stream2);
    cudaStreamSynchronize(stream2);
  }
  cudaFree(input);
  cudaFree(output);
  cudaStreamDestroy(stream);
}

What point types are supported?

According to your demo codes, each point data takes 4 floats. Does it means the only supported point types are PointXYZ, PointXYZRGB, PointXYZI ? Could you add support for PointXYZINormal in the future?
Thanks

How to display the pointcloud?

when I run the "./demo [*.pcd]" in the directory of cuda-icp, just got the result:
GPU has cuda devices: 1
----device id: 0 info----
GPU : Xavier
Capbility: 7.2
Global memory: 31918MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Loaded 7000 data points for P with the following fields: x y z
Loaded 7000 data points for Q with the following fields: x y z
iter.Maxiterate 0
iter.threshold 1e-12
iter.acceptrate 1

Target rigid transformation : cloud_in -> cloud_icp
Rotation matrix :
| 0.923880 -0.382683 0.000000 |
R = | 0.382683 0.923880 0.000000 |
| 0.000000 0.000000 1.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.200000 >

------------checking CUDA ICP(GPU)----------------
CUDA ICP by Time: 0.797888 ms.
CUDA ICP fitness_score: 0.777453
matrix_icp calculated Matrix by Class ICP
Rotation matrix :
| 1.000000 0.000000 -0.000000 |
R = | -0.000000 1.000000 0.000000 |
| -0.000000 0.000000 1.000000 |
Translation vector :
t = < -0.000000, 0.000000, -0.000000 >

------------checking PCL ICP(CPU)----------------
PCL icp.align Time: 38.2758 ms.
has converged: 1 score: 0.651369
CUDA ICP fitness_score: 0.651369
transformation_matrix:
0.999905 0.00279406 0.0134922 0.0161865
-0.00265722 0.999945 -0.010151 0.00527596
-0.0135198 0.0101141 0.999858 0.0133578
0 0 0 1

------------checking PCL GICP(CPU)----------------
PCL Gicp.align Time: 144.663 ms.
has converged: 1 score: 0.541552
transformation_matrix:
0.99874 0.00468762 0.0499603 -0.0427716
-0.00344507 0.999683 -0.0249281 0.0265501
-0.0500613 0.0247246 0.99844 0.148036
0 0 0 1

so,I really want to know how to display the result,just like https://developer.nvidia.com/zh-cn/blog/cuda-pcl-1-0-jetson/

why cuNDT cannot work with x86 version ubuntu20.04

~/cuPCL/cuNDT$ ./demo

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA A800 80GB PCIe
Capbility: 8.0
Global memory: 81085MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Loaded 7000 data points for P with the following fields: x y z
Loaded 7000 data points for Q with the following fields: x y z
Target rigid transformation : cloud_P -> cloud_Q
Rotation matrix :
| 0.923880 -0.382683 0.000000 |
R = | 0.382683 0.923880 0.000000 |
| 0.000000 0.000000 1.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.200000 >

------------checking PCL NDT(CPU)----------------
PCL align Time: 27.1937 ms.
Normal Distributions Transform has converged: 1 score: 0.648334
Rotation matrix :
| 0.999894 0.004857 0.013688 |
R = | -0.004680 0.999905 -0.012931 |
| -0.013750 0.012865 0.999823 |
Translation vector :
t = < 0.015418, 0.056840, 0.078443 >

------------checking CUDA NDT(GPU)----------------
CUDA NDT by Time: 0.777725 ms.
CUDA NDT fitness_score: 0.349491
Rotation matrix :
| 0.000000 0.000000 0.000000 |
R = | 0.000000 0.000000 0.000000 |
| 0.000000 0.000000 0.000000 |
Translation vector :
t = < 0.000000, 0.000000, 0.000000 >

cuda failure

------------checking CUDA PassThrough ----------------
Cuda failure: no kernel image is available for execution on the device at line 102 in file cudaFilter.cpp error status: 209
Aborted (core dumped)

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X2
Capbility: 6.2
Global memory: 7851MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

------------checking CUDA ----------------
CUDA Loaded 119978 data points from PCD file with the following fields: x y z

Source code missing?

this looks interesting but there seems to be no source included in the repo, only so files.

Any plans on releasing them?

may you add some comment in the header?

It will be clearer to have more comment (as in cuda-pcl).

while in cuda-octree it is harder.
I found the distance result for approxNearestSearch is 1e9 times the real sqr_distance.
but for radiusSearch it is directly the result.

Cuda filter demo, cuda-pcl is worse than pcl when I use the VoxelGrid

cuda-pcl in PassThrough is better than pcl but in VoxelGrid is not well

Undefined reference to log@glibc_2.29 and pow

As in title error during compilation of cuda-segmentation and cuda-ndt
makefile prepared by cmake
configuration:

ubuntu 18.04
pcl 1.12.1
jetpack 4.6
Jetson tx2

Segmentation Fault depending on the point cloud in `cuCluster`

Hello,

I receive segmentation fault only for some point clouds and with small voxel size parameters to the cuCluster. It does not seem related to the point cloud size. If I make the voxel sizes larger, however, the segmentation fault disappears. I thought may be the number of voxels necessary for the volume is larger than INT_MAX, and it causes integer overflow but my calculation shows the required number of voxels is not even near the limit.

I could provide you with some point cloud data that produce the error, if you wish to reproduce.

Because the library is not open source, I cannot really debug. It looks like the cudaExtractClusterImpl::extract function tries to write into the output array, however, the position it tries to write is out of bounds.

// call stack inside the library
libcudacluster.so!cudaExtractClusterImpl::extract(float*, int, float*, unsigned int*)
libcudacluster.so!cudaExtractCluster::extract(float*, int, float*, unsigned int*)

// The limits of the  point cloud
// is [0m,0m,0m] to ~[3m, 2m, 6m] meters.
// I see segmentation faults for voxel size 0.05.
// for 0.1, I do not.
ecp.voxelX = 0.05;
ecp.voxelY = 0.05;
ecp.voxelZ = 0.05;

Voxeldownsampling and Cluster Filter - “Warp out of Range Address”

Problem Explaination

Hello there ! I am trying to use the cuPCL repository: https://github.com/NVIDIA-AI-IOT/cuPCL such to preprocess the PointCloud by a Voxel Downsampling Filter prior to using a the defined Clusterer. The program runs smoothly without the Voxel Downsampling , but the problem comes when only making an Instance of the filter as shown below:

cudaExtractCluster cudaec(stream);
cudaFilter filterTest(stream);

So just using one works, but both produces a: Cuda failure: an illegal memory access was encountered at line 138 in file cudaFilter.cpp error status: 700

After some trials with Debugging with CUDA-GDB and CUDA MEMCHECK I came to the following results but do not quite sure if they can be solved as the classes are implemented in a precompiled .so files:

Both classes invoke the cudaFillVoxelGirdKernel, and the error occurs on the Kernel Launch of the first function call that invokes the Kernel Launch :

Thread 1 "collision_avoid" received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Switching focus to CUDA kernel 0, grid 6, block (3,0,0), thread (160,0,0), device 0, sm 6, warp 4, lane 0]
0x0000555555d50eb0 in cudaFillVoxelGirdKernel(float4*, int4*, int4*, float4*, unsigned int, float, float, float) ()

The Thread is trying to write 4 bytes into some Global Memory address (CUDA MEMCHECK):

Invalid __global__ write of size 4

And from debugging:

Illegal access to address (@global)0x8007b0800c60 detected
(cuda-gdb) print *0x8007b0800c60
Error: Failed to read local memory at address 0x8007b0800c60 on device 0 sm 0 warp 9 lane 0, error=CUDBG_ERROR_INVALID_MEMORY_ACCESS(0x8).

Moreover the following CUDA API Error is Returned:

warning: Cuda API error detected: cuGetProcAddress returned (0x1f4)

This indicates that a named symbol was not found. Examples of symbols are global/constant variable names, driver function names, texture names, and surface names.

What I do not understand is that from the Thread's scope the address is treated as a local address , but actually it seems to be a global one. And whether if the CUDA API Error can be a lead of some sort.

Note that for memory transfer cudaMemMallocManaged has been used (UVM), and even using explicit memory transfers did not solve the issue.

Other efforts to solve the issue was to limit all CUDA computations to match the Device limits as follows:

  size_t limit = 0;
  cudaDeviceGetLimit(&limit, cudaLimitStackSize);
  std::cout << "Stack limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitStackSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitPrintfFifoSize);
  std::cout << "cudaLimitPrintfFifoSize limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitPrintfFifoSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitMallocHeapSize);
  std::cout << "cudaLimitMallocHeapSize limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitMallocHeapSize, limit);

  cudaDeviceGetLimit(&limit, cudaLimitDevRuntimeSyncDepth);
  std::cout << "cudaLimitDevRuntimeSyncDepth limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitDevRuntimeSyncDepth, limit);

  cudaDeviceGetLimit(&limit, cudaLimitDevRuntimePendingLaunchCount);
  std::cout << "cudaLimitDevRuntimePendingLaunchCount limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitDevRuntimePendingLaunchCount, limit);

  cudaDeviceGetLimit(&limit, cudaLimitMaxL2FetchGranularity);
  std::cout << "cudaLimitMaxL2FetchGranularity limit is: " << limit << std::endl;
  cudaDeviceSetLimit(cudaLimitMaxL2FetchGranularity, limit);

But not changes have been yielded.

Device Info

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   56C    P8    18W /  N/A |    123MiB /  7982MiB |     32%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

  Dev PCI Bus/Dev ID  Name Description                                   SM Type  
*   0  01:00.0        NVIDIA GeForce RTX 2080 Super with Max-Q Design     TU104-A

SMs    Warps/SM Lanes/Warp Max Regs/Lane    Active SMs Mask 
sm_75  48       32         32           256 0x00000000000000000000ffffffffffff

Using Ros Noetic and Ubuntu 20.04

Any plans to release source code?

There are some errors when I use it.
I want to change some source code to suit my program.
thanks.

why cuNDT cannot work with x86 version ubuntu20.04

:~/cuPCL/cuNDT$ ./demo

Cuda failure: the launch timed out and was terminated at line 59 in file cudaICP.cpp error status: 702

When i run cuda-icp.cpp example with PCD data:
and output here:

Cuda failure: the launch timed out and was terminated at line 59 in file cudaICP.cpp error status: 702
Aborted (core dumped)

How fix that ?
Thanks you.

Cuda PCL with ROS

Is it possible to use the cuda PCL with ros? And if so, how?
Thanks in Advance.

Any plan to add thrust support?

Hi,

Thrust gives us c++ STL style cuda programming, which made cuda programming easier and safer.

Will thrust support be added to this project?

Thanks!

Cuda failure: invalid argument at line 126 in file cudaFilter.cpp error status: 1

Cuda failure: invalid argument at line 126 in file cudaFilter.cpp error status: 1。Have anyone meet this question?When I use my camera to generate the pointcloud,and this fault appear.If you have some free time and glance my question by coincidence,I wish you could tell me how to solve this problem.
Best wish!

Use with pcl::PointXYZI

I need to use this library with the above point type. Is that possible?

how to apply cuPCL on dynamic pointcloud bag file, instead of pcd file?

When I run "./demo [*.pcd]" in the directory of cuda-segmentation and cuda-filter

GPU has cuda devices: 1
----device id: 0 info----
GPU : Xavier
Capbility: 7.2
Global memory: 31918MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

[pcl::PCDReader::readHeader] Could not find file '[.pcd]'.
Error:can not open the file: [.pcd]

I cannot find where is the problem,can u help me?

We found the bug when using this function: "ndtTest.ndt((float )PUVM, nP, (float )QUVM, nQ, guess, cudaMatrix, stream);"

localization_node: /usr/include/eigen3/Eigen/src/SVD/SVDBase.h:85: const MatrixUType& Eigen::SVDBase::matrixU() const [with Derived = Eigen::JacobiSVD<Eigen::Matrix<float, 3, 3>, 2>; Eigen::SVDBase::MatrixUType = Eigen::Matrix<float, 3, 3>; typename Eigen::internal::traits::MatrixType::Scalar = float]: Assertion `m_isInitialized && "SVD is not initialized."' failed.

cuda version : 11.4
pcl version : 1.10
system : ubuntu 20.04
eigen version : 3.3.7
platform: Orin

How to add a point cloud strength field

Request for .so library in AMD64 Platform

Thank you for providing CUDA implementation with PCL.

Can you please share the .so library in AMD64 platform?

code branch --- x86, whe the demo run, the process died

cuda version : 10.02
pcl version : 1.8
system : ubuntu 18.04

@Haoyu-NV

cuFilter apply constraints on all 3 axes at one time.

how to apply filter constraints on all x,y,z at same time? the following code only applies filter to the most recent axis..

FilterParam_t setPx, setPy, setPz; // filter parameters for each axis

FilterType_t type = PASSTHROUGH; // only passthrough filter implemented in cuCL library for now

// this filter contraints is being applied for only one axis....
setPx.type = type;
setPx.dim = 0; // 0   // it will be 0,1,2 for x,y,z axes  
setPx.upFilterLimits = 1.5; 
setPx.downFilterLimits = -1.5; 
setPx.limitsNegative = false;   
filterTest.set(setPx);


setPy.type = type;
setPy.dim = 1; // 0   // it will be 0,1,2 for x,y,z axes  
setPy.upFilterLimits = 1.5; 
setPy.downFilterLimits = -1.5; 
setPy.limitsNegative = false;
filterTest.set(setPy);


setPz.type = type;
setPz.dim = 2; // 0   // it will be 0,1,2 for x,y,z axes  
setPz.upFilterLimits = 2.0; 
setPz.downFilterLimits = 0.0; 
setPz.limitsNegative = false;
filterTest.set(setPz);

filterTest.filter(output, &countLeft, input, nCount);

Cuda filter demo, cuda-pcl is worse than pcl if MemCpyTime(4.6285ms) is considered.

Hi everyone,

I added the a timespan calculation to measure the time comsuming for input and output memory allocation.

This kind of memory allocation is needed before every cuda functions calling.

The code is below.

  t1 = std::chrono::steady_clock::now();
  cudaMallocManaged(&input, sizeof(float) * 4 * nCount, cudaMemAttachHost);
  cudaStreamAttachMemAsync (stream, input );
  cudaMemcpyAsync(input, inputData, sizeof(float) * 4 * nCount, cudaMemcpyHostToDevice, stream);
  cudaStreamSynchronize(stream);

  float *output = NULL;
  cudaMallocManaged(&output, sizeof(float) * 4 * nCount, cudaMemAttachHost);
  cudaStreamAttachMemAsync (stream, output );
  cudaStreamSynchronize(stream);
  t2 = std::chrono::steady_clock::now();
  auto time_span1 = std::chrono::duration_cast<std::chrono::duration<double, std::ratio<1, 1000>>>(t2 - t1);

And here is my test result. MemCpy by Time 4.6285ms

So according to the real FPS of passthrough filter

cuda-pcl(4.6285+0.456927=5.085427ms) is not better than pcl(4.25133ms).

So what's the best practice of programming with cuda-pcl?

Thanks.

initialization error

On all application the same/similar error appears.
Configuration:

Jetson tx2
ubuntu 18.04 (in docker)
Jetpack 4.6
pcl 1.12.1

root@linux:/cuda-pcl/cuda-icp#_` ./cuda_segmentation test_Q.pcd 

GPU has cuda devices: 1
----device id: 0 info----
  GPU : NVIDIA Tegra X2 
  Capbility: 6.2
  Global memory: 3833MB
  Const memory: 64KB
  SM in a block: 48KB
  warp size: 32
  threads in a block: 1024
  block dim: (1024,1024,64)
  grid dim: (2147483647,65535,65535)

Loaded 7000 data points for P with the following fields: x y z
Loaded 7000 data points for Q with the following fields: x y z
 iter.Maxiterate 0
 iter.threshold 1e-12
 iter.acceptrate 1

Target rigid transformation : cloud_in -> cloud_icp
Rotation matrix :
    | 0.923880 -0.382683 0.000000 | 
R = | 0.382683 0.923880 0.000000 | 
    | 0.000000 0.000000 1.000000 | 
Translation vector :
t = < 0.000000, 0.000000, 0.200000 >

------------checking CUDA ICP(GPU)---------------- 
Cuda failure: initialization error at line 189 in file cudaICP.cpp error status: 3
Aborted (core `_dumped)

Merge work into PCL proper

https://discourse.ros.org/t/10x-acceleration-for-point-cloud-processing-with-cuda-pcl-on-jetson/18800/2?u=smac

Running on JP 4.4.0

Just curious if there is something fundamentally different that will prevent JP 4.4 running or that just only it has been tested on JP 4.4.1 per the README.md.

Support for multiple filters simultaneously?

Is there support for filtering X,Y, and Z simultaneously to extract a bounding box/cube from a point cloud?

For instance, in PCL lib this is achieved either using CropBox or ConditionOr settings as shown here:
https://stackoverflow.com/questions/45790828/remove-points-outside-defined-3d-box-inside-pcl-visualizer

Does similar functionality exist for this? The example only shows filtering 1 dimension ("X") at a time.

Could you help us recompile the library for Jetson TX2?

Hi! I am trying to use CUDA-PCL on Jetson TX2 (with Jetpack 4.5, CUDA-10.2 and PCL 1.8.1), but I have encountered a CUDA failure problem. It would be great if you can help me out. Here is the output when I run ./demo in cuda-pcl/cuda-segmentation:

nvidia@nvidia-tx2:~/Downloads/cuda-pcl/cuda-segmentation$ ./demo sample.pcd

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X2
Capbility: 6.2
Global memory: 7850MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

Cuda failure: no kernel image is available for execution on the device at line 310 in file cudaSegmentation.cpp error status: 209
Aborted (core dumped)

I guess the lib*.so binaries are not compiled with sm=62 so it cannot be executed on Jetson TX2. I will be appreciated if you could fix it for us.

[QST] Can this lib be used in Drive AGX Orin？

DriveAGX Orin is the tegra platform specifically for autonomous driving.

Can this lib be used in Drive AGX Orin？

thanks.

Cuda failure: invalid argument at line 123 in file cudaFilter.cpp error status: 1

The file cudaFilter.cpp is not provided. It seems difficult to locate this issue.

Segmentation fault (program cc1plus) with statistical_outlier_removal header

Hi all,
I am trying to make cuda-filter but g++ crashes. So, I figured out that statistical_outlier_removal.h was causing the error so I commented out. Do you know why this is happening? I had a similar issue within another program where pcl/common/transforms.h caused the same problem. I had to fix it using an alternative library from ROS tf2.
I have Jetson Nano Developer Kit 4GB memory with 2 GB of swap size (I have already tried increasing the swap size).

Cuda failure: no kernel image is available

Hi, I try to run your demo ,but something wrong with it.Print info as below:

GPU has cuda devices: 1
----device id: 0 info----
GPU : NVIDIA Tegra X2
Capbility: 6.2
Global memory: 7850MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

------------checking CUDA ----------------
CUDA Loaded 119978 data points from PCD file with the following fields: x y z

what should I do? please help me.

Will this work in a 64-bit Ubuntu 18.04 computer?

I'm getting the following error when I try to make the cude filter code:

(base) ➜  cuda-filter make
USE Default CUDA DIR: /usr/local/cuda
TARGET_ARCH: x86_64
CUDA_VERSION: 10020
SMS: 30 35 50 53 60 61 70 72 
g++  -I/usr/local/cuda/include -I/include -I/usr/local/include -I/usr/include/eigen3/ -I/usr/include/pcl-1.8/ -I/usr/include/vtk-6.3/ -D_REENTRANT -std=c++11 -O2 -fPIC -o obj/main.o -c main.cpp
g++ -D_REENTRANT -std=c++11  -O2 -o demo obj/main.o  -L/usr/lib -L/usr/local/lib -L/usr/local/cuda/lib64 -lcudart_static -lrt -ldl -lpthread -lcudart -L/lib64 -lcudnn -lpthread -L/usr/lib/aarch64-linux-gnu/ -lboost_system -lpcl_common -lpcl_io -lpcl_recognition -lpcl_features -lpcl_sample_consensus -lpcl_octree -lpcl_search -lpcl_filters -lpcl_kdtree -lpcl_segmentation -lpcl_visualization ./lib/libcudafilter.so
./lib/libcudafilter.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
Makefile:173: recipe for target 'demo' failed
make: *** [demo] Error 1

I wonder if it has anything to do with my computer architecture or am I missing something I need to do?

Thanks!

Cuda-segmentation: NvMapReserveOp failed on Jetson Xavier 32gb

Hello.

Thank you for efforts in providing CUDA-optimized PCL algorithms.

With the Cuda-segmentation demo, I get an CUDA error on the cudaExtractCluster part, while the cudaSegmentation part runs fine.
Other demos such as cuda-filter runs fine.

NvMapReserveOp 0x80000003 failed [22]
NvMapReserveOp 0x80000001 failed [22]
NvMapReserveOp 0x80000000 failed [22]

Do you know why this happens?
And can you help me use the Cuda-segmentation?

Thank you!

Jetson AGX Xavier Developer Kit
PCL 1.8
CUDA 10.2

[FEA] Introducing ros wrapper perception_cuda_pcl.

Hi, the repo perception_cuda_pcl is the ros interface for cuda_pcl, it is inspired by perception_open3d and depends on perception_pcl

So anyone who is using cuda_pcl throuth ros cpp can have a look.

Thanks.

cluster size always zero using cuda-cluster

Using cuda-cluster for some down-sampled robosense pointcloud, the cudaExtractCluster always return zero cluster, while normal pcl::EuclideanClusterExtraction works fine.

env:
jetson xavier nx
jetpack 4.6
ubuntu18.04
cuda10.2

sample pcd file: sample_pcd.zip

run the cuda-cluster demo with modified 'extractClusterParam' in testCUDA

  ecp.minClusterSize = 5;
  ecp.maxClusterSize = 2500000;
  ecp.voxelX = 0.2;
  ecp.voxelY = 0.2;
  ecp.voxelZ = 0.2;
  ecp.countThreshold = 20;

tried different ClusterParams several times, none succeed(:

cuda failure: initialization error at line 196 in file cudaICP.cpp error status:3

I can make the project about cuICP and generate the demo , but when I run the demo,the issue appered:

---------checking cUDA ICP(GPU)----------------
cuda failure: initialization error at line 196 in file cudaICP.cpp error status:3

My operating environment:
Xavier AGX 8G
cuda 10.2
Jetpack 4.5.1
pcl 1.8
vtk 6.3
eigen3

Linker Error

I keep getting linker errors for the libcudasegmentation file.
With the error "/usr/bin/ld: ./lib/libcudasegmentation.so: error adding symbols: file in wrong format"
What could be causing this?

请jetpack4.3可以吗？

Unwanted output LINE:178 of cuCluster

Hello, thanks for the repository.

Whenever I execute the extract method of cuCluster, it prints the following:
LINE:178 18696
The number after 178 changes across calls. This happens with the demo code and also other places. What is the meaning of this output and how can I get rid of it? The output of the demo code in my Jetson AGX Xavier is below. I checked out jp5.x branch, which is compatible with the jetpack version I have.

GPU has cuda devices: 1
----device id: 0 info----
GPU : Xavier
Capbility: 7.2
Global memory: 14907MB
Const memory: 64KB
SM in a block: 48KB
warp size: 32
threads in a block: 1024
block dim: (1024,1024,64)
grid dim: (2147483647,65535,65535)

-------------- test CUDA lib -----------
-------------- cudaExtractCluster -----------
LINE:178 18696
CUDA extract by Time: 14.91 ms.
PointCloud representing the Cluster: 162152 data points.
PointCloud representing the Cluster: 7098 data points.
PointCloud representing the Cluster: 1263 data points.
PointCloud representing the Cluster: 257 data points.

-------------- test PCL lib -----------
PCL(CPU) cluster kd-tree by Time: 92.0657 ms.
PCL(CPU) cluster extracted by Time: 5042.35 ms.
PointCloud cluster_indices: 4.
PointCloud representing the Cluster: 166789 data points.
PointCloud representing the Cluster: 7410 data points.
PointCloud representing the Cluster: 1318 data points.
PointCloud representing the Cluster: 427 data points.

nvidia-ai-iot / cupcl Goto Github PK

cupcl's People

Contributors

Stargazers

Watchers

Forkers

cupcl's Issues

Problem Explaination

Device Info

how to apply filter constraints on all x,y,z at same time? the following code only applies filter to the most recent axis..

Recommend Projects

Recommend Topics

Recommend Org