Light

Compiling Issues about gr-clenabled HOT 8 CLOSED

ghostop14 commented on July 26, 2024

Compiling Issues

from gr-clenabled.

Comments (8)

ghostop14 commented on July 26, 2024 1

Hi Lucas! I haven't seen that one before. I'm running it on Ubuntu 16.04 as well. There's a doc in the setup_help directory with some tips for installing on Ubuntu. Definitely sounds like an OpenCL header file issue (possibly an old version). Want to take a look at the Ubuntu setup doc first and see if anything looks out of sync? Sounds potentially related to getting cl.h in the standard /usr/include/CL/ path. I just rebuilt it on a clean Ubuntu version just to verify and it built okay here.

from gr-clenabled.

racerxdl commented on July 26, 2024

I will try that :D

from gr-clenabled.

racerxdl commented on July 26, 2024

It only works if I force the intel SDK at /usr/include/CL. The CUDA one loads as default because the ubuntu default opencl doesn't use .hpp (uses .h so the cmake doesn't find it), but CUDA only supports up to 1.1.

from gr-clenabled.

racerxdl commented on July 26, 2024

Sadly the GTX 980 here doesn't support OpenCL 1.2, just saw that gr-clenabled only supports 1.2 :((

from gr-clenabled.

ghostop14 commented on July 26, 2024

You should actually be okay once it compiles. I ran it on a gtx 970 so it should run fine on a 980. I think the 1.2 setting was just for the way some functions were defined in the latest headers.

from gr-clenabled.

racerxdl commented on July 26, 2024

Hmm thats true, it just gave preference for CPU OpenCL, when I disabled the CPU OpenCL it ran fine on my 980:

┌─[lucas@nblucas] - [/media/ELTN/Works/gr-clenabled/build] - [Sáb Set 09, 22:01]
└─[$] <git:(master*)> ./lib/test-clenabled
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by ./lib/test-clenabled)
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by ./lib/test-clenabled)
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /media/ELTN/Works/gr-clenabled/build/lib/libgnuradio-clenabled-1.0.0git.so.0.0.0)
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /media/ELTN/Works/gr-clenabled/build/lib/libgnuradio-clenabled-1.0.0git.so.0.0.0)
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /usr/lib/x86_64-linux-gnu/libclFFT.so.2)
----------------------------------------------------------
Testing no-action kernel (return only) constant operation to measure OpenCL overhead
This value represent the 'floor' on the selected platform.  Any CPU operations have to be slower than this to even be worthy of OpenCL consideration unless you're just looking to offload.
OpenCL: using NVIDIA CUDA
OpenCL INFO: Math Op Const building kernel with __constant params...
Max constant items: 8192
OpenCL INFO: Math Op Const building kernel with __constant params...
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000097 s  (84172104.000000 sps)

----------------------------------------------------------
OpenCL: using NVIDIA CUDA
OpenCL INFO: Math Op Const building kernel with __constant params...
Max constant items: 8192
OpenCL INFO: Math Op Const building kernel with __constant params...
Testing kernel that simply copies in[index]->out[index] 8192 items...
OpenCL Context: GPU
OpenCL Run Time:      0.000098 s  (83953656.000000 sps)


----------------------------------------------------------
OpenCL: using NVIDIA CUDA
OpenCL INFO: Math Op Const building kernel with __constant params...
OpenCL INFO: Math Op Const building kernel with __constant params...
Testing complex Multiply/Add Const performance with 8192 items...
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000103 s  (79827440.000000 sps)

CPU-Only Run Time:      0.000002 s  (3703871232.000000 sps)


----------------------------------------------------------
OpenCL: using NVIDIA CUDA
OpenCL INFO: Log10 Const building kernel with __constant params...
OpenCL INFO: Log10 Const building kernel with __constant params...
Testing Log10 float performance with 8192 items...
Note: gnuradio log10 uses the following calculation: 'out[i] = n * log10(std::max(in[i], (float) 1e-18)) + k';
the extra max() function adds extra time to the call versus a straight log10.
OpenCL Context: GPU
OpenCL Run Time:      0.000041 s  (197957792.000000 sps)

CPU-only Run Time:      0.000382 s  (21464218.000000 sps)

----------------------------------------------------------
OpenCL: using NVIDIA CUDA
OpenCL INFO: SNR Helper building kernel with __constant params...
OpenCL INFO: SNR Helper building kernel with __constant params...
Testing SNR Helper float performance with 8192 items...
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000063 s  (129135336.000000 sps)

CPU-only Run Time:      0.000075 s  (109455960.000000 sps)


----------------------------------------------------------
Testing Costas Loop performance with 8192 items...
OpenCL: using NVIDIA CUDA
OpenCL INFO: Costas Loop building kernel with __constant params...
Max constant items: 8192
OpenCL INFO: Costas Loop building kernel with __constant params...
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.020422 s  (401128.750000 sps)

CPU-only Run Time:      0.000499 s  (16430617.000000 sps)



----------------------------------------------------------
Testing Complex Signal Source
OpenCL: using NVIDIA CUDA
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000032 s  (257837968.000000 sps)

CPU-only Run Time:      0.000024 s  (339898560.000000 sps)

OpenCL: using NVIDIA CUDA
OpenCL: using NVIDIA CUDA
maximum error OpenCL versus gnuradio table lookup cos/sin: 0.000009/0.000008

----------------------------------------------------------
Testing complex operation (add/multiply/complex conj/mult conj) performance with 8192 items...
OpenCL: using NVIDIA CUDA
OpenCL INFO: Math Op - too many items for constant memory.  Building kernel with __global params...
Max constant items: 4096
OpenCL INFO: Math Op - too many items for constant memory.  Building kernel with __global params...
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000063 s  (130618856.000000 sps)

CPU-only Run Time:      0.000009 s  (880531712.000000 sps)



----------------------------------------------------------
Testing Complex to mag
OpenCL: using NVIDIA CUDA
OpenCL INFO: ComplexToMag building kernel with __constant params...
OpenCL INFO: ComplexToMag building kernel with __constant params...
OpenCL Context: GPU
OpenCL Run Time:      0.000048 s  (171729776.000000 sps)

CPU-only Run Time:      0.000002 s  (3389507840.000000 sps)



----------------------------------------------------------
Testing Complex to Mag and Phase
OpenCL: using NVIDIA CUDA
OpenCL INFO: ComplexToMag Const building kernel with __constant params...
OpenCL INFO: ComplexToMag Const building kernel with __constant params...
OpenCL Context: GPU
OpenCL Run Time:      0.000072 s  (113782992.000000 sps)

CPU-only Run Time:      0.000041 s  (201778720.000000 sps)



----------------------------------------------------------
Testing Complex to Arg
OpenCL: using NVIDIA CUDA
OpenCL INFO: MComplexToArg building kernel with __constant params...
OpenCL INFO: ComplexToArg using default output buffer of 8192...
OpenCL INFO: MComplexToArg building kernel with __constant params...
OpenCL INFO: ComplexToArg using default output buffer of 8192...
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000055 s  (148719424.000000 sps)

CPU-only Run Time:      0.000034 s  (241757664.000000 sps)



----------------------------------------------------------
Testing Mag and Phase to Complex
OpenCL: using NVIDIA CUDA
OpenCL INFO: MagPhaseToComplex building kernel with __constant params...
OpenCL INFO: MagPhaseToComplex building kernel with __constant params...
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000073 s  (112014192.000000 sps)

CPU-only Run Time:      0.000069 s  (119564856.000000 sps)



----------------------------------------------------------
Testing Quadrature Demodulation (used for FSK)
OpenCL: using NVIDIA CUDA
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000048 s  (171133264.000000 sps)

CPU-only Run Time:      0.000036 s  (230039920.000000 sps)



----------------------------------------------------------
Testing Forward FFT size of 2048 and 8192 data points.
OpenCL: using NVIDIA CUDA
gr::fft: can't import wisdom from /home/lucas/.gr_fftw_wisdom
First few points of input signal
input[0]: 0.000000,1.000000j
input[1]: 0.000767,1.000000j
input[2]: 0.001534,0.999999j
input[3]: 0.002301,0.999997j
OpenCL Context: GPU
Running on: NVIDIA CUDA

OpenCL Run Time:      0.000230 s  (35542740.000000 sps)

CPU-only Run Time:      0.000013 s  (628660032.000000 sps)


----------------------------------------------------------
Testing Reverse FFT
OpenCL: using NVIDIA CUDA
First few points of FWD->Rev FFT
output[0]: 0.000105,-0.000090j
output[1]: 0.000165,-0.000230j
output[2]: 0.000033,-0.000304j
output[3]: 0.000106,-0.000137j
OpenCL Context: GPU
OpenCL Run Time:      0.000256 s  (31972274.000000 sps)

CPU-only Run Time:      0.000014 s  (569532800.000000 sps)

Btw, thanks!

from gr-clenabled.

racerxdl commented on July 26, 2024

Funny though, almost all CPU runs are faster here. i7 6820HK vs GTX 980m

from gr-clenabled.

ghostop14 commented on July 26, 2024

There's definitely some variation among hardware and block sizes. If you look in the docs directory there's a paper I wrote on each of the block in there and their performance across a GTX 1070, a GTX 970, a 1000M and CPU-only. Actually check out the SNR helper block. I took what you had in your LRIT flowgraph and combined a number of blocks into 1 GPU-accelerated block that always runs faster than the CPU. For something like your LRIT flowgraph what I typically do is do the GPU SNR helper (you can only use 1 GPU block per card at a time since it wants full throughput on the card), then with the gr-lfast project use its faster Costas Loop (it's about 50-70% faster on the CPU due to code optimizations), AGC (about 10% faster), and FFT filter wrappers (since at the tap sizes in the flowgraph the FD filters are faster than the TD filters), then use the gr-grnet project's TCP sink to maintain the TCP sink in case the built-in ones go away.

from gr-clenabled.

Related Issues (13)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.