Comments (8)
Hi Lucas! I haven't seen that one before. I'm running it on Ubuntu 16.04 as well. There's a doc in the setup_help directory with some tips for installing on Ubuntu. Definitely sounds like an OpenCL header file issue (possibly an old version). Want to take a look at the Ubuntu setup doc first and see if anything looks out of sync? Sounds potentially related to getting cl.h in the standard /usr/include/CL/ path. I just rebuilt it on a clean Ubuntu version just to verify and it built okay here.
from gr-clenabled.
I will try that :D
from gr-clenabled.
It only works if I force the intel SDK at /usr/include/CL
. The CUDA one loads as default because the ubuntu default opencl doesn't use .hpp
(uses .h
so the cmake doesn't find it), but CUDA only supports up to 1.1.
from gr-clenabled.
Sadly the GTX 980 here doesn't support OpenCL 1.2, just saw that gr-clenabled only supports 1.2 :((
from gr-clenabled.
You should actually be okay once it compiles. I ran it on a gtx 970 so it should run fine on a 980. I think the 1.2 setting was just for the way some functions were defined in the latest headers.
from gr-clenabled.
Hmm thats true, it just gave preference for CPU OpenCL, when I disabled the CPU OpenCL it ran fine on my 980:
┌─[lucas@nblucas] - [/media/ELTN/Works/gr-clenabled/build] - [Sáb Set 09, 22:01]
└─[$] <git:(master*)> ./lib/test-clenabled
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by ./lib/test-clenabled)
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by ./lib/test-clenabled)
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /media/ELTN/Works/gr-clenabled/build/lib/libgnuradio-clenabled-1.0.0git.so.0.0.0)
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /media/ELTN/Works/gr-clenabled/build/lib/libgnuradio-clenabled-1.0.0git.so.0.0.0)
./lib/test-clenabled: /usr/local/cuda-8.0/lib64/libOpenCL.so.1: no version information available (required by /usr/lib/x86_64-linux-gnu/libclFFT.so.2)
----------------------------------------------------------
Testing no-action kernel (return only) constant operation to measure OpenCL overhead
This value represent the 'floor' on the selected platform. Any CPU operations have to be slower than this to even be worthy of OpenCL consideration unless you're just looking to offload.
OpenCL: using NVIDIA CUDA
OpenCL INFO: Math Op Const building kernel with __constant params...
Max constant items: 8192
OpenCL INFO: Math Op Const building kernel with __constant params...
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000097 s (84172104.000000 sps)
----------------------------------------------------------
OpenCL: using NVIDIA CUDA
OpenCL INFO: Math Op Const building kernel with __constant params...
Max constant items: 8192
OpenCL INFO: Math Op Const building kernel with __constant params...
Testing kernel that simply copies in[index]->out[index] 8192 items...
OpenCL Context: GPU
OpenCL Run Time: 0.000098 s (83953656.000000 sps)
----------------------------------------------------------
OpenCL: using NVIDIA CUDA
OpenCL INFO: Math Op Const building kernel with __constant params...
OpenCL INFO: Math Op Const building kernel with __constant params...
Testing complex Multiply/Add Const performance with 8192 items...
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000103 s (79827440.000000 sps)
CPU-Only Run Time: 0.000002 s (3703871232.000000 sps)
----------------------------------------------------------
OpenCL: using NVIDIA CUDA
OpenCL INFO: Log10 Const building kernel with __constant params...
OpenCL INFO: Log10 Const building kernel with __constant params...
Testing Log10 float performance with 8192 items...
Note: gnuradio log10 uses the following calculation: 'out[i] = n * log10(std::max(in[i], (float) 1e-18)) + k';
the extra max() function adds extra time to the call versus a straight log10.
OpenCL Context: GPU
OpenCL Run Time: 0.000041 s (197957792.000000 sps)
CPU-only Run Time: 0.000382 s (21464218.000000 sps)
----------------------------------------------------------
OpenCL: using NVIDIA CUDA
OpenCL INFO: SNR Helper building kernel with __constant params...
OpenCL INFO: SNR Helper building kernel with __constant params...
Testing SNR Helper float performance with 8192 items...
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000063 s (129135336.000000 sps)
CPU-only Run Time: 0.000075 s (109455960.000000 sps)
----------------------------------------------------------
Testing Costas Loop performance with 8192 items...
OpenCL: using NVIDIA CUDA
OpenCL INFO: Costas Loop building kernel with __constant params...
Max constant items: 8192
OpenCL INFO: Costas Loop building kernel with __constant params...
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.020422 s (401128.750000 sps)
CPU-only Run Time: 0.000499 s (16430617.000000 sps)
----------------------------------------------------------
Testing Complex Signal Source
OpenCL: using NVIDIA CUDA
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000032 s (257837968.000000 sps)
CPU-only Run Time: 0.000024 s (339898560.000000 sps)
OpenCL: using NVIDIA CUDA
OpenCL: using NVIDIA CUDA
maximum error OpenCL versus gnuradio table lookup cos/sin: 0.000009/0.000008
----------------------------------------------------------
Testing complex operation (add/multiply/complex conj/mult conj) performance with 8192 items...
OpenCL: using NVIDIA CUDA
OpenCL INFO: Math Op - too many items for constant memory. Building kernel with __global params...
Max constant items: 4096
OpenCL INFO: Math Op - too many items for constant memory. Building kernel with __global params...
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000063 s (130618856.000000 sps)
CPU-only Run Time: 0.000009 s (880531712.000000 sps)
----------------------------------------------------------
Testing Complex to mag
OpenCL: using NVIDIA CUDA
OpenCL INFO: ComplexToMag building kernel with __constant params...
OpenCL INFO: ComplexToMag building kernel with __constant params...
OpenCL Context: GPU
OpenCL Run Time: 0.000048 s (171729776.000000 sps)
CPU-only Run Time: 0.000002 s (3389507840.000000 sps)
----------------------------------------------------------
Testing Complex to Mag and Phase
OpenCL: using NVIDIA CUDA
OpenCL INFO: ComplexToMag Const building kernel with __constant params...
OpenCL INFO: ComplexToMag Const building kernel with __constant params...
OpenCL Context: GPU
OpenCL Run Time: 0.000072 s (113782992.000000 sps)
CPU-only Run Time: 0.000041 s (201778720.000000 sps)
----------------------------------------------------------
Testing Complex to Arg
OpenCL: using NVIDIA CUDA
OpenCL INFO: MComplexToArg building kernel with __constant params...
OpenCL INFO: ComplexToArg using default output buffer of 8192...
OpenCL INFO: MComplexToArg building kernel with __constant params...
OpenCL INFO: ComplexToArg using default output buffer of 8192...
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000055 s (148719424.000000 sps)
CPU-only Run Time: 0.000034 s (241757664.000000 sps)
----------------------------------------------------------
Testing Mag and Phase to Complex
OpenCL: using NVIDIA CUDA
OpenCL INFO: MagPhaseToComplex building kernel with __constant params...
OpenCL INFO: MagPhaseToComplex building kernel with __constant params...
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000073 s (112014192.000000 sps)
CPU-only Run Time: 0.000069 s (119564856.000000 sps)
----------------------------------------------------------
Testing Quadrature Demodulation (used for FSK)
OpenCL: using NVIDIA CUDA
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000048 s (171133264.000000 sps)
CPU-only Run Time: 0.000036 s (230039920.000000 sps)
----------------------------------------------------------
Testing Forward FFT size of 2048 and 8192 data points.
OpenCL: using NVIDIA CUDA
gr::fft: can't import wisdom from /home/lucas/.gr_fftw_wisdom
First few points of input signal
input[0]: 0.000000,1.000000j
input[1]: 0.000767,1.000000j
input[2]: 0.001534,0.999999j
input[3]: 0.002301,0.999997j
OpenCL Context: GPU
Running on: NVIDIA CUDA
OpenCL Run Time: 0.000230 s (35542740.000000 sps)
CPU-only Run Time: 0.000013 s (628660032.000000 sps)
----------------------------------------------------------
Testing Reverse FFT
OpenCL: using NVIDIA CUDA
First few points of FWD->Rev FFT
output[0]: 0.000105,-0.000090j
output[1]: 0.000165,-0.000230j
output[2]: 0.000033,-0.000304j
output[3]: 0.000106,-0.000137j
OpenCL Context: GPU
OpenCL Run Time: 0.000256 s (31972274.000000 sps)
CPU-only Run Time: 0.000014 s (569532800.000000 sps)
Btw, thanks!
from gr-clenabled.
Funny though, almost all CPU runs are faster here. i7 6820HK vs GTX 980m
from gr-clenabled.
There's definitely some variation among hardware and block sizes. If you look in the docs directory there's a paper I wrote on each of the block in there and their performance across a GTX 1070, a GTX 970, a 1000M and CPU-only. Actually check out the SNR helper block. I took what you had in your LRIT flowgraph and combined a number of blocks into 1 GPU-accelerated block that always runs faster than the CPU. For something like your LRIT flowgraph what I typically do is do the GPU SNR helper (you can only use 1 GPU block per card at a time since it wants full throughput on the card), then with the gr-lfast project use its faster Costas Loop (it's about 50-70% faster on the CPU due to code optimizations), AGC (about 10% faster), and FFT filter wrappers (since at the tap sizes in the flowgraph the FD filters are faster than the TD filters), then use the gr-grnet project's TCP sink to maintain the TCP sink in case the built-in ones go away.
from gr-clenabled.
Related Issues (13)
- PFB Clock Recovery Block HOT 3
- GNC Window Crashes When Changing QT GUI Range HOT 6
- Discontinuities using Tap-based Filter HOT 9
- 3.10 Mint 21 build problems (boost and iomanip)
- (Request) Gr-Iridium clenabled blocks possible?
- FFT size > 4096 HOT 1
- using opencl 2 cl2.hpp HOT 5
- Compilation error: cerr is not a member of std HOT 1
- complex taps HOT 7
- AttributeError: 'module' object has no attribute 'myclSignalSource'
- missing Frequency Xlating FIR Filter HOT 4
- GRC: Error in CL FFT block for GNURadio 3.8 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gr-clenabled.