Comments (542)
It's strange that Google ditched open OpenCL for proprietary CUDA.
from tensorflow.
I will be interested in expanding Tensor Flow with OpenCL. As we have already released OpenCL caffe. https://github.com/amd/OpenCL-caffe. Hopefully it can get integrated in light way? Is anyone interested in working together on this?
from tensorflow.
π
from tensorflow.
My apologies for not contributing more to this discussion recently, my plate has been more than full these past 2 weeks.
I'll be coordinating the OpenCL effort on the TensorFlow side. Our current thinking is:
- TensorFlow relies on c++11 and has taken a "single source" approach, so SYCL seems like a great fit.
- We don't have a lot of OpenCL experience in house, so we're collaborating closely with Codeplay to bridge this gap. In particular, Codeplay is currently leading the effort to add support for SYCL to the Eigen tensor library.
- TensorFlow relies on the cuDNN library to compute convolutions on NVidia GPUs. If somebody is interested in contributing an OpenCL equivalent, we'd be happy to help.
In order to help structure the effort, I created a mailing list: [email protected].
from tensorflow.
At the very least, the Eigen library would have to support OpenCL.
from tensorflow.
Hi all,
Just to keep you posted, we are still investigating how we can change the Eigen interface to better fit the SYCL/OpenCL 1.2 programming model.
Once we come up with a reasonable approach that targets heterogeneous programming models ( not only OpenCL / SYCL ) we will create a proposal.
Thanks,
Luke
from tensorflow.
Pls keep me update. I developed opencl-caffe for AMD. I am also looking at
tensor flow.
Thanks.
Junlu
On Dec 8, 2015 10:19 AM, "Luke Iwanski" [email protected] wrote:
Hi all,
Just to keep you posted, we are still investigating how we can change the
Eigen interface to better fit the SYCL/OpenCL 1.2 programming model.
Once we come up with a reasonable approach we will create a proposal.Thanks,
Lukeβ
Reply to this email directly or view it on GitHub
#22 (comment)
.
from tensorflow.
thumbs up and all that.
from tensorflow.
The website http://opencl.org is created to support open source porting projects just like these! We're currently installing all necessary tools at the website and have space for repositories at https://github.com/OpenCL/ - later on we're adding build-servers to test for several types of hardware and can provide our expertise in how to write code that runs at full speed on numerous hardware.
We're launching a porting initiative for GEGL next week, but we're happy to also support you.
from tensorflow.
Hi all,
Here at Codeplay we are looking into Eigen's tensor running on GPU using SYCL (a modern C++ layer on top of OpenCL). From what we have gathered so far, GPU tensor design is very closely coupled with CUDA and it will require interface changes for another programming model and particularly a SYCL and OpenCL 1.2 version.
If anyone is interested in digging deeper / helping out, we are most certainly interested in contributing.
Thanks,
Luke
from tensorflow.
π I can help code some OpenCL/SYCL if someone makes a plan, divides work into tasks etc. I recommend using Boost.Compute as a wrapper for OpenCL (it makes running kernels, testing, templating easier).
from tensorflow.
+1.
I've an AMD GPU and an Intel GPU in the laptop. I think both have OpenCL drivers and AMD's support seems to be much better. I'd have higher performance, because I've 2 OpenCL devices. I hope you make it scale with OpenCL devices.
from tensorflow.
/cc @ptillet @gongzg Is there any interest in this by Intel? I really hope that we don't fragment OPENCL here like in Caffe where we have an AMD fork, Intel unmerged PRs, another semi-unofficial AMD PR, and a long staging user PR (plus two old abandoned Opencl efforts). If somebody is interested in the history can take a look at BVLC/caffe#2610 comments.
from tensorflow.
π
from tensorflow.
hi all,
we will coordinate the effort of porting Eigenβs tensor module to SYCL for OpenCL as we already have something mostly working, but itβs not ready for review yet.
We are in favour of this approach as it will introduce less invasion to the code base. SYCL supports the single-source C++ templated model that eigen already uses.
Road map design is in progress so it shouldnβt be too long now.
Thanks,
Luke
from tensorflow.
Hi all,
Thanks for the interest!
At this point we are getting our testing infrastructure set up to make sure that nothing that we do introduces regression.
We are in touch with @benoitsteiner to make sure we are in sync with what he's done so far.
We are still in compiling a road map for the integration process - it should be done in couple weeks time, as there is a couple of business details to clarify.
Our goal is to bring the OpenCL to TensorFlow via Eigen by end of this year.
Thanks,
from tensorflow.
π
from tensorflow.
π
from tensorflow.
+1
from tensorflow.
@bhack We do have interest in this. Thanks for letting me know. If there is a proposal for Eigen's OpenCL/SYCL implementation, we will see what we can do from Intel side.
from tensorflow.
+1
Hope people working on it manage to overcome the CUDNN alternative problem by the time tensorflow gets close to 1.0
from tensorflow.
@martinwicke why is this issues closed ?
I don't think your commit fixes this.
from tensorflow.
Oh GitHub
from tensorflow.
would be great.
from tensorflow.
π
from tensorflow.
π
from tensorflow.
@bhack We are in contact with @benoitsteiner, but we will discuss our proposal with the upstream maintainers before we invest too much effort.
@DanMcLaughlin , @ville-k We are developing our implementation of SYCL, ComputeCpp (https://www.codeplay.com/products/computecpp). For more information, can you please contact me off-list via the email address on my profile?
from tensorflow.
@benoitsteiner Thank you for the update. It would be wonderful if all involved partners in @KhronosGroup (Google, Nvidia, Amd, Intel, Codeplay, Xilinx etc.) will promote a cudnn like API in a standardized way. A sort of Khronos openvx computer vision standardization effort but for deep learning.
from tensorflow.
@bhack Which new Google group?
Other than that, OpenCL and CUDA are too different programming approaches. CUDA works the way it is because one company has full control over everything, so it can embed binary blobs and who knows what in the final executable. This cannot be done with OpenCL, unless one goes down the SyCL path (I have my concerns...) and the SyCL compiler vendor has full control over all possible target architectures (unlikely or impossible in practice). Overall, my opinion is that a good OpenCL-enabled library needs more than just a few tweaks here and there. Probably not what you wanted to hear, but you asked for my opinion :-)
from tensorflow.
@gujunli Nice to see AMD here. /cc @naibaf7 @lunochod
from tensorflow.
/cc @lukeiwanski for Eigen/OpenCL/SYCL
from tensorflow.
@gujunli Certainly would be interested in contributing. Please let me know when you plan to start.
from tensorflow.
@lukeiwanski Thank you for the feedback. I think that @benoitsteiner worked at the tensor extension part of eigen.
from tensorflow.
An interesting initiative at https://github.com/ptillet/isaac also if here we rely on Eigen tensor extension.
from tensorflow.
I also would like to contribute. @benoitsteiner can you organize it?
from tensorflow.
This was included in the Roadmap but also tagged as contribution so a direction/bootstrap could be really useful.
from tensorflow.
I can contribute to organize it. who is responsible for OpenCL support in
Tensor flow now?
Thanks a lot.
Junli
On Tue, Jan 19, 2016 at 7:50 AM, bhack [email protected] wrote:
This was included in the Roadmap but also tagged as contribution so a
direction/bootstrap could be really useful.β
Reply to this email directly or view it on GitHub
#22 (comment)
.
Junli Gu--θ°·δΏδΈ½
Coordinated Science Lab
University of Illinois at Urbana-Champaign
from tensorflow.
I just assumed Benoit because he self assigned the feature, but I think you've got it Junli! Maybe start with an email or forum thread of interested parties?
from tensorflow.
@benoitsteiner knows more about interested parties that may not have shown
up in this thread (or this issue). I'd wait for him to coordinate to make
sure we avoid duplicating work.
On Tue, Jan 19, 2016 at 11:42 AM Dan McLaughlin [email protected]
wrote:
I just assumed Benoit because he self assigned the feature, but I think
you've got it Junli! Maybe start with an email or forum thread of
interested parties?β
Reply to this email directly or view it on GitHub
#22 (comment)
.
from tensorflow.
I'm interested. Is there any roadmap?
On Jan 19, 2016, at 11:46 AM, Martin Wicke [email protected] wrote:
@benoitsteiner knows more about interested parties that may not have shown
up in this thread (or this issue). I'd wait for him to coordinate to make
sure we avoid duplicating work.On Tue, Jan 19, 2016 at 11:42 AM Dan McLaughlin [email protected]
wrote:I just assumed Benoit because he self assigned the feature, but I think
you've got it Junli! Maybe start with an email or forum thread of
interested parties?β
Reply to this email directly or view it on GitHub
#22 (comment)
.β
Reply to this email directly or view it on GitHub.
from tensorflow.
Is there a list of CUDA dependency libraries that Tensorflow relying on?
This would help to see if we could have immediate OpenCL alternatives.
from tensorflow.
@hsaputra
There is clFFT, clBLAS (alternatively ViennaCL). Random number generator is a bit more tricky (no curand), either use a CPU generator and transfer to GPU or use another existing kernel for RNG.
The biggest pitfall will again be efficient convolution implementations (something like cuDNN).
There is experience about such issues here:
BVLC/caffe#2610
BVLC/caffe#2195
https://github.com/amd/OpenCL-caffe
from tensorflow.
Tensorflow use tensor extension upstreamed to Eigen. So I think that an Opencl/Sycl support to Eigen is needed. See this thread
from tensorflow.
Thanks @naibaf7. Yeah, I don't think there is a viable alternative for cuDNN for OpenCL right now.
from tensorflow.
@bhack from that thread and here it seems like @lukeiwanski is looking into it. I think we have enough willing people to work on it, we just need @benoitsteiner, @lukeiwanski or @gujunli to coordinate. Benoit has been quiet, maybe he's on holiday.
from tensorflow.
I would love to help contribute with this initiative.
from tensorflow.
@lukeiwanski Are you working or in contact with upstream? Do you think will be accepted upstream in Eigen?
from tensorflow.
+1
from tensorflow.
Great news @lukeiwanski, let us know of any help you need.
I'll guess you are using your own implementation of SYCL - will that be available for developers/researchers? On what platforms?
from tensorflow.
@lukeiwanski SYCL seems like the right way to go given the amount of template metaprogramming involved with Eigen. I'm an experienced c++ developer with OpenCL experience gained from developing my own neural nets and linear algebra library. I'd love to help with this effort and get started developing with SYCL.
from tensorflow.
@lukeiwanski is there any update/estimate regarding plans?
from tensorflow.
interested. would love to contribute.
from tensorflow.
Ok so actually seems that it is an effort of Codeplay with some kind of sync to Google internal. What are the role of AMD and Intel subscribers here?
from tensorflow.
/cc @keryell if you have any interest on this from SYCL/FPGA universe
from tensorflow.
@bhack sure I have some interest for high-end C++ on FPGA :-)
TensorFlow sounds like a good validation use-case for triSYCL too.
By the way, if some people here are looking for some internships on this subject, I have some positions. It looks like Codeplay is looking for some people too, if I trust their web site.
from tensorflow.
I'm really interested in @karlrupp and @hughperkins opinions. I hope they want to join in the discussion on the new google group.
from tensorflow.
@karlrupp See #22 (comment) at the end for the google group.
I asked your opinion cause you have a great experience with ViennaCL interfacing an algebra library with multiple backends (CPU, GPU, MIC). Tensorflow rely on Eigein library and its new tensor extension contributed by Google upstream (but only with CUDA backend). I think that they don't experienced much all the pitfall you have already encountered with ViennaCL in this years of development.
from tensorflow.
@bhack We are currently at the face-to-face meeting in Seattle this week but of course I cannot say whether we are talking about DNN libraries or not... :-)
from tensorflow.
@keryell Try to push the cause in Seattle ;)
from tensorflow.
@karlrupp You are right, OpenCL and CUDA are too different programming approaches. The single-source aspect found for example in CUDA and OpenMP 4.5 is extremely powerful from a software engineering perspective. This is why there is this SYCL standard for the real C++ programmers. SYCL can be seen as CUDA on steroids without any language extension and with some OpenMP aspects (the tasks). A typical SYCL device compiler is expected to generate SPIR-V kernels.
Your concerns about portability are less an issue with the SPIR-V standard (kind of portable equivalent of nVidia PTX/AMDIL/... in the Vulkan & OpenCL world) which is mandatory to accept in OpenCL 2.1 and Vulkan. So the beauty is that if you have a front-end that generates SPIR-V, you do not need special knowledge of the very details of the hardware to run on. There is a Khronos open-source bidirectional translator between LLVM IR and SPIR-V, so it opens quite new territories.
from tensorflow.
@keryell I agree that SPIR-V is a step forward. However, it does not address all issues of exhaustive jitting.
you do not need special knowledge of the very details of the hardware to run on
Is this a copy&paste from OpenCL 1.0 marketing, which claimed exactly the same? You will always need to go down to the details of the underlying hardware if you aim for maximum performance. This is especially the case in the context of fast tensor contractions.
from tensorflow.
...as @scott-gray demonstrated with neon
from tensorflow.
Is this a copy&paste from OpenCL 1.0 marketing, which claimed exactly the same?
Haha. :-)
You will always need to go down to the details of the underlying hardware if you aim for maximum performance. This is especially the case in the context of fast tensor contractions.
Of course, but before playing with the second-order optimization, it is useful to have the huge part of the whole templated C++ code running in some accelerated way.
For the optimization, either you stitch your optimized binary kernels Γ la NervanaSys or, since SYCL is pure C++, you can use asm("...") in it with a lot of #ifdef to test the target architecture. :-) That said, SPIR-V is itself extensible and I cannot see why we could not put inline VHDL or Verilog in it at some point. :-)
But more concretely, the recent introduction of sub-group operations should help to achieve good performance in a portable way and using simple built-in ad-hoc functions may help.
C++ adds interesting metaprogramming features that allows to replace most of the code generators used such as in clBLAS or other frameworks to generate code more adapted to X or Y hardware.
from tensorflow.
Also N4355 in c++17 could enter in the game soon or later
from tensorflow.
@karlrupp, @bhack The tensorflow approach is to rely on a hardware abstraction (the tensor module) for the majority of the operations needed in by a typical neural network, while relying on specialized libraries (such as cudnn) for the few operations that are really critical performance wise. The hardware abstraction enables us to implement most TensorFlow operations once and have them run on an accelerator with more than good enough performance.
from tensorflow.
@bhack Yes I love multidimensional arrays. Also in our domain of interest, there is the SG14 in the C++ committee that tries to have all the people interested in these issues to converge into the standard.
https://groups.google.com/a/isocpp.org/forum/#!forum/sg14
Of course SYCL is in the discussions. :-)
from tensorflow.
@benoitsteiner Mainly on cudnn for pooling and convolution. I think that if every vendor will produce an API with its own hardware for this operations with its own binary assembly will not be a so scalable approach. That is why I think some performance crucial API calls would be better to be standardized in some way.
from tensorflow.
@keryell There are really interesting topics for Matrix/Tensor in the new SG14 c++ specially in vector/SIMD calls agenda. But seems that nobody talked of convolution, pooling, and others useful "stabilized" deep learning interfaces. Also seems to me that in this specific standardization subgroups there are people from Nvidia, Intel, Amd, CodePlay etc.. but not from Google also if it is in others groups.
from tensorflow.
π
from tensorflow.
@bhack Yes there is no machine-learning style proposal in SG14 yet. But participation is open, so you can send some proposals. :-) But perhaps SG6 (numerics topics) is more relevant. I do not think they have their own mailing-list/forum yet.
from tensorflow.
@gujunli Does OpenCL Caffe run on Android? Sorry for asking this here but I didn't find anywhere else to ask it :) Would be great with a deep learning library that ran on Android devices and could use the GPU but it seems like there are no at the moment. (Correct me if I'm wrong!)
from tensorflow.
@krikru
The official (but experimental) OpenCL Caffe branch can be made to run on Android GPUs, however the performance at the moment is far from optimal. See sh1r0/caffe-android-lib#23 and https://github.com/BVLC/caffe/tree/opencl.
from tensorflow.
A real alternative to cudnn could be the extension of OpenVx standard objects with support to Tensor, NdConvolution, NdPooling operators and (probably) some other operator that could be considered standardizable.
Also cudnn team need to make some choice on what new API and operators they will introduce in every release. Of course a standard can not move as fast as cudnn releases but I think some operations and objects has enough "citations history" to be standardized.
from tensorflow.
@hughperkins At the moment, I haven't tried any deep learning library; I'm just doing some scouting to see which library I could potentially use. Have you tried cltorch and DeepCL on Android? I just assumed cltorch did work on Android, since there is an implementation of Torch that is dedicated specifically for Android. And why would you have such an implementation if there already was one that both worked on Android and used OpenCL, right? But maybe I should have known better.
from tensorflow.
@hughperkins For some reason I imagined that torch-android was an official Torch implementation for Android, meaning that no other Torch implementation (at least not official) was likely to run smoothly on Android, including cltorch. I don't know why I thought that, it of course doesn't make any sense.
from tensorflow.
Well... Soumith kind of coordinates torch development. He works at Facebook AI Research. So, since torch-android repo belongs to Soumith, I would say it's fairly close to official. But it maybe is not part of core for some reason. I guess you can ask the question as an issue in that repo, or in https://groups.google.com/forum/#!forum/torch7 Actually, since Soumith is kind of the main person that handles the requests in https://groups.google.com/forum/#!forum/torch7 , I reckon you probably want to post your question there.
from tensorflow.
meaning that no other Torch implementation (at least not official) was likely to run smoothly on Android, including cltorch
Note that cltorch is not an implementatino of torch. It's a plugin, thta provides OpenCL. You need both.
from tensorflow.
Note that cltorch is not an implementatino of torch. It's a plugin, thta provides OpenCL. You need both.
Ah, thanks for the clarification.
from tensorflow.
@naibaf7 Do the OpenCL Caffe branch and the OpenCL Caffe implementation by AMD have anything more in common besides the name? Have you compared the two or do you know if there is any difference in performance? You write that the OpenCL branch is far from optimal performance. What does that mean and what would be necessary in order to improve it? It would be interesting to try it on Android.
from tensorflow.
We are going off topic
from tensorflow.
@bhack Yeah, sorry for hijacking this thread. I just didn't know where to ask the question.
from tensorflow.
@krikru
please raise an issue about it on the Caffe branch, flag it with Android and OpenCL. Then we can discuss this further. Thanks.
from tensorflow.
@keryell Seems that the next f2f SG14 meeting in March will be hosted by Google. Will be any tensorflow internal there?
from tensorflow.
/cc @jfbastien
from tensorflow.
Perhaps @benoitsteiner could drop by, since he is local.
But before this event there is the full C++ F2F at the end of the month in Jacksonville, Florida.
https://isocpp.org/files/papers/N4568.pdf
Unfortunately I will not be able to attend any of them.
from tensorflow.
I don't know if CppCon 2015 talk C++ Multi-dimensional Arrays for Computational Physics and Applied Mathematics generated some paper follow-up.
from tensorflow.
+1
from tensorflow.
@bhack Thank you for pointing the talk on multi-dimensional arrays. It is interesting and address the real issues but looks too ad-hoc to be ratified in C++ as is. Personally I use Boost.MultiArray and I am more confident in a polished version of Boost.MultiArray.
from tensorflow.
There are also some papers at WG21. As you can see @jfbastien at Google has some activity at WG21 and also helped to host the SG14 f2f meeting at Google in March.
from tensorflow.
@bhack @keryell I think it would be worth taking this discussion to the SG14 mailing list as the details aren't related to OpenCL / tensorflow.
from tensorflow.
Yes probably it is no more so strictly confined here with all the details. Other than Eigen/sycl support Is there a plan for the cudnn calls?
from tensorflow.
+1 very interesting topic. Hope it coming soon.
from tensorflow.
This thread is very interesting. I've been trying to get caffe to work on android. The results seem to be surprising: caffe running with Mali gpu seems to be 2-3 slower than cpu, but about 4-5x more energy efficient. The test was run on Galaxy S6 (Mali T760, Peak Performance 200 GFlops).
Since GEMM is the core of convolution in caffe, I decided to profile its performance on Android. It seems that ViennaCL is not as efficient as some simple kernels. Now I am able to get GPU run as fast as CPU for large matrices (2k x 2k). This is still counter-intuitive, since normally we expect GPUs to be much faster.
See:
https://github.com/strin/mocha-profile
The kernel implementations can be found here:
OpenCL kernels for GEMM: https://github.com/strin/gemm-android
Any thoughts?
from tensorflow.
@strin Have you already followed this thread https://community.arm.com/thread/4935?
from tensorflow.
@bhack thanks for sharing. this thread looks very interesting. i tried to turn of the DVFS as suggested, but no significant performance was seen for sgemm in ViennaCL.
from tensorflow.
+1
from tensorflow.
@strin Have you tried the last sgemm version in the MALI SDK?
from tensorflow.
Tensorflow is latee ! ahah
https://gist.github.com/jarutis/ff28bca8cfb9ce0c8b1a
from tensorflow.
This will have an impact on the strategy: http://lists.llvm.org/pipermail/llvm-dev/2016-March/096576.html?
EDIT:
"StreamExecutor is currently used as the runtime for the vast majority of Google's internal GPGPU applications, and a snapshot of it is included in the open-source TensorFlow_ project, where it serves as the GPGPU runtime."
from tensorflow.
You can't always use the same commit comments in different repository ;) tensorflow/skflow#22
from tensorflow.
Related Issues (20)
- Gradients can't be computed for keras embeddings HOT 4
- Hermetic Cuda doesn't respect really aarch64(Tegra devices) HOT 4
- check failed: !PyErr_Occurred() when constructing two uint64 tensors HOT 1
- TF_SelectV2Op gets legalized to TFL_SelectOp
- Wheels have different metadata on different platforms HOT 3
- TextVectorization returns 'int64' vs 'float32' in TF 2.7 / nightly + Training simple unigram/bigram models much slower than in 2.15 HOT 3
- How to pack TFRT into wheel? And use it in saved_model_cli. HOT 3
- Custom `tf.keras.metrics.Metric` example fails on GPU in TF 2.17 (but not on nightly): is it possible to get it to work on 2.17? HOT 3
- Bazel build fails on RISC-V (riscv64) architecture for TensorFlow 2.17.0 HOT 1
- MLIR quantizer produces asymmetric quantization for int16 activations
- Python Wheel generation of TensorFlow Lite 2.17 for ARMv7l 32 bits not working HOT 2
- Auto-Encoding Variational Bayes HOT 1
- traceback issue HOT 1
- An `aborted issue` could be raised in TensorFlow when I used API `math_ops.cast` and `array_ops.split` HOT 1
- Got one aborted issue when using `data_flow_ops.MapStagingArea` HOT 2
- Using `gen_random_index_shuffle_ops.random_index_shuffle` with `rounds=-2` can cause a crash HOT 1
- Using `fft_ops` would cause an `aborted issue` HOT 2
- `tf_cond.cond` and `tf.function` could cause an aborted issue HOT 1
- TensorFlow does not support AMD Graphics Cards on Windows HOT 2
- tf-nightly[and-cuda] depends on archived version of nvidia-cudnn-cu12==8.9.7.29 HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow.