Giter Site home page Giter Site logo

Comments (45)

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024 1

So it doesn't like the maths functions? Huh, I wonder if we've changed something there...

So I found some AMD hardware and tried the Gaussian myself. It failed. Turns out, when I made changes to it last, I added this:

const auto pi = std::atan(1) * 4;

to calculate the value of pi as a const. The return type of the function is... double, this hardware doesn't support double as I suspect is also true of yours, the driver simply crashes when encountering a double - by changing it to const float, the code compiles and the test passes. I'll push that fix.

It's possible that somewhere in Tensorflow, we're using doubles - I don't really know the code at all, and I thought we were quite careful about that sort of thing. I can't look at this more tonight but might be able to look through the kernels on Monday to see if it's the same thing happening in there! Thanks for your help tracking this down.

(If you're interested, you can use the "extract-ir" script in the SDK to see the SPIR code, which is where I tracked down the use of double).

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024 1

As I mentioned, unfortunately on this older AMD driver that we have to use it straight up crashes when you use double, anywhere in the code, even if it's not in the kernel you're trying to run. There's no way we can recover from that unfortunately 😄

It might be useful to warn at the compiler level but honestly I think it'd be far too noisy. Lots of hardware supports it, too, and lots of drivers don't segfault but instead report an error!

from computecpp-sdk.

lukeiwanski avatar lukeiwanski commented on May 30, 2024 1

Doubles and Halfs should be optional in TF.
@guoyejun started process of making that happen here tensorflow/tensorflow#11545 as beignet exposed this problem

However it seems like mentioned PR is stuck.

@jwlawson and I are introducing config option that enables / disables half and double here: jwlawson/tensorflow@5ec5964

It should be in https://github.com/lukeiwanski/tensorflow/tree/dev/amd_gpu soon-ish.

After that happens, someone needs to go through all registered Ops and use TF_CALL_SYCL_NUMBER_TYPES macro for registration.

@mirh would you like to give it a go? :)

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024 1

Aaaand fixed. Thank you all.
(I mean still segfaults, but it's a totally different deal and stacktrace)

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024 1

Hi @mirh,
One of my colleagues had a good idea for what this issue might be - historically, some OpenCL implementations had issues with kernel names that were really long, like the ones found in Eigen. To that end, I think you could try checking out the tip of dev/amd_gpu and applying the following patch:

diff --git a/third_party/sycl/crosstool/CROSSTOOL.tpl b/third_party/sycl/crosstool/CROSSTOOL.tpl
index 3078b5b534..c62b5b93e1 100755
--- a/third_party/sycl/crosstool/CROSSTOOL.tpl
+++ b/third_party/sycl/crosstool/CROSSTOOL.tpl
@@ -175,6 +175,7 @@ toolchain {
   cxx_flag: "-DEIGEN_HAS_CXX11_MATH=1"
   cxx_flag: "-Wno-unused-variable"
   cxx_flag: "-Wno-unused-const-variable"
+  cxx_flag: "-sycl-compress-name"
 
   unfiltered_cxx_flag: "-Wno-builtin-macro-redefined"
   unfiltered_cxx_flag: "-D__DATE__=\"redacted\""

This patch will make the compiler output kernel names that are hashed versions of the "true" kernel name, which means they will have fixed length. Looking at the repo at the commits you mention, this does seem to be consistently one of the changes. Thanks so much for persevering with this!

from computecpp-sdk.

lukeiwanski avatar lukeiwanski commented on May 30, 2024 1

@Rbiessy could you take a look at this? I believe we need to add check around the double and half cases based on the ./configure options

from computecpp-sdk.

lukeiwanski avatar lukeiwanski commented on May 30, 2024 1

@mirh can you give it a spin at lukeiwanski/tensorflow@1e0dd42 ?

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024 1

It did it! (together with --sycl-compress-name)

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024 1

That's a good idea. We'll try to add to the FAQ on our website - this issue is more likely to affect TensorFlow code, because of the frankly huge kernel names, but could strike anywhere.

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

Hi @mirh, thanks for the report. Unfortunately without more information it's hard to say what's going wrong, essentially all this tells us is that when AMD's OpenCL implementation is trying to build the program it is crashing. I know that at least one version of the driver crashes when you try to use doubles on a device that does not support them, but I am sure there could be other reasons. What does the device code look like in this example?

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

It's the example you recommend here, nothing more.

from computecpp-sdk.

znmeb avatar znmeb commented on May 30, 2024

@mirh what's your operating system and OpenCL library? On Linux, the Mesa OpenCL implementation "Clover" is quite buggy - I can only get OpenCL to work with the proprietary AMD code.

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Yeah, I absolutely know.
I'm using fglrx 15.9 here (which is the last one available for my gpu - lucky you).

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Great news, everyone!
Seems like I'm getting the very same "ending" crash trace for yours gaussian-blur test (and only that, so I guess it's not that bad).
Could that help?

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

Ending? Could you elaborate, please?

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

I meant, the instructions "on the top" of the stack trace where the same.
(ie aclCompile followed by clbuildprogram)

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

Hi @mirh, sorry it's taken me a while to get back to you. Getting back up to speed, you get build failures in Tensorflow, and likewise in the gaussian blur sample? Some small changes were made over the holidays, could you maybe try them?

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Yes, exactly. No difference though.

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

Hmmmm... Depending on how much time you want to spend on this, you could start removing code from the kernel in the Gaussian Blur sample, to see if it ever passes the compilation... that would at least let us know what sort of construct is causing the failures. I can understand if you'd rather not do that though, as it is something of a slog.

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Commenting this, then replacing elem in L129 with 10 made the program execute (a blank image in the end eventually, but still)

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

OK, pushed here: 19be0cf

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Cool! Now all ctests work!
Then, lack of double precision is just a matter of my/ours ancient hardware (and I think just straight crashing instead of reporting an error or something should even be a problem in the driver)
But you shouldn't need to have to change programs I guess?
Either because binary64 is found not to bring any actually meaningful improvement and should just be 'converted' as binary32, or because computecpp stops (or at least warn!) you on compilation.

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

You are right.
Maybe you should just make a wiki page, with every error symptom "linked" to its reported cause.
EDIT: for as much this might give some other ideas

In other news
Back to our original problem about TF, I don't *think* that code would be supposed to have doubles at all.
Soo.. I dunno, guess like you could check yourself its samples? 😃

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Guess like.
But given all the time I need for compile, I'd first hope some degree of confidence in the thing working 🙃

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

In the end, while I intended to look at the sycl files to see if double turned up in them, I didn't really have time. I still might be able to investigate but certainly not today or tomorrow.

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024
#17 0x00007fffb49190e0 in clBuildProgram () from /usr/lib/libamdocl64.so
#18 0x00007fffe0e8382f in clBuildProgram () from /usr/lib/libOpenCL.so.1
#19 0x00007fffe15fdc28 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#20 0x00007fffe15fdf62 in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#21 0x00007fffe157b326 in cl::sycl::detail::context::create_program_for_binary(std::shared_ptr<cl::sycl::detail::context> const&, unsigned char const*, int, bool) () from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#22 0x00007fffe157e12f in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#23 0x00007fffe85c3ea4 in cl::sycl::program cl::sycl::program::create_program_for_kernel<Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> >(cl::sycl::context) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#24 0x00007fffe85c3670 in void cl::sycl::handler::parallel_for_impl<Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true>, Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> >(cl::sycl::detail::nd_range_base const&, Eigen::TensorSycl::ExecExprFunctorKernel<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> >, utility::tuple::Tuple<utility::tuple::Tuple<Eigen::DSizes<long, 1> > > >, utility::tuple::Tuple<Eigen::RangeAccess<(cl::sycl::access::mode)2>, Eigen::RangeAccess<(cl::sycl::access::mode)0> >, true> const&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#25 0x00007fffe85c25bf in Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const
    () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#26 0x00007fffe85c237f in cl::sycl::event cl::sycl::detail::command_group::submit_handler<Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}>(Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}, std::shared_ptr<cl::sycl::detail::queue>const&, cl::sycl::detail::standard_handler_tag) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#27 0x00007fffe85c219f in cl::sycl::event cl::sycl::queue::submit<Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}>(Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&)::{lambda(cl::sycl::handler&)#1}) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#28 0x00007fffe85c2021 in Eigen::TensorSycl::SYCLExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::SyclDevice, false>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorConversionOp<float, Eigen::TensorMap<Eigen::Tensor<unsigned char const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::SyclDevice const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#29 0x00007fffe85c1e50 in std::_Function_handler<void (tensorflow::OpKernelContext*, tensorflow::Tensor const&, tensorflow::Tensor*), tensorflow::GetSyclCastFromUint8(tensorflow::DataType)::$_20>::_M_invoke(std::_Any_data const&, tensorflow::OpKernelContext*&&, tensorflow::Tensor const&, tensorflow::Tensor*&&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#30 0x00007fffe83e66bc in tensorflow::CastOpBase::Compute(tensorflow::OpKernelContext*) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#31 0x00007fffe229d06d in tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long) () from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#32 0x00007fffe229dc58 in std::_Function_handler<void (), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(tensorflow::gtl::InlinedVector<tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, 8> const&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::$_1>::_M_invoke(std::_Any_data const&) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
#33 0x00007fffe22f32f2 in Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so

Aaaand.. It's here again (computecpp 0.7.0, tensorflow 1.8). I'll try to downgrade some stuff and see what comes out. 0.6.1 doesn't compile with that.

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Ok so.. lukeiwanski/tensorflow@0fc77bd with 0.6.1 still segfaults...
While an older build I had on lukeiwanski/tensorflow@9b6db88 is fine. Will try to recompile this, to see if the magic is on some code on my side changing or not.

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Well, FML.
lukeiwanski/tensorflow@9b6db88 crash in libamdocl64 (I don't have stack trace, but I guess not very dissimilarly from the reported issue).
And so does the slightly older lukeiwanski/tensorflow@d7bc636.

But.. lukeiwanski/tensorflow@591d829 actually runs? Without even lukeiwanski/tensorflow#205 ?!
I mean, it fails (not segfault!) after some good amount of minutes complaining about "Tensor had NaN values", but that seems a total cakewalk to take care of then.

I'll now try to bisect the last handful of commits, but I cannot understand what.. Build dependencies must have changed in this month or so to change code behavior? Same commits, different results.
EDIT: mhh, thinking better, *I* did one actual change myself, this time I have been disabling all possible goddamn configure switches, to speed up compilation.
Also going to check that I guess..

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

I went further on with bissecting, and indeed lukeiwanski/tensorflow@3cc8566 was making the deal.
On the other hand, when I tried the tip of dev/amd_gpu (plus that line) I still was getting crashes.

So.. progressing for whatever else might be.

EDIT: fuck, I just noticed I put it in the *wrong* lines (local abi, instead of cross_target)

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Well, I just put it on both, and it still didn't work (with the tip)
So I guess I'll try to bisect very slowly wherever the hell that might have stopped to work in turn?

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

If you have the time, that would be very useful! It's a shame that didn't work, I really thought we were onto something there...

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

I'm still doing my testings, but are you sure that option goes as a cxx_flag, rather than compiler_flag or linker_flag something?

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

cxx_flags are passed to the compiler when compiling C++ source flags (I say this because the compiler could be, for example, gcc -x c++, in some cases). It's definitely not a linker flag, In fact, that particular flag is specific to compute++, so no other compiler will be able to understand what it might mean.

I'd be surprised if there were issues, because lots of flags get added in the same way by bazel, and if they don't get added you will have many more compile errors (or sometimes runtime errors - std::err messages about "missing kernels" would indicate that this had happened). Technically there's a way to check - but it's a little bit involved (a nontrivial change, with some thought attached I'm afraid).

If you're willing to keep bisecting to find the time when it broke, that might be our new best chance!

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Ok so, updates..
For the moment, after quite some WTFing I found it should be something between lukeiwanski/tensorflow@8410038 and lukeiwanski/tensorflow@8fd87cd making the holy -sycl-compress-name flag stop to work

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

lukeiwanski/tensorflow@f850c60 is the breaking commit

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

Oh - maybe it's the half-type code. @lukeiwanski halfs are a configurable option, right? Maybe we can turn it off...

from computecpp-sdk.

lukeiwanski avatar lukeiwanski commented on May 30, 2024

hmm that's good point - disabling halfs might not be enough.. let me try something.

from computecpp-sdk.

Rbiessy avatar Rbiessy commented on May 30, 2024

That should do the trick: lukeiwanski/tensorflow#245
So now we are not compiling kernels in CastOp with half or with double if they are disabled. There may be other places where we do a similar mistake though.

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

Haha, finally! Well done everyone & @Rbiessy, and thanks for sticking with it @mirh!

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

together with --sycl-compress-name

So.. Are you like doing anything for that?

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

What do you mean? This option makes the compiler hash the name of the kernel and output that instead because of some buggy drivers, that's all. It makes debugging harder though so we tend not to enable it by default.

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Makes sense.
Could it be auto enabled for, I dunno, say only amd's CL_DRIVER_VERSION below 2000?

from computecpp-sdk.

DuncanMcBain avatar DuncanMcBain commented on May 30, 2024

Since it's a compiler flag, there's no way for the ComputeCpp runtime to identify which driver it is running on and change the flag accordingly.

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

Lol, right.
Could you at least have a FAQ for this issue?
I don't know, like "python segmentation fault in libamdocl64.so"?

from computecpp-sdk.

mirh avatar mirh commented on May 30, 2024

--cxxopt="-sycl-compress-name" on command line also does it.

from computecpp-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.