As mentionned in <a class="issue-link js-issue-link" data-error-text="Failed to load t

For context, I'm using current <a href="https://github.com/lukeiwanski/tensorflow/tree

Using --config=sycl_arm fails like that: <code class=

Okay, so it seems TF_SYCL_CROSS_TOOLCHAIN and <code c

To summup: is it expected I should use <code class="notranslat

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

ARM cross-compilation (tl;dr: use proper SPIR target),about codeplaysoftware/computecpp-sdk

Comments (52)

Rbiessy commented on May 30, 2024 1

Ohh ok gotcha so that's why you said

Rebuilding with Snapshot included

from computecpp-sdk.

lissyx commented on May 30, 2024 1

Ok, I don't think there's anything needed here. Remainer of the discussion is really tailored to vc4 driver so far, and it is happening on their repo. I'll file new issues if moving forward with the driver reveals issues on the ComputeCpp side.

from computecpp-sdk.

lissyx commented on May 30, 2024

For context, I'm using current https://github.com/lukeiwanski/tensorflow/tree/integration/1.8

from computecpp-sdk.

lissyx commented on May 30, 2024

Okay, first mistake on my side, it looks like I should --config=sycl_arm instead of --config=sycl ?

from computecpp-sdk.

lissyx commented on May 30, 2024

Using --config=sycl_arm fails like that: ERROR: Toolchain identifier '' for cpu 'armeabi' is illegal (does not match '[a-zA-Z_][\.\- \w]*')

from computecpp-sdk.

lissyx commented on May 30, 2024

Okay, so it seems TF_SYCL_CROSS_TOOLCHAIN and TF_SYCL_CROSS_TOOLCHAIN_NAME are not being put into the action_env, and that's why sycl_configure.bzl fails to identify cross-compilation. I'm now able to start the build with proper SYCL-level cross-compilation. It seems I still have to pass a ComputeCpp SDK that can run on my system, and not the ARM one.

Also, proper value for TF_SYCL_CROSS_TOOLCHAIN_NAME=arm-linux-gnueabihf because that should be a valid clang triplet, and TF_SYCL_CROSS_TOOLCHAIN=gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/. Now, waiting for the build to go to the end :)

from computecpp-sdk.

lissyx commented on May 30, 2024

Looks like clang sees different things when cross-compiling. Currently stuck with those errors on protobuf, never had any issue cross-compiling for armv7 and armv8 with gcc 4.9.4 and gcc 7.2.1:

$ grep 'error: ' build.log 
external/protobuf_archive/src/google/protobuf/arena_impl.h:280:55: error: cast from pointer to smaller type 'google::protobuf::internal::AtomicWord' (aka 'int') loses information
external/protobuf_archive/src/google/protobuf/metadata_lite.h:142:9: error: cast from pointer to smaller type 'intptr_t' (aka 'int') loses information
external/protobuf_archive/src/google/protobuf/metadata_lite.h:142:9: error: cast from pointer to smaller type 'intptr_t' (aka 'int') loses information
external/protobuf_archive/src/google/protobuf/metadata_lite.h:76:15: error: no matching member function for call to 'PtrValue'
external/protobuf_archive/src/google/protobuf/metadata_lite.h:137:12: error: cast from pointer to smaller type 'intptr_t' (aka 'int') loses information
external/protobuf_archive/src/google/protobuf/metadata_lite.h:137:12: error: cast from pointer to smaller type 'intptr_t' (aka 'int') loses information
external/protobuf_archive/src/google/protobuf/metadata_lite.h:142:9: error: cast from pointer to smaller type 'intptr_t' (aka 'int') loses information
external/protobuf_archive/src/google/protobuf/metadata_lite.h:142:9: error: cast from pointer to smaller type 'intptr_t' (aka 'int') loses information
external/protobuf_archive/src/google/protobuf/metadata_lite.h:76:15: error: no matching member function for call to 'PtrValue'

from computecpp-sdk.

lissyx commented on May 30, 2024

To summup:

is it expected I should use COMPUTECPP_TOOLKIT_PATH=ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64/ when cross-compiling for ARM ?
is it expected that action_env flags TF_SYCL_CROSS_TOOLCHAIN and TF_SYCL_CROSS_TOOLCHAIN_NAME are not being passed from configure and that I need to specify them ?
According to my described steps, should I expect to see the build to complete?

from computecpp-sdk.

Rbiessy commented on May 30, 2024

Hello lissyx,

Our cross-compilation is still experimental and is indeed not documented anywhere yet. Thank you for your interest in it.

You seem to be really close to getting it working. Here are a few tips that should help you:

you should create a new computecpp folder that contains everything from ComputeCpp-CE-0.7.0-Ubuntu-16.04-x86_64 except for libComputeCpp.so which should be replaced by the one from ComputeCpp-CE-0.7.0-Ubuntu-16.04-ARM_64.
you have to export TF_SYCL_CROSS_TOOLCHAIN and TF_SYCL_CROSS_TOOLCHAIN_NAME to your environment before running configure.
you will also probably have to export CC_OPT_FLAGS="-march=armv8-a".

On a side note if you are using the tip of dev/amd_gpu you will need to update to ComputeCpp CE 0.8.0 but that is not needed for cross-compiling.

Hope this helps!

from computecpp-sdk.

lissyx commented on May 30, 2024

Thanks @Rbiessy for your feedback. So it means with my current setup, it should work upto the link stage (because currently my libComputeCpp.so is amd64 and not armv7).

I did pass TF_SYCL_CROSS_TOOLCHAIN TF_SYCL_CROSS_TOOLCHAIN_NAME when running ./configure, but they were not being picked Is it possible this might result in misconfiguration ?

I'm targetting ARMv7 and not ARMv8 so I guess I need to use -march=armv7-a as we do already for our other targets :)

from computecpp-sdk.

lissyx commented on May 30, 2024

BTW i'm using lukeiwanski/integration/1.8, is that right with ComputeCpp v0.7.0 ?

from computecpp-sdk.

Rbiessy commented on May 30, 2024

Yes it should fail to link.
I am not sure what you mean by not being picked by configure? But since you specified --config=sycl-arm it would have failed immediately if these were wrong.
Oh right, it should (cross-)compile but I would recommend CE 0.8.0 for integration/1.8.

from computecpp-sdk.

lissyx commented on May 30, 2024

Would you be able to share which toolchain you are using on your side @Rbiessy ? And which ARM target ?

from computecpp-sdk.

lissyx commented on May 30, 2024

@Rbiessy Well, when I say not being picked by configure I mean I run it with those in the env:

TF_SYCL_CROSS_TOOLCHAIN=xxx TF_SYCL_CROSS_TOOLCHAIN_NAME=yyy ./configure

But then they are not written into .tf_configure.bazelrc, and thus at the bazel build step, they are not there. Now, it's likely if you export, that they might get seen at the bazel build time, but it's a bit weird since other variables do not :-). Not a huge issue IMHO.

I'll give a try to ComputeCpp 0.8.0, but I'm not sure. It feels like there's some mess in the include directories. I'm trying that with GCC 4.9.4 from Linaro: https://releases.linaro.org/components/toolchain/binaries/4.9-2017.01/arm-linux-gnueabihf/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf.tar.xz

from computecpp-sdk.

Rbiessy commented on May 30, 2024

I tried the cross-compilation with TF_SYCL_CROSS_TOOLCHAIN=gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu and TF_SYCL_CROSS_TOOLCHAIN_NAME=aarch64-linux-gnu (64bit ARMv8). Also the mali OpenCL driver requires spirv64 which can be specified with TF_SYCL_BITCODE_TARGET if needed.
That reminds me that you will have some issues with python. You have to make sure to download all the TF python dependencies installed on your machine for the targeted architecture. Then create a symlink in your toolchain like so: ln -s /usr/include/aarch64-linux-gnu/python2.7/ path/to/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc/usr/include/aarch64-linux-gnu/ (for my setup).

I see, if I remember correctly it was because we didn't want to introduce even more questions specific to ComputeCpp in the configure.

Finally it is pretty much guaranteed that you will have more issues specific to 32bit architectures. We plan to work on that at some point.

from computecpp-sdk.

lissyx commented on May 30, 2024

Thanks! We make no use of Python, so we should be good on that side. I'll verify build with armv8 as well, to check if I'm doing something wrong or if that is just the current status of the build support :)

from computecpp-sdk.

lissyx commented on May 30, 2024

As expected: using 0.8.0 does not help :-). I'll check targetting ARM64.

from computecpp-sdk.

lissyx commented on May 30, 2024

@Rbiessy So, using 0.8.0-ubuntu-14.04-x86_64, gcc 6.3.1 aarch64, I'm facing another build error, with this head:

commit 7a0ef1665355e9378206258f25a5e5463b9f4b86 (HEAD, lukeiwanski/integration/1.8)
Author: Luke Iwanski <[email protected]>
Date:   Thu May 17 12:31:10 2018 +0100

    [Temp] Fixes compilation issue that happens when using ComputeCpp 0.8

  (cd DeepSpeech/BazelCache/output_sycl_arm/execroot/org_tensorflow && \
  exec env - \
    COMPUTECPP_TOOLKIT_PATH=DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/ \
    PATH=/home/alexandre/bin:/home/alexandre/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python \
    PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
    TF_DOWNLOAD_CLANG=0 \
    TF_NEED_COMPUTECPP=1 \
    TF_NEED_CUDA=0 \
    TF_NEED_OPENCL_SYCL=1 \
    TF_SYCL_BITCODE_TARGET=spir64 \
  DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/bin/compute -target aarch64-linux-gnu '--gcc-toolchain=DeepSpeech/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu' '--sysroot=DeepSpeech/gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc' -ffunction-sections -fdata-sections -fPIE -fno-omit-frame-pointer -Wall -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -DARM_NON_MOBILE -DNO_LOCAL_MEM '-DDISABLE_SKINNY=1' '-fvisibility=hidden' -DCTC_DISABLE_OMP '-std=c++11' -isystem DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include -fsycl-ih-last -sycl-driver -Xclang -cl-denorms-are-zero -Xclang -cl-fp32-correctly-rounded-divide-sqrt -Xclang -cl-mad-enable -sycl-target spir64 '-DTENSORFLOW_USE_SYCL=1' '-DEIGEN_USE_SYCL=1' '-DEIGEN_HAS_C99_MATH=1' '-DEIGEN_HAS_CXX11_MATH=1' -Wno-unused-variable -Wno-unused-const-variable '-DTENSORFLOW_SYCL_NO_HALF=1' '-DTENSORFLOW_SYCL_NO_DOUBLE=1' -MD -MF bazel-out/armeabi-opt/bin/tensorflow/core/_objs/lib_internal_impl/tensorflow/core/lib/bfloat16/bfloat16.pic.d '-frandom-seed=bazel-out/armeabi-opt/bin/tensorflow/core/_objs/lib_internal_impl/tensorflow/core/lib/bfloat16/bfloat16.pic.o' -fPIC -D__CLANG_SUPPORT_DYN_ANNOTATION__ -DEIGEN_MPL2_ONLY -DTENSORFLOW_USE_ABSL -DTF_USE_SNAPPY -iquote . -iquote bazel-out/armeabi-opt/genfiles -iquote external/com_google_absl -iquote bazel-out/armeabi-opt/genfiles/external/com_google_absl -iquote external/bazel_tools -iquote bazel-out/armeabi-opt/genfiles/external/bazel_tools -iquote external/nsync -iquote bazel-out/armeabi-opt/genfiles/external/nsync -iquote external/protobuf_archive -iquote bazel-out/armeabi-opt/genfiles/external/protobuf_archive -iquote external/eigen_archive -iquote bazel-out/armeabi-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/armeabi-opt/genfiles/external/local_config_sycl -iquote external/opencl_headers -iquote bazel-out/armeabi-opt/genfiles/external/opencl_headers -iquote external/double_conversion -iquote bazel-out/armeabi-opt/genfiles/external/double_conversion -iquote external/gif_archive -iquote bazel-out/armeabi-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/armeabi-opt/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/armeabi-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/armeabi-opt/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/armeabi-opt/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/armeabi-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/armeabi-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/armeabi-opt/genfiles/external/zlib_archive -iquote external/snappy -iquote bazel-out/armeabi-opt/genfiles/external/snappy -Ibazel-out/armeabi-opt/bin/external/opencl_headers/_virtual_includes/OpenCL-Headers -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/nsync/public -isystem bazel-out/armeabi-opt/genfiles/external/nsync/public -isystem external/protobuf_archive/src -isystem bazel-out/armeabi-opt/genfiles/external/protobuf_archive/src -isystem external/eigen_archive -isystem bazel-out/armeabi-opt/genfiles/external/eigen_archive -isystem external/local_config_sycl/sycl -isystem bazel-out/armeabi-opt/genfiles/external/local_config_sycl/sycl -isystem external/local_config_sycl/sycl/include -isystem bazel-out/armeabi-opt/genfiles/external/local_config_sycl/sycl/include -isystem external/double_conversion -isystem bazel-out/armeabi-opt/genfiles/external/double_conversion -isystem external/gif_archive/lib -isystem bazel-out/armeabi-opt/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/armeabi-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/armeabi-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/armeabi-opt/genfiles/external/zlib_archive -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -DTENSORFLOW_MONOLITHIC_BUILD -pthread -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -no-canonical-prefixes -c tensorflow/core/lib/bfloat16/bfloat16.cc -o bazel-out/armeabi-opt/bin/tensorflow/core/_objs/lib_internal_impl/tensorflow/core/lib/bfloat16/bfloat16.pic.o)
In file included from tensorflow/core/lib/bfloat16/bfloat16.cc:18:
In file included from ./third_party/eigen3/Eigen/Core:1:
In file included from external/eigen_archive/Eigen/Core:446:
external/eigen_archive/Eigen/src/Core/arch/SYCL/PacketMath.h:373:10: error: no matching function for call to 'select'
  return cl::sycl::select(thenPacket, elsePacket, condition);
         ^~~~~~~~~~~~~~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/type_traits.h:31:45: note: candidate template ignored: disabled by 'enable_if' [with T1 = cl::sycl::vec<double, 2>, T2 = cl::sycl::vec<int, 2>]
using enable_if_t = typename std::enable_if<B, T>::type;
                                            ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/type_traits.h:31:45: note: candidate template ignored: disabled by 'enable_if' [with T1 = cl::sycl::vec<double, 2>, T2 = cl::sycl::vec<int, 2>]
In file included from tensorflow/core/lib/bfloat16/bfloat16.cc:18:
In file included from ./third_party/eigen3/Eigen/Core:1:
In file included from external/eigen_archive/Eigen/Core:322:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/sycl.hpp:20:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/sycl_builtins.h:23:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/cpp_to_cl_cast.h:13:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/type_traits.h:21:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec.h:23:
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_common.h:44:36: error: invalid vector element type 'cl::sycl::vec<int, 4>'
using __sycl_vector __attribute__((ext_vector_type(kElems))) = dataT;
                                   ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:319:18: note: in instantiation of template type alias '__sycl_vector' requested here
  inline detail::__sycl_vector<dataT, kElems> get_data() const;
                 ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:414:35: note: in instantiation of template class 'cl::sycl::detail::mem_container_storage<cl::sycl::vec<int, 4>, 4>' requested here
class mem_container_base : public mem_container_storage<dataT, kElems> {};
                                  ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:433:14: note: in instantiation of template class 'cl::sycl::detail::mem_container_base<cl::sycl::vec<int, 4>, 4>' requested here
    : public mem_container_base<dataT, kElems> {
             ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:459:48: note: in instantiation of template class 'cl::sycl::detail::mem_container<cl::sycl::vec<int, 4>, 4, 1>' requested here
class mem_container<dataT, kElems, 2> : public mem_container<dataT, kElems, 1> {
                                               ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:543:48: note: in instantiation of template class 'cl::sycl::detail::mem_container<cl::sycl::vec<int, 4>, 4, 2>' requested here
class mem_container<dataT, kElems, 3> : public mem_container<dataT, kElems, 2> {
                                               ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:990:48: note: (skipping 1 context in backtrace; use -ftemplate-backtrace-limit=0 to see all)
class mem_container<dataT, kElems, 4> : public mem_container<dataT, kElems, 3> {
                                               ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:9439:20: note: in instantiation of template class 'cl::sycl::detail::mem_container<cl::sycl::vec<int, 4>, 4, 4>' requested here
class vec : public detail::mem_container<dataT, kElems, kElems> {
                   ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:87:8: note: in instantiation of template class 'cl::sycl::vec<cl::sycl::vec<int, 4>, 4>' requested here
       srcVectT::width == destVecT::width);
       ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:9572:49: note: in instantiation of template class 'cl::sycl::detail::is_valid_vec_convert_conversion<cl::sycl::vec<cl::sycl::vec<int, 4>, 4>, cl::sycl::vec<float, 4> >' requested here
                typename std::enable_if<detail::is_valid_vec_convert_conversion<
                                                ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:9574:25: note: in instantiation of default argument for 'convert<cl::sycl::vec<int, 4>, cl::sycl::rounding_mode::automatic>' required here
  vec<convertT, kElems> convert() const;
                        ^~~~~~~~~~~~~~~
external/eigen_archive/Eigen/src/Core/arch/SYCL/TypeCasting.h:38:22: note: while substituting deduced template arguments into function template 'convert' [with convertT = cl::sycl::vec<int, 4>, roundingMode = cl::sycl::rounding_mode::automatic, $2 = (no value)]
  return a. template convert<cl::sycl::cl_int4, cl::sycl::rounding_mode::automatic>();
                     ^
In file included from tensorflow/core/lib/bfloat16/bfloat16.cc:18:
In file included from ./third_party/eigen3/Eigen/Core:1:
In file included from external/eigen_archive/Eigen/Core:322:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/sycl.hpp:20:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/sycl_builtins.h:23:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/cpp_to_cl_cast.h:13:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/type_traits.h:21:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec.h:23:
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_common.h:44:36: error: invalid vector element type 'cl::sycl::vec<int, 4>'
using __sycl_vector __attribute__((ext_vector_type(kElems))) = dataT;
                                   ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_common.h:44:36: error: invalid vector element type 'cl::sycl::vec<int, 4>'
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_common.h:44:36: error: invalid vector element type 'cl::sycl::vec<int, 4>'
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_common.h:44:36: error: invalid vector element type 'cl::sycl::vec<int, 4>'
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_common.h:44:36: error: invalid vector element type 'cl::sycl::vec<int, 4>'
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_common.h:44:36: error: invalid vector element type 'cl::sycl::vec<int, 4>'
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_common.h:44:36: error: invalid vector element type 'cl::sycl::vec<int, 4>'
In file included from tensorflow/core/lib/bfloat16/bfloat16.cc:18:
In file included from ./third_party/eigen3/Eigen/Core:1:
In file included from external/eigen_archive/Eigen/Core:322:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/sycl.hpp:20:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/sycl_builtins.h:23:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/cpp_to_cl_cast.h:13:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/type_traits.h:21:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec.h:24:
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:10754:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:87:8: note: in instantiation of template class 'cl::sycl::vec<cl::sycl::vec<int, 4>, 4>' requested here
       srcVectT::width == destVecT::width);
       ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:9572:49: note: in instantiation of template class 'cl::sycl::detail::is_valid_vec_convert_conversion<cl::sycl::vec<cl::sycl::vec<int, 4>, 4>, cl::sycl::vec<float, 4> >' requested here
                typename std::enable_if<detail::is_valid_vec_convert_conversion<
                                                ^
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:9574:25: note: in instantiation of default argument for 'convert<cl::sycl::vec<int, 4>, cl::sycl::rounding_mode::automatic>' required here
  vec<convertT, kElems> convert() const;
                        ^~~~~~~~~~~~~~~
external/eigen_archive/Eigen/src/Core/arch/SYCL/TypeCasting.h:38:22: note: while substituting deduced template arguments into function template 'convert' [with convertT = cl::sycl::vec<int, 4>, roundingMode = cl::sycl::rounding_mode::automatic, $2 = (no value)]
  return a. template convert<cl::sycl::cl_int4, cl::sycl::rounding_mode::automatic>();
                     ^
In file included from tensorflow/core/lib/bfloat16/bfloat16.cc:18:
In file included from ./third_party/eigen3/Eigen/Core:1:
In file included from external/eigen_archive/Eigen/Core:322:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/sycl.hpp:20:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/sycl_builtins.h:23:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/cpp_to_cl_cast.h:13:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/type_traits.h:21:
In file included from DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec.h:24:
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:10792:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:10830:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:10868:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:10906:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:10944:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:10982:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:11020:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:11058:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include/SYCL/vec_impl.h:11096:64: error: no type named 'type' in 'cl::sycl::detail::vec_ops::logical_return<16>'
  vec<typename detail::vec_ops::logical_return<sizeof(dataT)>::type, kElems>
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
INFO: Elapsed time: 3.328s, Critical Path: 3.04s
FAILED: Build did NOT complete successfully

from computecpp-sdk.

lissyx commented on May 30, 2024

I'm getting the same with TF_USE_DOUBLE_SYCL=1 TF_USE_HALF_SYCL=1, am I just unlucky and the current integration/1.8 I'm trying is just broken for Aarch64 cross-compilation ? If it's that experimental, It would not be surprising :-)

from computecpp-sdk.

Rbiessy commented on May 30, 2024

I think this is because integration/1.8 uses a old version of Eigen. I'd suggest you bump the Eigen version to the commit 410527dff31d (have a look at lukeiwanski/tensorflow@0c833af to see how to do that).
You should also stick with TF_USE_HALF_SYCL=0 for now.
We should be able to cross-compile this branch for aarch64 but this hasn't been tested yet.

from computecpp-sdk.

lissyx commented on May 30, 2024

Right, thanks, I totally understand it might be broken. But so far, it seems like updating eigen does the trick :-). So at least now I'm mostly pretty sure I'm doing it right, and I know it's more than expected to fail. I'll try to move forward on ARM build for the RPi3, hopefully getting something in the end :-).

EDIT: After making sure that I replaced libComputeCpp.so with ARM64 one, final link stage properly completes. So, the setup works here for ARM64 :-)

from computecpp-sdk.

lissyx commented on May 30, 2024

Successfull build as well with:

ComputeCpp 0.8.0 Ubuntu 14.04
Eigen @ 410527dff31d
GCC 4.9.4 Aarch64 https://releases.linaro.org/components/toolchain/binaries/4.9-2017.01/aarch64-linux-gnu/gcc-linaro-4.9.4-2017.01-x86_64_aarch64-linux-gnu.tar.xz

from computecpp-sdk.

lissyx commented on May 30, 2024

Same setup, it's failing:

ComputeCpp 0.8.0 Ubuntu 14.04
Eigen @ 410527dff31d
GCC 4.9.4 ARM

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

So the Aarch64 build works, but the arm32 fails? Just to clarify that's what's happening.

from computecpp-sdk.

lissyx commented on May 30, 2024

@DuncanMcBain That's exactly that, aarch64 works, but arm32 does not. I just re-verified, on a tensorflow r1.8 branch with cross-compilation for RPi3 using GCC 4.9.4, and it also does build properly. I have a feeling that the clang in the middle is doing funny things ?

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

Possible! Apologies if you've posted it, but do you have a build log for the 32-bit failure? Scanning upthread I only saw the Aarch64 one (which I assume is now fixed).

from computecpp-sdk.

lissyx commented on May 30, 2024

Yeah, there was one above, but it's not really meaningful anymore. I'm making a clean one, and attaching that.

from computecpp-sdk.

lissyx commented on May 30, 2024

Here is an uptodate log @DuncanMcBain build.log

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

Thanks! Immediately I can see that there are errors relating to the size of size_t and some pointers - could you try setting the TF_BITCODE_TARGET to spir, not spir64? It should make a difference, I hope...

from computecpp-sdk.

lissyx commented on May 30, 2024

Oh, nice catch. I should have thought about that. Build is going much further now, let's see how much :-)

from computecpp-sdk.

lissyx commented on May 30, 2024

Okay, further there's a narrowing error:

ERROR: /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tensorflow-lissyx/tensorflow/core/kernels/BUILD:1789:1: C++ compilation of rule '//tensorflow/core/kernels:tensor_array_ops' failed (Exit 1): compute failed: error executing command 
  (cd /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/BazelCache/output_sycl_arm/execroot/org_tensorflow && \
  exec env - \
    COMPUTECPP_TOOLKIT_PATH=/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/ \
    PATH=/home/alexandre/bin:/home/alexandre/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python \
    PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
    TF_DOWNLOAD_CLANG=0 \
    TF_NEED_COMPUTECPP=1 \
    TF_NEED_CUDA=0 \
    TF_NEED_OPENCL_SYCL=1 \
    TF_SYCL_BITCODE_TARGET=spir \
  /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/bin/compute -target arm-linux-gnueabihf '--gcc-toolchain=/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf' '--sysroot=/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/arm-linux-gnueabihf/libc' -ffunction-sections -fdata-sections -fPIE -fno-omit-frame-pointer -Wall -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -DRASPBERRY_PI -DNO_LOCAL_MEM '-fvisibility=hidden' -DCTC_DISABLE_OMP -v '-std=c++11' -isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include -fsycl-ih-last -sycl-driver -Xclang -cl-denorms-are-zero -Xclang -cl-fp32-correctly-rounded-divide-sqrt -Xclang -cl-mad-enable -sycl-target spir '-DTENSORFLOW_USE_SYCL=1' '-DEIGEN_USE_SYCL=1' '-DEIGEN_HAS_C99_MATH=1' '-DEIGEN_HAS_CXX11_MATH=1' -Wno-unused-variable -Wno-unused-const-variable '-DTENSORFLOW_SYCL_NO_HALF=1' '-DTENSORFLOW_SYCL_NO_DOUBLE=1' -MD -MF bazel-out/armeabi-opt/bin/tensorflow/core/kernels/_objs/tensor_array_ops/tensorflow/core/kernels/tensor_array_ops.pic.d '-frandom-seed=bazel-out/armeabi-opt/bin/tensorflow/core/kernels/_objs/tensor_array_ops/tensorflow/core/kernels/tensor_array_ops.pic.o' -fPIC -DEIGEN_MPL2_ONLY -D__CLANG_SUPPORT_DYN_ANNOTATION__ -DTENSORFLOW_USE_ABSL -DTF_USE_SNAPPY -iquote . -iquote bazel-out/armeabi-opt/genfiles -iquote external/nsync -iquote bazel-out/armeabi-opt/genfiles/external/nsync -iquote external/bazel_tools -iquote bazel-out/armeabi-opt/genfiles/external/bazel_tools -iquote external/eigen_archive -iquote bazel-out/armeabi-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/armeabi-opt/genfiles/external/local_config_sycl -iquote external/opencl_headers -iquote bazel-out/armeabi-opt/genfiles/external/opencl_headers -iquote external/com_google_absl -iquote bazel-out/armeabi-opt/genfiles/external/com_google_absl -iquote external/gif_archive -iquote bazel-out/armeabi-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/armeabi-opt/genfiles/external/jpeg -iquote external/protobuf_archive -iquote bazel-out/armeabi-opt/genfiles/external/protobuf_archive -iquote external/com_googlesource_code_re2 -iquote bazel-out/armeabi-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/armeabi-opt/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/armeabi-opt/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/armeabi-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/armeabi-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/armeabi-opt/genfiles/external/zlib_archive -iquote external/double_conversion -iquote bazel-out/armeabi-opt/genfiles/external/double_conversion -iquote external/snappy -iquote bazel-out/armeabi-opt/genfiles/external/snappy -Ibazel-out/armeabi-opt/bin/external/opencl_headers/_virtual_includes/OpenCL-Headers -isystem external/nsync/public -isystem bazel-out/armeabi-opt/genfiles/external/nsync/public -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/eigen_archive -isystem bazel-out/armeabi-opt/genfiles/external/eigen_archive -isystem external/local_config_sycl/sycl -isystem bazel-out/armeabi-opt/genfiles/external/local_config_sycl/sycl -isystem external/local_config_sycl/sycl/include -isystem bazel-out/armeabi-opt/genfiles/external/local_config_sycl/sycl/include -isystem external/gif_archive/lib -isystem bazel-out/armeabi-opt/genfiles/external/gif_archive/lib -isystem external/protobuf_archive/src -isystem bazel-out/armeabi-opt/genfiles/external/protobuf_archive/src -isystem external/farmhash_archive/src -isystem bazel-out/armeabi-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/armeabi-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/armeabi-opt/genfiles/external/zlib_archive -isystem external/double_conversion -isystem bazel-out/armeabi-opt/genfiles/external/double_conversion -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -DTENSORFLOW_MONOLITHIC_BUILD -pthread -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -no-canonical-prefixes -c tensorflow/core/kernels/tensor_array_ops.cc -o bazel-out/armeabi-opt/bin/tensorflow/core/kernels/_objs/tensor_array_ops/tensorflow/core/kernels/tensor_array_ops.pic.o)
Codeplay ComputeCpp - CE 0.8.0 Device Compiler - clang version 3.9.0 ([email protected]:sycl/clang.git 16de864c53d4ce86287b1b7f7254f76d847eaa9c) ([email protected]:sycl/llvm.git 05239d6794411875ac5542dd337663a22e266f07) (based on LLVM 3.9.0svn)
Target: arm--linux-gnueabihf
Thread model: posix
InstalledDir: /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/bin
Found candidate GCC installation: /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/lib/gcc/arm-linux-gnueabihf/4.9.4
Selected GCC installation: /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/lib/gcc/arm-linux-gnueabihf/4.9.4
Candidate multilib: .;@m32
Selected multilib: .;@m32
 "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/bin/compute" -cc1 -triple armv6kz--linux-gnueabihf -aux-triple spir-unknown-unknown -emit-llvm-bc -emit-llvm-uselists -disable-free -disable-llvm-verifier -discard-value-names -main-file-name tensor_array_ops.cc -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu arm1176jzf-s -target-feature +strict-align -target-abi aapcs-linux -mfloat-abi hard -v -dwarf-column-info -debugger-tuning=gdb -ffunction-sections -fdata-sections -coverage-file /proc/self/cwd/bazel-out/armeabi-opt/bin/tensorflow/core/kernels/_objs/tensor_array_ops/tensorflow/core/kernels/tensor_array_ops.pic.o -resource-dir /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/bin/../lib/clang/3.9.0 -dependency-file bazel-out/armeabi-opt/bin/tensorflow/core/kernels/_objs/tensor_array_ops/tensorflow/core/kernels/tensor_array_ops.pic.d -MT bazel-out/armeabi-opt/bin/tensorflow/core/kernels/_objs/tensor_array_ops/tensorflow/core/kernels/tensor_array_ops.pic.o -sys-header-deps -isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include -iquote . -iquote bazel-out/armeabi-opt/genfiles -iquote external/nsync -iquote bazel-out/armeabi-opt/genfiles/external/nsync -iquote external/bazel_tools -iquote bazel-out/armeabi-opt/genfiles/external/bazel_tools -iquote external/eigen_archive -iquote bazel-out/armeabi-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/armeabi-opt/genfiles/external/local_config_sycl -iquote external/opencl_headers -iquote bazel-out/armeabi-opt/genfiles/external/opencl_headers -iquote external/com_google_absl -iquote bazel-out/armeabi-opt/genfiles/external/com_google_absl -iquote external/gif_archive -iquote bazel-out/armeabi-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/armeabi-opt/genfiles/external/jpeg -iquote external/protobuf_archive -iquote bazel-out/armeabi-opt/genfiles/external/protobuf_archive -iquote external/com_googlesource_code_re2 -iquote bazel-out/armeabi-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/armeabi-opt/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/armeabi-opt/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/armeabi-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/armeabi-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/armeabi-opt/genfiles/external/zlib_archive -iquote external/double_conversion -iquote bazel-out/armeabi-opt/genfiles/external/double_conversion -iquote external/snappy -iquote bazel-out/armeabi-opt/genfiles/external/snappy -isystem external/nsync/public -isystem bazel-out/armeabi-opt/genfiles/external/nsync/public -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/eigen_archive -isystem bazel-out/armeabi-opt/genfiles/external/eigen_archive -isystem external/local_config_sycl/sycl -isystem bazel-out/armeabi-opt/genfiles/external/local_config_sycl/sycl -isystem external/local_config_sycl/sycl/include -isystem bazel-out/armeabi-opt/genfiles/external/local_config_sycl/sycl/include -isystem external/gif_archive/lib -isystem bazel-out/armeabi-opt/genfiles/external/gif_archive/lib -isystem external/protobuf_archive/src -isystem bazel-out/armeabi-opt/genfiles/external/protobuf_archive/src -isystem external/farmhash_archive/src -isystem bazel-out/armeabi-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/armeabi-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/armeabi-opt/genfiles/external/zlib_archive -isystem external/double_conversion -isystem bazel-out/armeabi-opt/genfiles/external/double_conversion -D NDEBUG -D GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK -D RASPBERRY_PI -D NO_LOCAL_MEM -D CTC_DISABLE_OMP -D TENSORFLOW_USE_SYCL=1 -D EIGEN_USE_SYCL=1 -D EIGEN_HAS_C99_MATH=1 -D EIGEN_HAS_CXX11_MATH=1 -D TENSORFLOW_SYCL_NO_HALF=1 -D TENSORFLOW_SYCL_NO_DOUBLE=1 -D EIGEN_MPL2_ONLY -D __CLANG_SUPPORT_DYN_ANNOTATION__ -D TENSORFLOW_USE_ABSL -D TF_USE_SNAPPY -I bazel-out/armeabi-opt/bin/external/opencl_headers/_virtual_includes/OpenCL-Headers -D EIGEN_AVOID_STL_ARRAY -I external/gemmlowp -D TENSORFLOW_MONOLITHIC_BUILD -D "__DATE__=\"redacted\"" -D "__TIMESTAMP__=\"redacted\"" -D "__TIME__=\"redacted\"" -isysroot /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/arm-linux-gnueabihf/libc -internal-isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/lib/gcc/arm-linux-gnueabihf/4.9.4/../../../../arm-linux-gnueabihf/include/c++/4.9.4 -internal-isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/lib/gcc/arm-linux-gnueabihf/4.9.4/../../../../arm-linux-gnueabihf/include/c++/4.9.4/arm-linux-gnueabihf -internal-isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/lib/gcc/arm-linux-gnueabihf/4.9.4/../../../../arm-linux-gnueabihf/include/c++/4.9.4/backward -internal-isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/arm-linux-gnueabihf/libc/usr/local/include -internal-isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/bin/../lib/clang/3.9.0/include -internal-externc-isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/arm-linux-gnueabihf/libc/include -internal-externc-isystem /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/arm-linux-gnueabihf/libc/usr/include -O2 -Wall -Wno-unused-variable -Wno-unused-const-variable -Wno-sign-compare -Wno-builtin-macro-redefined -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /proc/self/cwd -ftemplate-depth 900 -ferror-limit 19 -fmessage-length 0 -fvisibility hidden -pthread -fallow-half-arguments-and-returns -fno-signed-char -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -sycl -Rsycl-serial-memop -fsycl-ih-last -sycl-ih /tmp/tensor_array_ops-5ec247.sycl -fdiagnostics-show-option -vectorize-loops -vectorize-slp -cl-denorms-are-zero -cl-fp32-correctly-rounded-divide-sqrt -cl-mad-enable -o /tmp/tensor_array_ops-dd419f.bc -x c++ tensorflow/core/kernels/tensor_array_ops.cc
clang -cc1 version 3.9.0 based upon LLVM 3.9.0svn default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "external/gemmlowp"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/nsync"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/bazel_tools"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/eigen_archive"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/local_config_sycl"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/opencl_headers"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/com_google_absl"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/gif_archive"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/protobuf_archive"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/com_googlesource_code_re2"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/farmhash_archive"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/fft2d"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/highwayhash"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/zlib_archive"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/double_conversion"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/nsync/public"
ignoring nonexistent directory "external/bazel_tools/tools/cpp/gcc3"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/eigen_archive"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/local_config_sycl/sycl"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/local_config_sycl/sycl/include"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/gif_archive/lib"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/protobuf_archive/src"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/farmhash_archive/src"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/zlib_archive"
ignoring nonexistent directory "bazel-out/armeabi-opt/genfiles/external/double_conversion"
ignoring nonexistent directory "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/arm-linux-gnueabihf/libc/usr/local/include"
ignoring nonexistent directory "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/arm-linux-gnueabihf/libc/include"
#include "..." search starts here:
 .
 bazel-out/armeabi-opt/genfiles
 external/nsync
 external/bazel_tools
 external/eigen_archive
 external/local_config_sycl
 external/opencl_headers
 external/com_google_absl
 external/gif_archive
 external/jpeg
 bazel-out/armeabi-opt/genfiles/external/jpeg
 external/protobuf_archive
 external/com_googlesource_code_re2
 external/farmhash_archive
 external/fft2d
 external/highwayhash
 external/png_archive
 bazel-out/armeabi-opt/genfiles/external/png_archive
 external/zlib_archive
 external/double_conversion
 external/snappy
 bazel-out/armeabi-opt/genfiles/external/snappy
#include <...> search starts here:
 bazel-out/armeabi-opt/bin/external/opencl_headers/_virtual_includes/OpenCL-Headers
 /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64//include
 external/nsync/public
 external/eigen_archive
 external/local_config_sycl/sycl
 external/local_config_sycl/sycl/include
 external/gif_archive/lib
 external/protobuf_archive/src
 external/farmhash_archive/src
 external/png_archive
 bazel-out/armeabi-opt/genfiles/external/png_archive
 external/zlib_archive
 external/double_conversion
 /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/lib/gcc/arm-linux-gnueabihf/4.9.4/../../../../arm-linux-gnueabihf/include/c++/4.9.4
 /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/lib/gcc/arm-linux-gnueabihf/4.9.4/../../../../arm-linux-gnueabihf/include/c++/4.9.4/arm-linux-gnueabihf
 /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/lib/gcc/arm-linux-gnueabihf/4.9.4/../../../../arm-linux-gnueabihf/include/c++/4.9.4/backward
 /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/ComputeCpp-CE-0.8.0-Ubuntu-14.04-x86_64/bin/../lib/clang/3.9.0/include
 /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/arm-linux-gnueabihf/libc/usr/include
End of search list.
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
In file included from tensorflow/core/kernels/tensor_array_ops.cc:25:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:105:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h:289:167: error: non-constant-expression cannot be narrowed from type 'long long' to 'int' in initializer list [-Wc++11-narrowing]
  EIGEN_STRONG_INLINE explicit DSizes(DenseIndex firstDimension, DenseIndex secondDimension, IndexTypes... otherDimensions) : Base({{firstDimension, secondDimension, otherDimensions...}}) {
                                                                                                                                                                      ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1229:41: note: in instantiation of function template specialization 'Eigen::DSizes<int, 3>::DSizes<long long>' requested here
    Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, 1,
                                        ^
tensorflow/core/kernels/tensor_array_ops.cc:1140:12: note: in instantiation of member function 'tensorflow::TensorArrayUnpackOrScatterOp<Eigen::ThreadPoolDevice, long long, true>::Compute' requested here
  explicit TensorArrayUnpackOrScatterOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1287:19: note: in instantiation of member function 'tensorflow::TensorArrayUnpackOrScatterOp<Eigen::ThreadPoolDevice, long long, true>::TensorArrayUnpackOrScatterOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SCATTER_AND_UNPACK);
                  ^
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h:289:167: note: insert an explicit cast to silence this issue
  EIGEN_STRONG_INLINE explicit DSizes(DenseIndex firstDimension, DenseIndex secondDimension, IndexTypes... otherDimensions) : Base({{firstDimension, secondDimension, otherDimensions...}}) {
                                                                                                                                                                      ^~~~~~~~~~~~~~~
                                                                                                                                                                      static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1366:12: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, long long>::Compute' requested here
  explicit TensorArraySplitOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1497:19: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, long long>::TensorArraySplitOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SPLIT);
                  ^
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1366:12: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, int>::Compute' requested here
  explicit TensorArraySplitOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1497:19: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, int>::TensorArraySplitOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SPLIT);
                  ^
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1366:12: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, unsigned short>::Compute' requested here
  explicit TensorArraySplitOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1497:19: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, unsigned short>::TensorArraySplitOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SPLIT);
                  ^
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1366:12: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, short>::Compute' requested here
  explicit TensorArraySplitOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1497:19: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, short>::TensorArraySplitOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SPLIT);
                  ^
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1366:12: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, unsigned char>::Compute' requested here
  explicit TensorArraySplitOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1497:19: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, unsigned char>::TensorArraySplitOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SPLIT);
                  ^
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1366:12: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, signed char>::Compute' requested here
  explicit TensorArraySplitOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1497:19: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, signed char>::TensorArraySplitOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SPLIT);
                  ^
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1366:12: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, Eigen::half>::Compute' requested here
  explicit TensorArraySplitOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1497:19: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, Eigen::half>::TensorArraySplitOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SPLIT);
                  ^
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: error: non-constant-expression cannot be narrowed from type 'int64' (aka 'long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1366:12: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, tensorflow::bfloat16>::Compute' requested here
  explicit TensorArraySplitOp(OpKernelConstruction* context)
           ^
tensorflow/core/kernels/tensor_array_ops.cc:1497:19: note: in instantiation of member function 'tensorflow::TensorArraySplitOp<Eigen::ThreadPoolDevice, tensorflow::bfloat16>::TensorArraySplitOp' requested here
TF_CALL_ALL_TYPES(REGISTER_SPLIT);
                  ^
tensorflow/core/kernels/tensor_array_ops.cc:1454:54: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> indices{0, previous_length, 0};
                                                     ^~~~~~~~~~~~~~~
                                                     static_cast<int>( )
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: error: non-constant-expression cannot be narrowed from type 'Scalar' (aka 'const long long') to 'int' in initializer list [-Wc++11-narrowing]
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
tensorflow/core/kernels/tensor_array_ops.cc:1455:52: note: insert an explicit cast to silence this issue
      Eigen::DSizes<Eigen::DenseIndex, 3> sizes{1, tensor_lengths_t(i),
                                                   ^~~~~~~~~~~~~~~~~~~
                                                   static_cast<int>(  )
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
INFO: Elapsed time: 21.085s, Critical Path: 20.62s
FAILED: Build did NOT complete successfully

Now, those seems to be Eigen-specific. Given this is already a patched-version of Eigen, and this related to sizes that looks like 64 bits, are those just side-effects of the current focus on aarch64 ? Adding --copt=-Wno-c++11-narrowing allows me to go to the end of the build :)

from computecpp-sdk.

lissyx commented on May 30, 2024

Update: I could finish the build, and a first attempt would fail as expected because vc4 requires root access for now:

pi@rpi3-opencl-20180518:~/deepspeech $ ./deepspeech ~/tmp/deepspeech/models/tf14.frozen.494_e120.LSTM.ldc93s1.pb ~/tmp/deepspeech/models/alphabet.txt ~/tmp/deepspeech/audio/ -t
TensorFlow: v1.8.0-rc1-1904-g9989353054
DeepSpeech: v0.2.0-alpha.5-0-g7cc8382
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
[VC4CL] can't open /dev/mem
[VC4CL] This program should be run as root. Try prefixing command with: sudo
terminate called after throwing an instance of 'std::system_error'
  what():  Failed to open /dev/mem: Permission denied
Aborted

Forcing that under sudo, it's going further:

pi@rpi3-opencl-20180518:~/deepspeech $ sudo ./deepspeech ~/tmp/deepspeech/models/tf14.frozen.494_e120.LSTM.ldc93s1.pb ~/tmp/deepspeech/models/alphabet.txt ~/tmp/deepspeech/audio/ -t
TensorFlow: v1.8.0-rc1-1904-g9989353054
DeepSpeech: v0.2.0-alpha.5-0-g7cc8382
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-05-23 13:54:53.988212: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-05-23 13:54:53.988803: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: VideoCore IV GPU, vendor: Broadcom, profile: EMBEDDED_PROFILE
Running on directory /home/pi/tmp/deepspeech/audio/
> /home/pi/tmp/deepspeech/audio//2830-3980-0043.wav
2018-05-23 13:54:54.205643: W ./tensorflow/core/framework/allocator.cc:108] Allocation of 11713728 exceeds 10% of system memory.
2018-05-23 13:54:54.227152: W ./tensorflow/core/framework/allocator.cc:108] Allocation of 11713728 exceeds 10% of system memory.
2018-05-23 13:54:54.279121: W ./tensorflow/core/framework/allocator.cc:108] Allocation of 11713728 exceeds 10% of system memory.
2018-05-23 13:54:54.300364: W ./tensorflow/core/framework/allocator.cc:108] Allocation of 11713728 exceeds 10% of system memory.
2018-05-23 13:54:54.322776: W ./tensorflow/core/framework/allocator.cc:108] Allocation of 11713728 exceeds 10% of system memory.

Now waiting to see the output. The deepspeech process is consuming lots of CPU, let's see.

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

Wow that's great! I've seen the C++11 narrowing thing before, I think Eigen doesn't really care about 32-bit builds (which is a shame). I'm also surprised, it looks like Werror is turned on (normally I'd not do that for cross-compile builds which will always tend to be a bit more warny). Here's hoping it works!

from computecpp-sdk.

lissyx commented on May 30, 2024

Running for two hours, nothing :'(. Not even error or anything.

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

Oh. Presumably you'd have expected a failure or at least some kind of output by now? I'll keep an eye on the issue you've made in the VC4CL repo, if nothing else I have a Pi at home that doesn't do a lot! Would be cool to put a project on there :)

from computecpp-sdk.

lissyx commented on May 30, 2024

Yes. I let it run during the night, and nothing. As you just said, I'd expect some output, or some error. Even pushing more TensorFlow logging does reveal no activity further activity. (log attached)
deepspeech_mmap.log

from computecpp-sdk.

lissyx commented on May 30, 2024

@DuncanMcBain Do we have way to dump OpenCL code somehow? This might help doe300 to identify what's going on.

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

It's... tricky. SYCL as an API doesn't really provide a way to do this right now. It's not really OpenCL C either, it's an intermediate format, which might be the problem? Perhaps VC4CL can deal with it directly, I don't know.

I think the problem might be that the kernels are large. That said, if nothing happened overnight... that does seem rather extreme. I will put together some instructions that hopefully will let you crosscompile the SDK so you can test with that (it's much, much smaller, so might be easier to reproduce and debug if it goes wrong too). I'll try to get those to you this afternoon.

from computecpp-sdk.

lissyx commented on May 30, 2024

@DuncanMcBain Not sure if you saw the latest developments on the issue, but it looks like VC4CL has nothing more to do. I don't really know how I should get more debug toggled for that, TF_CPP_MIN_VLOG_LEVEL=3 is the best I could get so far.

Can we dump the kernels that are being passed to vc4cl on your side ?

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

It'd be rather tricky to do so from TensorFlow, I'm afraid. I see it looks like you might have managed to reproduce this hanging issue purely from the Vc4cl side of things, right? It would be easier to get our hands on the kernels from the SDK, largely because intercepting the inner workings of the TensorFlow build process is challenging. It's much easier with CMake.

We have a script called "spir_extract" which will emit the SPIR code for a given integration header. You can then disassemble it or whatever you need to do. It's in the tools directory of the SDK.

If you'd like to crosscompile the SDK and look at the kernels in there, let me know.

from computecpp-sdk.

lissyx commented on May 30, 2024

It feels more like something is locked up.

Does this gdb stack helps? I got it running deepspeech under gdb, then waiting for the long wait-condition-infinite and CTRL+C:

Thread 1 "deepspeech" received signal SIGINT, Interrupt.
0x6fd4b94c in __pthread_cond_wait (cond=0x705d3730 <cl::sycl::detail::scheduler::m_workPending>, mutex=0x705d3760 <cl::sycl::detail::scheduler::m_transList_mutex>) at pthread_cond_wait.c:186
186	pthread_cond_wait.c: No such file or directory.
(gdb) bt
#0  0x6fd4b94c in __pthread_cond_wait (cond=0x705d3730 <cl::sycl::detail::scheduler::m_workPending>, mutex=0x705d3760 <cl::sycl::detail::scheduler::m_transList_mutex>) at pthread_cond_wait.c:186
#1  0x6febdea0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#2  0x7057b49c in cl::sycl::detail::scheduler::wait_for_empty_queue(unsigned long long) () from /home/pi/deepspeech/libComputeCpp.so
#3  0x70574d1a in cl::sycl::queue::wait_and_throw() () from /home/pi/deepspeech/libComputeCpp.so
#4  0x74879d98 in tensorflow::SYCLDeviceContext::CopyCPUTensorToDevice(tensorflow::Tensor const*, tensorflow::Device*, tensorflow::Tensor*, std::function<void (tensorflow::Status const&)>) const () from /home/pi/deepspeech/libdeepspeech.so
#5  0x74877684 in tensorflow::SYCLDevice::MakeTensorFromProto(tensorflow::TensorProto const&, tensorflow::AllocatorAttributes, tensorflow::Tensor*) () from /home/pi/deepspeech/libdeepspeech.so
#6  0x707846c8 in tensorflow::ConstantOp::ConstantOp(tensorflow::OpKernelConstruction*) () from /home/pi/deepspeech/libdeepspeech.so
#7  0x70786360 in tensorflow::$_1::__invoke () from /home/pi/deepspeech/libdeepspeech.so
#8  0x74a7bb94 in tensorflow::CreateOpKernel(tensorflow::DeviceType, tensorflow::DeviceBase*, tensorflow::Allocator*, tensorflow::FunctionLibraryRuntime*, tensorflow::NodeDef const&, int, tensorflow::OpKernel**) ()
   from /home/pi/deepspeech/libdeepspeech.so
#9  0x749a5e24 in tensorflow::CreateNonCachedKernel(tensorflow::Device*, tensorflow::FunctionLibraryRuntime*, tensorflow::NodeDef const&, int, tensorflow::OpKernel**) () from /home/pi/deepspeech/libdeepspeech.so
#10 0x749b98cc in tensorflow::FunctionLibraryRuntimeImpl::CreateKernel(tensorflow::NodeDef const&, tensorflow::FunctionLibraryDefinition const*, tensorflow::OpKernel**) () from /home/pi/deepspeech/libdeepspeech.so
#11 0x749b9374 in tensorflow::FunctionLibraryRuntimeImpl::CreateKernel(tensorflow::NodeDef const&, tensorflow::OpKernel**) () from /home/pi/deepspeech/libdeepspeech.so
#12 0x70776b9c in std::_Function_handler<tensorflow::Status (tensorflow::NodeDef const&, tensorflow::OpKernel**), tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::default_delete<tensorflow::DirectSession::ExecutorsAndKeys> >*, std::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::default_delete<tensorflow::DirectSession::FunctionInfo> >*, tensorflow::DirectSession::RunStateArgs*)::$_6>::_M_invoke(std::_Any_data const&, tensorflow::NodeDef const&, tensorflow::OpKernel**) () from /home/pi/deepspeech/libdeepspeech.so
#13 0x749a2eac in tensorflow::(anonymous namespace)::ExecutorImpl::Initialize() () from /home/pi/deepspeech/libdeepspeech.so
#14 0x749a1b60 in tensorflow::NewLocalExecutor(tensorflow::LocalExecutorParams const&, std::unique_ptr<tensorflow::Graph const, std::default_delete<tensorflow::Graph const> >, tensorflow::Executor**) () from /home/pi/deepspeech/libdeepspeech.so
#15 0x70767708 in tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::default_delete<tensorflow::DirectSession::ExecutorsAndKeys> >*, std::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::default_delete<tensorflow::DirectSession::FunctionInfo> >*, tensorflow::DirectSession::RunStateArgs*) () from /home/pi/deepspeech/libdeepspeech.so
#16 0x707616f4 in tensorflow::DirectSession::GetOrCreateExecutors(tensorflow::gtl::ArraySlice<std::string>, tensorflow::gtl::ArraySlice<std::string>, tensorflow::gtl::ArraySlice<std::string>, tensorflow::DirectSession::ExecutorsAndKeys**, tensorflow::DirectSession::RunStateArgs*) () from /home/pi/deepspeech/libdeepspeech.so
#17 0x7075f8c4 in tensorflow::DirectSession::Run(tensorflow::RunOptions const&, std::vector<std::pair<std::string, tensorflow::Tensor>, std::allocator<std::pair<std::string, tensorflow::Tensor> > > const&, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tensorflow::RunMetadata*) () from /home/pi/deepspeech/libdeepspeech.so
#18 0x7075d2c8 in tensorflow::DirectSession::Run(std::vector<std::pair<std::string, tensorflow::Tensor>, std::allocator<std::pair<std::string, tensorflow::Tensor> > > const&, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*) () from /home/pi/deepspeech/libdeepspeech.so
#19 0x707399d8 in DeepSpeech::Model::infer(float*, int, int) () from /home/pi/deepspeech/libdeepspeech.so
#20 0x000111b2 in LocalDsSTT(DeepSpeech::Model&, short const*, unsigned int, int) ()
#21 0x00011634 in ProcessFile(DeepSpeech::Model&, char const*, bool) ()
#22 0x00011830 in main ()
(gdb)

from computecpp-sdk.

lissyx commented on May 30, 2024

Okay, so, I've somehow got a place inside VC4's Buffer.cpp that matches the latest alive moment. And I've hit 'next' in gdb. This got me inside cl::sycl::detail::scheduler::scheduler_loop() () from /home/pi/deepspeech/libComputeCpp.so and from there, it looks like I'm in some kind of infinite loop. Check the whole log attached
debug.txt

Any attemp to 'cont' then CTRL+C gives be the same backtrace as above.

Does that shed any light @DuncanMcBain ?

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

Yes, kind of! As far as I can tell, TensorFlow is successfully submitting the copy to the device queue, then Eigen calls synchronize() which I believe calls down into cl::sycl::queue::wait_and_throw(). I think, but can't be sure, that this is the prelude to the backtrace you are seeing.

This might be helpful to the developer of VC4CL, as it looks like it hangs at the very first copy-on. You've done a great job here, excellent investigation! FWIW, wait_and_throw() will never leave until all events on the queue have finished ending, so it looks like somehow the CL implementation is never completing the copy. Maybe the VC4CL developer can help from here?

from computecpp-sdk.

lissyx commented on May 30, 2024

So we enqueue commands, and then, there's a flush. This is a link to the current implementation of flush: https://github.com/doe300/VC4CL/blob/7d5d906c8e2e69ff94ae605cdbfe1f7a32c87833/src/CommandQueue.cpp#L103-L107

vc4cl::Event* vc4cl::Buffer::createBufferActionEvent(vc4cl::CommandQueue*, vc4cl::CommandType, cl_uint, _cl_event* const*, cl_int*) const => /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/Buffer.cpp:597
T* vc4cl::newOpenCLObject(Args ...) [with T = vc4cl::Event; Args = {vc4cl::Context*, int, vc4cl::CommandType}]:206
T* vc4cl::newOpenCLObject(Args ...) [with T = vc4cl::Event; Args = {vc4cl::Context*, int, vc4cl::CommandType}]:208
[VC4CL] Tracking live-time of object: cl_event
T* vc4cl::newOpenCLObject(Args ...) [with T = vc4cl::Event; Args = {vc4cl::Context*, int, vc4cl::CommandType}]:210
vc4cl::Event* vc4cl::Buffer::createBufferActionEvent(vc4cl::CommandQueue*, vc4cl::CommandType, cl_uint, _cl_event* const*, cl_int*) const => /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/Buffer.cpp:602
cl_int vc4cl::CommandQueue::enqueueEvent(vc4cl::Event*) => /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp:111
cl_int vc4cl::CommandQueue::enqueueEvent(vc4cl::Event*) => /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp:125
cl_int vc4cl::CommandQueue::enqueueEvent(vc4cl::Event*) => /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp:134
[Switching to Thread 0x6aaff3e0 (LWP 18460)]

Thread 11 "deepspeech" hit Breakpoint 1, vc4cl::CommandQueue::flush (this=0x28ff70) at /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp:154
154	/home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp: No such file or directory.
(gdb) bt
#0  vc4cl::CommandQueue::flush (this=0x28ff70) at /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp:154
#1  0x6e1c82fe in VC4CL_clFlush (command_queue=0x28ff90) at /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp:451
#2  0x70b7023c in cl::sycl::detail::queue::flush() const () from /home/pi/deepspeech/libComputeCpp.so
#3  0x70b5c054 in cl::sycl::detail::transaction::state<(cl::sycl::detail::trans_detail::status_t)3>::enter() () from /home/pi/deepspeech/libComputeCpp.so
#4  0x70b59730 in cl::sycl::detail::transaction::change_to(cl::sycl::detail::trans_detail::status_t, cl::sycl::detail::trans_detail::status_t) () from /home/pi/deepspeech/libComputeCpp.so
#5  0x70b598f8 in cl::sycl::detail::transaction::next_state() () from /home/pi/deepspeech/libComputeCpp.so
#6  0x70b8111e in cl::sycl::detail::scheduler::execute(std::unique_ptr<cl::sycl::detail::transaction, std::default_delete<cl::sycl::detail::transaction> >&&) () from /home/pi/deepspeech/libComputeCpp.so
#7  0x70b81400 in cl::sycl::detail::scheduler::scheduler_loop() () from /home/pi/deepspeech/libComputeCpp.so
#8  0x704c8d44 in ?? () from /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#9  0x70349fc4 in start_thread (arg=0x6aaff3e0) at pthread_create.c:335
Backtrace stopped: Cannot access memory at address 0x66
(gdb) n
cl_int vc4cl::CommandQueue::flush() => /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp:154
157	in /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp
(gdb) 
158	in /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp
(gdb) 
VC4CL_clFlush (command_queue=0x28ff90) at /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp:452
452	in /home/alex/codaz/Mozilla/DeepSpeech/RPI3-GPU/VC4CL/src/CommandQueue.cpp
(gdb) 
0x70b7023c in cl::sycl::detail::queue::flush() const () from /home/pi/deepspeech/libComputeCpp.so
(gdb) 
Single stepping until exit from function _ZNK2cl4sycl6detail5queue5flushEv,
which has no line number information.
0x70b5c054 in cl::sycl::detail::transaction::state<(cl::sycl::detail::trans_detail::status_t)3>::enter() () from /home/pi/deepspeech/libComputeCpp.so
(gdb) 
Single stepping until exit from function _ZN2cl4sycl6detail11transaction5stateILNS1_12trans_detail8status_tE3EE5enterEv,
which has no line number information.
0x70b59730 in cl::sycl::detail::transaction::change_to(cl::sycl::detail::trans_detail::status_t, cl::sycl::detail::trans_detail::status_t) () from /home/pi/deepspeech/libComputeCpp.so
(gdb) 
Single stepping until exit from function _ZN2cl4sycl6detail11transaction9change_toENS1_12trans_detail8status_tES4_,
which has no line number information.
0x70b598f8 in cl::sycl::detail::transaction::next_state() () from /home/pi/deepspeech/libComputeCpp.so
(gdb) 
Single stepping until exit from function _ZN2cl4sycl6detail11transaction10next_stateEv,
which has no line number information.
0x70b8111e in cl::sycl::detail::scheduler::execute(std::unique_ptr<cl::sycl::detail::transaction, std::default_delete<cl::sycl::detail::transaction> >&&) () from /home/pi/deepspeech/libComputeCpp.so
(gdb) 
Single stepping until exit from function _ZN2cl4sycl6detail9scheduler7executeEOSt10unique_ptrINS1_11transactionESt14default_deleteIS4_EE,
which has no line number information.
0x70b81400 in cl::sycl::detail::scheduler::scheduler_loop() () from /home/pi/deepspeech/libComputeCpp.so
(gdb) 
Single stepping until exit from function _ZN2cl4sycl6detail9scheduler14scheduler_loopEv,
which has no line number information.
0x706f20c8 in pthread_mutex_unlock@plt () from /home/pi/deepspeech/libComputeCpp.so
(gdb) 
Single stepping until exit from function pthread_mutex_unlock@plt,
which has no line number information.
pthread_mutex_unlock (mutex=0x70bd8760 <cl::sycl::detail::scheduler::m_transList_mutex>) at forward.c:194
194	forward.c: No such file or directory.
(gdb) 
__GI___pthread_mutex_unlock (mutex=0x70bd8760 <cl::sycl::detail::scheduler::m_transList_mutex>) at pthread_mutex_unlock.c:324
324	pthread_mutex_unlock.c: No such file or directory.
(gdb) 
__pthread_mutex_unlock_usercnt (mutex=0x70bd8760 <cl::sycl::detail::scheduler::m_transList_mutex>, decr=1) at pthread_mutex_unlock.c:38
38	in pthread_mutex_unlock.c
(gdb)

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

I'm not sure that's the issue - as far as I can tell, VC4CL has an empty flush method because commands are worked on immediately when submitted to the queue, so there's nothing to flush.

from computecpp-sdk.

lissyx commented on May 30, 2024

Right. I'm not making a lot of progress, except that it seems there's an event that reaches vc4cl, and then it's being injected. I'm not sure yet if it's being properly used or not.

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

If there's any further information we can provide for the developer of VC4CL, please let us know! This same code runs on all other OpenCL platforms we've tried, and is quite intense, so it is possible that some unusual code paths are being hit.

from computecpp-sdk.

lissyx commented on May 30, 2024

@DuncanMcBain Okay, after fighting with some deadlock and others in vc4cl, I've luckily managed to get something running: it failed this way (after removing the noise from my own output:

2018-05-25 14:48:20.680492: E ./tensorflow/core/common_runtime/executor.cc:662] Executor failed to create kernel. Not found: No registered 'Snapshot' OpKernel for SYCL devices compatible with node bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd = Snapshot[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:SYCL:0"](bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul, ^bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd/Enter)
	.  Registered:  <no registered kernels>
	 [[Node: bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd = Snapshot[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:SYCL:0"](bidirectional_rnn/bw/bw/while/basic_lstm_cell/MatMul, ^bidirectional_rnn/bw/bw/while/basic_lstm_cell/BiasAdd/Enter)]]
Error running session: Not found: FetchOutputs node logits_output_node: not found
cpu_time_overall=5.11148 cpu_time_mfcc=0.08749 cpu_time_infer=5.02399

Rebuilding with Snapshot included, I'm hitting a VC4C-level compilation error \o/

from computecpp-sdk.

DuncanMcBain commented on May 30, 2024

That's odd, why wasn't snapshot included before? Do we not build it as part of our SYCL port or something?

from computecpp-sdk.

Rbiessy commented on May 30, 2024

What strikes me the most is that this operation is not even registered for the CPU according to your log. It should be registered for both CPU and SYCL, I have no idea what could have caused that.

from computecpp-sdk.

lissyx commented on May 30, 2024

No, don't worry about that @DuncanMcBain and @Rbiessy, when we build libdeepspeech, we manually select the kernels to avoid useless-space-taking code :-).

from computecpp-sdk.

lissyx commented on May 30, 2024

So, update is that now I have to deal with compilations issues when running kernels. Which is good.

from computecpp-sdk.

ARM cross-compilation (tl;dr: use proper SPIR target) about computecpp-sdk HOT 52 CLOSED

Comments (52)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent