Giter Site home page Giter Site logo

oneapi.jl's Introduction

oneAPI.jl

Julia support for the oneAPI programming toolkit.

oneAPI.jl provides support for working with the oneAPI unified programming model. The package is verified to work with the (currently) only implementation of this interface that is part of the Intel Compute Runtime, only available on Linux.

Status

The current version of oneAPI.jl supports most of the oneAPI Level Zero interface, has good kernel programming capabilties, and as a demonstration of that it fully implements the GPUArrays.jl array interfaces. This results in a full-featured GPU array type.

However, the package has not been extensively tested, and performance issues might be present. The integration with vendor libraries like oneMKL or oneDNN is still in development, and as result certain array operations may be unavailable or slow.

Quick start

You need to use Julia 1.8 or higher, and it is strongly advised to use the official binaries. For now, only Linux is supported. On Windows, you need to use the second generation Windows Subsystem for Linux (WSL2). If you're using Intel Arc GPUs (A580, A750, A770, etc), you need to use at least Linux 6.2. For other hardware, any recent Linux distribution should work.

Once you have installed Julia, proceed by entering the package manager REPL mode by pressing ] and adding theoneAPI package:

pkg> add oneAPI

This installation will take a couple of minutes to download necessary binaries, such as the oneAPI loader, several SPIR-V tools, etc. For now, the oneAPI.jl package also depends on the Intel implementation of the oneAPI spec. That means you need compatible hardware; refer to the Intel documentation for more details.

Once you have oneAPI.jl installed, perform a smoke test by calling the versioninfo() function:

julia> using oneAPI

julia> oneAPI.versioninfo()
Binary dependencies:
- NEO_jll: 22.43.24595+0
- libigc_jll: 1.0.12504+0
- gmmlib_jll: 22.3.0+0
- SPIRV_LLVM_Translator_unified_jll: 0.2.0+0
- SPIRV_Tools_jll: 2022.1.0+0

Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1

1 driver:
- 00000000-0000-0000-173d-d94201036013 (v1.3.24595, API v1.3.0)

2 devices:
- Intel(R) Graphics [0x56a0]
- Intel(R) HD Graphics P630 [0x591d]

If you have multiple compatible drivers or devices, use the driver! and device! functions to configure which one to use in the current task:

julia> devices()
ZeDevice iterator for 2 devices:
1. Intel(R) Graphics [0x56a0]
2. Intel(R) HD Graphics P630 [0x591d]

julia> device()
ZeDevice(GPU, vendor 0x8086, device 0x56a0): Intel(R) Graphics [0x56a0]

julia> device!(2)
ZeDevice(GPU, vendor 0x8086, device 0x591d): Intel(R) HD Graphics P630 [0x591d]

To ensure other functionality works as expected, you can run the test suite from the package manager REPL mode. Note that this will pull and run the test suite for GPUArrays, which takes quite some time:

pkg> test oneAPI
...
Testing finished in 16 minutes, 27 seconds, 506 milliseconds

Test Summary: | Pass  Total  Time
  Overall     | 4945   4945
    SUCCESS
     Testing oneAPI tests passed

Usage

The functionality of oneAPI.jl is organized as follows:

  • low-level wrappers for the Level Zero library
  • kernel programming capabilities
  • abstractions for high-level array programming

The level zero wrappers are available in the oneL0 submodule, and expose all flexibility of the underlying APIs with user-friendly wrappers:

julia> using oneAPI, oneAPI.oneL0

julia> drv = first(drivers());

julia> ctx = ZeContext(drv);

julia> dev = first(devices(drv))
ZeDevice(GPU, vendor 0x8086, device 0x1912): Intel(R) Gen9

julia> compute_properties(dev)
(maxTotalGroupSize = 256, maxGroupSizeX = 256, maxGroupSizeY = 256, maxGroupSizeZ = 256, maxGroupCountX = 4294967295, maxGroupCountY = 4294967295, maxGroupCountZ = 4294967295, maxSharedLocalMemory = 65536, subGroupSizes = (8, 16, 32))

julia> queue = ZeCommandQueue(ctx, dev);

julia> execute!(queue) do list
         append_barrier!(list)
       end

Built on top of that, are kernel programming capabilities for executing Julia code on oneAPI accelerators. For now, we reuse OpenCL intrinsics, and compile to SPIR-V using Khronos' translator:

julia> function kernel()
         barrier()
         return
       end

julia> @oneapi items=1 kernel()

Code reflection macros are available to see the generated code:

julia> @device_code_llvm @oneapi items=1 kernel()
;  @ REPL[18]:1 within `kernel'
define dso_local spir_kernel void @_Z17julia_kernel_3053() local_unnamed_addr {
top:
;  @ REPL[18]:2 within `kernel'
; ┌ @ oneAPI.jl/src/device/opencl/synchronization.jl:9 within `barrier' @ oneAPI.jl/src/device/opencl/synchronization.jl:9
; │┌ @ oneAPI.jl/src/device/opencl/utils.jl:34 within `macro expansion'
    call void @_Z7barrierj(i32 0)
; └└
;  @ REPL[18]:3 within `kernel'
  ret void
}
julia> @device_code_spirv @oneapi items=1 kernel()
; SPIR-V
; Version: 1.0
; Generator: Khronos LLVM/SPIR-V Translator; 14
; Bound: 9
; Schema: 0
               OpCapability Addresses
               OpCapability Kernel
          %1 = OpExtInstImport "OpenCL.std"
               OpMemoryModel Physical64 OpenCL
               OpEntryPoint Kernel %4 "_Z17julia_kernel_3067"
               OpSource OpenCL_C 200000
               OpName %top "top"
       %uint = OpTypeInt 32 0
     %uint_2 = OpConstant %uint 2
     %uint_0 = OpConstant %uint 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
          %4 = OpFunction %void None %3
        %top = OpLabel
               OpControlBarrier %uint_2 %uint_2 %uint_0
               OpReturn
               OpFunctionEnd

Finally, the oneArray type makes it possible to use your oneAPI accelerator without the need to write custom kernels, thanks to Julia's high-level array abstractions:

julia> a = oneArray(rand(Float32, 2,2))
2×2 oneArray{Float32,2}:
 0.592979  0.996154
 0.874364  0.232854

julia> a .+ 1
2×2 oneArray{Float32,2}:
 1.59298  1.99615
 1.87436  1.23285

Float64 support

Not all oneAPI GPUs support Float64 datatypes. You can test if your GPU does using the following code:

julia> using oneAPI
julia> oneL0.module_properties(device()).fp64flags & oneL0.ZE_DEVICE_MODULE_FLAG_FP64 == oneL0.ZE_DEVICE_MODULE_FLAG_FP64
false

If your GPU doesn't, executing code that relies on Float64 values will result in an error:

julia> oneArray([1.]) .+ 1
┌ Error: Module compilation failed:
│
│ error: Double type is not supported on this platform.

Development

To work on oneAPI.jl, you just need to dev the package. In addition, you may need to build the binary support library that's used to interface with oneMKL and other C++ vendor libraries. This library is normally provided by the oneAPI_Support_jll.jl package, however, we only guarantee to update this package when releasing oneAPI.jl. You can build this library yourself by simply executing deps/build_local.jl.

To facilitate development, there are other things you may want to configure:

Enabling the oneAPI validation layer

The oneAPI Level Zero libraries feature a so-called validation layer, which validates the arguments to API calls. This can be useful to spot potential isssues, and can be enabled by setting the following environment variables:

  • ZE_ENABLE_VALIDATION_LAYER=1
  • ZE_ENABLE_PARAMETER_VALIDATION=1
  • EnableDebugBreak=0 (this is needed to work around intel/compute-runtime#639)

Using a debug toolchain

If you're experiencing an issue with the underlying toolchain (NEO, IGC, etc), you may want to use a debug build of these components, which also perform additional validation. This can be done simply by calling oneAPI.set_debug!(true) and restarting your Julia session. This sets a preference used by the respective JLL packages.

Using a local toolchain

To further debug the toolchain, you may need a custom build and point oneAPI.jl towards it. This can also be done using preferences, overriding the paths to resources provided by the various JLLs that oneAPI.jl uses. A helpful script to automate this is provided in the res folder of this repository:

$ julia res/local.jl

Trying to find local IGC...
- found libigc at /usr/local/lib/libigc.so
- found libiga64 at /usr/local/lib/libiga64.so
- found libigdfcl at /usr/local/lib/libigdfcl.so
- found libopencl-clang at /usr/local/lib/libopencl-clang.so.11

Trying to find local gmmlib...
- found libigdgmm at /usr/local/lib/libigdgmm.so

Trying to find local NEO...
- found libze_intel_gpu.so.1 at /usr/local/lib/libze_intel_gpu.so.1
- found libigdrcl at /usr/local/lib/intel-opencl/libigdrcl.so

Trying to find local oneAPI loader...
- found libze_loader at /lib/x86_64-linux-gnu/libze_loader.so
- found libze_validation_layer at /lib/x86_64-linux-gnu/libze_validation_layer.so

Writing preferences...

The discovered paths will be written to a global file with preferences, typically $HOME/.julia/environments/vX.Y/LocalPreferences.toml (where vX.Y refers to the Julia version you are using). You can modify this file, or remove it when you want to revert to default set of binaries.

oneapi.jl's People

Contributors

amontoison avatar dependabot[bot] avatar dkarrasch avatar github-actions[bot] avatar gnimuc avatar kballeda avatar maleadt avatar moelf avatar pengtu avatar ranocha avatar sarbojit2019 avatar tgymnich avatar troels avatar vchuravy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oneapi.jl's Issues

How to select a device?

I have 2 Intel gpus on my machine - an igpu and a dgpu, how can i select the dgpu (second device) instead of the igpu, which is the default.

Output of versioninfo:

Binary dependencies:
- NEO_jll: 22.25.23529+0
- libigc_jll: 1.0.11378+0
- gmmlib_jll: 22.1.3+0
- SPIRV_LLVM_Translator_unified_jll: 0.2.0+0
- SPIRV_Tools_jll: 2022.1.0+0

Toolchain:
- Julia: 1.8.2
- LLVM: 13.0.1


2 devices:
- Intel(R) Graphics [0x5693]
- Intel(R) Graphics [0x46a6]

cannot open shared object file: No such file or directory

I am getting an error when I try to import oneAPI

julia> using oneAPI
[ Info: Precompiling oneAPI [8f75cd03-7ff8-4ecb-9b8f-daf728133b1b]
ERROR: LoadError: InitError: could not load library "/home/[redacted]/.julia/artifacts/769012422c18bf09c53376e964a7fb286d1799c8/lib/libLLVMSPIRVLib.so"
libLLVM-11jl.so: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(s::String, flags::UInt32)
   @ Base.Libc.Libdl ./libdl.jl:114
 [2] macro expansion
   @ ~/.julia/packages/JLLWrappers/bkwIo/src/products/library_generators.jl:54 [inlined]
 [3] __init__()
   @ SPIRV_LLVM_Translator_jll ~/.julia/packages/SPIRV_LLVM_Translator_jll/Zez7T/src/wrappers/x86_64-linux-gnu-cxx11.jl:9
 [4] top-level scope (repeats 2 times)
   @ none:1
during initialization of module SPIRV_LLVM_Translator_jll
in expression starting at /home/[redacted]/.julia/packages/oneAPI/zydrg/src/oneAPI.jl:1
ERROR: Failed to precompile oneAPI [8f75cd03-7ff8-4ecb-9b8f-daf728133b1b] to /home/felix/.julia/compiled/v1.6/oneAPI/jl_aWfrhe.
Stacktrace:
 [1] compilecache(pkg::Base.PkgId, path::String, internal_stderr::Base.TTY, internal_stdout::Base.TTY)
   @ Base ./loading.jl:1360

Error broadcasting a number

For information, I have an Intel® HD Graphics 520.

x = oneVector{Float32}(rand(10))
10-element oneVector{Float32, oneAPI.oneL0.DeviceBuffer}:
 0.63402575
 0.9193291
 0.69438887
 0.8417072
 0.7550055
 0.39492267
 0.21401107
 0.3663646
 0.21098158
 0.5470217
x .= zero(Float32)
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#17(oneAPI.oneKernelContext, oneDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{Float32}}, Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to !)
Stacktrace:
 [1] #getindex
   @ ~/Bureau/git/oneAPI.jl/src/device/quirks.jl:56
 [2] _broadcast_getindex
   @ ./broadcast.jl:617
 [3] _getindex
   @ ./broadcast.jl:667
 [4] _broadcast_getindex
   @ ./broadcast.jl:642
 [5] getindex
   @ ./broadcast.jl:597
 [6] broadcast_kernel
   @ ~/Bureau/git/GPUArrays.jl/src/host/broadcast.jl:57
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{oneAPI.oneKernelContext, oneDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{Float32}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kb6yJ/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/kb6yJ/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/kb6yJ/src/driver.jl:416 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kb6yJ/src/utils.jl:83
  [6] (::oneAPI.var"#62#63"{GPUCompiler.CompilerJob{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{oneAPI.oneKernelContext, oneDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{Float32}}, Int64}}}})(ctx::LLVM.Context)
    @ oneAPI ~/Bureau/git/oneAPI.jl/src/compiler/execution.jl:163
  [7] JuliaContext(f::oneAPI.var"#62#63"{GPUCompiler.CompilerJob{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{oneAPI.oneKernelContext, oneDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{Float32}}, Int64}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kb6yJ/src/driver.jl:76
  [8] zefunction_compile(job::GPUCompiler.CompilerJob)
    @ oneAPI ~/Bureau/git/oneAPI.jl/src/compiler/execution.jl:160
  [9] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(oneAPI.zefunction_compile), linker::typeof(oneAPI.zefunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kb6yJ/src/cache.jl:90
 [10] zefunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{oneAPI.oneKernelContext, oneDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(identity), Tuple{Float32}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ oneAPI ~/Bureau/git/oneAPI.jl/src/compiler/execution.jl:149
 [11] zefunction
    @ ~/Bureau/git/oneAPI.jl/src/compiler/execution.jl:142 [inlined]
 [12] macro expansion
    @ ~/Bureau/git/oneAPI.jl/src/compiler/execution.jl:51 [inlined]
 [13] #launch_heuristic#90
    @ ~/Bureau/git/oneAPI.jl/src/gpuarrays.jl:17 [inlined]
 [14] _copyto!
    @ ~/Bureau/git/GPUArrays.jl/src/host/broadcast.jl:63 [inlined]
 [15] materialize!
    @ ~/Bureau/git/GPUArrays.jl/src/host/broadcast.jl:41 [inlined]
 [16] materialize!(dest::oneVector{Float32, oneAPI.oneL0.DeviceBuffer}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(identity), Tuple{Float32}})
    @ Base.Broadcast ./broadcast.jl:868
 [17] top-level scope
    @ REPL[16]:1

Compile time error

I am getting a compile time error with the following short program:

#################
#GPU parallel
using PyCall, Printf
using oneAPI
cv2 = pyimport("cv2")

function gpu_parallel_paint!(pixels, t, n)
x = get_global_id(0) - 1
y = get_global_id(1) - 1
c = -0.8f0 + cos(t) * 0.2f0im
z = (y/n - 1 + (x/n - 0.5f0)im) * 2.0f0
iterations = 0
while sqrt(abs2(z)) < 20 && iterations < 50
z = z*z + c
iterations += 1
end
@inbounds pixels[x][y] = 1.0f0 - iterations * 0.02f0
return nothing
end

frames = 1000
n=320
pixels=oneAPI.fill(0.0f0, n, 2*n)

ti = 0.0f0
for i = 1:frames
t = i0.03f0
global ti += @Elapsed @oneapi items=(n, 2
n) gpu_parallel_paint!(pixels, t, n)
cv2.imshow("Julia Set", pixels)
cv2.waitKey(1)
end
fps = Int(floor(frames/ti))
println("FPS: ", fps)
cv2.destroyAllWindows()

################

Detailed error message says the kernel function returns a value of 'Union{}' even though I put "return nothing" in the kernel. I am new to Julia language, so it could be a mistake on my part.

Thanks,
Peng.


ERROR: LoadError: GPU compilation of kernel gpu_parallel_paint!(oneDeviceArray{Float32,2,1}, Float32, Int64) failed
KernelError: kernel returns a value of type Union{}

Make sure your kernel function ends in return, return nothing or nothing.
If the returned value is of type Union{}, your Julia code probably throws an exception.
Inspect the code with @device_code_warntype for more details.

Stacktrace:
[1] check_method(::GPUCompiler.CompilerJob) at /home/peng/.julia/packages/GPUCompiler/ze8Ok/src/validation.jl:18
[2] macro expansion at /home/peng/.julia/packages/TimerOutputs/dVnaw/src/TimerOutput.jl:206 [inlined]
[3] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/peng/.julia/packages/GPUCompiler/ze8Ok/src/driver.jl:63
[4] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/peng/.julia/packages/GPUCompiler/ze8Ok/src/driver.jl:39
[5] compile at /home/peng/.julia/packages/GPUCompiler/ze8Ok/src/driver.jl:35 [inlined]
[6] #zefunction_compile#42 at /home/peng/.julia/packages/oneAPI/Hn0Ia/src/compiler/execution.jl:123 [inlined]
[7] zefunction_compile(::GPUCompiler.FunctionSpec{typeof(gpu_parallel_paint!),Tuple{oneDeviceArray{Float32,2,1},Float32,Int64}}) at /home/peng/.julia/packages/oneAPI/Hn0Ia/src/compiler/execution.jl:120
[8] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{typeof(gpu_parallel_paint!),Tuple{oneDeviceArray{Float32,2,1},Float32,Int64}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/peng/.julia/packages/GPUCompiler/ze8Ok/src/cache.jl:40
[9] gpu_parallel_paint! at /home/peng/workspace/sycl-coding/Julia-Set/JuliaLang/fractal_gpu.jl:7 [inlined]
[10] cached_compilation(::Dict{UInt64,Any}, ::typeof(oneAPI.zefunction_compile), ::typeof(oneAPI.zefunction_link), ::GPUCompiler.FunctionSpec{typeof(gpu_parallel_paint!),Tuple{oneDeviceArray{Float32,2,1},Float32,Int64}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/peng/.julia/packages/GPUCompiler/ze8Ok/src/cache.jl:0
[11] cached_compilation(::Dict{UInt64,Any}, ::Function, ::Function, ::GPUCompiler.FunctionSpec{typeof(gpu_parallel_paint!),Tuple{oneDeviceArray{Float32,2,1},Float32,Int64}}) at /home/peng/.julia/packages/GPUCompiler/ze8Ok/src/cache.jl:65
[12] zefunction(::Function, ::Type{T} where T; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/peng/.julia/packages/oneAPI/Hn0Ia/src/compiler/execution.jl:113
[13] zefunction(::Function, ::Type{T} where T) at /home/peng/.julia/packages/oneAPI/Hn0Ia/src/compiler/execution.jl:111
[14] macro expansion at /home/peng/.julia/packages/oneAPI/Hn0Ia/src/compiler/execution.jl:34 [inlined]
[15] macro expansion at ./timing.jl:233 [inlined]
[16] top-level scope at /home/peng/workspace/sycl-coding/Julia-Set/JuliaLang/fractal_gpu.jl:27
[17] include(::Function, ::Module, ::String) at ./Base.jl:380
[18] include(::Module, ::String) at ./Base.jl:368
[19] exec_options(::Base.JLOptions) at ./client.jl:296
[20] _start() at ./client.jl:506

Kernel compilation fail when using `sqrt` and similar function

MWE:

using oneAPI
function fff(a, s)
    a[1] = sqrt(s)
    # a[1] = oneAPI.sqrt(s)  # this one works
    return nothing
end
@oneapi items=1  fff(oneArray([0.0f0]), 1.23f0)

gives
InvalidFunctionCall: Unexpected llvm intrinsic: llvm.sqrt.f32 [Src: ../lib/SPIRV/SPIRVWriter.cpp:1491 ]

Also fails with abs, for example.

(feel free to change the issue title)

fill! doesn't work with Int16/Float16

By @troels:

For the oneAPI failures, there appears to be one libzero bug in zeCommandListAppendMemoryFill for filling 6 bytes and 10 bytes, and probably more with 2 byte content. I guess a @test_broken would be appropriate, though with this approach, it's somewhat difficult to add that. Until the bug is fixed, not trying to fill with 2 byte types seems in oneAPI seems ok.

precompile error for SPIRV_LLVM_Translator_jll

I get an error precompiling oneAPI:

(oneapi) pkg> precompile
Precompiling project...
[ Info: Precompiling oneAPI [8f75cd03-7ff8-4ecb-9b8f-daf728133b1b]
ERROR: LoadError: InitError: could not load library "/home/bda/.julia/artifacts/6b3e339c94c460d95f35b5c6cc186f3d829b31de/lib/libLLVMSPIRVLib.so"
libLLVM-9jl.so: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String, ::UInt32) at /build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
 [2] macro expansion at /home/bda/.julia/packages/JLLWrappers/KuIwt/src/products/library_generators.jl:61 [inlined]
 [3] __init__() at /home/bda/.julia/packages/SPIRV_LLVM_Translator_jll/E4bij/src/wrappers/x86_64-linux-gnu-cxx11.jl:9
 [4] top-level scope at none:2
 [5] eval at ./boot.jl:331 [inlined]
during initialization of module SPIRV_LLVM_Translator_jll
in expression starting at /home/bda/.julia/packages/oneAPI/UcBkX/src/oneAPI.jl:12

I see this error on Debian testing (with default clang 10) and on Arch linux (with default clang 11). Note that libLLVMSPIRVLib.so is a symlink to libLLVMSPIRVLib.so.9jl which does exist, but libLLVM-9jl.so does not.

oneMKL: copy broken

As noted in #242 (comment); reproduces quickly by executing the oneMKL tests multiple times:

❯ jl --project -L test/setup.jl

copy: Test Failed at /home/tim/Julia/pkg/oneAPI/test/onemkl.jl:15
  Expression: Array(A) == Array(B)
   Evaluated: ComplexF32[0.62028897f0 + 0.5680954f0im, 0.8470302f0 + 0.72753716f0im, 0.48795187f0 + 0.21962279f0im, 0.9410701f0 + 0.19438893f0im, 0.03132671f0 + 0.24299073f0im, 0.9167444f0 + 0.53888947f0im, 0.9014001f0 + 0.9751158f0im, 0.8477041f0 + 0.5251923f0im, 0.014673173f0 + 0.01942259f0im, 0.7496979f0 + 0.8727344f0im, 0.30134517f0 + 0.07368064f0im, 0.42843723f0 + 0.2052998f0im, 0.08305049f0 + 0.31514007f0im, 0.55817646f0 + 0.24732232f0im, 0.99573386f0 + 0.2845214f0im, 0.4044994f0 + 0.38024253f0im, 0.90844953f0 + 0.33899027f0im, 0.8413433f0 + 0.40553868f0im, 0.93793666f0 + 0.4310677f0im, 0.064192474f0 + 0.19298851f0im] == ComplexF32[0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im, 0.0f0 + 0.0f0im]

@Sarbojit2019 could you take a look?

Intel Arc hardware requires Linux 6.2

Specs

OS: Clear Linux 37450
Linux Kernel: 6.0.2-1201.native
CPU: 12th Gen Intel® Core™ i7-12700 × 20
GPU: Mesa Intel® Arc™ A750 Graphics (DG2)
Julia: 1.8.2
oneAPI.jl: 0.3.0

Mesa

*** MESA_GLSL_CACHE_DISABLE is deprecated; use MESA_SHADER_CACHE_DISABLE instead ***
client glx vendor string: Mesa Project and SGI
    Device: Mesa Intel(R) Arc(tm) A750 Graphics (DG2) (0x56a1)
OpenGL renderer string: Mesa Intel(R) Arc(tm) A750 Graphics (DG2)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 22.3.0-devel
OpenGL version string: 4.6 (Compatibility Profile) Mesa 22.3.0-devel

DRV

julia> drv
ZeDriver(00000000-0000-0000-171f-ad2b01030000, version 1.3.0)

Devices

julia> dev = first(devices(drv))
ZeDevice(GPU, vendor 0x8086, device 0x56a1): Intel(R) Graphics [0x56a1]

Issue

It seems that using intel arc GPUs lead to an "internal error" even for just creating an array (see below).

Error

julia> a = oneArray(rand(Float32, 2,2))
ERROR: ZeError: unknown or internal error (code 2147483646, ZE_RESULT_ERROR_UNKNOWN)
Stacktrace:
  [1] throw_api_error(res::oneAPI.oneL0._ze_result_t)
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/1AnU6/lib/level-zero/error.jl:102
  [2] macro expansion
    @ ~/.julia/packages/oneAPI/1AnU6/lib/level-zero/oneL0.jl:20 [inlined]
  [3] zeCommandQueueExecuteCommandLists
    @ ~/.julia/packages/oneAPI/1AnU6/lib/utils/call.jl:24 [inlined]
  [4] execute!
    @ ~/.julia/packages/oneAPI/1AnU6/lib/level-zero/cmdlist.jl:47 [inlined]
  [5] execute!(f::Function, queue::ZeCommandQueue, fence::Nothing; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ oneAPI.oneL0 ~/.julia/packages/oneAPI/1AnU6/lib/level-zero/cmdlist.jl:60
  [6] execute! (repeats 2 times)
    @ ~/.julia/packages/oneAPI/1AnU6/lib/level-zero/cmdlist.jl:58 [inlined]
  [7] unsafe_copyto!(ctx::ZeContext, dev::ZeDevice, dst::ZePtr{Float32}, src::Ptr{Float32}, N::Int64)
    @ oneAPI ~/.julia/packages/oneAPI/1AnU6/src/memory.jl:7
  [8] unsafe_copyto!
    @ ~/.julia/packages/oneAPI/1AnU6/src/array.jl:305 [inlined]
  [9] copyto!(dest::oneMatrix{Float32, oneAPI.oneL0.DeviceBuffer}, doffs::Int64, src::Matrix{Float32}, soffs::Int64, n::Int64)
    @ oneAPI ~/.julia/packages/oneAPI/1AnU6/src/array.jl:267
 [10] copyto!
    @ ~/.julia/packages/oneAPI/1AnU6/src/array.jl:271 [inlined]
 [11] oneArray
    @ ~/.julia/packages/oneAPI/1AnU6/src/array.jl:205 [inlined]
 [12] oneArray
    @ ~/.julia/packages/oneAPI/1AnU6/src/array.jl:209 [inlined]
 [13] oneArray(A::Matrix{Float32})
    @ oneAPI ~/.julia/packages/oneAPI/1AnU6/src/array.jl:218
 [14] top-level scope

Miscompilation due to conditional store moving across work group barrier

function kernel(::Val{n}, x::Core.LLVMPtr{T}) where {T, n}
    tx = get_local_id()

    @inbounds begin
        # initialize some local memory
        y = oneLocalArray(T, n)
        y[tx] = unsafe_load(x, tx)
        barrier()

        # show its contents
        if tx == 1
            for i in 1:n
                oneAPI.@println("y[$i] = $(y[i])")
            end
        end

        # write to it, but _after_ the display
        barrier()
        if tx > 1
            y[tx] = 0
        end
    end

    return
end

x = oneArray([10, 10])
ptr = reinterpret(Core.LLVMPtr{eltype(x),AS.Global}, pointer(x))
GC.@preserve x begin
    n = length(x)
    @oneapi items=n kernel(Val(n), ptr)
    synchronize()
end

Prints [10, 0], which is obviously wrong. Making the store unconditional results in the expected output.

Use DMA engine for large memory copies

We currently use a single global queue, but large memory transfers should probably use a special queue with FLAG_COPY set so that the DMA copy engines can be used. We'll probably need to order operations on that queue wrt. to the global queue (using events?).

Interactive profiling

VTune is great, but it can currently only profile a whole application or attach and start/stop manually. It'd be great to be able to denote profile ranges a la CUDA.@profile, and in addition to be able to do so with an attached program for interactive profiling.

@vchuravy pointed me to Intel's Instrumentation and Tracing Technology (ITT), https://github.com/intel/ittapi, as an equivalent for NVIDIA's NVTX, which has similar APIs:

Apparently ITT is already being built as part of LLVM for the purpose of JIT registration, so we could take it from there.

Worryingly high memory use (and leak?) during testing

┌ Info: System information:
│ Binary dependencies:
│ - NEO_jll: 22.25.23529+0
│ - libigc_jll: 1.0.11378+0
│ - gmmlib_jll: 22.1.3+0
│ - SPIRV_LLVM_Translator_unified_jll: 0.2.0+0
│ - SPIRV_Tools_jll: 2022.1.0+0
│ 
│ Toolchain:
│ - Julia: 1.7.3
│ - LLVM: 12.0.1
│ 
│ 1 driver:
│ - 00000000-0000-0000-1707-a06e01030000 (v1.3.0, API v1.3.0)
│ 
│ 1 device:
└ - Intel(R) UHD Graphics P630 [0x3e96]
                                                  |          | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
array                                         (2) |     1.60 |   0.04 |  2.5 |     276.05 |   585.97 |
examples                                      (2) |    14.34 |   0.00 |  0.0 |       9.97 |   585.97 |
      From worker 2:	WARNING: Method definition #2699#kernel(Any) in module Main at /home/joto/.julia/packages/oneAPI/mHp15/test/execution.jl:293 overwritten at /home/joto/.julia/packages/oneAPI/mHp15/test/execution.jl:301.
execution                                     (2) |    42.62 |   2.01 |  4.7 |    7578.43 |   745.37 |
level-zero                                    (2) |     2.88 |   0.05 |  1.7 |     306.18 |   745.37 |
pointer                                       (2) |     0.21 |   0.00 |  0.0 |      12.31 |   745.37 |
device/intrinsics                             (2) |    63.98 |   3.81 |  5.9 |   12120.96 |  1145.53 |
gpuarrays/indexing scalar                     (2) |    18.04 |   0.98 |  5.4 |    3239.98 |  1261.28 |
gpuarrays/reductions/reducedim!               (2) |   151.24 |   8.27 |  5.5 |   26046.24 |  2344.89 |
gpuarrays/linalg                              (2) |    74.94 |   3.10 |  4.1 |   12603.09 |  3300.46 |
gpuarrays/math/power                          (2) |    34.64 |   1.88 |  5.4 |    6110.72 |  3511.20 |
gpuarrays/linalg/mul!/vector-matrix           (2) |    88.00 |   3.62 |  4.1 |   13997.27 |  4425.48 |
gpuarrays/indexing multidimensional           (2) |    42.98 |   2.14 |  5.0 |    7570.85 |  4710.65 |
gpuarrays/interface                           (2) |     5.09 |   0.19 |  3.8 |     689.76 |  4757.80 |
gpuarrays/reductions/any all count            (2) |    20.73 |   0.92 |  4.4 |    3385.59 |  4861.70 |
gpuarrays/reductions/minimum maximum extrema  (2) |   231.66 |  10.06 |  4.3 |   34656.54 |  6431.69 |
gpuarrays/uniformscaling                      (2) |     9.47 |   0.34 |  3.6 |    1257.00 |  6431.69 |
gpuarrays/linalg/mul!/matrix-matrix           (2) |   182.76 |   7.28 |  4.0 |   26287.22 |  8479.74 |
gpuarrays/math/intrinsics                     (2) |     4.16 |   0.19 |  4.6 |     685.35 |  8479.74 |
gpuarrays/linalg/norm                         (2) |   362.17 |  12.16 |  3.4 |   38170.87 |  9847.84 |
gpuarrays/statistics                          (2) |   135.30 |   5.44 |  4.0 |   18106.43 | 10524.75 |
gpuarrays/reductions/mapreduce                (2) |   495.31 |  17.34 |  3.5 |   57147.60 | 12348.84 |
gpuarrays/constructors                        (2) |    16.90 |   0.52 |  3.1 |    2413.53 | 12456.36 |
gpuarrays/random                              (2) |    32.64 |   1.39 |  4.3 |    5132.55 | 12456.36 |
gpuarrays/base                                (2) |    61.33 |   2.99 |  4.9 |   10806.98 | 12768.13 |
gpuarrays/reductions/== isequal               (2) |   253.17 |   4.77 |  1.9 |   17205.50 | 13480.29 |
gpuarrays/broadcasting                        (2) |   459.49 |  21.83 |  4.8 |   56494.34 | 17444.71 |
gpuarrays/reductions/mapreducedim!            (2) |   131.21 |   5.64 |  4.3 |   13080.49 | 18137.08 |
gpuarrays/reductions/reduce                   (2) |    39.97 |   0.42 |  1.0 |    2175.47 | 18137.08 |
gpuarrays/reductions/sum prod                 (2) |   590.88 |  34.96 |  5.9 |   42650.34 | 19826.98 |
Testing finished in 1 hour, 1 second, 627 milliseconds

Test Summary: | Pass  Total
  Overall     | 6703   6703
    SUCCESS
     Testing oneAPI tests passed 

shell> ps -o pid,user,vsz,rss,comm,args
    PID USER        VSZ   RSS COMMAND         COMMAND
2538755 joto   11604  5300 bash            -bash
2538807 joto   72336  3084 julia           julia
2538809 joto  1766440 303012 julia         /home/joto/.julia/juliaup/julia-1.7.3+0.x64/bin/julia
2561125 joto    9932  3468 bash            /bin/bash -c (ps -o 'pid,user,vsz,rss,comm,args') && true
2561129 joto   12100  3220 ps              ps -o pid,user,vsz,rss,comm,args

julia> GC.gc()

shell> ps -o pid,user,vsz,rss,comm,args
    PID USER        VSZ   RSS COMMAND         COMMAND
2538755 joto   11604  5300 bash            -bash
2538807 joto   72336  3084 julia           julia
2538809 joto  1670580 285872 julia         /home/joto/.julia/juliaup/julia-1.7.3+0.x64/bin/julia
2561650 joto    9932  3392 bash            /bin/bash -c (ps -o 'pid,user,vsz,rss,comm,args') && true
2561654 joto   12100  3268 ps              ps -o pid,user,vsz,rss,comm,args

InvalidFunctionCall: Unexpected llvm intrinsic: llvm.fabs.f64 [Src: ../lib/SPIRV/SPIRVWriter.cpp:1625 ]

I attempted to run the first example from https://github.com/SciML/DiffEqGPU.jl but only with oneArray instead of cu to allocate the arrays in Intel GPU space.

I got the following stack trace trying to execute:
@time sol = solve(prob,Tsit5())

InvalidFunctionCall: Unexpected llvm intrinsic: llvm.fabs.f64 [Src: ../lib/SPIRV/SPIRVWriter.cpp:1625  ]
Stack dump:
0.	Program arguments: /home/joto/.julia/artifacts/6b3e339c94c460d95f35b5c6cc186f3d829b31de/bin/llvm-spirv -o /tmp/jl_5uXC9h /tmp/jl_NKMykh 
1.	Running pass 'LLVMToSPIRV' on module '/tmp/jl_NKMykh'.
/usr/bin/../lib64/julia/libLLVM-9jl.so(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x2e)[0x7f0e5986688e]
/usr/bin/../lib64/julia/libLLVM-9jl.so(_ZN4llvm3sys17RunSignalHandlersEv+0x34)[0x7f0e59864ec4]
/usr/bin/../lib64/julia/libLLVM-9jl.so(+0x7bf002)[0x7f0e59865002]
/usr/bin/../lib64/libpthread.so.0(+0x14a90)[0x7f0e5c787a90]
/usr/bin/../lib64/libc.so.6(gsignal+0x145)[0x7f0e58bc59e5]
/usr/bin/../lib64/libc.so.6(abort+0x127)[0x7f0e58bae895]
/home/joto/.julia/artifacts/6b3e339c94c460d95f35b5c6cc186f3d829b31de/lib/libLLVMSPIRVLib.so.9jl(_ZN5SPIRV13SPIRVErrorLog10checkErrorEbNS_14SPIRVErrorCodeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPKcSB_j+0x358)[0x7f0e5c463bb8]
ERROR: failed process: Process(`/home/joto/.julia/artifacts/6b3e339c94c460d95f35b5c6cc186f3d829b31de/bin/llvm-spirv -o /tmp/jl_5uXC9h /tmp/jl_NKMykh`, ProcessSignaled(6)) [0]

Stacktrace:
 [1] pipeline_error at ./process.jl:525 [inlined]
 [2] run(::Cmd; wait::Bool) at ./process.jl:440
 [3] run at ./process.jl:438 [inlined]
 [4] #28 at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/spirv.jl:77 [inlined]
 [5] #2 at /home/joto/.julia/packages/JLLWrappers/KuIwt/src/runtime.jl:49 [inlined]
 [6] withenv(::JLLWrappers.var"#2#3"{GPUCompiler.var"#28#31"{String,String,LLVM.API.LLVMCodeGenFileType},String}, ::Pair{String,String}, ::Vararg{Pair{String,String},N} where N) at ./env.jl:161
 [7] withenv_executable_wrapper(::GPUCompiler.var"#28#31"{String,String,LLVM.API.LLVMCodeGenFileType}, ::String, ::String, ::String, ::Bool, ::Bool) at /home/joto/.julia/packages/JLLWrappers/KuIwt/src/runtime.jl:48
 [8] #invokelatest#1 at ./essentials.jl:710 [inlined]
 [9] invokelatest at ./essentials.jl:709 [inlined]
 [10] #llvm_spirv#7 at /home/joto/.julia/packages/JLLWrappers/KuIwt/src/products/executable_generators.jl:7 [inlined]
 [11] llvm_spirv(::Function) at /home/joto/.julia/packages/JLLWrappers/KuIwt/src/products/executable_generators.jl:7
 [12] (::GPUCompiler.var"#27#30"{String,LLVM.API.LLVMCodeGenFileType})(::String, ::IOStream) at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/spirv.jl:71
 [13] mktemp(::GPUCompiler.var"#27#30"{String,LLVM.API.LLVMCodeGenFileType}, ::String) at ./file.jl:659
 [14] mktemp at ./file.jl:657 [inlined]
 [15] #26 at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/spirv.jl:70 [inlined]
 [16] mktemp(::GPUCompiler.var"#26#29"{LLVM.Module,LLVM.API.LLVMCodeGenFileType}, ::String) at ./file.jl:659
 [17] mktemp at ./file.jl:657 [inlined]
 [18] mcgen(::GPUCompiler.CompilerJob{GPUCompiler.SPIRVCompilerTarget,oneAPI.oneAPICompilerParams}, ::LLVM.Module, ::LLVM.Function, ::LLVM.API.LLVMCodeGenFileType) at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/spirv.jl:65
 [19] macro expansion at /home/joto/.julia/packages/TimerOutputs/ZmKD7/src/TimerOutput.jl:206 [inlined]
 [20] macro expansion at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:254 [inlined]
 [21] macro expansion at /home/joto/.julia/packages/TimerOutputs/ZmKD7/src/TimerOutput.jl:206 [inlined]
 [22] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:248
 [23] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:39
 [24] compile at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:35 [inlined]
 [25] #zefunction_compile#53 at /home/joto/.julia/packages/oneAPI/UcBkX/src/compiler/execution.jl:128 [inlined]
 [26] zefunction_compile(::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12",Tuple{oneAPI.oneKernelContext,oneDeviceArray{Float64,1,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(muladd),Tuple{Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1},Nothing,typeof(DiffEqBase.ODE_DEFAULT_NORM),Tuple{Base.Broadcast.Extruded{oneDeviceArray{Float64,1,1},Tuple{Bool},Tuple{Int64}},Float32}},Float64,Float64}},Int64}}) at /home/joto/.julia/packages/oneAPI/UcBkX/src/compiler/execution.jl:125
 [27] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12",Tuple{oneAPI.oneKernelContext,oneDeviceArray{Float64,1,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(muladd),Tuple{Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1},Nothing,typeof(DiffEqBase.ODE_DEFAULT_NORM),Tuple{Base.Broadcast.Extruded{oneDeviceArray{Float64,1,1},Tuple{Bool},Tuple{Int64}},Float32}},Float64,Float64}},Int64}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:40
 [28] broadcast_kernel at /home/joto/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:60 [inlined]
 [29] cached_compilation(::Dict{UInt64,Any}, ::typeof(oneAPI.zefunction_compile), ::typeof(oneAPI.zefunction_link), ::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12",Tuple{oneAPI.oneKernelContext,oneDeviceArray{Float64,1,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(muladd),Tuple{Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1},Nothing,typeof(DiffEqBase.ODE_DEFAULT_NORM),Tuple{Base.Broadcast.Extruded{oneDeviceArray{Float64,1,1},Tuple{Bool},Tuple{Int64}},Float32}},Float64,Float64}},Int64}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:0
 [30] cached_compilation(::Dict{UInt64,Any}, ::Function, ::Function, ::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12",Tuple{oneAPI.oneKernelContext,oneDeviceArray{Float64,1,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(muladd),Tuple{Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{1},Nothing,typeof(DiffEqBase.ODE_DEFAULT_NORM),Tuple{Base.Broadcast.Extruded{oneDeviceArray{Float64,1,1},Tuple{Bool},Tuple{Int64}},Float32}},Float64,Float64}},Int64}}) at /home/joto/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65
 [31] zefunction(::Function, ::Type{T} where T; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/joto/.julia/packages/oneAPI/UcBkX/src/compiler/execution.jl:118
 [32] macro expansion at /home/joto/.julia/packages/oneAPI/UcBkX/src/compiler/execution.jl:34 [inlined]
 [33] #gpu_call#81 at /home/joto/.julia/packages/oneAPI/UcBkX/src/gpuarrays.jl:29 [inlined]
 [34] #gpu_call#1 at /home/joto/.julia/packages/GPUArrays/ZxsKE/src/device/execution.jl:67 [inlined]
 [35] copyto! at /home/joto/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:68 [inlined]
 [36] copyto! at ./broadcast.jl:886 [inlined]
 [37] materialize! at ./broadcast.jl:848 [inlined]
 [38] materialize! at ./broadcast.jl:845 [inlined]
 [39] macro expansion at /home/joto/.julia/packages/DiffEqBase/V7P18/src/diffeqfastbc.jl:88 [inlined]
 [40] ode_determine_initdt(::oneArray{Float64,1}, ::Float32, ::Float32, ::Float32, ::Float64, ::Float64, ::typeof(DiffEqBase.ODE_DEFAULT_NORM), ::ODEProblem{oneArray{Float64,1},Tuple{Float32,Float32},true,DiffEqBase.NullParameters,ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem}, ::OrdinaryDiffEq.ODEIntegrator{Tsit5,true,oneArray{Float64,1},Nothing,Float32,DiffEqBase.NullParameters,Float32,Float32,Float32,Array{oneArray{Float64,1},1},ODESolution{Float64,2,Array{oneArray{Float64,1},1},Nothing,Nothing,Array{Float32,1},Array{Array{oneArray{Float64,1},1},1},ODEProblem{oneArray{Float64,1},Tuple{Float32,Float32},true,DiffEqBase.NullParameters,ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem},Tsit5,OrdinaryDiffEq.InterpolationData{ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Array{oneArray{Float64,1},1},Array{Float32,1},Array{Array{oneArray{Float64,1},1},1},OrdinaryDiffEq.Tsit5Cache{oneArray{Float64,1},oneArray{Float64,1},oneArray{Float64,1},OrdinaryDiffEq.Tsit5ConstantCache{Float64,Float32}}},DiffEqBase.DEStats},ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},OrdinaryDiffEq.Tsit5Cache{oneArray{Float64,1},oneArray{Float64,1},oneArray{Float64,1},OrdinaryDiffEq.Tsit5ConstantCache{Float64,Float32}},OrdinaryDiffEq.DEOptions{Float64,Float64,Float32,Float32,typeof(DiffEqBase.ODE_DEFAULT_NORM),typeof(opnorm),CallbackSet{Tuple{},Tuple{}},typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN),typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE),typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK),DataStructures.BinaryHeap{Float32,DataStructures.LessThan},DataStructures.BinaryHeap{Float32,DataStructures.LessThan},Nothing,Nothing,Int64,Tuple{},Tuple{},Tuple{}},oneArray{Float64,1},Float64,Nothing,OrdinaryDiffEq.DefaultInit}) at /home/joto/.julia/packages/OrdinaryDiffEq/VPJBD/src/initdt.jl:11
 [41] auto_dt_reset! at /home/joto/.julia/packages/OrdinaryDiffEq/VPJBD/src/integrators/integrator_interface.jl:297 [inlined]
 [42] handle_dt!(::OrdinaryDiffEq.ODEIntegrator{Tsit5,true,oneArray{Float64,1},Nothing,Float32,DiffEqBase.NullParameters,Float32,Float32,Float32,Array{oneArray{Float64,1},1},ODESolution{Float64,2,Array{oneArray{Float64,1},1},Nothing,Nothing,Array{Float32,1},Array{Array{oneArray{Float64,1},1},1},ODEProblem{oneArray{Float64,1},Tuple{Float32,Float32},true,DiffEqBase.NullParameters,ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem},Tsit5,OrdinaryDiffEq.InterpolationData{ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Array{oneArray{Float64,1},1},Array{Float32,1},Array{Array{oneArray{Float64,1},1},1},OrdinaryDiffEq.Tsit5Cache{oneArray{Float64,1},oneArray{Float64,1},oneArray{Float64,1},OrdinaryDiffEq.Tsit5ConstantCache{Float64,Float32}}},DiffEqBase.DEStats},ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},OrdinaryDiffEq.Tsit5Cache{oneArray{Float64,1},oneArray{Float64,1},oneArray{Float64,1},OrdinaryDiffEq.Tsit5ConstantCache{Float64,Float32}},OrdinaryDiffEq.DEOptions{Float64,Float64,Float32,Float32,typeof(DiffEqBase.ODE_DEFAULT_NORM),typeof(opnorm),CallbackSet{Tuple{},Tuple{}},typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN),typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE),typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK),DataStructures.BinaryHeap{Float32,DataStructures.LessThan},DataStructures.BinaryHeap{Float32,DataStructures.LessThan},Nothing,Nothing,Int64,Tuple{},Tuple{},Tuple{}},oneArray{Float64,1},Float64,Nothing,OrdinaryDiffEq.DefaultInit}) at /home/joto/.julia/packages/OrdinaryDiffEq/VPJBD/src/solve.jl:453
 [43] __init(::ODEProblem{oneArray{Float64,1},Tuple{Float32,Float32},true,DiffEqBase.NullParameters,ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem}, ::Tsit5, ::Tuple{}, ::Tuple{}, ::Tuple{}, ::Type{Val{true}}; saveat::Tuple{}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Bool, callback::Nothing, dense::Bool, calck::Bool, dt::Float32, dtmin::Nothing, dtmax::Float32, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, qoldinit::Rational{Int64}, fullnormalize::Bool, failfactor::Int64, beta1::Nothing, beta2::Nothing, maxiters::Int64, internalnorm::typeof(DiffEqBase.ODE_DEFAULT_NORM), internalopnorm::typeof(opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK), verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/joto/.julia/packages/OrdinaryDiffEq/VPJBD/src/solve.jl:416
 [44] __init(::ODEProblem{oneArray{Float64,1},Tuple{Float32,Float32},true,DiffEqBase.NullParameters,ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem}, ::Tsit5, ::Tuple{}, ::Tuple{}, ::Tuple{}, ::Type{Val{true}}) at /home/joto/.julia/packages/OrdinaryDiffEq/VPJBD/src/solve.jl:66 (repeats 5 times)
 [45] #__solve#391 at /home/joto/.julia/packages/OrdinaryDiffEq/VPJBD/src/solve.jl:4 [inlined]
 [46] __solve at /home/joto/.julia/packages/OrdinaryDiffEq/VPJBD/src/solve.jl:4 [inlined]
 [47] solve_call(::ODEProblem{oneArray{Float64,1},Tuple{Float32,Float32},true,DiffEqBase.NullParameters,ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem}, ::Tsit5; merge_callbacks::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/joto/.julia/packages/DiffEqBase/V7P18/src/solve.jl:92
 [48] solve_call at /home/joto/.julia/packages/DiffEqBase/V7P18/src/solve.jl:65 [inlined]
 [49] #solve_up#461 at /home/joto/.julia/packages/DiffEqBase/V7P18/src/solve.jl:114 [inlined]
 [50] solve_up at /home/joto/.julia/packages/DiffEqBase/V7P18/src/solve.jl:107 [inlined]
 [51] #solve#460 at /home/joto/.julia/packages/DiffEqBase/V7P18/src/solve.jl:102 [inlined]
 [52] solve(::ODEProblem{oneArray{Float64,1},Tuple{Float32,Float32},true,DiffEqBase.NullParameters,ODEFunction{true,typeof(f),UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},DiffEqBase.StandardODEProblem}, ::Tsit5) at /home/joto/.julia/packages/DiffEqBase/V7P18/src/solve.jl:100
 [53] top-level scope at ./timing.jl:174 [inlined]
 [54] top-level scope at ./REPL[8]:0

Abort during SYCL queue creation

As observed while running oneAPI tests:

┌ Info: System information:
│ Binary dependencies:
│ - NEO_jll: 22.35.24055+0
│ - libigc_jll: 1.0.11702+0
│ - gmmlib_jll: 22.1.3+0
│ - SPIRV_LLVM_Translator_unified_jll: 0.2.0+0
│ - SPIRV_Tools_jll: 2022.1.0+0
│
│ Toolchain:
│ - Julia: 1.8.5
│ - LLVM: 13.0.1
│
│ 1 driver:
│ - 00000000-0000-0000-173b-6a4501030000 (v1.3.0, API v1.3.0)
│
│ 1 device:
└ - Intel(R) Iris(R) Xe Graphics [0x9a49]
[ Info: Using oneAPI support library at /home/tim/Julia/depot/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/deps/lib/liboneapi_support.so
                                                  |          | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
      From worker 8:	terminate called after throwing an instance of 'sycl::_V1::runtime_error'
      From worker 8:	  what():  Native API failed. Native API returns: -33 (PI_ERROR_INVALID_DEVICE) -33 (PI_ERROR_INVALID_DEVICE)
      From worker 8:
      From worker 8:	signal (6): Aborted
      From worker 8:	in expression starting at /home/tim/Julia/pkg/oneAPI/test/sycl.jl:14
      From worker 8:	gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
      From worker 8:	abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
      From worker 8:	__verbose_terminate_handler at /workspace/srcdir/gcc-12.1.0/libstdc++-v3/libsupc++/vterminate.cc:95
      From worker 8:	__terminate at /workspace/srcdir/gcc-12.1.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
      From worker 8:	terminate at /workspace/srcdir/gcc-12.1.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
      From worker 8:	__cxa_throw at /workspace/srcdir/gcc-12.1.0/libstdc++-v3/libsupc++/eh_throw.cc:98
      From worker 8:	_ZNK4sycl3_V16detail6plugin13checkPiResultINS0_13runtime_errorEEEv10_pi_result at /home/tim/Julia/depot/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libsycl.so.6 (unknown line)
      From worker 8:	_ZN4sycl3_V16detail15make_queue_implEmRKNS0_7contextEP10_pi_devicebRKSt8functionIFvNS0_14exception_listEEENS0_7backendE at /home/tim/Julia/depot/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libsycl.so.6 (unknown line)
      From worker 8:	_ZN4sycl3_V16detail10make_queueEmRKNS0_7contextEPKNS0_6deviceEbRKSt8functionIFvNS0_14exception_listEEENS0_7backendE at /home/tim/Julia/depot/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libsycl.so.6 (unknown line)
      From worker 8:	_ZN4sycl3_V13ext6oneapi10level_zero10make_queueERKNS0_7contextEmb at /home/tim/Julia/depot/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/conda/lib/libsycl.so.6 (unknown line)
      From worker 8:	syclQueueCreate at /home/tim/Julia/depot/scratchspaces/8f75cd03-7ff8-4ecb-9b8f-daf728133b1b/deps/lib/liboneapi_support.so (unknown line)
      From worker 8:	syclQueueCreate at /home/tim/Julia/pkg/oneAPI/lib/sycl/libsycl.jl:49 [inlined]
      From worker 8:	syclQueue at /home/tim/Julia/pkg/oneAPI/lib/sycl/SYCL.jl:74 [inlined]
      From worker 8:	#43 at /home/tim/Julia/pkg/oneAPI/src/context.jl:88
      From worker 8:	get! at ./iddict.jl:178 [inlined]
      From worker 8:	sycl_queue at /home/tim/Julia/pkg/oneAPI/src/context.jl:87

This was on a loaded machine, which may be related. I included the versioninfo output to show that the system does have a valid device, and I have previously successfully executed code on it. In fact, during this very test execution other native tests (i.e., not using oneMKL/sycl, but using oneL0 from Julia directly) worked fine, so this looks like a specific issue with the SYCL/MKL integration.

Also, regardless of the actual issue, shouldn't we be catching C++ exceptions like this instead of aborting the process?

cc @pengtu

oneMKL: test failures with rotate and reflect

Hi @maleadt , I am facing following errors when I tested latest master in my local setup.
Looks like couple of rotate/reflect primitives are failing for complexf64 type at level-1.

~/Kali/2023/2002_fp16_rebase/oneAPI.jl$ $JULIA --project -L test/setup.jl test/onemkl.jl 
rotate: Test Failed at /home/kali/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:33
  Expression: testf(rotate!, rand(T, m), rand(T, m), rand(real(T)), rand(T))
Stacktrace:
 [1] macro expansion
   @ ~/Kali/oneAPI.jl/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
 [2] macro expansion
   @ ~/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:33 [inlined]
 [3] macro expansion
   @ ~/Kali/oneAPI.jl/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:1363 [inlined]
 [4] macro expansion
   @ ~/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:32 [inlined]
 [5] macro expansion
   @ ~/Kali/oneAPI.jl/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:1439 [inlined]
 [6] macro expansion
   @ ~/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:12 [inlined]
 [7] macro expansion
   @ ~/Kali/oneAPI.jl/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:1363 [inlined]
 [8] top-level scope
   @ ~/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:12
reflect: Test Failed at /home/kali/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:38
  Expression: testf(reflect!, rand(T, m), rand(T, m), rand(real(T)), rand(T))
Stacktrace:
 [1] macro expansion
   @ ~/Kali/oneAPI.jl/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
 [2] macro expansion
   @ ~/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:38 [inlined]
 [3] macro expansion
   @ ~/Kali/oneAPI.jl/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:1363 [inlined]
 [4] macro expansion
   @ ~/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:37 [inlined]
 [5] macro expansion
   @ ~/Kali/oneAPI.jl/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:1439 [inlined]
 [6] macro expansion
   @ ~/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:12 [inlined]
 [7] macro expansion
   @ ~/Kali/oneAPI.jl/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:1363 [inlined]
 [8] top-level scope
   @ ~/Kali/2023/2002_fp16_rebase/oneAPI.jl/test/onemkl.jl:12

image

Test Environment:
OS: "Ubuntu 20.04.5 LTS"
CPU: i9-9900K
GPU: Intel(R) UHD Graphics 630 [0x3e98]

Warning+Error thrown by oneAPI, use of "arguments" must be qualified

Trying to test out oneAPI.jl, package seems to throw an error due to a naming conflict with LLVM? The same error is thrown trying to run the kernel example from this blogpost.

This is on Kubuntu 20.04 Linux.

julia> using oneAPI

julia> oneAPI.versioninfo()
Binary dependencies:
- NEO_jll: 21.31.20514+0
- libigc_jll: 1.0.8173+0
- gmmlib_jll: 21.2.1+0
- SPIRV_LLVM_Translator_jll: 11.0.0+2
- SPIRV_Tools_jll: 2021.2.0+0

Toolchain:
- Julia: 1.6.0
- LLVM: 11.0.1

1 driver:
- 00000000-0000-0000-16b8-d0b101010000 (v1.1.0, API v1.1.0)

1 device:
- Intel(R) UHD Graphics 620 [0x3ea0]


julia> function my_copy_transpose(dst, src, newaxes)
              i = get_global_id()
              idx_src = CartesianIndices(src)[i]
              idx_dest = CartesianIndex((idx_src[i] for i in newaxes)...)
              dst[idx_dest] = src[idx_src]
              return nothing
              end
my_copy_transpose (generic function with 1 method)

julia> src = oneArray(rand(2, 2, 2, 2, 2));

julia> dst = similar(src);

julia> @oneapi items=(2 ^ 5) my_copy_transpose(dst, src, (2, 1, 3, 4, 5))
WARNING: both oneL0 and LLVM export "arguments"; uses of it in module oneAPI must be qualified
ERROR: UndefVarError: arguments not defined
Stacktrace:
 [1] onecall(::oneAPI.oneL0.ZeKernel, ::Type, ::oneDeviceArray{Float64, 5, 1}, ::Vararg{Any, N} where N; groups::Int64, items::Int64, queue::oneAPI.oneL0.ZeCommandQueue)
   @ oneAPI ~/.julia/packages/oneAPI/PolvD/src/compiler/execution.jl:169
 [2] macro expansion
   @ ~/.julia/packages/oneAPI/PolvD/src/compiler/execution.jl:121 [inlined]                                                         
 [3] call(::oneAPI.HostKernel{my_copy_transpose, Tuple{oneDeviceArray{Float64, 5, 1}, oneDeviceArray{Float64, 5, 1}, Int64, Int64}}, ::oneDeviceArray{Float64, 5, 1}, ::oneDeviceArray{Float64, 5, 1}, ::Int64, ::Int64; call_kwargs::Base.Iterators.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:items,), Tuple{Int64}}})
   @ oneAPI ~/.julia/packages/oneAPI/PolvD/src/compiler/execution.jl:93
 [4] (::oneAPI.HostKernel{my_copy_transpose, Tuple{oneDeviceArray{Float64, 5, 1}, oneDeviceArray{Float64, 5, 1}, Int64, Int64}})(::oneArray{Float64, 5}, ::Vararg{Any, N} where N; kwargs::Base.Iterators.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:items,), Tuple{Int64}}})
   @ oneAPI ~/.julia/packages/oneAPI/PolvD/src/compiler/execution.jl:179
 [5] top-level scope
   @ ~/.julia/packages/oneAPI/PolvD/src/compiler/execution.jl:53

GPU vendor independent API

I'm really excited to see the maturity of the GPU support for Julia, particularly the speed of a working oneAPI implementation while everyone else (RAJA, Kokkos, etc) are still working on getting their implementations working. However, I am surprised that each GPU vendor lib is separate and has it's own slightly different API - this makes life more difficult in the Exascale multi-vendor HPC landscape. Is this being worked on or under consideration?

I've just started looking... so far the array types and broadcasting are nice and vendor independent, but the custom kernel and launch syntax varies a lot, in ways that I don't think are necessary. Sure the underlying runtimes have different names, and there are some subtle differences in SYCL ids vs CUDA ids, but HIP and CUDA are basically identical and I doubt very many people care about the differences in SYCL id naming or functionality.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

Check/mention permissions in initialization error message

I am using a hardware which seems to be supported (Processor with code name Coffee Lake):

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_NUM_PRECOMPILE_TASKS = 8

But I am getting the following error while trying to use oneAPI:

julia> using oneAPI
┌ Error: No compatible oneAPI driver implementation found.
│ Your hardware probably is not supported by any oneAPI driver.
│ 
│ oneAPI.jl currently only supports the Intel Compute runtime,
│ consult their README for a list of compatible hardware:
│ https://github.com/intel/compute-runtime#supported-platforms
└ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/BmdFb/lib/level-zero/oneL0.jl:95

Better error on JLL unavailability (Windows, etc)

(v1.7) pkg> add oneAPI
    Updating registry at `C:\Users\jackn\.julia\registries\General.toml`
   Resolving package versions...
    Updating `D:\Storage\Programming\julia\.julia\environments\v1.7\Project.toml`
  [8f75cd03] + oneAPI v0.2.4
    Updating `D:\Storage\Programming\julia\.julia\environments\v1.7\Manifest.toml`
  [8f75cd03] + oneAPI v0.2.4
  [700fe977] + NEO_jll v22.25.23529+0
  [85f0d8ed] + SPIRV_LLVM_Translator_unified_jll v0.2.0+0
  [6ac6d60f] + SPIRV_Tools_jll v2022.1.0+0
  [09858cae] + gmmlib_jll v22.1.3+0
  [94295238] + libigc_jll v1.0.11378+0
  [f4bc562b] + oneAPI_Level_Zero_Headers_jll v1.4.0+0
  [13eca655] + oneAPI_Level_Zero_Loader_jll v1.8.1+1

julia> oneAPI.versioninfo()
ERROR: UndefVarError: oneAPI not defined
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

julia> using oneAPI
ERROR: InitError: UndefVarError: libze_loader not defined
Stacktrace:
 [1] unsafe_zeInit
   @ C:\Users\jackn\.julia\packages\oneAPI\mHp15\lib\level-zero\libze.jl:1092 [inlined]
 [2] __init__()
   @ oneAPI.oneL0 C:\Users\jackn\.julia\packages\oneAPI\mHp15\lib\level-zero\oneL0.jl:94
 [3] _include_from_serialized(path::String, depmods::Vector{Any})
   @ Base .\loading.jl:768
 [4] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
   @ Base .\loading.jl:854
 [5] _require(pkg::Base.PkgId)
   @ Base .\loading.jl:1097
 [6] require(uuidkey::Base.PkgId)
   @ Base .\loading.jl:1013
 [7] require(into::Module, mod::Symbol)
   @ Base .\loading.jl:997
during initialization of module oneL0

ZeDriver(0) crashes

julia> ZeDriver(0)

signal (11): Segmentation fault
in expression starting at none:0
zeDriverGetProperties at /home/tim/Julia/depot/artifacts/e33fd0d676040934fd312785345ee3d02723a99c/lib/libze_intel_gpu.so.1 (unknown line)
show at /home/tim/Julia/pkg/oneAPI/lib/level-zero/driver.jl:16
unknown function (ip: 0x7f01300c3da8)

It shouldn't be possible to construct a ZeDriver object with a handle derived from an integer (0).

MethodError: unsafe_convert(::Type{PtrOrZePtr{Nothing}}, ::PtrOrZePtr{Nothing}) is ambiguous.

The following code (test.jl):

using BenchmarkTools
using oneAPI
using oneAPI.oneL0
drv = first(drivers());
dev = collect(devices(drv))[1]
device!(dev)

A = rand(2^9, 2^9)
A_d = oneArray(A)   <---- error here!!

@btime A * A
@btime A_d * A_d

function vadd!(c, a, b)
    i = get_global_id()
    @inbounds c[i] = a[i] + b[i]
    return
end

@oneapi items=length(A_d) vadd!(C_d, A_d, B_d)

results in the exception:

Exception has occurred: MethodError
MethodError: unsafe_convert(::Type{PtrOrZePtr{Nothing}}, ::PtrOrZePtr{Nothing}) is ambiguous. Candidates:
  unsafe_convert(::Type{T}, x::T) where T in Base at essentials.jl:414
  unsafe_convert(::Type{PtrOrZePtr{T}}, val) where T in oneAPI.oneL0 at /home/jfq/.julia/packages/oneAPI/1AnU6/lib/level-zero/pointer.jl:117
Possible fix, define
  unsafe_convert(::Type{PtrOrZePtr{T}}, ::PtrOrZePtr{T}) where T

Stacktrace:
 [1] zeContextMakeMemoryResident(hContext::ZeContext, hDevice::ZeDevice, ptr::oneAPI.oneL0.DeviceBuffer, size::Int64)
   @ oneAPI.oneL0 ~/.julia/packages/oneAPI/1AnU6/lib/level-zero/libze.jl:2066
 [2] make_resident(ctx::ZeContext, dev::ZeDevice, buf::oneAPI.oneL0.DeviceBuffer, size::Int64)
   @ oneAPI.oneL0 ~/.julia/packages/oneAPI/1AnU6/lib/level-zero/residency.jl:7
 [3] make_resident(ctx::ZeContext, dev::ZeDevice, buf::oneAPI.oneL0.DeviceBuffer)
   @ oneAPI.oneL0 ~/.julia/packages/oneAPI/1AnU6/lib/level-zero/residency.jl:7
 [4] allocate(#unused#::Type{oneAPI.oneL0.DeviceBuffer}, ctx::ZeContext, dev::ZeDevice, bytes::Int64, alignment::Int64)
   @ oneAPI ~/.julia/packages/oneAPI/1AnU6/src/pool.jl:5
 [5] oneMatrix{Float64, oneAPI.oneL0.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64, Int64})
   @ oneAPI ~/.julia/packages/oneAPI/1AnU6/src/array.jl:44
 [6] oneMatrix{Float64, oneAPI.oneL0.DeviceBuffer}(xs::Matrix{Float64})
   @ oneAPI ~/.julia/packages/oneAPI/1AnU6/src/array.jl:204
 [7] (oneMatrix{Float64})(xs::Matrix{Float64})
   @ oneAPI ~/.julia/packages/oneAPI/1AnU6/src/array.jl:209
 [8] oneArray(A::Matrix{Float64})
   @ oneAPI ~/.julia/packages/oneAPI/1AnU6/src/array.jl:218
 [9] top-level scope
   @ ~/src/juliatest/test.jl:9

I tried this on a fresh install:

julia> VERSION
v"1.8.2"

julia> Pkg.status()
Status `~/.julia/environments/v1.8/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.1
  [8f75cd03] oneAPI v0.3.0

julia> oneAPI.versioninfo()
Binary dependencies:
- NEO_jll: 22.35.24055+0
- libigc_jll: 1.0.11702+0
- gmmlib_jll: 22.1.3+0
- SPIRV_LLVM_Translator_unified_jll: 0.2.0+0
- SPIRV_Tools_jll: 2022.1.0+0

Toolchain:
- Julia: 1.8.2
- LLVM: 13.0.1

1 driver:
- 00000000-0000-0000-1722-26f401030000 (v1.3.0, API v1.3.0)

1 device:
- Intel(R) Graphics [0x46a6]

possible miscompilation on Tiger Lake Xe

version : oneAPI v0.2.2 (edit: reproduced with master)
code:

using oneAPI
arr = oneArray(ones(UInt64, 260));
print(arr .!= 0)

result:
Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
The result is wrong on index 29, 49, 69, 89, 109, 129, 149, 169, 189, 209, 229, 249.
Same thing happens on exact same index for similar code, such as

arr = oneArray(zeros(UInt64, 260));
print(arr .== 0)

or

arr = oneArray{UInt64}(1:260);
print(arr .!= 0)

this does not happen when 260 is changed to a smaller number.

malloc and exceptions

Those are the two remaining bits that will unlock a lot of functionality:

julia> exp.(oneArray(rand(ComplexF32, 2)))
ERROR: InvalidIRError: compiling kernel broadcast_kernel(oneAPI.oneKernelContext, oneDeviceVector{ComplexF32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(exp), Tuple{Base.Broadcast.Extruded{oneDeviceVector{ComplexF32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)
Stacktrace:
  [1] malloc
    @ ~/Julia/pkg/GPUCompiler/src/runtime.jl:88
  [2] macro expansion
    @ ~/Julia/pkg/GPUCompiler/src/runtime.jl:185
  [3] box
    @ ~/Julia/pkg/GPUCompiler/src/runtime.jl:174
  [4] box_float32
    @ ~/Julia/pkg/GPUCompiler/src/runtime.jl:214
  [5] sincos_domain_error
    @ special/trig.jl:164
  [6] sincos
    @ special/trig.jl:181
  [7] exp
    @ complex.jl:655
  [8] _broadcast_getindex_evalf
    @ broadcast.jl:648
  [9] _broadcast_getindex
    @ broadcast.jl:621
 [10] getindex
    @ broadcast.jl:575
 [11] broadcast_kernel
    @ ~/Julia/depot/packages/GPUArrays/WV76E/src/host/broadcast.jl:62

Double type is not supported on this platform

I'm trying oneAPI in Julia 1.6.0 running on a Linux 5.4.0 machine with an Intel Core i7-1065G7 (with iGPU Intel Iris Plus G7). As suggested by README, I ran the "smoke-test" and everything worked well (got ZeDevice(GPU, vendor 0x8086, device 0x8a52): Intel(R) Graphics [0x8a52]).

However, when I run ] test oneAPI I get the same error multiple times:

┌ Error: Module compilation failed:
│ 
│ error: double type is not supported on this platform
│ error: backend compiler failed build.
└ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/yKhRw/lib/level-zero/module.jl:49

The tests that fail with this error are the following:

  • execution
  • level-zero
  • device/intrinsics
  • gpuarrays/math
  • gpuarrays/indexing scalar
  • gpuarrays/value constructors
  • gpuarrays/indexing multidimensional
  • gpuarrays/iterator constructors
  • gpuarrays/linear algebra
  • gpuarrays/random
  • gpuarrays/base
  • gpuarrays/mapreduce essentials
  • gpuarrays/broadcasting

and probably more that I didn't wait to see.

This simple code also throw the same error:

julia> a = oneArray(rand(2,2))
2×2 oneArray{Float64, 2}:
 0.28361    0.367083
 0.0142196  0.783932

julia> a .+ 1
┌ Error: Module compilation failed:
│ 
│ error: double type is not supported on this platform
│ error: backend compiler failed build.
└ @ oneAPI.oneL0 ~/.julia/packages/oneAPI/yKhRw/lib/level-zero/module.jl:49

InitError on macOS

I just wanted to try out this package (the tagged release 0.1.0) but I saw this InitError on using it.

julia> using oneAPI
[ Info: Precompiling oneAPI [8f75cd03-7ff8-4ecb-9b8f-daf728133b1b]
ERROR: InitError: UndefVarError: libze_loader not defined
Stacktrace:
 [1] unsafe_zeInit at /Users/crstnbr/.julia/packages/oneAPI/UcBkX/lib/level-zero/libze.jl:6 [inlined]
 [2] __init__() at /Users/crstnbr/.julia/packages/oneAPI/UcBkX/lib/level-zero/oneL0.jl:36
 [3] _include_from_serialized(::String, ::Array{Any,1}) at ./loading.jl:697
 [4] _require_from_serialized(::String) at ./loading.jl:749
 [5] _require(::Base.PkgId) at ./loading.jl:1040
 [6] require(::Base.PkgId) at ./loading.jl:928
 [7] require(::Module, ::Symbol) at ./loading.jl:923
during initialization of module oneL0

I'm on Mac OS Catalina 10.15.7 and here is the Julia info:

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

Segfault, due to early memory release?

This was noticed on Intel devcloud, with Julia 1.7-rc1, and the tb/onemkl branch

julia> x1 = oneArray(rand(576, 64, 2));

julia> x2 = oneArray(rand(576, 64, 36));

julia> function trial(x1, x2)
         res = similar(x1, size(x1, 1), size(x2, 3), size(x1, 3))
         x1_p = permutedims(x1, (1, 3, 2))
         for i = 1:size(res, 1)

           res[i,:,:] = permutedims(x1_p[i, :, :] * x2[i,:,:], (2,1))
         end
         res
       end

The MWE isn't great since it isn't written with GPUs in mind, but calling this function a couple times in a REPL seems to OOM, or in another case, crash Julia.

Stacktrace on crash
signal (11): Segmentation fault
in expression starting at REPL[21]:1
_ZN3NEO13DrmAllocation15makeBOsResidentEPNS_9OsContextEjPSt6vectorIPNS_12BufferObjectESaIS5_EEb at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_9SKLFamilyEE16processResidencyERKSt6vectorIPNS_18GraphicsAllocationESaIS5_EEj at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_9SKLFamilyEE13flushInternalERKNS_11BatchBufferERKSt6vectorIPNS_18GraphicsAllocationESaIS8_EE at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_9SKLFamilyEE5flushERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS7_EE at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO21CommandStreamReceiver17submitBatchBufferERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS5_EE at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN2L015CommandQueueImp17submitBatchBufferEmRSt6vectorIPN3NEO18GraphicsAllocationESaIS4_EEPv at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN2L014CommandQueueHwIL14GFXCORE_FAMILY12EE19executeCommandListsEjPP25_ze_command_list_handle_tP18_ze_fence_handle_tb at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN9_pi_queue18executeCommandListEP25_ze_command_list_handle_tP18_ze_fence_handle_tP9_pi_eventbb at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libpi_level_zero.so (unknown line)
piEnqueueKernelLaunch at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libpi_level_zero.so (unknown line)
_ZN2cl4sycl6detail13ExecCGCommand24SetKernelParamsAndLaunchEPNS1_12CGExecKernelEP10_pi_kernelRNS1_8NDRDescTERSt6vectorIP9_pi_eventSaISB_EERSB_S9_IbSaIbEE at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl6detail13ExecCGCommand10enqueueImpEv at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl6detail7Command7enqueueERNS1_14EnqueueResultTENS1_9BlockingTE at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl6detail9Scheduler5addCGESt10unique_ptrINS1_2CGESt14default_deleteIS4_EESt10shared_ptrINS1_10queue_implEE at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl7handler8finalizeEv at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl6detail10queue_impl11submit_implERKSt8functionIFvRNS0_7handlerEEERKSt10shared_ptrIS2_ERKNS1_13code_locationE at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl5queue11submit_implESt8functionIFvRNS0_7handlerEEERKNS0_6detail13code_locationE at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN6oneapi3mkl3gpu20launch_kernel_3D_usmEPiPN2cl4sycl5queueEPNS4_6kernelEP18mkl_gpu_argument_tPmSB_P20mkl_gpu_event_list_t at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl3gpu28mkl_blas_gpu_launch_s_nocopyEPiPN2cl4sycl5queueEPNS4_6kernelEPK17gpu_driver_info_tljbb17gpu_update_type_tllllPcSD_SD_mmmlllPfSE_bP20mkl_gpu_event_list_t at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl3gpu37mkl_blas_gpu_sgemm_nocopy_driver_syclEPiPN2cl4sycl5queueEP14blas_arg_usm_tP20mkl_gpu_event_list_t at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl3gpu30mkl_blas_gpu_sgemm_driver_syclEPiPN2cl4sycl5queueEP14blas_arg_usm_tP20mkl_gpu_event_list_t at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl3gpu10sgemm_syclEPN2cl4sycl5queueE10MKL_LAYOUT13MKL_TRANSPOSES7_lllfPKflS9_lfPflRKSt6vectorINS3_5eventESaISC_EElll at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl4blas5sgemmERN2cl4sycl5queueE10MKL_LAYOUTNS0_9transposeES7_lllfPKflS9_lfPflRKSt6vectorINS3_5eventESaISC_EE at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl4blas12column_major4gemmERN2cl4sycl5queueENS0_9transposeES7_lllfPKflS9_lfPflRKSt6vectorINS4_5eventESaISC_EE at /glob/development-tools/versions/oneapi/2021.3/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
onemklSgemm at /home/u91328/.julia/packages/oneAPI/ufpSE/deps/liboneapilib.so (unknown line)
onemklSgemm at /home/u91328/.julia/packages/oneAPI/ufpSE/lib/mkl/libonemkl.jl:11
unknown function (ip: 0x7f4b70681546)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
gemm! at /home/u91328/.julia/packages/oneAPI/ufpSE/lib/mkl/wrappers.jl:57
gemm_dispatch! at /home/u91328/.julia/packages/oneAPI/ufpSE/lib/mkl/linalg.jl:45
mul! at /home/u91328/.julia/packages/oneAPI/ufpSE/lib/mkl/linalg.jl:54 [inlined]
mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:275 [inlined]
* at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:160 [inlined]
Dense at /home/u91328/.julia/packages/Flux/Zz9RI/src/layers/basic.jl:148
Dense at /home/u91328/.julia/packages/Flux/Zz9RI/src/layers/basic.jl:151 [inlined]
applychain at /home/u91328/.julia/packages/Flux/Zz9RI/src/layers/basic.jl:37 [inlined]
Chain at /home/u91328/.julia/packages/Flux/Zz9RI/src/layers/basic.jl:39
unknown function (ip: 0x7f4b706812a2)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1790 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:166 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:587
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:244
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:229
#run_repl#47 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:362
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:349
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
#929 at ./client.jl:394
jfptr_YY.929_45627.clone_1 at /home/u91328/julia-1.7.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1790 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:757
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
run_main_repl at ./client.jl:379
exec_options at ./client.jl:309
_start at ./client.jl:495
jfptr__start_29515.clone_1 at /home/u91328/julia-1.7.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1790 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at /buildworker/worker/package_linux64/build/cli/loader_exe.c:42
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/u91328/julia-1.7.0-rc1/bin/julia (unknown line)
Allocations: 109211129 (Pool: 109175009; Big: 36120); GC: 89
Segmentation fault
Stacktrace on crash
signal (11): Segmentation fault
in expression starting at REPL[6]:1
_ZN3NEO13DrmAllocation15makeBOsResidentEPNS_9OsContextEjPSt6vectorIPNS_12BufferObjectESaIS5_EEb at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_9SKLFamilyEE16processResidencyERKSt6vectorIPNS_18GraphicsAllocationESaIS5_EEj at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_9SKLFamilyEE13flushInternalERKNS_11BatchBufferERKSt6vectorIPNS_18GraphicsAllocationESaIS8_EE at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO24DrmCommandStreamReceiverINS_9SKLFamilyEE5flushERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS7_EE at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN3NEO21CommandStreamReceiver17submitBatchBufferERNS_11BatchBufferERSt6vectorIPNS_18GraphicsAllocationESaIS5_EE at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN2L015CommandQueueImp17submitBatchBufferEmRSt6vectorIPN3NEO18GraphicsAllocationESaIS4_EEPv at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN2L014CommandQueueHwIL14GFXCORE_FAMILY12EE19executeCommandListsEjPP25_ze_command_list_handle_tP18_ze_fence_handle_tb at /home/u91328/.julia/artifacts/0425153072013ce35c4d657144ad692b7bd55efb/lib64/libze_intel_gpu.so.1 (unknown line)
_ZN9_pi_queue18executeCommandListEP25_ze_command_list_handle_tP18_ze_fence_handle_tP9_pi_eventbb at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libpi_level_zero.so (unknown line)
piEnqueueKernelLaunch at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libpi_level_zero.so (unknown line)
_ZN2cl4sycl6detail13ExecCGCommand24SetKernelParamsAndLaunchEPNS1_12CGExecKernelEP10_pi_kernelRNS1_8NDRDescTERSt6vectorIP9_pi_eventSaISB_EERSB_S9_IbSaIbEE at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl6detail13ExecCGCommand10enqueueImpEv at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl6detail7Command7enqueueERNS1_14EnqueueResultTENS1_9BlockingTE at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl6detail9Scheduler5addCGESt10unique_ptrINS1_2CGESt14default_deleteIS4_EESt10shared_ptrINS1_10queue_implEE at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl7handler8finalizeEv at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl6detail10queue_impl11submit_implERKSt8functionIFvRNS0_7handlerEEERKSt10shared_ptrIS2_ERKNS1_13code_locationE at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN2cl4sycl5queue11submit_implESt8functionIFvRNS0_7handlerEEERKNS0_6detail13code_locationE at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/compiler/2021.3.0/linux/lib/libsycl.so.5 (unknown line)
_ZN6oneapi3mkl3gpu20launch_kernel_3D_usmEPiPN2cl4sycl5queueEPNS4_6kernelEP18mkl_gpu_argument_tPmSB_P20mkl_gpu_event_list_t at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl3gpu28mkl_blas_gpu_launch_d_nocopyEPiPN2cl4sycl5queueEPNS4_6kernelEPK17gpu_driver_info_tljbb17gpu_update_type_tllllPcSD_SD_mmmlllPdSE_bP20mkl_gpu_event_list_t at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl3gpu37mkl_blas_gpu_dgemm_nocopy_driver_syclEPiPN2cl4sycl5queueEP14blas_arg_usm_tP20mkl_gpu_event_list_t at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl3gpu30mkl_blas_gpu_dgemm_driver_syclEPiPN2cl4sycl5queueEP14blas_arg_usm_tP20mkl_gpu_event_list_t at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl3gpu10dgemm_syclEPN2cl4sycl5queueE10MKL_LAYOUT13MKL_TRANSPOSES7_llldPKdlS9_ldPdlRKSt6vectorINS3_5eventESaISC_EElll at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl4blas5dgemmERN2cl4sycl5queueE10MKL_LAYOUTNS0_9transposeES7_llldPKdlS9_ldPdlRKSt6vectorINS3_5eventESaISC_EE at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
_ZN6oneapi3mkl4blas12column_major4gemmERN2cl4sycl5queueENS0_9transposeES7_llldPKdlS9_ldPdlRKSt6vectorINS4_5eventESaISC_EE at /glob/development-tools/versions/oneapi/2021.3.0/inteloneapi/mkl/2021.3.0/lib/intel64/libmkl_sycl.so.1 (unknown line)
onemklDgemm at /home/u91328/.julia/packages/oneAPI/ufpSE/deps/liboneapilib.so (unknown line)
onemklDgemm at /home/u91328/.julia/packages/oneAPI/ufpSE/lib/mkl/libonemkl.jl:20
unknown function (ip: 0x7fe5719cf7d9)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
gemm! at /home/u91328/.julia/packages/oneAPI/ufpSE/lib/mkl/wrappers.jl:57
gemm_dispatch! at /home/u91328/.julia/packages/oneAPI/ufpSE/lib/mkl/linalg.jl:45
mul! at /home/u91328/.julia/packages/oneAPI/ufpSE/lib/mkl/linalg.jl:54 [inlined]
mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:275 [inlined]
* at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/matmul.jl:160 [inlined]
trial at ./REPL[3]:6
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1790 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:166 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:587
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:244
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:229
#run_repl#47 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:362
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:349
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
#929 at ./client.jl:394
jfptr_YY.929_45627.clone_1 at /home/u91328/julia-1.7.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1790 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:757
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
run_main_repl at ./client.jl:379
exec_options at ./client.jl:309
_start at ./client.jl:495
jfptr__start_29515.clone_1 at /home/u91328/julia-1.7.0-rc1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2245 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2427
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1790 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at /buildworker/worker/package_linux64/build/cli/loader_exe.c:42
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/u91328/julia-1.7.0-rc1/bin/julia (unknown line)
Allocations: 65611811 (Pool: 65591135; Big: 20676); GC: 54
Segmentation fault

Wrong results when LHS variable is used on RHS with a barrier synchronization

Below is an MWE. The result should be [0.05, 0.0, 0.0, 0.0] whereas oneAPI returns [0.05, -0.05, 0.05, -0.05]. I checked the same code plus some porting in CUDA.jl where it yields the correct values.

Maybe I'm using something wrong? A system issue?

My setup is Julia 1.6.5 on a i7-8700 with integrated graphics.

using oneAPI
using Test

n = 4
nblk = 1

r = [0.1, 0.1, 0.1, 0.1]

function foo_cpu(n::Int, r::Vector{Float64})
    # Solve L*x = r and store the result in r.

    @inbounds for j=1:n
        temp = r[j]/2.0
        for k=j+1:n
            r[k] = r[k] - 2.0*temp
        end
        r[j] = temp
    end
end

function foo_gpu(::Val{n},r_) where {n}
     tx = get_local_id()
     bx = get_group_id()

     r = oneLocalArray(Float64, 4)
     
     r[tx] = r_[tx]

     barrier()

     @inbounds for j=1:n
          if tx == 1
               r[j] = r[j] / 2.0
          end
          barrier()

          if tx > j && tx <= 4
               r[tx] = r[tx] - 2.0*r[j]
          end
          barrier()
     end

     if bx == 1
          r_[tx] = r[tx]
     end

     barrier()

     return nothing
end

dr = oneArray(r)
@oneapi items=(n, nblk) groups=nblk foo_gpu(Val{n}(),dr)
hr = copy(r)
foo_cpu(n,hr)
@show Array(dr)
@show hr
@test all(Array(dr) .== hr)

Extremely slow reduction

On a 1024x1024 Float32 matrix:

julia> @benchmark sum($a)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  84.734 μs … 228.946 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     85.332 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   87.917 μs ±   7.545 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▃▂▄▁▁                                                      ▁
  ████████▇██▇▇█▇▆█████▇██████▇▆▇▆▆▅▆▆▆▆▆▆▆▆▅▅▅▆▆▄▇█▆▅▅▅▆▄▄▃▅▄ █
  84.7 μs       Histogram: log(frequency) by time       120 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sum($d_a)
BenchmarkTools.Trial: 618 samples with 1 evaluation.
 Range (min … max):  6.966 ms …   9.740 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.047 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.079 ms ± 602.087 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▁▁▄▃▄  ▁ ▁▇ ▅▄█▃▄            ▂ ▁ ▃▁▇▁ ▁ ▁▁  ▃           
  ▂▂▃▃▄█████▇▇█▇██▇█████▄▇▆▆▃█▅▇▇▆▆███████████▅█████▃▄▃▆▂▅▄▂▃ ▅
  6.97 ms         Histogram: frequency by time        9.28 ms <

 Memory estimate: 27.41 KiB, allocs estimate: 509.

It scales, so this is probably the kernel being bad:

julia> d_a = oneArray(rand(Float32, 4096, 4096));

julia> a = rand(Float32, 4096, 4096);

julia> @benchmark sum($a)
BenchmarkTools.Trial: 1682 samples with 1 evaluation.
 Range (min … max):  2.918 ms …  3.185 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.964 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.967 ms ± 25.760 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▅  █      ▆  ▃                                    
  ▂▁▁▃▂▂▇▅▃▇█▃▃█▃▂██▃██▄▄█▇▃▄▇▂▃▇▃▂▅▅▂▃▄▂▂▃▂▂▃▃▂▂▃▂▁▃▂▂▂▂▁▁▂ ▃
  2.92 ms        Histogram: frequency by time        3.05 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sum($d_a)
BenchmarkTools.Trial: 45 samples with 1 evaluation.
 Range (min … max):  112.776 ms … 113.728 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     113.151 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   113.186 ms ± 218.961 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁       ▁       ▁ ▄   ▄█▁ ▄▁       ▁▁    ▁                     
  █▁▁▆▆▁▁▁█▆▁▁▁▁▁▁█▆█▆▁▆███▁██▁▁▁▆▆▆▆██▁▁▁▆█▆▁▁▁▁▆▁▁▆▁▁▁▁▁▆▁▁▁▆ ▁
  113 ms           Histogram: frequency by time          114 ms <

 Memory estimate: 28.75 KiB, allocs estimate: 516.

`res/local.jl` Inconsistency in commandline options with multiple vendor libraries installed

Thanks for this script.

When running it on a system with both Intel CPU and GPUs, I get the following output:

: CommandLine Error: Option 'enable-nonnull-arg-prop' registered more than once!                                                                                                                                                                                                        LLVM ERROR: inconsistency in registered CommandLine options                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     signal (6): Aborted

This seems related to https://bugs.llvm.org/show_bug.cgi?id=30587

On such a system you can trigger by loading any runtime library:

julia -e 'using Libdl; Libdl.find_library("libigc")'   

Output:

: CommandLine Error: Option 'enable-nonnull-arg-prop' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

signal (6): Aborted
in expression starting at none:1
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /home/mschanen/julia/julia-1.6.5/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm18report_fatal_errorEPKcb at /home/mschanen/julia/julia-1.6.5/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN12_GLOBAL__N_117CommandLineParser9addOptionEPN4llvm2cl6OptionEPNS2_10SubCommandE at /home/mschanen/julia/julia-1.6.5/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm2cl6Option11addArgumentEv at /home/mschanen/julia/julia-1.6.5/bin/../lib/julia/libLLVM-11jl.so (unknown line)
unknown function (ip: 0x7feccab0e567)
call_init.part.0 at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_init at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_catch_exception at /lib64/libc.so.6 (unknown line)
dl_open_worker at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_catch_exception at /lib64/libc.so.6 (unknown line)
_dl_open at /lib64/ld-linux-x86-64.so.2 (unknown line)
dlopen_doit at /lib64/libdl.so.2 (unknown line)
_dl_catch_exception at /lib64/libc.so.6 (unknown line)
_dl_catch_error at /lib64/libc.so.6 (unknown line)
_dlerror_run at /lib64/libdl.so.2 (unknown line)
dlopen at /lib64/libdl.so.2 (unknown line)
jl_load_dynamic_library at /buildworker/worker/package_linux64/build/src/dlload.c:264
#dlopen#3 at ./libdl.jl:114
dlopen##kw at ./libdl.jl:114 [inlined]
find_library at ./libdl.jl:203
find_library at ./libdl.jl:211 [inlined]
find_library at ./libdl.jl:211
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:115
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:204
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:155 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:562
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:670
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:877
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:825
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:825
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:929
eval at ./boot.jl:360 [inlined]
exec_options at ./client.jl:261
_start at ./client.jl:485
jfptr__start_38548.clone_1 at /home/mschanen/julia/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:560
repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:702
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
main at /buildworker/worker/package_linux64/build/cli/loader_exe.c:42
Allocations: 2649 (Pool: 2639; Big: 10); GC: 0
[1]    50921 abort      julia -e 'using Libdl; Libdl.find_library("libigc")'

mapreduce failure with CartesianIndex values

The following findfirst-like kernel fails when using CartesianIndices:

using oneAPI

function doit(xs)
    # works
    indices, dummy_index = eachindex(xs), 1

    # fails
    #indices, dummy_index = CartesianIndices(xs), CartesianIndex{ndims(xs)}()

    # given two pairs of (istrue, index), return the one with the smallest index
    function findfirst_reduction(t1, t2)
        (x, i), (y, j) = t1, t2
        if i > j
            t1, t2 = t2, t1
            (x, i), (y, j) = t1, t2
        end
        x && return t1
        y && return t2
        return (false, dummy_index)
    end

    res = mapreduce(tuple, findfirst_reduction, xs, indices;
                    init = (false, dummy_index))
    return res[1] ? res[2] : nothing
end

function main()
    x = [false false; true false]
    @show i = doit(oneArray(x))
    @assert i !== nothing
    @assert x[i]
end

liboneapi_support: double pointer support required

@maleadt liboneapi_support library automation needs enhancement to support case explained below.

If prototype of a function in onemkl.h is following,

void onemklSgemmBatched(syclQueue_t device_queue, onemklTranspose transa, onemklTranspose transb, int64_t m, int64_t n, int64_t k, float alpha, const float **a, int64_t lda, const float **b, int64_t ldb, float beta, float **c, int64_t ldc, int64_t group_count);

When I try to generate liboneapi_support.jl using wrap.jl I get following output,

function onemklSgemmBatched(device_queue, transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc, group_count) @ccall liboneapi_support.onemklSgemmBatched(device_queue::syclQueue_t, transa::onemklTranspose, transb::onemklTranspose, m::Int64, n::Int64, k::Int64, alpha::Cfloat, a::Ptr{Ptr{Cfloat}}, lda::Int64, b::Ptr{Ptr{Cfloat}}, ldb::Int64, beta::Cfloat, c::Ptr{Ptr{Cfloat}}, ldc::Int64, group_count::Int64)::Cvoid

Here a::Ptr{Ptr{Cfloat}} is generated where ideally I need ZePtr{Ptr{Cfloat}} .

ERROR: InvalidIRError: compiling kernel broadcast_kernel; Reason unsupported dynamic function invocation

Reason: unsupported dynamic function invocation (call to emit_printf(::Val{fmt}, argspec...) where fmt in oneAPI at /home/chriselrod/.julia/packages/oneAPI/bEvNc/src/device/opencl/printf.jl:27)

julia> using oneAPI

julia> a = oneArray(rand(2,2))
2×2 oneArray{Float64, 2}:
 0.94417   0.947858
 0.463884  0.31073

julia> a .+ 1
WARNING: both LLVM and ExprTools export "parameters"; uses of it in module oneAPI must be qualified
ERROR: InvalidIRError: compiling kernel broadcast_kernel(oneAPI.oneKernelContext, oneDeviceMatrix{Float64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{oneDeviceMatrix{Float64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Int64}}, Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to emit_printf(::Val{fmt}, argspec...) where fmt in oneAPI at /home/chriselrod/.julia/packages/oneAPI/bEvNc/src/device/opencl/printf.jl:27)
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/oneAPI/bEvNc/src/device/opencl/printf.jl:129
  [2] _print
    @ ~/.julia/packages/oneAPI/bEvNc/src/device/opencl/printf.jl:91
  [3] macro expansion
    @ ~/.julia/packages/oneAPI/bEvNc/src/device/opencl/printf.jl:178
  [4] throw_boundserror
    @ ~/.julia/packages/oneAPI/bEvNc/src/device/quirks.jl:3
  [5] getindex
    @ range.jl:702
  [6] _broadcast_getindex_evalf
    @ broadcast.jl:648
  [7] _broadcast_getindex
    @ broadcast.jl:621
  [8] #19
    @ broadcast.jl:1098
  [9] ntuple
    @ ntuple.jl:49
 [10] copy
    @ broadcast.jl:1098
 [11] materialize
    @ broadcast.jl:883
 [12] getindex
    @ multidimensional.jl:353
 [13] _getindex
    @ abstractarray.jl:1209
 [14] getindex
    @ abstractarray.jl:1170
 [15] macro expansion
    @ ~/.julia/packages/GPUArrays/8dzSJ/src/device/indexing.jl:81
 [16] broadcast_kernel
    @ ~/.julia/packages/GPUArrays/8dzSJ/src/host/broadcast.jl:58
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.SPIRVCompilerTarget, oneAPI.oneAPICompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#16", Tuple{oneAPI.oneKernelContext, oneDeviceMatrix{Float64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{oneDeviceMatrix{Float64, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Int64}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/QjFdA/src/validation.jl:111
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/QjFdA/src/driver.jl:319 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/ZQ0rt/src/TimerOutput.jl:236 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/QjFdA/src/driver.jl:317 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/QjFdA/src/utils.jl:62
  [6] zefunction_compile(job::GPUCompiler.CompilerJob)
    @ oneAPI ~/.julia/packages/oneAPI/bEvNc/src/compiler/execution.jl:131
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(oneAPI.zefunction_compile), linker::typeof(oneAPI.zefunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/QjFdA/src/cache.jl:89
  [8] zefunction(f::Function, tt::Type; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ oneAPI ~/.julia/packages/oneAPI/bEvNc/src/compiler/execution.jl:122
  [9] macro expansion
    @ ~/.julia/packages/oneAPI/bEvNc/src/compiler/execution.jl:34 [inlined]
 [10] #gpu_call#77
    @ ~/.julia/packages/oneAPI/bEvNc/src/gpuarrays.jl:29 [inlined]
 [11] gpu_call(::GPUArrays.var"#broadcast_kernel#16", ::oneArray{Float64, 2}, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Extruded{oneArray{Float64, 2}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Int64}}, ::Int64; target::oneArray{Float64, 2}, total_threads::Nothing, threads::Int64, blocks::Int64, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/device/execution.jl:67
 [12] copyto!
    @ ~/.julia/packages/GPUArrays/8dzSJ/src/host/broadcast.jl:65 [inlined]
 [13] copyto!
    @ ./broadcast.jl:936 [inlined]
 [14] copy
    @ ~/.julia/packages/GPUArrays/8dzSJ/src/host/broadcast.jl:47 [inlined]
 [15] materialize(bc::Base.Broadcast.Broadcasted{oneAPI.oneArrayStyle{2}, Nothing, typeof(+), Tuple{oneArray{Float64, 2}, Int64}})
    @ Base.Broadcast ./broadcast.jl:883
 [16] top-level scope
    @ REPL[3]:1

julia> versioninfo()
Julia Version 1.6.3-pre.1
Commit 7c45ff0e94* (2021-07-16 20:20 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, tigerlake)
Environment:
  JULIA_NUM_THREADS = 8

julia> using oneAPI.oneL0

julia> drv = first(drivers());

julia> dev = first(devices(drv))
ZeDevice(GPU, vendor 0x8086, device 0x9a49): Intel(R) Iris(R) Xe Graphics [0x9a49]

julia> compute_properties(dev)
(maxTotalGroupSize = 512, maxGroupSizeX = 512, maxGroupSizeY = 512, maxGroupSizeZ = 512, maxGroupCountX = 4294967295, maxGroupCountY = 4294967295, maxGroupCountZ = 4294967295, maxSharedLocalMemory = 65536, subGroupSizes = (8, 16, 32))

julia> ctx = ZeContext(drv);

julia> queue = ZeCommandQueue(ctx, dev);

julia> execute!(queue) do list
         append_barrier!(list)
       end

julia> function kernel()
         barrier()
         return
       end
kernel (generic function with 1 method)

julia> @oneapi items=1 kernel()

julia> @device_code_llvm @oneapi items=1 kernel()
; CompilerJob of kernel kernel() for GPUCompiler.SPIRVCompilerTarget
define spir_kernel void @_Z17julia_kernel_4443() local_unnamed_addr {
entry:
;  @ REPL[18]:2 within `kernel'
; ┌ @ /home/chriselrod/.julia/packages/oneAPI/bEvNc/src/device/opencl/synchronization.jl:9 within `barrier' @ /home/chriselrod/.julia/packages/oneAPI/bEvNc/src/device/opencl/synchronization.jl:9
; │┌ @ /home/chriselrod/.julia/packages/oneAPI/bEvNc/src/device/utils.jl:47 within `macro expansion'
    call void @_Z7barrierj(i32 0)
    ret void
; └└
}

Build support library on Yggdrasil

Initial recipe:

using BinaryBuilder, Pkg

name = "oneAPISupport"
version = v"2022.1.0"

non_reg_ARGS = filter(arg -> arg != "--register", ARGS)

generic_sources = [
    GitSource("https://github.com/JuliaGPU/oneAPI.jl", "0b33abf16ef3893a96d49d9d12b54f661752c7b3")
]

platform_sources = Dict(
    # https://conda.anaconda.org/intel/linux-64
    Platform("x86_64", "linux"; libc="glibc") => [
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/dpcpp-cpp-rt-2022.1.0-intel_3768.tar.bz2",
            "1472da83f109dbead10835e49d204035272b9727eb71863e5a64688e13e6bacf";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/dpcpp_impl_linux-64-2022.1.0-intel_3768.tar.bz2",
            "4ff8f0a0c482aa6ffeb419fe9d0d38a697d2db8d86e65ca499f47d5d68747436";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/dpcpp_linux-64-2022.1.0-intel_3768.tar.bz2",
            "96a13c1fb673bcb0b6b0ddb6c436312113292d7ea21a55395a7efa34e70af0b1";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/icc_rt-2022.1.0-intel_3768.tar.bz2",
            "b81f4838a930d08edec2aab4d3eebd89ce3b321ca602792bcc9433926836da07";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/intel-cmplr-lib-rt-2022.1.0-intel_3768.tar.bz2",
            "8c86ea88d46cb13b3b537203e15fc6e6ec2d803b7bd0bde8561d347b18ba426e";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/intel-cmplr-lic-rt-2022.1.0-intel_3768.tar.bz2",
            "fd3b6a0e75f06b1bf22b070a7b61b09d2a3e9d9e01a64b60b746b35f45681acb";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/intel-opencl-rt-2022.1.0-intel_3768.tar.bz2",
            "f4086002b4d5699dea78659777e412ef6c6ea2fa1d3984d135848f0b75144b81";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/intel-openmp-2022.1.0-intel_3768.tar.bz2",
            "498dc37ce1bd513f591b633565151c4de8f11a12914814f2bf85afebbd35ee23";
        ),

        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/libgcc-ng-9.3.0-hdf63c60_101.tar.bz2",
            "bd735039588da538ecb09ab5dc1819d1bd4a8dedc520b85d5ff1ea2d94c42603";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/libstdcxx-ng-9.3.0-hdf63c60_101.tar.bz2",
            "63d1298a60509ad37ea6eba6d01e950ba53398ce013323a15cf9ca1811404665";
        ),

        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/mkl-2022.1.0-intel_223.tar.bz2",
            "31c225ce08d3dc129f0881e5d36a1ef0ba8dc9fdc0e168397c2ac144d5f0bf54";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/mkl-devel-2022.1.0-intel_223.tar.bz2",
            "4e014e6ac31e8961f09c937b66f53d2c0d75f074f39abfa9f378f4659ed2ecbb";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/mkl-devel-dpcpp-2022.1.0-intel_223.tar.bz2",
            "25e38a5466245ce289c77a4bb1c38d26d3a4ec762b0207f6f03af361a3529322";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/mkl-dpcpp-2022.1.0-intel_223.tar.bz2",
            "79af3aa775168128054d8e2cb04717fea55b1779885d3472286106e1f24d0fc4";
        ),
        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/mkl-include-2022.1.0-intel_223.tar.bz2",
            "704e658a9b25a200f8035f3d0a8f2e094736496a2169f87609f1cfed2e2eb0a9";
        ),

        ArchiveSource(
            "https://conda.anaconda.org/intel/linux-64/tbb-2021.6.0-intel_835.tar.bz2",
            "ce47c1d22829cdd8c04b050acf2003082607e5a954bcbf486a7639045b411e5e";
        ),
    ]
)

script = raw"""
install_license "licensing/compiler/Intel Developer Tools EULA"

mkdir -p ${libdir} ${includedir}
mv lib/clang/*/include/CL ${includedir}
rm -rf lib/clang
cp -r lib/* ${libdir}
cp -r include/* ${includedir}

cd oneAPI.jl/deps
sed -i "/CMAKE_CXX_COMPILER/d" CMakeLists.txt

CMAKE_FLAGS=()
cmake -B build -S . -GNinja ${CMAKE_FLAGS[@]}

ninja -C build -j ${nproc} install
"""

# The products that we will ensure are always built
products = [
    LibraryProduct(["liboneapi_support"], :liboneapi_support),
]

# Dependencies that must be installed before this package can be built
dependencies = [
    BuildDependency("oneAPI_Level_Zero_Headers_jll")
]

non_reg_ARGS = filter(arg -> arg != "--register", ARGS)
include("../../fancy_toys.jl")
filter!(platform_sources) do (platform, sources)
    should_build_platform(triplet(platform))
end

for (idx, (platform, sources)) in enumerate(platform_sources)
    # Use "--register" only on the last invocation of build_tarballs
    if idx < length(platform_sources)
        args = non_reg_ARGS
    else
        args = ARGS
    end
    build_tarballs(args, name, version, [generic_sources; sources], script, [platform],
                   products, dependencies; preferred_gcc_version = v"8")
end

Note that we're using GCC here instead of DPCPP, which we can't execute in the cross environment. Also, we're downloading too many dependencies, but I'm just mimicking what Conda does here for starters.

This runs into https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/oneMKL-declaration-of-layout-changes-meaning-of-layout/m-p/1257515/highlight/true:

[2/4] Building CXX object CMakeFiles/oneapilib.dir/sycl.cpp.o
ninja: job failed: /opt/bin/x86_64-linux-gnu-libgfortran5-cxx11/c++ -Doneapilib_EXPORTS  -fPIC -std=gnu++17 -MD -MT CMakeFiles/oneapilib.dir/onemkl.cpp.o -MF CMakeFiles/oneapilib.dir/onemkl.cpp.o.d -o CMakeFiles/oneapilib.dir/onemkl.cpp.o -c /workspace/srcdir/oneAPI.jl/deps/onemkl.cpp
ninja: subcommand failed
In file included from /workspace/x86_64-linux-gnu-libgfortran5-cxx11/destdir/include/oneapi/mkl.hpp:32,
                 from /workspace/srcdir/oneAPI.jl/deps/onemkl.cpp:4:
/workspace/x86_64-linux-gnu-libgfortran5-cxx11/destdir/include/oneapi/mkl/stats.hpp:71:29: error: declaration of ‘constexpr const oneapi::mkl::stats::layout layout’ changes meaning of ‘layout’ [-fpermissive]
   71 |     static constexpr layout layout = ObservationsLayout;
      |                             ^~~~~~
/workspace/x86_64-linux-gnu-libgfortran5-cxx11/destdir/include/oneapi/mkl/stats.hpp:36:12: note: ‘layout’ declared here as ‘enum class oneapi::mkl::stats::layout’
   36 | enum class layout : std::int64_t {
      |            ^~~~~~
/workspace/x86_64-linux-gnu-libgfortran5-cxx11/destdir/include/oneapi/mkl/stats.hpp:102:29: error: declaration of ‘constexpr const oneapi::mkl::stats::layout layout’ changes meaning of ‘layout’ [-fpermissive]
  102 |     static constexpr layout layout = ObservationsLayout;
      |                             ^~~~~~
/workspace/x86_64-linux-gnu-libgfortran5-cxx11/destdir/include/oneapi/mkl/stats.hpp:36:12: note: ‘layout’ declared here as ‘enum class oneapi::mkl::stats::layout’
   36 | enum class layout : std::int64_t {
      |            ^~~~~~
/workspace/srcdir/oneAPI.jl/deps/onemkl.cpp: In function ‘oneapi::mkl::transpose convert(onemklTranspose)’:
/workspace/srcdir/oneAPI.jl/deps/onemkl.cpp:19:1: warning: control reaches end of non-void function [-Wreturn-type]
   19 | }
      | ^
Previous command exited with 1

@pengtu You mentioned we could swap DPCPP for GCC; any thoughts? This fails on GCC 8-12 (where 8 is the minimum we require for C++17 support)/

Package test fails on Intel UHD 620 with `nested task error: rmprocs: pids [2] not terminated after 30 seconds.`

(jl_jUUbkI) pkg> test oneAPI
     Testing oneAPI
      Status `/tmp/jl_yQPuYN/Project.toml`
  [79e6a3ab] Adapt v3.3.3
  [7a1cc6ca] FFTW v1.4.6
  [0c68f7d7] GPUArrays v8.3.2
  [90137ffa] StaticArrays v1.4.4
  [8f75cd03] oneAPI v0.2.3
  [ade2ca70] Dates `@stdlib/Dates`
  [8ba89e20] Distributed `@stdlib/Distributed`
  [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra`
  [de0858da] Printf `@stdlib/Printf`
  [3fa0cd96] REPL `@stdlib/REPL`
  [9a3f8284] Random `@stdlib/Random`
  [10745b16] Statistics `@stdlib/Statistics`
  [8dfed614] Test `@stdlib/Test`
      Status `/tmp/jl_yQPuYN/Manifest.toml`
  [621f4979] AbstractFFTs v1.1.0
  [79e6a3ab] Adapt v3.3.3
  [fa961155] CEnum v0.4.2
  [d360d2e6] ChainRulesCore v1.14.0
  [9e997f8a] ChangesOfVariables v0.1.3
  [34da2185] Compat v3.43.0
  [ffbed154] DocStringExtensions v0.8.6
  [e2ba6199] ExprTools v0.1.8
  [7a1cc6ca] FFTW v1.4.6
  [0c68f7d7] GPUArrays v8.3.2
  [61eb1bfa] GPUCompiler v0.14.1
  [3587e190] InverseFunctions v0.1.4
  [92d709cd] IrrationalConstants v0.1.1
  [692b3bcd] JLLWrappers v1.4.1
  [929cbde3] LLVM v4.11.1
  [2ab3a3ac] LogExpFunctions v0.3.15
  [21216c6a] Preferences v1.3.0
  [189a3867] Reexport v1.2.2
  [276daf66] SpecialFunctions v2.1.5
  [90137ffa] StaticArrays v1.4.4
  [a759f4b9] TimerOutputs v0.5.19
  [8f75cd03] oneAPI v0.2.3
  [f5851436] FFTW_jll v3.3.10+0
  [1d5cc7b8] IntelOpenMP_jll v2018.0.3+2
  [dad2f222] LLVMExtra_jll v0.0.16+0
  [856f044c] MKL_jll v2022.0.0+0
  [700fe977] NEO_jll v22.17.23034+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [85f0d8ed] SPIRV_LLVM_Translator_unified_jll v0.1.0+0
  [6ac6d60f] SPIRV_Tools_jll v2022.1.0+0
  [09858cae] gmmlib_jll v22.1.2+0
  [94295238] libigc_jll v1.0.11061+0
  [f4bc562b] oneAPI_Level_Zero_Headers_jll v1.3.7+1
  [13eca655] oneAPI_Level_Zero_Loader_jll v1.7.15+0
  [0dad84c5] ArgTools `@stdlib/ArgTools`
  [56f22d72] Artifacts `@stdlib/Artifacts`
  [2a0f44e3] Base64 `@stdlib/Base64`
  [ade2ca70] Dates `@stdlib/Dates`
  [8bb1440f] DelimitedFiles `@stdlib/DelimitedFiles`
  [8ba89e20] Distributed `@stdlib/Distributed`
  [f43a241f] Downloads `@stdlib/Downloads`
  [b77e0a4c] InteractiveUtils `@stdlib/InteractiveUtils`
  [4af54fe1] LazyArtifacts `@stdlib/LazyArtifacts`
  [b27032c2] LibCURL `@stdlib/LibCURL`
  [76f85450] LibGit2 `@stdlib/LibGit2`
  [8f399da3] Libdl `@stdlib/Libdl`
  [37e2e46d] LinearAlgebra `@stdlib/LinearAlgebra`
  [56ddb016] Logging `@stdlib/Logging`
  [d6f4376e] Markdown `@stdlib/Markdown`
  [a63ad114] Mmap `@stdlib/Mmap`
  [ca575930] NetworkOptions `@stdlib/NetworkOptions`
  [44cfe95a] Pkg `@stdlib/Pkg`
  [de0858da] Printf `@stdlib/Printf`
  [3fa0cd96] REPL `@stdlib/REPL`
  [9a3f8284] Random `@stdlib/Random`
  [ea8e919c] SHA `@stdlib/SHA`
  [9e88b42a] Serialization `@stdlib/Serialization`
  [1a1011a3] SharedArrays `@stdlib/SharedArrays`
  [6462fe0b] Sockets `@stdlib/Sockets`
  [2f01184e] SparseArrays `@stdlib/SparseArrays`
  [10745b16] Statistics `@stdlib/Statistics`
  [fa267f1f] TOML `@stdlib/TOML`
  [a4e569a6] Tar `@stdlib/Tar`
  [8dfed614] Test `@stdlib/Test`
  [cf7118a7] UUIDs `@stdlib/UUIDs`
  [4ec0a83e] Unicode `@stdlib/Unicode`
  [e66e0078] CompilerSupportLibraries_jll `@stdlib/CompilerSupportLibraries_jll`
  [deac9b47] LibCURL_jll `@stdlib/LibCURL_jll`
  [29816b5a] LibSSH2_jll `@stdlib/LibSSH2_jll`
  [c8ffd9c3] MbedTLS_jll `@stdlib/MbedTLS_jll`
  [14a3606d] MozillaCACerts_jll `@stdlib/MozillaCACerts_jll`
  [4536629a] OpenBLAS_jll `@stdlib/OpenBLAS_jll`
  [05823500] OpenLibm_jll `@stdlib/OpenLibm_jll`
  [83775a58] Zlib_jll `@stdlib/Zlib_jll`
  [8e850b90] libblastrampoline_jll `@stdlib/libblastrampoline_jll`
  [8e850ede] nghttp2_jll `@stdlib/nghttp2_jll`
  [3f19e933] p7zip_jll `@stdlib/p7zip_jll`
Precompiling project...
  6 dependencies successfully precompiled in 18 seconds (35 already precompiled)
     Testing Running tests...
                                                  |          | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
array                                         (2) |    11.55 |   0.22 |  1.9 |     276.23 |   686.46 |
examples                                      (2) |    92.64 |   0.00 |  0.0 |      10.64 |   686.46 |
      From worker 2:	WARNING: Method definition #2492#kernel(Any) in module Main at /home/joto/.julia/packages/oneAPI/oAWyB/test/execution.jl:293 overwritten at /home/joto/.julia/packages/oneAPI/oAWyB/test/execution.jl:301.
execution                                     (2) |   234.14 |   8.95 |  3.8 |    7790.15 |   686.46 |
level-zero                                    (2) |    15.28 |   0.24 |  1.6 |     323.19 |   686.46 |
pointer                                       (2) |     1.06 |   0.00 |  0.0 |      12.30 |   686.46 |
device/intrinsics                             (2) |   353.08 |  18.26 |  5.2 |   12926.37 |   789.96 |
gpuarrays/indexing scalar                     (2) |   105.96 |   4.67 |  4.4 |    3481.50 |   837.07 |
gpuarrays/reductions/reducedim!               (2) |   844.81 |  37.31 |  4.4 |   26736.36 |  1256.86 |
gpuarrays/linalg                              (2) |   351.42 |  13.48 |  3.8 |   10783.73 |  1633.63 |
gpuarrays/math/power                          (2) |   201.09 |   9.42 |  4.7 |    6106.48 |  1636.37 |
gpuarrays/linalg/mul!/vector-matrix           (2) |   504.15 |  19.22 |  3.8 |   13991.48 |  1906.59 |
gpuarrays/indexing multidimensional           (2) |   253.78 |  10.33 |  4.1 |    7861.00 |  2093.59 |
gpuarrays/interface                           (2) |    29.79 |   0.99 |  3.3 |     751.43 |  2093.59 |
gpuarrays/reductions/any all count            (2) |   119.71 |   5.04 |  4.2 |    3615.71 |  2093.59 |
gpuarrays/reductions/minimum maximum extrema  (2) |  1354.93 |  51.36 |  3.8 |   37398.33 |  2643.75 |
gpuarrays/uniformscaling                      (2) |    54.28 |   1.64 |  3.0 |    1352.44 |  2643.75 |
gpuarrays/linalg/mul!/matrix-matrix           (2) |   973.37 |  32.60 |  3.3 |   26161.34 |  2901.64 |
gpuarrays/math/intrinsics                     (2) |    22.14 |   0.80 |  3.6 |     702.95 |  2901.64 |
gpuarrays/linalg/norm                         (2) |  1764.69 |  45.22 |  2.6 |   33672.32 |  3273.92 |
gpuarrays/statistics                          (2) |   715.68 |  24.09 |  3.4 |   17050.71 |  3535.25 |
gpuarrays/reductions/mapreduce                (2) |  2591.53 |  77.95 |  3.0 |   58664.09 |  4458.85 |
gpuarrays/constructors                        (2) |    92.73 |   2.74 |  3.0 |    2435.66 |  4458.85 |
gpuarrays/random                              (2) |   177.37 |   6.10 |  3.4 |    5204.63 |  4458.85 |
gpuarrays/base                                (2) |   178.65 |   6.21 |  3.5 |    5475.41 |  4521.52 |
gpuarrays/broadcasting                        (2) |  2475.30 | 161.76 |  6.5 |   55411.07 |  5216.58 |
gpuarrays/reductions/mapreducedim!            (2) |   686.13 |  46.13 |  6.7 |   13165.50 |  6216.85 |
gpuarrays/reductions/reduce                   (2) |   180.45 |   1.64 |  0.9 |    2234.24 |  6216.85 |
gpuarrays/reductions/sum prod                 (2) |  2870.19 | 169.21 |  5.9 |   43675.72 |  6856.17 |
ERROR: LoadError: TaskFailedException

    nested task error: rmprocs: pids [2] not terminated after 30 seconds.
    Stacktrace:
     [1] _rmprocs(pids::Vector{Int64}, waitfor::Int64)
       @ Distributed /usr/share/julia/stdlib/v1.7/Distributed/src/cluster.jl:1065
     [2] rmprocs(pids::Int64; waitfor::Int64)
       @ Distributed /usr/share/julia/stdlib/v1.7/Distributed/src/cluster.jl:1033
     [3] (::var"#recycle_worker#37")(p::Int64)
       @ Main ~/.julia/packages/oneAPI/oAWyB/test/runtests.jl:219
     [4] macro expansion
       @ ~/.julia/packages/oneAPI/oAWyB/test/runtests.jl:271 [inlined]
     [5] (::var"#32#38"{Dict{String, DateTime}, Task, var"#recycle_worker#37"})()
       @ Main ./task.jl:423
Stacktrace:
 [1] macro expansion
   @ task.jl:400 [inlined]
 [2] top-level scope
   @ ~/.julia/packages/oneAPI/oAWyB/test/runtests.jl:217
 [3] include(fname::String)
   @ Base.MainInclude ./client.jl:451
 [4] top-level scope
   @ none:6
in expression starting at /home/joto/.julia/packages/oneAPI/oAWyB/test/runtests.jl:187
ERROR: Package oneAPI errored during testing

(jl_jUUbkI) pkg> st
      Status `/tmp/jl_jUUbkI/Project.toml`
  [8f75cd03] oneAPI v0.2.3

I understand that old mobile GPUs are not a priority but i thought i let you know. Laptop with i5 8250u had adequate cooling.

Strange error `InvalidBitWidth: Invalid bit width in input: 96`

Julia Version 1.6.0-beta1
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz

[8f75cd03] oneAPI v0.1.0 https://github.com/JuliaGPU/oneAPI.jl.git#master

MWE:

using StaticArrays
using oneAPI

function f(a)
    x = SVector(0f0, 0f0) # does not fail with tuple instead of SVector
    v = MVector{3, Float32}(undef); # does not fail with MVector{2, Float32}
                                    # for MVector{4, Float32} gives "Invalid bit width in input: 128"
    for (i,_) in enumerate(x)
        v[i] = 1.0f0  ## does not fail with 0.0f0
    end
    a[1] = v[1]
    return nothing
end
@oneapi f(oneArray(zeros(1)))

Invalid bit width in input

Reported by @michel2323:

InvalidBitWidth: Invalid bit width in input: 2                                                                                                                                                                                                                                          dspcg: Error During Test at /home/mschanen/git/ExaTronKernels.jl/test/oneAPI.jl:938                                                                                                                                                                                                       Got exception outside of a @test                                                                                                                                                                                                                                                        Failed to translate LLVM code to SPIR-V.                                                                                                                                                                                                                                                If you think this is a bug, please file an issue and attach /tmp/jl_dLIFvn.bc.                                                                                                                                                                                                          Stacktrace:                                                                                                                                                                                                                                                                               [1] error(s::String)

jl_QOvqhg.bc.zip

oneMKL tests are failing.

Following are the snapshots of level-1,2,3 tests that are failing under tests/onemkl.jl after following commit - 63fa2c4 in master. @maleadt please let me know if there is a change in the way we test them. I am investigating further at my end as well.

Level-1:

level1_failures

Level-2:

level2_failures

Level-3:

level3_failures

Pkg Installation problem: Unsatisfiable requirements

I'm trying to add the oneAPI package as per the instructions and I get:

(@v1.5) pkg> add oneAPI
  Resolving package versions...
ERROR: Unsatisfiable requirements detected for package GPUCompiler [61eb1bfa]:
 GPUCompiler [61eb1bfa] log:
 ├─possible versions are: [0.1.0, 0.2.0, 0.3.0, 0.4.0-0.4.1, 0.5.0-0.5.5, 0.6.0-0.6.1, 0.7.0-0.7.3, 0.8.0-0.8.3] or uninstalled
 ├─restricted by compatibility requirements with CUDA [052768ef] to versions: [0.3.0, 0.4.0-0.4.1, 0.5.0-0.5.5, 0.6.0-0.6.1, 0.7.0-0.7.3, 0.8.1-0.8.3]
 │ └─CUDA [052768ef] log:
 │   ├─possible versions are: [0.1.0, 1.0.0-1.0.2, 1.1.0, 1.2.0-1.2.1, 1.3.0-1.3.3, 2.0.0-2.0.2, 2.1.0, 2.2.0-2.2.1, 2.3.0] or uninstalled
 │   ├─restricted to versions * by an explicit requirement, leaving only versions [0.1.0, 1.0.0-1.0.2, 1.1.0, 1.2.0-1.2.1, 1.3.0-1.3.3, 2.0.0-2.0.2, 2.1.0, 2.2.0-2.2.1, 2.3.0]
 │   └─restricted by julia compatibility requirements to versions: [1.0.0-1.0.2, 1.1.0, 1.2.0-1.2.1, 1.3.0-1.3.3, 2.0.0-2.0.2, 2.1.0, 2.2.0-2.2.1, 2.3.0] or uninstalled, leaving only versions: [1.0.0-1.0.2, 1.1.0, 1.2.0-1.2.1, 1.3.0-1.3.3, 2.0.0-2.0.2, 2.1.0, 2.2.0-2.2.1, 2.3.0]
 ├─restricted by compatibility requirements with Enzyme [7da242da] to versions: [0.4.0-0.4.1, 0.7.0-0.7.3]
 │ └─Enzyme [7da242da] log:
 │   ├─possible versions are: [0.1.0, 0.2.0-0.2.1] or uninstalled
 │   └─restricted to versions * by an explicit requirement, leaving only versions [0.1.0, 0.2.0-0.2.1]
 └─restricted by compatibility requirements with oneAPI [8f75cd03] to versions: 0.8.2-0.8.3 — no versions left
   └─oneAPI [8f75cd03] log:
     ├─possible versions are: 0.1.0 or uninstalled
     └─restricted to versions * by an explicit requirement, leaving only versions 0.1.0

Running Julia 1.5.3

I'm thinking this is just incompatible with CUDA, is that correct?

Windows support

Trying

using oneAPI

Get the error:

ERROR: InitError: UndefVarError: libze_loader not defined
Stacktrace:
 [1] unsafe_zeInit
   @ C:\Users\alexeyc\.julia\packages\oneAPI\bEvNc\lib\level-zero\libze.jl:895 [inlined]
 [2] __init__()
   @ oneAPI.oneL0 C:\Users\alexeyc\.julia\packages\oneAPI\bEvNc\lib\level-zero\oneL0.jl:93
 [3] _include_from_serialized(path::String, depmods::Vector{Any})
   @ Base .\loading.jl:696
 [4] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
   @ Base .\loading.jl:782
 [5] _require(pkg::Base.PkgId)
   @ Base .\loading.jl:1020
 [6] require(uuidkey::Base.PkgId)
   @ Base .\loading.jl:936
 [7] require(into::Module, mod::Symbol)
   @ Base .\loading.jl:923
during initialization of module oneL0

Windows 10, Julia 1.6.2 (2021-07-14)

Conversions from float to integers fail

For example, this:

Int32.(ceil.(oneAPI.oneArray([1.2f0])))

fails with Reason: unsupported call to an unknown function (call to gpu_malloc)

As @maleadt said, the reason is:

we don't have a device-side malloc right now, which breaks exceptions that box their arguments

ERROR: /tmp/jl_p05O0j-download.gz Can not open the file as archive

add oneAPI caused error:

ERROR: /tmp/jl_p05O0j-download.gz
Can not open the file as archive

so running tests now ends with:

ERROR: LoadError: InitError: could not load symbol "LLVMExtraInitializeAllTargets":
/home/slaus/julia-1.7.1/bin/julia: undefined symbol: LLVMExtraInitializeAllTargets
Stacktrace:
[1] LLVMInitializeAllTargets
@ ~/.julia/packages/LLVM/srSVa/lib/libLLVM_extra.jl:10 [inlined]
[2] InitializeAllTargets
@ ~/.julia/packages/LLVM/srSVa/src/init.jl:58 [inlined]
[3] init()
@ GPUCompiler ~/.julia/packages/GPUCompiler/XwWPj/src/GPUCompiler.jl:50
[4] _include_from_serialized(path::String, depmods::Vector{Any})
@ Base ./loading.jl:768
[5] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
@ Base ./loading.jl:854
[6] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1097
[7] require(uuidkey::Base.PkgId)
@ Base ./loading.jl:1013
[8] require(into::Module, mod::Symbol)
@ Base ./loading.jl:997
[9] include
@ ./Base.jl:418 [inlined]
[10] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::String)
@ Base ./loading.jl:1318
[11] top-level scope
@ none:1
[12] eval
@ ./boot.jl:373 [inlined]
[13] eval(x::Expr)
@ Base.MainInclude ./client.jl:453
[14] top-level scope
@ none:1
during initialization of module GPUCompiler
in expression starting at /home/slaus/.julia/packages/oneAPI/zydrg/src/oneAPI.jl:1

What could be the reason?
I have Ubuntu 20.04; Julia version 1.7.1; OpenCL 2.0
Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.