juliasimd / vectorizationbase.jl Goto Github PK
View Code? Open in Web Editor NEWBase library providing vectorization-tools (ie, SIMD) that other libraries are built off of.
License: MIT License
Base library providing vectorization-tools (ie, SIMD) that other libraries are built off of.
License: MIT License
As @chriselrod asked in #21, here's the output of
using Libdl
llvmlib = VERSION β₯ v"1.6.0-DEV.1429" ? Libdl.dlopen(Base.libllvm_path()) : Libdl.dlopen(only(filter(lib->occursin(r"LLVM\b", basename(lib)), Libdl.dllist())));
gethostcpufeatures = Libdl.dlsym(llvmlib, :LLVMGetHostCPUFeatures);
features_cstring = ccall(gethostcpufeatures, Cstring, ());
features = filter(ext -> (m = match(r"\d", ext); isnothing(m) ? true : m.offset != 2 ), split(unsafe_string(features_cstring), ','));
println(features)
on Julia 1.5.3 on the MacBook Pro Apple M1 (but it should be the same as on the MacBook Air since the M1 I the same as far as I know):
SubString{String}["+sse2", "+cx16", "+sahf", "-tbm", "-avx512ifma", "-sha", "-gfni", "-fma4", "-vpclmulqdq", "-prfchw", "-bmi2", "-cldemote", "-fsgsbase", "-ptwrite", "-xsavec", "+popcnt", "-mpx", "+aes", "-avx512bitalg", "-movdiri", "-xsaves", "-avx512er", "-avx512vnni", "-avx512vpopcntdq", "-pconfig", "-clwb", "-avx512f", "-clzero", "-pku", "+mmx", "-lwp", "-rdpid", "-xop", "-rdseed", "-waitpkg", "-movdir64b", "-sse4a", "-avx512bw", "-clflushopt", "-xsave", "-avx512vbmi2", "-avx512vl", "-invpcid", "-avx512cd", "-avx", "-vaes", "+cx8", "-fma", "-rtm", "-bmi", "-enqcmd", "-rdrnd", "-mwaitx", "+sse4.1", "+sse4.2", "-avx2", "+fxsr", "-wbnoinvd", "+sse", "-lzcnt", "+pclmul", "-prefetchwt1", "-f16c", "+ssse3", "-sgx", "-shstk", "+cmov", "-avx512vbmi", "-avx512bf16", "-movbe", "-xsaveopt", "-avx512dq", "-adx", "-avx512pf", "+sse3"]
to be honest, I am not sure what to do with this, but I was asked to provide it. I can also do that on Julia nightly (1.6-DEV) if that helps.
I get the following error when using v0.21.18 on a 32 bit machine.
LoadError: MethodError: no method matching vload_transpose_quote(::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Bool, ::Int32, ::Int32, ::UInt64, ::Bool)
Closest candidates are:
vload_transpose_quote(::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Bool, ::Int32, ::Int32, ::UInt32, ::Bool) at /home/runner/.julia/packages/VectorizationBase/Q2q04/src/vecunroll/memory.jl:204
Running the same code on 64 bit causes no error.
I'm getting the following error when mixing dual numbers & Vec
s:
julia> using LoopVectorization, ForwardDiff
julia> function grad!(π₯x, π₯β::AbstractArray{π―}, β, x, πΆπi=1:3, πΆπj=1:3) where π―
begin
πxβ² = ForwardDiff.Dual((zero)(π―), ((one)(π―), (zero)(π―)))
πx = ForwardDiff.Dual((zero)(π―), ((zero)(π―), (one)(π―)))
end
LoopVectorization.@avx for j in πΆπj
for i in πΆπi
β = 1 * (x[i] + πx) + 1000 * (x[j] + πxβ²)
π₯x[j] = π₯x[j] + ForwardDiff.partials(β, 1) * π₯β[i, j]
π₯x[i] = π₯x[i] + ForwardDiff.partials(β, 2) * π₯β[i, j]
end
end
π₯x
end
grad! (generic function with 3 methods)
julia> grad!(zeros(3), ones(3,3), rand(3,3), [1,2,3.0])
ERROR: StackOverflowError:
Stacktrace:
[1] vfma_fast(a::ForwardDiff.Dual{Nothing, VectorizationBase.Vec{4, Float64}, 2}, b::ForwardDiff.Dual{Nothing, VectorizationBase.Vec{4, Float64}, 2}, c::ForwardDiff.Dual{Nothing, VectorizationBase.Vec{4, Float64}, 2})
@ VectorizationBase ~/.julia/packages/VectorizationBase/6EQU1/src/base_defs.jl:216
(@v1.7) pkg> st VectorizationBase
Status `~/.julia/environments/v1.7/Project.toml`
[3d5dd08c] VectorizationBase v0.15.5
This also happens with 0.15.1 but not with v0.14.12. Happens on Julia 1.5 too.
Found in CI here:
https://github.com/mcabbott/Tullio.jl/pull/57/checks?check_run_id=1743880859
For a program using LoopVectorization, in VS Code with Julia extension 1.4.3, I get this when I start debug mode, referencing line 340 of stridedpointers.jl (@generated function $f(p1::Core.LLVMPtr{T,0}, p2::Core.LLVMPtr{T,0}) where {T}
)
Exception has occurred: ErrorException
this intrinsic must be compiled to be called
Stacktrace:
[1] vle(p1::Core.LLVMPtr{Float64, 0}, p2::Core.LLVMPtr{Float64, 0})
@ VectorizationBase C:\Users\drood\.julia\packages\VectorizationBase\OEl8L\src\strided_pointers\stridedpointers.jl:340
[2] vle(p1::Ptr{Float32}, p2::Ptr{Float32}, sp::VectorizationBase.OffsetPrecalc{Float32, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{4}, Int64}, Tuple{Static.StaticInt{0}, Static.StaticInt{0}}, LayoutPointers.StridedPointer{Float32, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{4}, Int64}, Tuple{Static.StaticInt{0}, Static.StaticInt{0}}}, Tuple{Nothing, Tuple{Int64, Int64, Int64}}})
@ VectorizationBase C:\Users\drood\.julia\packages\VectorizationBase\OEl8L\src\strided_pointers\stridedpointers.jl:346
[3] _turbo_!(#unused#::Val{(false, 0, 0, 0, false, 8, 32, 15, 64, 32768, 262144, 16777216, 0x0000000000000001)}, #unused#::Val{(:LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000013, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0001, 0x01), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000032, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0002, 0x02), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0003, 0x03), :numericconstant, Symbol("###reduction##zero###17###"), LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000000, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.constant, 0x0004, 0x00), :LoopVectorization, :vfmadd_fast, LoopVectorization.OperationStruct(0x00000000000000000000000000000132, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000100020004, 0x00000000000000000000000000000000, LoopVectorization.compute, 0x0004, 0x00), :LoopVectorization, :reduced_add, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000050003, 0x00000000000000000000000000000000, LoopVectorization.compute, 0x0003, 0x00), :LoopVectorization, :setindex!, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000000006, 0x00000000000000000000000000000000, LoopVectorization.memstore, 0x0005, 0x03))}, #unused#::Val{(LoopVectorization.ArrayRefStruct{:B, Symbol("##vptr##_B")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000103, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101), LoopVectorization.ArrayRefStruct{:C, Symbol("##vptr##_C")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000302, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101), LoopVectorization.ArrayRefStruct{:A, Symbol("##vptr##_A")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000102, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101))}, #unused#::Val{(0, (), (), (), (), ((4, LoopVectorization.IntOrFloat),), ())}, #unused#::Val{(:i, :k, :j)}, #unused#::Val{Tuple{Tuple{CloseOpenIntervals.CloseOpen{Int64, Int64}, CloseOpenIntervals.CloseOpen{Int64, Int64}, CloseOpenIntervals.CloseOpen{Static.StaticInt{0}, Int64}}, Tuple{LayoutPointers.GroupedStridedPointers{Tuple{Ptr{Float32}, Ptr{Float32}, Ptr{Float32}}, (1, 1, 1), (0, 0, 0), ((1, 2), (1, 2), (1, 2)), ((1, 2), (3, 4), (5, 6)), Tuple{Static.StaticInt{4}, Int64, Static.StaticInt{4}, Int64, Static.StaticInt{4}, Int64}, NTuple{6, Static.StaticInt{0}}}}}}, var#arguments#::Tuple{Int64, Int64, Int64, Int64, Int64, Ptr{Float32}, Ptr{Float32}, Ptr{Float32}, Int64, Int64, Int64})
@ LoopVectorization C:\Users\drood\.julia\packages\LoopVectorization\kVenK\src\reconstruct_loopset.jl:713
[4] _turbo_!(#unused#::Val{(false, 0, 0, 0, false, 8, 32, 15, 64, 32768, 262144, 16777216, 0x0000000000000007)}, #unused#::Val{(:LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000013, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0001, 0x01), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000032, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0002, 0x02), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0003, 0x03), :numericconstant, Symbol("###reduction##zero###17###"), LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000000, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.constant, 0x0004, 0x00), :LoopVectorization, :vfmadd_fast, LoopVectorization.OperationStruct(0x00000000000000000000000000000132, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000100020004, 0x00000000000000000000000000000000, LoopVectorization.compute, 0x0004, 0x00), :LoopVectorization, :reduced_add, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000050003, 0x00000000000000000000000000000000, LoopVectorization.compute, 0x0003, 0x00), :LoopVectorization, :setindex!, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000000006, 0x00000000000000000000000000000000, LoopVectorization.memstore, 0x0005, 0x03))}, #unused#::Val{(LoopVectorization.ArrayRefStruct{:B, Symbol("##vptr##_B")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000103, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101), LoopVectorization.ArrayRefStruct{:C, Symbol("##vptr##_C")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000302, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101), LoopVectorization.ArrayRefStruct{:A, Symbol("##vptr##_A")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000102, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101))}, #unused#::Val{(0, (), (), (), (), ((4, LoopVectorization.IntOrFloat),), ())}, #unused#::Val{(:i, :k, :j)}, #unused#::Val{Tuple{Tuple{CloseOpenIntervals.CloseOpen{Static.StaticInt{0}, Int64}, CloseOpenIntervals.CloseOpen{Static.StaticInt{0}, Int64}, CloseOpenIntervals.CloseOpen{Static.StaticInt{0}, Int64}}, Tuple{LayoutPointers.GroupedStridedPointers{Tuple{Ptr{Float32}, Ptr{Float32}, Ptr{Float32}}, (1, 1, 1), (0, 0, 0), ((1, 2), (1, 2), (1, 2)), ((1, 2), (3, 4), (5, 6)), Tuple{Static.StaticInt{4}, Int64, Static.StaticInt{4}, Int64, Static.StaticInt{4}, Int64}, NTuple{6, Static.StaticInt{0}}}}}}, var#arguments#::Tuple{Int64, Int64, Int64, Ptr{Float32}, Ptr{Float32}, Ptr{Float32}, Int64, Int64, Int64})
@ LoopVectorization C:\Users\drood\.julia\packages\LoopVectorization\kVenK\src\codegen\lower_threads.jl:652
[stack trace continues, referencing my code]
Results from typing status in the package manager:
[31c24e10] Distributions v0.25.32
[bdcacae8] LoopVectorization v0.12.99
[a2af1166] SortingAlgorithms v1.0.1
[37e2e46d] LinearAlgebra
[de0858da] Printf
[9a3f8284] Random
Currently, it uses a generic build script.
This script assumes:
const REGISTER_SIZE = 16
const REGISTER_COUNT = 16
const CACHELINE_SIZE = 64
const SIMD_NATIVE_INTEGERS = true
If any of these are violated, dependent libraries (e.g., LoopVectorization) are likely to produce suboptimal code. If these numbers undershoot, that would just mean some performance is left on the table, but it's likely to perform reasonably well.
If these numbers overshoot, performance consequences could be dire. Register spills galore.
I believe some ARM CPUs do not have SIMD Float64
, so perhaps this should be handled somehow.
Ideally, we'd use a library like CpuId.jl to query hardware info, like we do for AMD and Intel.
I tracked down performance regressions observed in trixi-framework/Trixi.jl#509 to the following problem. On AVX2 (and non-AVX512) systems, the commit e2b6ccb "Fast integer ops shouldn't wrap" in VectorizationBase resulted in significant performance regressions when loading data from a PtrArray
via getindex
.
Using Julia v1.6.1 on an Intel i7 8700K yields the following results for a minimal working example. I'm using StrideArrays v0.1.6 with StrideArraysCore v0.1.5 and LoopVectorization v0.12.12 with the diff
~/.julia/dev/StrideArrays$ git diff
diff --git a/Project.toml b/Project.toml
index 206a58f..bb0e2b0 100644
--- a/Project.toml
+++ b/Project.toml
@@ -18,13 +18,11 @@ VectorizedRNG = "33b4df10-0173-11e9-2a0c-851a7edac40e"
[compat]
ArrayInterface = "3"
-LoopVectorization = "0.12.13"
Octavian = "0.2.3"
SLEEFPirates = "0.6.13"
Static = "0.2.4"
StrideArraysCore = "0.1.3"
ThreadingUtilities = "0.4"
-VectorizationBase = "0.19.32"
VectorizedRNG = "0.2.8"
julia = "1.5"
in StrideArrays to be able to check different versions of VectorizationBase.
~/.julia/dev/VectorizationBase$ git checkout cb8185d6a1b4b4ad84f0c539af7b4b39d5b7bb59
Previous HEAD position was e2b6ccb Fast integer ops shouldn't wrap
HEAD is now at cb8185d non small pow2 non-AVX512 mask fixes
(StrideArrays) pkg> up
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General`
No Changes to `~/.julia/dev/StrideArrays/Project.toml`
No Changes to `~/.julia/dev/StrideArrays/Manifest.toml`
Precompiling project...
9 dependencies successfully precompiled in 18 seconds (15 already precompiled)
julia> using StrideArrays, BenchmarkTools
julia> function foo(a::AbstractArray{T,N}) where {T,N}
idx = ntuple(_ -> 1, Val(N))
@inbounds res = a[idx...]
res
end
foo (generic function with 1 method)
julia> for N in 1:5
a_array = randn(ntuple(_ -> 4, N)...)
a_stride = StrideArray(a_array)
a_ptr = PtrArray(pointer(a_array), size(a_array))
a_ptr_static = PtrArray(pointer(a_array), map(StaticInt, size(a_array)))
@info "New round" N
@btime foo($a_array)
@btime foo($a_stride)
@btime foo($a_ptr)
@btime foo($a_ptr_static)
end
β Info: New round
β N = 1
1.056 ns (0 allocations: 0 bytes)
1.079 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
β Info: New round
β N = 2
1.056 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
β Info: New round
β N = 3
1.056 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
β Info: New round
β N = 4
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
β Info: New round
β N = 5
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
~/.julia/dev/VectorizationBase$ git checkout e2b6ccbc9cea28e57b44b17dc864d46217cdd93d
Previous HEAD position was cb8185d non small pow2 non-AVX512 mask fixes
HEAD is now at e2b6ccb Fast integer ops shouldn't wrap
(StrideArrays) pkg> up
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General`
No Changes to `~/.julia/dev/StrideArrays/Project.toml`
No Changes to `~/.julia/dev/StrideArrays/Manifest.toml`
Precompiling project...
9 dependencies successfully precompiled in 18 seconds (15 already precompiled)
julia> using StrideArrays, BenchmarkTools
julia> function foo(a::AbstractArray{T,N}) where {T,N}
idx = ntuple(_ -> 1, Val(N))
@inbounds res = a[idx...]
res
end
foo (generic function with 1 method)
julia> for N in 1:5
a_array = randn(ntuple(_ -> 4, N)...)
a_stride = StrideArray(a_array)
a_ptr = PtrArray(pointer(a_array), size(a_array))
a_ptr_static = PtrArray(pointer(a_array), map(StaticInt, size(a_array)))
@info "New round" N
@btime foo($a_array)
@btime foo($a_stride)
@btime foo($a_ptr)
@btime foo($a_ptr_static)
end
β Info: New round
β N = 1
1.056 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
1.056 ns (0 allocations: 0 bytes)
β Info: New round
β N = 2
1.056 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
β Info: New round
β N = 3
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
1.057 ns (0 allocations: 0 bytes)
β Info: New round
β N = 4
1.057 ns (0 allocations: 0 bytes)
2.099 ns (0 allocations: 0 bytes)
2.099 ns (0 allocations: 0 bytes)
2.104 ns (0 allocations: 0 bytes)
β Info: New round
β N = 5
1.056 ns (0 allocations: 0 bytes)
2.099 ns (0 allocations: 0 bytes)
2.103 ns (0 allocations: 0 bytes)
2.099 ns (0 allocations: 0 bytes)
I'm not sure where this feature request should go? Base Julia, CpuId.jl, Hwloc.jl, VectorizationBase.jl, etc. So I figured I'd open it here and we could discuss where this feature should be implemented.
Anyway, the feature request is this: I would like a way to figure out how many sockets my machine has.
I got the following error, any idea why?
julia> using OrdinaryDiffEq
[ Info: Precompiling OrdinaryDiffEq [1dea7af3-3e70-54e6-95c3-0bf5283fa5ed]
ERROR: LoadError: UndefVarError: UInt256 not defined
Stacktrace:
[1] mask_type
@ ~/.julia/packages/VectorizationBase/pTvQj/src/early_definitions.jl:97 [inlined]
[2] worker_type
@ ~/.julia/packages/Polyester/f3SSz/src/request.jl:5 [inlined]
[3] worker_pointer_type
@ ~/.julia/packages/Polyester/f3SSz/src/request.jl:6 [inlined]
[4] reserved(id::UInt32)
@ Polyester ~/.julia/packages/Polyester/f3SSz/src/request.jl:14
[5] _request_threads
@ ~/.julia/packages/Polyester/f3SSz/src/request.jl:36 [inlined]
[6] request_threads
@ ~/.julia/packages/Polyester/f3SSz/src/request.jl:69 [inlined]
[7] batch(::Polyester.var"#11#12", ::Tuple{Int64, Int64}, ::Static.StaticInt{1}, ::Static.StaticInt{1})
@ Polyester ~/.julia/packages/Polyester/f3SSz/src/batch.jl:182
Julia versioninfo
julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD EPYC 7702 64-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, znver2)
Reproducible example:
julia --compiled-modules=no
using VectorizationBase
include(joinpath(pkgdir(VectorizationBase), "test", "runtests.jl"))
Sample error:
ERROR: LoadError: LoadError: MethodError: no method matching register_size()
The applicable method may be too new: running in world age 31504, while current world is 34093.
Closest candidates are:
register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:68 (method too new to be called from this world context.)
register_size(::Type{T}) where T<:Union{Signed, Unsigned} at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:3 (method too new to be called from this world context.)
register_size(::Type{T}) where T at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:2 (method too new to be called from this world context.)
Stacktrace:
[1] dynamic_integer_register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:38
[2] #s1160#30 at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:65 [inlined]
[3] #s1160#30(::Any) at ./none:0
[4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
[5] simd_integer_register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:70
[6] __pick_vector_width(::Int64, ::Int64, ::Any) at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:39
[7] _pick_vector_width(::Type{T} where T) at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:53
[8] #s1160#33 at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:73 [inlined]
[9] #s1160#33(::Any, ::Any) at ./none:0
[10] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
[11] top-level scope at /home/chriselrod/.julia/dev/VectorizationBase/test/testsetup.jl:6
[12] include(::String) at ./client.jl:457
[13] top-level scope at /home/chriselrod/.julia/dev/VectorizationBase/test/runtests.jl:4
[14] include(::String) at ./client.jl:457
[15] top-level scope at REPL[2]:1
in expression starting at /home/chriselrod/.julia/dev/VectorizationBase/test/testsetup.jl:6
in expression starting at /home/chriselrod/.julia/dev/VectorizationBase/test/runtests.jl:4
Fixing this should hopefully solve
JuliaSIMD/LoopVectorization.jl#192
Or at least be a step towards it. Other libraries depending on VectorizationBase may be using a similar pattern that caused the above.
Not sure if this is a VectorizationBase.jl, LoopVectorization.jl, or Hwloc.jl bug.
L1CACHE.linesize=nothing
on my system:
julia> VectorizationBase.LβCACHE
(size = nothing, depth = nothing, linesize = nothing, associativity = nothing, type = nothing)
This causes LoopVectorization.jl
to fail to precompile.
My system is the WSL2, the Windows Subsystem for Linux 2 running Ubuntu-20.04. The /proc/cpuinfo
appears normal (happy to post on request), CPU is Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
.
Some more info:
julia> VectorizationBase.CACHE_COUNT
(0, 0, 0, 0)
julia> VectorizationBase.COUNTS
Dict{Symbol,Int64} with 19 entries:
:Package => 1
:Error => 0
:PU => 16
:OS_Device => 0
:L5Cache => 0
:L4Cache => 0
:I1Cache => 0
:L3Cache => 0
:Core => 8
:Machine => 1
:I3Cache => 0
:PCI_Device => 0
:L2Cache => 0
:NUMANode => 0
:Bridge => 0
:Group => 0
:Misc => 0
:L1Cache => 0
:I2Cache => 0
julia> VectorizationBase.TOPOLOGY
D0: L0 P0 Machine
D1: L0 P0 Package
D2: L0 P0 Core
D3: L0 P0 PU
D3: L1 P1 PU
D2: L1 P1 Core
D3: L2 P2 PU
D3: L3 P3 PU
D2: L2 P2 Core
D3: L4 P4 PU
D3: L5 P5 PU
D2: L3 P3 Core
D3: L6 P6 PU
D3: L7 P7 PU
D2: L4 P4 Core
D3: L8 P8 PU
D3: L9 P9 PU
D2: L5 P5 Core
D3: L10 P10 PU
D3: L11 P11 PU
D2: L6 P6 Core
D3: L12 P12 PU
D3: L13 P13 PU
D2: L7 P7 Core
D3: L14 P14 PU
D3: L15 P15 PU
I observed the following error in some recent CI tests using GitHub actions
Got exception outside of a @test
LoadError: StackOverflowError:
Stacktrace:
[1] vsub(a::UInt128, b::UInt128) (repeats 79984 times)
@ VectorizationBase ~/.julia/packages/VectorizationBase/czbgP/src/llvm_intrin/binary_ops.jl:90
We do not use VectorizationBase directly, only LoopVectorization. Sadly, I can't reproduce this error locally using the same test set. Nevertheless,
julia> using Pkg; Pkg.activate(temp=true); Pkg.add("VectorizationBase")
[...]
[3d5dd08c] + VectorizationBase v0.19.27
[...]
julia> using VectorizationBase
julia> VectorizationBase.vsub(UInt128(1), UInt128(5))
ERROR: StackOverflowError:
Stacktrace:
[1] vsub(a::UInt128, b::UInt128) (repeats 79984 times)
@ VectorizationBase ~/.julia/packages/VectorizationBase/czbgP/src/llvm_intrin/binary_ops.jl:90
Hi Chris,
I have a question regarding the use of vmap! I am wondering whether it is possible to use a function involving the ternary operator in vmap! Below is an example. Thanks very much in advance!
x = rand(10^4); y = rand(10^4); z = similar(x); f(x,y) = x==y ? 2.0 : 0.0 vmap!(f, z, x, y);
System information:
Julia Version: 1.5.3 (2020-11-09)
OS: Manjaro Linux x86_64
Host: MacBookPro11,4 1.0
Kernel: 5.10.2-2-MANJARO
CPU: Intel i7-4980HQ (8) @ 4.000GHz
GPU: Intel Crystal Well
First noticed the issue when trying to precompile DifferentialEquations.jl
. Not sure if this is an issue with VectorizationBase.jl
or with Hwloc
...but the relevant line in /src/topology.jl
is as follows:
const CACHE_LEVELS = something(findfirst(isequal(0), CACHE_COUNT) - 1, length(CACHE_COUNT) + 1)
In particular, findfirst(isequal(0), CACHE_COUNT)
throws an error if 0
does not occur in CACHE_COUNT
.
Here is the output on my machine after adding the lines
@show CACHE_COUNT
@show findfirst(isequal(0), CACHE_COUNT)
to /src/topology.jl
:
julia> import VectorizationBase
[ Info: Precompiling VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f]
CACHE_COUNT = (4, 4, 1, 1)
findfirst(isequal(0), CACHE_COUNT) = nothing
ERROR: LoadError: LoadError: MethodError: no method matching -(::Nothing, ::Int64)
Closest candidates are:
-(::BigInt, ::Union{Int16, Int32, Int64, Int8}) at gmp.jl:532
-(::Base.CoreLogging.LogLevel, ::Integer) at logging.jl:117
-(::Missing, ::Number) at missing.jl:115
...
Stacktrace:
[1] top-level scope at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/topology.jl:18
[2] include(::Function, ::Module, ::String) at ./Base.jl:380
[3] include at ./Base.jl:368 [inlined]
[4] include(::String) at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/VectorizationBase.jl:1
[5] top-level scope at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/VectorizationBase.jl:355
[6] include(::Function, ::Module, ::String) at ./Base.jl:380
[7] include(::Module, ::String) at ./Base.jl:368
[8] top-level scope at none:2
[9] eval at ./boot.jl:331 [inlined]
[10] eval(::Expr) at ./client.jl:467
[11] top-level scope at ./none:3
in expression starting at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/topology.jl:18
in expression starting at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/VectorizationBase.jl:352
ERROR: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /home/dipsticksupreme/.julia/compiled/v1.5/VectorizationBase/Dto5m_zM0pN.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
[3] _require(::Base.PkgId) at ./loading.jl:1030
[4] require(::Base.PkgId) at ./loading.jl:928
[5] require(::Module, ::Symbol) at ./loading.jl:923
Any quick fixes? Please lmk if there's any helpful information I could provide!
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Error is like this and it happened when I add a package GpABC which I suppose have dependency on VectorizationBase. Do you guys have any suggestions?
PS: Julia version is v1.4.2; XX in C:\Users\XX.julia represents my user name and it's chinese.
Building VectorizationBase β `C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\deps\build.log`
β Error: Error building `VectorizationBase`:
β ERROR: LoadError: LoadError: InitError: could not load library "C:\Users\XX\AppData\Local\Programs\Julia\Julia-1.4.2\bin\LLVM.dll"
β The specified module could not be found.
β Stacktrace:
β [1] dlopen at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.4\Libdl\src\Libdl.jl:109
[inlined] (repeats 2 times)
β [2] (::LLVM.var"#14#cache_fptr!#3")() at C:\Users\XX\.julia\packages\LLVM\KITdB\src\util.jl:103
β [3] macro expansion at C:\Users\XX\.julia\packages\LLVM\KITdB\src\util.jl:111 [inlined]
β [4] runtime_version() at C:\Users\XX\.julia\packages\LLVM\KITdB\src\base.jl:9
β [5] __init__() at C:\Users\XX\.julia\packages\LLVM\KITdB\src\LLVM.jl:77
β [6] top-level scope at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\deps\build.jl:6
β [7] top-level scope at none:5
β during initialization of module LLVM
β in expression starting at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\deps\build_x86.jl:1
β in expression starting at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\deps\build.jl:4
β @ Pkg.Operations D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.4\Pkg\src\Operations.jl:899
Building Conda βββββββββββββ `C:\Users\XX\.julia\packages\Conda\3rPhK\deps\build.log`
Building FFTW ββββββββββββββ `C:\Users\XX\.julia\packages\FFTW\kcXL6\deps\build.log`
Building SLEEFPirates ββββββ `C:\Users\XX\.julia\packages\SLEEFPirates\kmfoV\deps\build.log`
β Error: Error building `SLEEFPirates`:
β ERROR: LoadError: "File C:\\Users\\XX\\.julia\\packages\\VectorizationBase\\LiMxH\\src\\cpu_info.jl does not exist. Please run `using Pkg; Pkg.build()`."
β Stacktrace:
β [1] top-level scope at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\src\VectorizationBase.jl:3
β [2] top-level scope at none:2
β [3] eval at .\boot.jl:331 [inlined]
β in expression starting at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\src\VectorizationBase.jl:3
β ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to C:\Users\XX\.julia\compiled\v1.4\VectorizationBase\Dto5m_13fSc.ji.
β Stacktrace:
β [1] top-level scope at none:5
β in expression starting at C:\Users\XX\.julia\packages\SLEEFPirates\kmfoV\deps\build.jl:1
β @ Pkg.Operations D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.4\Pkg\src\Operations.jl:899
Hi! I get a precompilation error when importing VectorizationBase. Originally I got this error when trying to install DifferentialEquations.jl, but I reproduce the error just adding VectorizationBase. I can reproduce it with julia 1.7.2 or 1.6.5 in Ubuntu, Below the stacktrace (for julia 1.7.2, for 1.6.5 I get the same):
julia> using VectorizationBase
[ Info: Precompiling VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f]
ERROR: LoadError: UndefVarError: num_cache_levels not defined
Stacktrace:
[1] include
@ ./Base.jl:418 [inlined]
[2] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
@ Base ./loading.jl:1318
[3] top-level scope
@ none:1
[4] eval
@ ./boot.jl:373 [inlined]
[5] eval(x::Expr)
@ Base.MainInclude ./client.jl:453
[6] top-level scope
@ none:1
in expression starting at /home/ismael/.julia/packages/VectorizationBase/yDGcX/src/VectorizationBase.jl:1
ERROR: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /home/ismael/.julia/compiled/v1.7/VectorizationBase/jl_voXyGg.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, ignore_loaded_modules::Bool)
@ Base ./loading.jl:1466
[3] compilecache(pkg::Base.PkgId, path::String)
@ Base ./loading.jl:1410
[4] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1120
[5] require(uuidkey::Base.PkgId)
@ Base ./loading.jl:1013
[6] require(into::Module, mod::Symbol)
@ Base ./loading.jl:997
Any ideas? Thanks in advance!
Hi. I'm trying to do something weird, not really sure how it works, but it seems this package is the only one where an issue appears, so I'm hoping I might at least learn a bit more about how things work.
I have a Julia application that runs in the cloud. My local development machine is not the exact same Sys.CUP_NAME
as the remote machine. I believe that as a result, even thought I have my app pre-compiled in a Docker container, it needs to be pre-compiled again when it gets deployed.
I was hoping that setting julia -C core-avx2
or something like that might allow me to easily circumvent the issue. When I try doing that, though, I run into this error during pre-compilation. Is there any work around that, or is it not really reasonable?
ERROR: LoadError: InitError: Evaluation into the closed module `HostCPUFeatures` breaks incremental compilation because the side effects will not be permanent. This is likely due to some other module mutating `HostCPUFeatures` with `eval` during precompilation - don't do this.
Stacktrace:
[1] eval
@ ./boot.jl:370 [inlined]
[2] setfeaturefalse(s::Symbol)
@ HostCPUFeatures ~/.julia/packages/HostCPUFeatures/9sAqs/src/cpu_info_x86.jl:36
[3] make_generic(target::String)
@ HostCPUFeatures ~/.julia/packages/HostCPUFeatures/9sAqs/src/cpu_info_x86.jl:73
[4] __init__()
@ HostCPUFeatures ~/.julia/packages/HostCPUFeatures/9sAqs/src/HostCPUFeatures.jl:45
[5] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
@ Base ./loading.jl:1074
[6] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})
@ Base ./loading.jl:1020
[7] _tryrequire_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String)
@ Base ./loading.jl:1407
[8] _require(pkg::Base.PkgId, env::String)
@ Base ./loading.jl:1781
[9] _require_prelocked(uuidkey::Base.PkgId, env::String)
@ Base ./loading.jl:1625
[10] macro expansion
@ ./loading.jl:1613 [inlined]
[11] macro expansion
@ ./lock.jl:267 [inlined]
[12] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1576
[13] include
@ ./Base.jl:457 [inlined]
[14] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
@ Base ./loading.jl:2010
[15] top-level scope
@ stdin:2
during initialization of module HostCPUFeatures
in expression starting at /home/user/.julia/packages/VectorizationBase/e4FnQ/src/VectorizationBase.jl:1
in expression starting at stdin:2
ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to "/home/user/.julia/compiled/v1.9/VectorizationBase/jl_JLOYjx".
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
@ Base ./loading.jl:2260
[3] compilecache
@ ./loading.jl:2127 [inlined]
[4] _require(pkg::Base.PkgId, env::String)
@ Base ./loading.jl:1770
[5] _require_prelocked(uuidkey::Base.PkgId, env::String)
@ Base ./loading.jl:1625
[6] macro expansion
@ ./loading.jl:1613 [inlined]
[7] macro expansion
@ ./lock.jl:267 [inlined]
[8] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1576
[9] include
@ ./Base.jl:457 [inlined]
[10] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
@ Base ./loading.jl:2010
[11] top-level scope
@ stdin:2
in expression starting at /home/user/.julia/packages/LoopVectorization/DDH6Z/src/LoopVectorization.jl:1
in expression starting at stdin:2
ERROR: LoadError: Failed to precompile LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890] to "/home/user/.julia/compiled/v1.9/LoopVectorization/jl_gmjndJ".
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
...
I'm trying to come up with an MWE that accurately describes the issue we are running into with relocatability. Here's my first attempt.
The basic idea is that we want global constants that depend on the specific CPU architecture. As an example:
module Foo
import CpuId
struct IntelCpu end
struct OtherCpu end
const CPU_BRAND = if startswith(CpuId.cpubrand(), "Intel(R) ")
IntelCpu()
else
OtherCpu()
end
do_stuff() = do_stuff(CPU_BRAND)
do_stuff(::IntelCpu) = 1
do_stuff(::OtherCpu) = 1.0
end # module
Unfortunately, the above code is not relocatable. If I compile my package Foo.jl in a sysimage or app using a computer with an Intel CPU, and then I try to move my sysimage or app to a computer with a non-Intel CPU, bad things will happen.
So I figure that because the information I need about the CPU architecture is not available until runtime, I should move that logic into __init__
. So I try this instead:
module Foo
import CpuId
struct IntelCpu end
struct OtherCpu end
do_stuff() = do_stuff(CPU_BRAND)
do_stuff(::IntelCpu) = 1
do_stuff(::OtherCpu) = 1.0
function __init__()
if startswith(CpuId.cpubrand(), "Intel(R) ")
@eval const CPU_BRAND = IntelCpu()
else
@eval const CPU_BRAND = OtherCpu()
end
return nothing
end
end # module
Unfortunately, this will break precompilation. If I have a package Bar.jl that depends on Foo.jl, e.g. this:
module Bar
import Foo
end # module
When I try to do import Bar
, I get this error:
julia> import Bar
[ Info: Precompiling Bar [f4235cf3-1c45-4253-b7f4-6bb3fb59c5c4]
ERROR: LoadError: InitError: Evaluation into the closed module `Foo` breaks incremental compilation because the side effects will not be permanent. This is likely due to some other module mutating `Foo` with `eval` during precompilation - don't do this.
Stacktrace:
[1] eval
@ ./boot.jl:369 [inlined]
[2] __init__()
@ Foo ~/Downloads/MWE-eval/Foo.jl/src/Foo.jl:10
[3] _include_from_serialized(path::String, depmods::Vector{Any})
@ Base ./loading.jl:670
[4] _require_from_serialized(path::String)
@ Base ./loading.jl:723
[5] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1027
[6] require(uuidkey::Base.PkgId)
@ Base ./loading.jl:910
[7] require(into::Module, mod::Symbol)
@ Base ./loading.jl:897
[8] include
@ ./Base.jl:386 [inlined]
[9] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
@ Base ./loading.jl:1209
[10] top-level scope
@ none:1
[11] eval
@ ./boot.jl:369 [inlined]
[12] eval(x::Expr)
@ Base.MainInclude ./client.jl:453
[13] top-level scope
@ none:1
during initialization of module Foo
in expression starting at /Users/dilum/Downloads/MWE-eval/Bar.jl/src/Bar.jl:1
ERROR: Failed to precompile Bar [f4235cf3-1c45-4253-b7f4-6bb3fb59c5c4] to /Users/dilum/.julia/compiled/v1.7/Bar/jl_Q6bC82.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::Base.TTY, internal_stdout::Base.TTY)
@ Base ./loading.jl:1356
[3] compilecache(pkg::Base.PkgId, path::String)
@ Base ./loading.jl:1302
[4] _require(pkg::Base.PkgId)
@ Base ./loading.jl:1017
[5] require(uuidkey::Base.PkgId)
@ Base ./loading.jl:910
[6] require(into::Module, mod::Symbol)
@ Base ./loading.jl:897
Is this a correct description of what we are trying to accomplish here? That is, we need to have global constants like CPU_BRAND
, but we can't define those constants until runtime, and in fact the types of those constants is not known until runtime?
I'm looking into providing support for multichannel colors in ImageFiltering. As you may know, JuliaImages provides real RGB types that encode the color of a pixel without adding an array dimension to do it. Naturally, these aren't natively supported by VectorizationBase. Obviously, I can reinterpret(reshape, Float32 #=or whatever=#, img)
, but everything gets a lot uglier if you have to add array dimensions. I am guessing that VecUnroll
is kind of like a SVector
, is that right? If so, what do the parameters "mean"? Or if that's not the case, is there a good solution for supporting the equivalent of an NTuple{N,T} where T<:NativeTypes
?
ERROR: LoadError: UndefVarError: Contiguous not defined
Stacktrace:
[1] include(::Function, ::Module, ::String) at .\Base.jl:380
[2] include(::Module, ::String) at .\Base.jl:368
[3] top-level scope at none:2
[4] eval at .\boot.jl:331 [inlined]
[5] eval(::Expr) at .\client.jl:467
[6] top-level scope at .\none:3
in expression starting at C:\Users\kool7\.julia\packages\VectorizationBase\qmYqb\src\VectorizationBase.jl:4
I am not 100% sure this is the right place, but the same tests run fine on my older machine but fail on my new MacBook Pro with the M1 chip (so an ARM architecture). The error message reads
ERROR: LoadError: LoadError: type NullAttr has no field size
Stacktrace:
[1] getproperty(::Hwloc.NullAttr, ::Symbol) at ./Base.jl:33
[2] top-level scope at /Users/ronny/.julia/packages/VectorizationBase/26Yla/src/topology.jl:8
[3] include(::Function, ::Module, ::String) at ./Base.jl:380
[4] include at ./Base.jl:368 [inlined]
[5] include(::String) at /Users/ronny/.julia/packages/VectorizationBase/26Yla/src/VectorizationBase.jl:1
[6] top-level scope at /Users/ronny/.julia/packages/VectorizationBase/26Yla/src/VectorizationBase.jl:270
[7] include(::Function, ::Module, ::String) at ./Base.jl:380
[8] include(::Module, ::String) at ./Base.jl:368
[9] top-level scope at none:2
[10] eval at ./boot.jl:331 [inlined]
[11] eval(::Expr) at ./client.jl:467
[12] top-level scope at ./none:3
i.e. I have
ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /Users/ronny/.julia/compiled/v1.5/VectorizationBase/Dto5m_1dTjA.ji.
when actually trying to compile DiffEqBase
(6.44.3). So to recreate this, after ] add DiffEqbase
doing using DiffEqBase
causes this error for example.
If I can provide any further information, let me know.
edit: Oh sorry only checked the ReadMe a little late β maybe the question could also be: Do you plan to support non x86 architectures?
The goal of this is to improve compile and load times.
It also would be nice for libraries wanting to take on smaller dependencies.
Additionally, LoopVectorization itself might no longer depend on llvmcall
at all in the future, but may still want other parts.
The pieces:
@chriselrod , thanks for the quick response on the PR just now.
I'm getting an error that (I think?) is coming from VectorizationBase.jl
Custom logsumexp functions: Error During Test at D:\libraries\julia\dev\ShaleDrillingLikelihood\test\utilities\sum-functions.jl:41
Test threw exception
Expression: bmark β logsumexp!(z1, x)
MethodError: no method matching VectorizationBase.Pointer{Float64}(::Ptr{Float64})
Stacktrace:
[1] macro expansion at D:\libraries\julia\packages\VectorizationBase\KoDSv\src\vectorizable.jl:305 [inlined]
[2] vectorizable at D:\libraries\julia\packages\VectorizationBase\KoDSv\src\vectorizable.jl:297 [inlined]
[3] macro expansion at .\gcutils.jl:189 [inlined]
[4] macro expansion at D:\libraries\julia\packages\LoopVectorization\QozFw\src\LoopVectorization.jl:333 [inlined]
[5] macro expansion at D:\libraries\julia\dev\ShaleDrillingLikelihood\src\utilities\sum-functions.jl:94 [inlined]
[6] logsumexp!(::Array{Float64,1}, ::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}) at D:\libraries\julia\dev\ShaleDrillingLikelihood\src\utilities\sum-functions.jl:84
[7] top-level scope at D:\libraries\julia\dev\ShaleDrillingLikelihood\test\utilities\sum-functions.jl:41
[8] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Test\src\Test.jl:1113
[9] top-level scope at D:\libraries\julia\dev\ShaleDrillingLikelihood\test\utilities\sum-functions.jl:33
Custom logsumexp functions: Error During Test at D:\libraries\julia\dev\ShaleDrillingLikelihood\test\utilities\sum-functions.jl:41
Original code is
"""
logsumexp!(r, x)
Compute `r` = softmax(x) and return `logsumexp(x)`.
Based on code from
https://arxiv.org/pdf/1412.8695.pdf eq 3.8 for p(y)
https://discourse.julialang.org/t/fast-logsumexp/22827/7?u=baggepinnen for stable logsumexp
"""
@generated function logsumexp!(r::AbstractArray{T}, x::AbstractArray{T}) where {T}
quote
n = length(x)
length(r) == n || throw(DimensionMismatch())
isempty(x) && return -T(Inf)
1 == stride1(r) == stride1(x) || throw(error("Arrays not strided"))
u = maximum(x) # max value used to re-center
abs(u) == Inf && return any(isnan, x) ? T(NaN) : u # check for non-finite values
s = zero(T)
@vectorize $T for i = 1:n
tmp = exp(x[i] - u)
r[i] = tmp
s += tmp
end
invs = inv(s)
r .*= invs
return log1p(s-1) + u
end
end
See screenshot
On aarch64 (NVIDIA Jetson Xavier NX), I get
julia> using VectorizationBase
[ Info: Precompiling VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f]
ERROR: LoadError: LoadError: syntax: "$" expression outside quote around /home/user/.julia/packages/VectorizationBase/GlxB2/src/cpu_info_generic.jl:5
julia> using Pkg; Pkg.activate(temp=true); Pkg.add(["StaticArrays", "VectorizationBase"])
Activating new project at `/tmp/jl_es7LWm`
Updating registry at `~/.julia/registries/General.toml`
Resolving package versions...
Updating `/tmp/jl_es7LWm/Project.toml`
[90137ffa] + StaticArrays v1.5.2
[3d5dd08c] + VectorizationBase v0.21.45
...
julia> using StaticArrays
julia> mutable struct HoldsAnSVector
x::SVector{1, Float64}
end
julia> foo = HoldsAnSVector(SVector(1.0))
HoldsAnSVector([1.0])
julia> foo.x = [2.0]
1-element Vector{Float64}:
2.0
julia> using VectorizationBase
julia> foo.x = [3.0]
ERROR: MethodError: no method matching _offset_index(::Tuple{}, ::Tuple{StaticInt{1}})
Closest candidates are:
_offset_index(::Tuple{}, ::Tuple{}) at ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:17
_offset_index(::Tuple{I1}, ::Tuple{I2}) where {I1, I2} at ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:22
_offset_index(::Tuple{I1, I2, Vararg}, ::Tuple{I3}) where {I1, I2, I3} at ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:20
Stacktrace:
[1] offset_index
@ ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:31 [inlined]
[2] linear_index
@ ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:32 [inlined]
[3] _vload
@ ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:41 [inlined]
[4] vload
@ ~/.julia/packages/VectorizationBase/sAHNI/src/llvm_intrin/memory_addr.jl:970 [inlined]
[5] getindex
@ ~/.julia/packages/VectorizationBase/sAHNI/src/special/misc.jl:197 [inlined]
[6] unroll_tuple(a::Vector{Float64}, #unused#::Length{1})
@ StaticArrays ~/.julia/packages/StaticArrays/8Dz3j/src/convert.jl:206
[7] convert
@ ~/.julia/packages/StaticArrays/8Dz3j/src/convert.jl:199 [inlined]
[8] setproperty!(x::HoldsAnSVector, f::Symbol, v::Vector{Float64})
@ Base ./Base.jl:43
[9] top-level scope
@ REPL[7]:1
This causes the CI failures in trixi-framework/Trixi.jl#1202 and trixi-framework/Trixi.jl#1149
From @chriselrod's comment here:
VectorizationBase.jl also needs a lot more tests. It has <50% coverage at the moment.
For context, here is the current code coverage:
In all versions of VectorizationBase since 0.20.13 I get the following InitError when porting the precompiled module between different machines:
ERROR: InitError: TypeError: non-boolean (Nothing) used in boolean context
Stacktrace:
[1] _define_cache(N::Int64, c::NamedTuple{(:size, :linesize, :associativity, :type, :inclusive), Tuple{Int64, Int64, Nothing, Nothing, Nothing}})
@ VectorizationBase ~/.julia/packages/VectorizationBase/VXa02/src/topology.jl:194
[2] redefine_cache(N::Int64)
@ VectorizationBase ~/.julia/packages/VectorizationBase/VXa02/src/topology.jl:220
[3] foreach
@ ./abstractarray.jl:2141 [inlined]
[4] __init__()
@ VectorizationBase ~/.julia/packages/VectorizationBase/VXa02/src/VectorizationBase.jl:391
[5] _include_from_serialized(path::String, depmods::Vector{Any})
@ Base ./loading.jl:674
[6] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
@ Base ./loading.jl:760
[7] _require(pkg::Base.PkgId)
@ Base ./loading.jl:998
[8] require(uuidkey::Base.PkgId)
@ Base ./loading.jl:914
[9] require(into::Module, mod::Symbol)
@ Base ./loading.jl:901
during initialization of module VectorizationBase
If you need more information on the hardware, just let me know.
I'm seeing MethodError: no method matching isless(::Val{UInt8}, ::Int64)
coming from line 167 of llvm_intrin/masks.jl
in VectorizationBase. For example (from https://github.com/mcabbott/Tullio.jl/runs/1542174891):
MethodError: no method matching isless(::Val{UInt8}, ::Int64)
Closest candidates are:
isless(!Matched::Missing, ::Any) at missing.jl:87
isless(!Matched::AbstractFloat, ::Real) at operators.jl:167
isless(!Matched::ForwardDiff.Dual{Tx,V,N} where N where V, ::Integer) where Tx at /home/runner/.julia/packages/ForwardDiff/qTmqf/src/dual.jl:139
...
Stacktrace:
[1] <(::Val{UInt8}, ::Int64) at ./operators.jl:277
[2] <=(::Val{UInt8}, ::Int64) at ./operators.jl:326
[3] mask_type(::Val{UInt8}) at /home/runner/.julia/packages/VectorizationBase/lAowq/src/llvm_intrin/masks.jl:167
[4] Mask at /home/runner/.julia/packages/VectorizationBase/lAowq/src/VectorizationBase.jl:145 [inlined]
[5] Mask{UInt8,U} where U<:Unsigned(::UInt8) at /home/runner/.julia/packages/VectorizationBase/lAowq/src/VectorizationBase.jl:150
[6] top-level scope at /home/runner/work/Tullio.jl/Tullio.jl/test/runtests.jl:217
[7] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[8] top-level scope at /home/runner/work/Tullio.jl/Tullio.jl/test/runtests.jl:217
[9] include(::String) at ./client.jl:457
[10] top-level scope at none:6
[11] eval(::Module, ::Any) at ./boot.jl:331
[12] exec_options(::Base.JLOptions) at ./client.jl:272
[13] _start() at ./client.jl:506
This was with:
[bdcacae8] LoopVectorization v0.9.6
[3d5dd08c] VectorizationBase v0.13.10
Currently, the global constants that are related to CPU topology (VectorizationBase.CACHE_COUNT
, etc.) are defined at module precompilation time. See e.g. https://github.com/chriselrod/VectorizationBase.jl/blob/master/src/topology.jl
Unfortunately, this will make it very difficult for us to allow users to override the values of these constants at run-time, which is required for e.g. the implementation of JuliaLinearAlgebra/Octavian.jl#7
I would describe the implementation of JuliaLinearAlgebra/Octavian.jl#7 as two steps:
VectorizationBase.CACHE_COUNT
and other such global constants at import
-time by defining them in the VectorizationBase.__init__()
function.@chriselrod If you can implement task 1, then I can implement task 2.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.