Giter Site home page Giter Site logo

vectorizationbase.jl's People

Contributors

astupidbear avatar brenhinkeller avatar carlolucibello avatar chriselrod avatar chriselrod-lilly avatar chrisrackauckas avatar dependabot[bot] avatar dilumaluthge avatar giggleliu avatar github-actions[bot] avatar haampie avatar jaakkor2 avatar jeffreysarnoff avatar johnnychen94 avatar juliatagbot avatar maj0e avatar moble avatar ranocha avatar timholy avatar tknopp avatar tokazama avatar yingboma avatar zentrik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

vectorizationbase.jl's Issues

Help with Performance on Apple M1

As @chriselrod asked in #21, here's the output of

using Libdl
llvmlib = VERSION β‰₯ v"1.6.0-DEV.1429" ? Libdl.dlopen(Base.libllvm_path()) : Libdl.dlopen(only(filter(lib->occursin(r"LLVM\b", basename(lib)), Libdl.dllist())));
gethostcpufeatures = Libdl.dlsym(llvmlib, :LLVMGetHostCPUFeatures);
features_cstring = ccall(gethostcpufeatures, Cstring, ());
features = filter(ext -> (m = match(r"\d", ext); isnothing(m) ? true : m.offset != 2 ), split(unsafe_string(features_cstring), ','));
println(features)

on Julia 1.5.3 on the MacBook Pro Apple M1 (but it should be the same as on the MacBook Air since the M1 I the same as far as I know):

SubString{String}["+sse2", "+cx16", "+sahf", "-tbm", "-avx512ifma", "-sha", "-gfni", "-fma4", "-vpclmulqdq", "-prfchw", "-bmi2", "-cldemote", "-fsgsbase", "-ptwrite", "-xsavec", "+popcnt", "-mpx", "+aes", "-avx512bitalg", "-movdiri", "-xsaves", "-avx512er", "-avx512vnni", "-avx512vpopcntdq", "-pconfig", "-clwb", "-avx512f", "-clzero", "-pku", "+mmx", "-lwp", "-rdpid", "-xop", "-rdseed", "-waitpkg", "-movdir64b", "-sse4a", "-avx512bw", "-clflushopt", "-xsave", "-avx512vbmi2", "-avx512vl", "-invpcid", "-avx512cd", "-avx", "-vaes", "+cx8", "-fma", "-rtm", "-bmi", "-enqcmd", "-rdrnd", "-mwaitx", "+sse4.1", "+sse4.2", "-avx2", "+fxsr", "-wbnoinvd", "+sse", "-lzcnt", "+pclmul", "-prefetchwt1", "-f16c", "+ssse3", "-sgx", "-shstk", "+cmov", "-avx512vbmi", "-avx512bf16", "-movbe", "-xsaveopt", "-avx512dq", "-adx", "-avx512pf", "+sse3"]

to be honest, I am not sure what to do with this, but I was asked to provide it. I can also do that on Julia nightly (1.6-DEV) if that helps.

No method matching on 32 bit machine

I get the following error when using v0.21.18 on a 32 bit machine.

LoadError: MethodError: no method matching vload_transpose_quote(::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Bool, ::Int32, ::Int32, ::UInt64, ::Bool)
  Closest candidates are:
    vload_transpose_quote(::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Int32, ::Bool, ::Int32, ::Int32, ::UInt32, ::Bool) at /home/runner/.julia/packages/VectorizationBase/Q2q04/src/vecunroll/memory.jl:204

Running the same code on 64 bit causes no error.

StackOverflowError on VectorizationBase v0.15

I'm getting the following error when mixing dual numbers & Vecs:

julia> using LoopVectorization, ForwardDiff

julia> function grad!(π›₯x, π›₯β„›::AbstractArray{𝒯}, β„›, x, 𝒢𝓍i=1:3, 𝒢𝓍j=1:3) where 𝒯
           begin
               πœ€xβ€² = ForwardDiff.Dual((zero)(𝒯), ((one)(𝒯), (zero)(𝒯)))
               πœ€x = ForwardDiff.Dual((zero)(𝒯), ((zero)(𝒯), (one)(𝒯)))
           end
           LoopVectorization.@avx for j in 𝒢𝓍j
               for i in 𝒢𝓍i
                   β„› = 1 * (x[i] + πœ€x) + 1000 * (x[j] + πœ€xβ€²)
                   π›₯x[j] = π›₯x[j] + ForwardDiff.partials(β„›, 1) * π›₯β„›[i, j]
                   π›₯x[i] = π›₯x[i] + ForwardDiff.partials(β„›, 2) * π›₯β„›[i, j]
               end
           end
           π›₯x
       end
grad! (generic function with 3 methods)

julia> grad!(zeros(3), ones(3,3), rand(3,3), [1,2,3.0])
ERROR: StackOverflowError:
Stacktrace:
 [1] vfma_fast(a::ForwardDiff.Dual{Nothing, VectorizationBase.Vec{4, Float64}, 2}, b::ForwardDiff.Dual{Nothing, VectorizationBase.Vec{4, Float64}, 2}, c::ForwardDiff.Dual{Nothing, VectorizationBase.Vec{4, Float64}, 2})
   @ VectorizationBase ~/.julia/packages/VectorizationBase/6EQU1/src/base_defs.jl:216

(@v1.7) pkg> st VectorizationBase
Status `~/.julia/environments/v1.7/Project.toml`
  [3d5dd08c] VectorizationBase v0.15.5

This also happens with 0.15.1 but not with v0.14.12. Happens on Julia 1.5 too.

Found in CI here:
https://github.com/mcabbott/Tullio.jl/pull/57/checks?check_run_id=1743880859

"this intrinsic must be compiled to be called"

For a program using LoopVectorization, in VS Code with Julia extension 1.4.3, I get this when I start debug mode, referencing line 340 of stridedpointers.jl (@generated function $f(p1::Core.LLVMPtr{T,0}, p2::Core.LLVMPtr{T,0}) where {T})

Exception has occurred: ErrorException
this intrinsic must be compiled to be called

Stacktrace:
  [1] vle(p1::Core.LLVMPtr{Float64, 0}, p2::Core.LLVMPtr{Float64, 0})
    @ VectorizationBase C:\Users\drood\.julia\packages\VectorizationBase\OEl8L\src\strided_pointers\stridedpointers.jl:340
  [2] vle(p1::Ptr{Float32}, p2::Ptr{Float32}, sp::VectorizationBase.OffsetPrecalc{Float32, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{4}, Int64}, Tuple{Static.StaticInt{0}, Static.StaticInt{0}}, LayoutPointers.StridedPointer{Float32, 2, 1, 0, (1, 2), Tuple{Static.StaticInt{4}, Int64}, Tuple{Static.StaticInt{0}, Static.StaticInt{0}}}, Tuple{Nothing, Tuple{Int64, Int64, Int64}}})
    @ VectorizationBase C:\Users\drood\.julia\packages\VectorizationBase\OEl8L\src\strided_pointers\stridedpointers.jl:346
  [3] _turbo_!(#unused#::Val{(false, 0, 0, 0, false, 8, 32, 15, 64, 32768, 262144, 16777216, 0x0000000000000001)}, #unused#::Val{(:LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000013, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0001, 0x01), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000032, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0002, 0x02), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0003, 0x03), :numericconstant, Symbol("###reduction##zero###17###"), LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000000, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.constant, 0x0004, 0x00), :LoopVectorization, :vfmadd_fast, LoopVectorization.OperationStruct(0x00000000000000000000000000000132, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000100020004, 0x00000000000000000000000000000000, LoopVectorization.compute, 0x0004, 0x00), :LoopVectorization, :reduced_add, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000050003, 0x00000000000000000000000000000000, LoopVectorization.compute, 0x0003, 0x00), :LoopVectorization, :setindex!, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000000006, 0x00000000000000000000000000000000, LoopVectorization.memstore, 0x0005, 0x03))}, #unused#::Val{(LoopVectorization.ArrayRefStruct{:B, Symbol("##vptr##_B")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000103, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101), LoopVectorization.ArrayRefStruct{:C, Symbol("##vptr##_C")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000302, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101), LoopVectorization.ArrayRefStruct{:A, Symbol("##vptr##_A")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000102, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101))}, #unused#::Val{(0, (), (), (), (), ((4, LoopVectorization.IntOrFloat),), ())}, #unused#::Val{(:i, :k, :j)}, #unused#::Val{Tuple{Tuple{CloseOpenIntervals.CloseOpen{Int64, Int64}, CloseOpenIntervals.CloseOpen{Int64, Int64}, CloseOpenIntervals.CloseOpen{Static.StaticInt{0}, Int64}}, Tuple{LayoutPointers.GroupedStridedPointers{Tuple{Ptr{Float32}, Ptr{Float32}, Ptr{Float32}}, (1, 1, 1), (0, 0, 0), ((1, 2), (1, 2), (1, 2)), ((1, 2), (3, 4), (5, 6)), Tuple{Static.StaticInt{4}, Int64, Static.StaticInt{4}, Int64, Static.StaticInt{4}, Int64}, NTuple{6, Static.StaticInt{0}}}}}}, var#arguments#::Tuple{Int64, Int64, Int64, Int64, Int64, Ptr{Float32}, Ptr{Float32}, Ptr{Float32}, Int64, Int64, Int64})
    @ LoopVectorization C:\Users\drood\.julia\packages\LoopVectorization\kVenK\src\reconstruct_loopset.jl:713
  [4] _turbo_!(#unused#::Val{(false, 0, 0, 0, false, 8, 32, 15, 64, 32768, 262144, 16777216, 0x0000000000000007)}, #unused#::Val{(:LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000013, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0001, 0x01), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000032, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0002, 0x02), :LoopVectorization, :getindex, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.memload, 0x0003, 0x03), :numericconstant, Symbol("###reduction##zero###17###"), LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000000, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, LoopVectorization.constant, 0x0004, 0x00), :LoopVectorization, :vfmadd_fast, LoopVectorization.OperationStruct(0x00000000000000000000000000000132, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000100020004, 0x00000000000000000000000000000000, LoopVectorization.compute, 0x0004, 0x00), :LoopVectorization, :reduced_add, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000050003, 0x00000000000000000000000000000000, LoopVectorization.compute, 0x0003, 0x00), :LoopVectorization, :setindex!, LoopVectorization.OperationStruct(0x00000000000000000000000000000012, 0x00000000000000000000000000000003, 0x00000000000000000000000000000000, 0x00000000000000000000000000000006, 0x00000000000000000000000000000000, LoopVectorization.memstore, 0x0005, 0x03))}, #unused#::Val{(LoopVectorization.ArrayRefStruct{:B, Symbol("##vptr##_B")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000103, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101), LoopVectorization.ArrayRefStruct{:C, Symbol("##vptr##_C")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000302, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101), LoopVectorization.ArrayRefStruct{:A, Symbol("##vptr##_A")}(0x00000000000000000000000000000101, 0x00000000000000000000000000000102, 0x00000000000000000000000000000000, 0x00000000000000000000000000000101))}, #unused#::Val{(0, (), (), (), (), ((4, LoopVectorization.IntOrFloat),), ())}, #unused#::Val{(:i, :k, :j)}, #unused#::Val{Tuple{Tuple{CloseOpenIntervals.CloseOpen{Static.StaticInt{0}, Int64}, CloseOpenIntervals.CloseOpen{Static.StaticInt{0}, Int64}, CloseOpenIntervals.CloseOpen{Static.StaticInt{0}, Int64}}, Tuple{LayoutPointers.GroupedStridedPointers{Tuple{Ptr{Float32}, Ptr{Float32}, Ptr{Float32}}, (1, 1, 1), (0, 0, 0), ((1, 2), (1, 2), (1, 2)), ((1, 2), (3, 4), (5, 6)), Tuple{Static.StaticInt{4}, Int64, Static.StaticInt{4}, Int64, Static.StaticInt{4}, Int64}, NTuple{6, Static.StaticInt{0}}}}}}, var#arguments#::Tuple{Int64, Int64, Int64, Ptr{Float32}, Ptr{Float32}, Ptr{Float32}, Int64, Int64, Int64})
    @ LoopVectorization C:\Users\drood\.julia\packages\LoopVectorization\kVenK\src\codegen\lower_threads.jl:652

[stack trace continues, referencing my code]

Results from typing status in the package manager:

[31c24e10] Distributions v0.25.32
[bdcacae8] LoopVectorization v0.12.99
[a2af1166] SortingAlgorithms v1.0.1
[37e2e46d] LinearAlgebra
[de0858da] Printf
[9a3f8284] Random

Improve cpu info for non-x86 architectures

Currently, it uses a generic build script.

This script assumes:

const REGISTER_SIZE = 16
const REGISTER_COUNT = 16
const CACHELINE_SIZE = 64
const SIMD_NATIVE_INTEGERS = true

If any of these are violated, dependent libraries (e.g., LoopVectorization) are likely to produce suboptimal code. If these numbers undershoot, that would just mean some performance is left on the table, but it's likely to perform reasonably well.
If these numbers overshoot, performance consequences could be dire. Register spills galore.

I believe some ARM CPUs do not have SIMD Float64, so perhaps this should be handled somehow.

Ideally, we'd use a library like CpuId.jl to query hardware info, like we do for AMD and Intel.

Commit "Fast integer ops shouldn't wrap" slowed down loading from PtrArrays with four or more dimensions

I tracked down performance regressions observed in trixi-framework/Trixi.jl#509 to the following problem. On AVX2 (and non-AVX512) systems, the commit e2b6ccb "Fast integer ops shouldn't wrap" in VectorizationBase resulted in significant performance regressions when loading data from a PtrArray via getindex.

Using Julia v1.6.1 on an Intel i7 8700K yields the following results for a minimal working example. I'm using StrideArrays v0.1.6 with StrideArraysCore v0.1.5 and LoopVectorization v0.12.12 with the diff

~/.julia/dev/StrideArrays$ git diff
diff --git a/Project.toml b/Project.toml
index 206a58f..bb0e2b0 100644
--- a/Project.toml
+++ b/Project.toml
@@ -18,13 +18,11 @@ VectorizedRNG = "33b4df10-0173-11e9-2a0c-851a7edac40e"
 
 [compat]
 ArrayInterface = "3"
-LoopVectorization = "0.12.13"
 Octavian = "0.2.3"
 SLEEFPirates = "0.6.13"
 Static = "0.2.4"
 StrideArraysCore = "0.1.3"
 ThreadingUtilities = "0.4"
-VectorizationBase = "0.19.32"
 VectorizedRNG = "0.2.8"
 julia = "1.5"

in StrideArrays to be able to check different versions of VectorizationBase.

Last good commit

~/.julia/dev/VectorizationBase$ git checkout cb8185d6a1b4b4ad84f0c539af7b4b39d5b7bb59
Previous HEAD position was e2b6ccb Fast integer ops shouldn't wrap
HEAD is now at cb8185d non small pow2 non-AVX512 mask fixes
(StrideArrays) pkg> up
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General`
  No Changes to `~/.julia/dev/StrideArrays/Project.toml`
  No Changes to `~/.julia/dev/StrideArrays/Manifest.toml`
Precompiling project...
  9 dependencies successfully precompiled in 18 seconds (15 already precompiled)

julia> using StrideArrays, BenchmarkTools

julia> function foo(a::AbstractArray{T,N}) where {T,N}
           idx = ntuple(_ -> 1, Val(N))
           @inbounds res = a[idx...]
           res
       end
foo (generic function with 1 method)

julia> for N in 1:5
           a_array = randn(ntuple(_ -> 4, N)...)
           a_stride     = StrideArray(a_array)
           a_ptr        = PtrArray(pointer(a_array), size(a_array))
           a_ptr_static = PtrArray(pointer(a_array), map(StaticInt, size(a_array)))
           @info "New round" N
           @btime foo($a_array)
           @btime foo($a_stride)
           @btime foo($a_ptr)
           @btime foo($a_ptr_static)
       end
β”Œ Info: New round
β””   N = 1
  1.056 ns (0 allocations: 0 bytes)
  1.079 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
β”Œ Info: New round
β””   N = 2
  1.056 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
β”Œ Info: New round
β””   N = 3
  1.056 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
β”Œ Info: New round
β””   N = 4
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
β”Œ Info: New round
β””   N = 5
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)

First bad commit

~/.julia/dev/VectorizationBase$ git checkout e2b6ccbc9cea28e57b44b17dc864d46217cdd93d
Previous HEAD position was cb8185d non small pow2 non-AVX512 mask fixes
HEAD is now at e2b6ccb Fast integer ops shouldn't wrap
(StrideArrays) pkg> up
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General`
  No Changes to `~/.julia/dev/StrideArrays/Project.toml`
  No Changes to `~/.julia/dev/StrideArrays/Manifest.toml`
Precompiling project...
  9 dependencies successfully precompiled in 18 seconds (15 already precompiled)

julia> using StrideArrays, BenchmarkTools

julia> function foo(a::AbstractArray{T,N}) where {T,N}
           idx = ntuple(_ -> 1, Val(N))
           @inbounds res = a[idx...]
           res
       end
foo (generic function with 1 method)

julia> for N in 1:5
           a_array = randn(ntuple(_ -> 4, N)...)
           a_stride     = StrideArray(a_array)
           a_ptr        = PtrArray(pointer(a_array), size(a_array))
           a_ptr_static = PtrArray(pointer(a_array), map(StaticInt, size(a_array)))
           @info "New round" N
           @btime foo($a_array)
           @btime foo($a_stride)
           @btime foo($a_ptr)
           @btime foo($a_ptr_static)
       end
β”Œ Info: New round
β””   N = 1
  1.056 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
  1.056 ns (0 allocations: 0 bytes)
β”Œ Info: New round
β””   N = 2
  1.056 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
β”Œ Info: New round
β””   N = 3
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
  1.057 ns (0 allocations: 0 bytes)
β”Œ Info: New round
β””   N = 4
  1.057 ns (0 allocations: 0 bytes)
  2.099 ns (0 allocations: 0 bytes)
  2.099 ns (0 allocations: 0 bytes)
  2.104 ns (0 allocations: 0 bytes)
β”Œ Info: New round
β””   N = 5
  1.056 ns (0 allocations: 0 bytes)
  2.099 ns (0 allocations: 0 bytes)
  2.103 ns (0 allocations: 0 bytes)
  2.099 ns (0 allocations: 0 bytes)

Feature request: how many sockets does my machine have?

I'm not sure where this feature request should go? Base Julia, CpuId.jl, Hwloc.jl, VectorizationBase.jl, etc. So I figured I'd open it here and we could discuss where this feature should be implemented.

Anyway, the feature request is this: I would like a way to figure out how many sockets my machine has.

UInt256 not defined

I got the following error, any idea why?

julia> using OrdinaryDiffEq
[ Info: Precompiling OrdinaryDiffEq [1dea7af3-3e70-54e6-95c3-0bf5283fa5ed]
ERROR: LoadError: UndefVarError: UInt256 not defined
Stacktrace:
  [1] mask_type
    @ ~/.julia/packages/VectorizationBase/pTvQj/src/early_definitions.jl:97 [inlined]
  [2] worker_type
    @ ~/.julia/packages/Polyester/f3SSz/src/request.jl:5 [inlined]
  [3] worker_pointer_type
    @ ~/.julia/packages/Polyester/f3SSz/src/request.jl:6 [inlined]
  [4] reserved(id::UInt32)
    @ Polyester ~/.julia/packages/Polyester/f3SSz/src/request.jl:14
  [5] _request_threads
    @ ~/.julia/packages/Polyester/f3SSz/src/request.jl:36 [inlined]
  [6] request_threads
    @ ~/.julia/packages/Polyester/f3SSz/src/request.jl:69 [inlined]
  [7] batch(::Polyester.var"#11#12", ::Tuple{Int64, Int64}, ::Static.StaticInt{1}, ::Static.StaticInt{1})
    @ Polyester ~/.julia/packages/Polyester/f3SSz/src/batch.jl:182

Julia versioninfo

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD EPYC 7702 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

World age errors when using VectorizationBase with `--compiled-modules=no`

Reproducible example:

  1. Start julia with julia --compiled-modules=no
using VectorizationBase
include(joinpath(pkgdir(VectorizationBase), "test", "runtests.jl"))

Sample error:

ERROR: LoadError: LoadError: MethodError: no method matching register_size()
The applicable method may be too new: running in world age 31504, while current world is 34093.
Closest candidates are:
  register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:68 (method too new to be called from this world context.)
  register_size(::Type{T}) where T<:Union{Signed, Unsigned} at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:3 (method too new to be called from this world context.)
  register_size(::Type{T}) where T at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:2 (method too new to be called from this world context.)
Stacktrace:
 [1] dynamic_integer_register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:38
 [2] #s1160#30 at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:65 [inlined]
 [3] #s1160#30(::Any) at ./none:0
 [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
 [5] simd_integer_register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:70
 [6] __pick_vector_width(::Int64, ::Int64, ::Any) at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:39
 [7] _pick_vector_width(::Type{T} where T) at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:53
 [8] #s1160#33 at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:73 [inlined]
 [9] #s1160#33(::Any, ::Any) at ./none:0
 [10] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
 [11] top-level scope at /home/chriselrod/.julia/dev/VectorizationBase/test/testsetup.jl:6
 [12] include(::String) at ./client.jl:457
 [13] top-level scope at /home/chriselrod/.julia/dev/VectorizationBase/test/runtests.jl:4
 [14] include(::String) at ./client.jl:457
 [15] top-level scope at REPL[2]:1
in expression starting at /home/chriselrod/.julia/dev/VectorizationBase/test/testsetup.jl:6
in expression starting at /home/chriselrod/.julia/dev/VectorizationBase/test/runtests.jl:4

Fixing this should hopefully solve
JuliaSIMD/LoopVectorization.jl#192
Or at least be a step towards it. Other libraries depending on VectorizationBase may be using a similar pattern that caused the above.

`L1CACHE.linesize` is `nothing` on WSL2 Ubuntu making LoopVectorization.jl fail to precompile

Not sure if this is a VectorizationBase.jl, LoopVectorization.jl, or Hwloc.jl bug.

L1CACHE.linesize=nothing on my system:

julia> VectorizationBase.L₁CACHE
(size = nothing, depth = nothing, linesize = nothing, associativity = nothing, type = nothing)

This causes LoopVectorization.jl to fail to precompile.

My system is the WSL2, the Windows Subsystem for Linux 2 running Ubuntu-20.04. The /proc/cpuinfo appears normal (happy to post on request), CPU is Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz.

Some more info:

julia> VectorizationBase.CACHE_COUNT
(0, 0, 0, 0)

julia> VectorizationBase.COUNTS
Dict{Symbol,Int64} with 19 entries:
  :Package    => 1
  :Error      => 0
  :PU         => 16
  :OS_Device  => 0
  :L5Cache    => 0
  :L4Cache    => 0
  :I1Cache    => 0
  :L3Cache    => 0
  :Core       => 8
  :Machine    => 1
  :I3Cache    => 0
  :PCI_Device => 0
  :L2Cache    => 0
  :NUMANode   => 0
  :Bridge     => 0
  :Group      => 0
  :Misc       => 0
  :L1Cache    => 0
  :I2Cache    => 0

julia> VectorizationBase.TOPOLOGY
D0: L0 P0 Machine  
    D1: L0 P0 Package  
        D2: L0 P0 Core  
            D3: L0 P0 PU  
            D3: L1 P1 PU  
        D2: L1 P1 Core  
            D3: L2 P2 PU  
            D3: L3 P3 PU  
        D2: L2 P2 Core  
            D3: L4 P4 PU  
            D3: L5 P5 PU  
        D2: L3 P3 Core  
            D3: L6 P6 PU  
            D3: L7 P7 PU  
        D2: L4 P4 Core  
            D3: L8 P8 PU  
            D3: L9 P9 PU  
        D2: L5 P5 Core  
            D3: L10 P10 PU  
            D3: L11 P11 PU  
        D2: L6 P6 Core  
            D3: L12 P12 PU  
            D3: L13 P13 PU  
        D2: L7 P7 Core  
            D3: L14 P14 PU  
            D3: L15 P15 PU  

StackOverflowError: vsub(a::UInt128, b::UInt128)

I observed the following error in some recent CI tests using GitHub actions

  Got exception outside of a @test
  LoadError: StackOverflowError:
  Stacktrace:
   [1] vsub(a::UInt128, b::UInt128) (repeats 79984 times)
     @ VectorizationBase ~/.julia/packages/VectorizationBase/czbgP/src/llvm_intrin/binary_ops.jl:90

We do not use VectorizationBase directly, only LoopVectorization. Sadly, I can't reproduce this error locally using the same test set. Nevertheless,

julia> using Pkg; Pkg.activate(temp=true); Pkg.add("VectorizationBase")
[...]

  [3d5dd08c] + VectorizationBase v0.19.27
[...]

julia> using VectorizationBase

julia> VectorizationBase.vsub(UInt128(1), UInt128(5))
ERROR: StackOverflowError:
Stacktrace:
 [1] vsub(a::UInt128, b::UInt128) (repeats 79984 times)
   @ VectorizationBase ~/.julia/packages/VectorizationBase/czbgP/src/llvm_intrin/binary_ops.jl:90

A question regarding vmap!

Hi Chris,

I have a question regarding the use of vmap! I am wondering whether it is possible to use a function involving the ternary operator in vmap! Below is an example. Thanks very much in advance!

x = rand(10^4); y = rand(10^4); z = similar(x); f(x,y) = x==y ? 2.0 : 0.0 vmap!(f, z, x, y);

Definition of `const CACHE_LEVELS` causing precompilation to fail on Manjaro Linux

System information:

Julia Version: 1.5.3 (2020-11-09)
OS: Manjaro Linux x86_64
Host: MacBookPro11,4 1.0
Kernel: 5.10.2-2-MANJARO
CPU: Intel i7-4980HQ (8) @ 4.000GHz
GPU: Intel Crystal Well

First noticed the issue when trying to precompile DifferentialEquations.jl. Not sure if this is an issue with VectorizationBase.jl or with Hwloc...but the relevant line in /src/topology.jl is as follows:

const CACHE_LEVELS = something(findfirst(isequal(0), CACHE_COUNT) - 1, length(CACHE_COUNT) + 1)

In particular, findfirst(isequal(0), CACHE_COUNT) throws an error if 0 does not occur in CACHE_COUNT.

Here is the output on my machine after adding the lines

@show CACHE_COUNT
@show findfirst(isequal(0), CACHE_COUNT)

to /src/topology.jl:

julia> import VectorizationBase
[ Info: Precompiling VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f]
CACHE_COUNT = (4, 4, 1, 1)
findfirst(isequal(0), CACHE_COUNT) = nothing
ERROR: LoadError: LoadError: MethodError: no method matching -(::Nothing, ::Int64)
Closest candidates are:
  -(::BigInt, ::Union{Int16, Int32, Int64, Int8}) at gmp.jl:532
  -(::Base.CoreLogging.LogLevel, ::Integer) at logging.jl:117
  -(::Missing, ::Number) at missing.jl:115
  ...
Stacktrace:
 [1] top-level scope at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/topology.jl:18
 [2] include(::Function, ::Module, ::String) at ./Base.jl:380
 [3] include at ./Base.jl:368 [inlined]
 [4] include(::String) at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/VectorizationBase.jl:1
 [5] top-level scope at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/VectorizationBase.jl:355
 [6] include(::Function, ::Module, ::String) at ./Base.jl:380
 [7] include(::Module, ::String) at ./Base.jl:368
 [8] top-level scope at none:2
 [9] eval at ./boot.jl:331 [inlined]
 [10] eval(::Expr) at ./client.jl:467
 [11] top-level scope at ./none:3
in expression starting at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/topology.jl:18
in expression starting at /home/dipsticksupreme/.julia/dev/VectorizationBase/src/VectorizationBase.jl:352
ERROR: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /home/dipsticksupreme/.julia/compiled/v1.5/VectorizationBase/Dto5m_zM0pN.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923

Any quick fixes? Please lmk if there's any helpful information I could provide!

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Win10, error with building VectorizationBase?

Error is like this and it happened when I add a package GpABC which I suppose have dependency on VectorizationBase. Do you guys have any suggestions?
PS: Julia version is v1.4.2; XX in C:\Users\XX.julia represents my user name and it's chinese.

   Building VectorizationBase β†’ `C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\deps\build.log`
β”Œ Error: Error building `VectorizationBase`: 
β”‚ ERROR: LoadError: LoadError: InitError: could not load library "C:\Users\XX\AppData\Local\Programs\Julia\Julia-1.4.2\bin\LLVM.dll"
β”‚ The specified module could not be found.
β”‚ Stacktrace:
β”‚  [1] dlopen at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.4\Libdl\src\Libdl.jl:109 
[inlined] (repeats 2 times)
β”‚  [2] (::LLVM.var"#14#cache_fptr!#3")() at C:\Users\XX\.julia\packages\LLVM\KITdB\src\util.jl:103       
β”‚  [3] macro expansion at C:\Users\XX\.julia\packages\LLVM\KITdB\src\util.jl:111 [inlined]
β”‚  [4] runtime_version() at C:\Users\XX\.julia\packages\LLVM\KITdB\src\base.jl:9
β”‚  [5] __init__() at C:\Users\XX\.julia\packages\LLVM\KITdB\src\LLVM.jl:77
β”‚  [6] top-level scope at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\deps\build.jl:6
β”‚  [7] top-level scope at none:5
β”‚ during initialization of module LLVM
β”‚ in expression starting at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\deps\build_x86.jl:1      
β”‚ in expression starting at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\deps\build.jl:4
β”” @ Pkg.Operations D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.4\Pkg\src\Operations.jl:899
   Building Conda ────────────→ `C:\Users\XX\.julia\packages\Conda\3rPhK\deps\build.log`
   Building FFTW ─────────────→ `C:\Users\XX\.julia\packages\FFTW\kcXL6\deps\build.log`
   Building SLEEFPirates ─────→ `C:\Users\XX\.julia\packages\SLEEFPirates\kmfoV\deps\build.log`
β”Œ Error: Error building `SLEEFPirates`: 
β”‚ ERROR: LoadError: "File C:\\Users\\XX\\.julia\\packages\\VectorizationBase\\LiMxH\\src\\cpu_info.jl does not exist. Please run `using Pkg; Pkg.build()`."
β”‚ Stacktrace:
β”‚  [1] top-level scope at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\src\VectorizationBase.jl:3 
β”‚  [2] top-level scope at none:2
β”‚  [3] eval at .\boot.jl:331 [inlined]
β”‚ in expression starting at C:\Users\XX\.julia\packages\VectorizationBase\LiMxH\src\VectorizationBase.jl:3
β”‚ ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to C:\Users\XX\.julia\compiled\v1.4\VectorizationBase\Dto5m_13fSc.ji.
β”‚ Stacktrace:
β”‚  [1] top-level scope at none:5
β”‚ in expression starting at C:\Users\XX\.julia\packages\SLEEFPirates\kmfoV\deps\build.jl:1
β”” @ Pkg.Operations D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.4\Pkg\src\Operations.jl:899

Precompiling VectorizationBase errors in Ubuntu, julia 1.7.2

Hi! I get a precompilation error when importing VectorizationBase. Originally I got this error when trying to install DifferentialEquations.jl, but I reproduce the error just adding VectorizationBase. I can reproduce it with julia 1.7.2 or 1.6.5 in Ubuntu, Below the stacktrace (for julia 1.7.2, for 1.6.5 I get the same):

julia> using VectorizationBase
[ Info: Precompiling VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f]
ERROR: LoadError: UndefVarError: num_cache_levels not defined
Stacktrace:
 [1] include
   @ ./Base.jl:418 [inlined]
 [2] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
   @ Base ./loading.jl:1318
 [3] top-level scope
   @ none:1
 [4] eval
   @ ./boot.jl:373 [inlined]
 [5] eval(x::Expr)
   @ Base.MainInclude ./client.jl:453
 [6] top-level scope
   @ none:1
in expression starting at /home/ismael/.julia/packages/VectorizationBase/yDGcX/src/VectorizationBase.jl:1
ERROR: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /home/ismael/.julia/compiled/v1.7/VectorizationBase/jl_voXyGg.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, ignore_loaded_modules::Bool)
   @ Base ./loading.jl:1466
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1410
 [4] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1120
 [5] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:1013
 [6] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:997

Any ideas? Thanks in advance!

Precompilation breaks for non-native target

Hi. I'm trying to do something weird, not really sure how it works, but it seems this package is the only one where an issue appears, so I'm hoping I might at least learn a bit more about how things work.

I have a Julia application that runs in the cloud. My local development machine is not the exact same Sys.CUP_NAME as the remote machine. I believe that as a result, even thought I have my app pre-compiled in a Docker container, it needs to be pre-compiled again when it gets deployed.

I was hoping that setting julia -C core-avx2 or something like that might allow me to easily circumvent the issue. When I try doing that, though, I run into this error during pre-compilation. Is there any work around that, or is it not really reasonable?

ERROR: LoadError: InitError: Evaluation into the closed module `HostCPUFeatures` breaks incremental compilation because the side effects will not be permanent. This is likely due to some other module mutating `HostCPUFeatures` with `eval` during precompilation - don't do this.
Stacktrace:
  [1] eval
    @ ./boot.jl:370 [inlined]
  [2] setfeaturefalse(s::Symbol)
    @ HostCPUFeatures ~/.julia/packages/HostCPUFeatures/9sAqs/src/cpu_info_x86.jl:36
  [3] make_generic(target::String)
    @ HostCPUFeatures ~/.julia/packages/HostCPUFeatures/9sAqs/src/cpu_info_x86.jl:73
  [4] __init__()
    @ HostCPUFeatures ~/.julia/packages/HostCPUFeatures/9sAqs/src/HostCPUFeatures.jl:45
  [5] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
    @ Base ./loading.jl:1074
  [6] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})
    @ Base ./loading.jl:1020
  [7] _tryrequire_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String)
    @ Base ./loading.jl:1407
  [8] _require(pkg::Base.PkgId, env::String)
    @ Base ./loading.jl:1781
  [9] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:1625
 [10] macro expansion
    @ ./loading.jl:1613 [inlined]
 [11] macro expansion
    @ ./lock.jl:267 [inlined]
 [12] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1576
 [13] include
    @ ./Base.jl:457 [inlined]
 [14] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
    @ Base ./loading.jl:2010
 [15] top-level scope
    @ stdin:2
during initialization of module HostCPUFeatures
in expression starting at /home/user/.julia/packages/VectorizationBase/e4FnQ/src/VectorizationBase.jl:1
in expression starting at stdin:2
ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to "/home/user/.julia/compiled/v1.9/VectorizationBase/jl_JLOYjx".
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool)
    @ Base ./loading.jl:2260
  [3] compilecache
    @ ./loading.jl:2127 [inlined]
  [4] _require(pkg::Base.PkgId, env::String)
    @ Base ./loading.jl:1770
  [5] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:1625
  [6] macro expansion
    @ ./loading.jl:1613 [inlined]
  [7] macro expansion
    @ ./lock.jl:267 [inlined]
  [8] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1576
  [9] include
    @ ./Base.jl:457 [inlined]
 [10] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
    @ Base ./loading.jl:2010
 [11] top-level scope
    @ stdin:2
in expression starting at /home/user/.julia/packages/LoopVectorization/DDH6Z/src/LoopVectorization.jl:1
in expression starting at stdin:2
ERROR: LoadError: Failed to precompile LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890] to "/home/user/.julia/compiled/v1.9/LoopVectorization/jl_gmjndJ".
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
...

Problem statement/MWE of the relocatability issue

I'm trying to come up with an MWE that accurately describes the issue we are running into with relocatability. Here's my first attempt.


The basic idea is that we want global constants that depend on the specific CPU architecture. As an example:

module Foo

import CpuId

struct IntelCpu end
struct OtherCpu end

const CPU_BRAND = if startswith(CpuId.cpubrand(), "Intel(R) ")
    IntelCpu()
else
    OtherCpu()
end

do_stuff() = do_stuff(CPU_BRAND)
do_stuff(::IntelCpu) = 1
do_stuff(::OtherCpu) = 1.0 

end # module

Unfortunately, the above code is not relocatable. If I compile my package Foo.jl in a sysimage or app using a computer with an Intel CPU, and then I try to move my sysimage or app to a computer with a non-Intel CPU, bad things will happen.

So I figure that because the information I need about the CPU architecture is not available until runtime, I should move that logic into __init__. So I try this instead:

module Foo

import CpuId

struct IntelCpu end
struct OtherCpu end

do_stuff() = do_stuff(CPU_BRAND)
do_stuff(::IntelCpu) = 1
do_stuff(::OtherCpu) = 1.0 

function __init__()
    if startswith(CpuId.cpubrand(), "Intel(R) ")
        @eval const CPU_BRAND = IntelCpu()
    else
        @eval const CPU_BRAND = OtherCpu()
    end
    return nothing
end

end # module

Unfortunately, this will break precompilation. If I have a package Bar.jl that depends on Foo.jl, e.g. this:

module Bar

import Foo

end # module

When I try to do import Bar, I get this error:

julia> import Bar
[ Info: Precompiling Bar [f4235cf3-1c45-4253-b7f4-6bb3fb59c5c4]
ERROR: LoadError: InitError: Evaluation into the closed module `Foo` breaks incremental compilation because the side effects will not be permanent. This is likely due to some other module mutating `Foo` with `eval` during precompilation - don't do this.
Stacktrace:
  [1] eval
    @ ./boot.jl:369 [inlined]
  [2] __init__()
    @ Foo ~/Downloads/MWE-eval/Foo.jl/src/Foo.jl:10
  [3] _include_from_serialized(path::String, depmods::Vector{Any})
    @ Base ./loading.jl:670
  [4] _require_from_serialized(path::String)
    @ Base ./loading.jl:723
  [5] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1027
  [6] require(uuidkey::Base.PkgId)
    @ Base ./loading.jl:910
  [7] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:897
  [8] include
    @ ./Base.jl:386 [inlined]
  [9] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
    @ Base ./loading.jl:1209
 [10] top-level scope
    @ none:1
 [11] eval
    @ ./boot.jl:369 [inlined]
 [12] eval(x::Expr)
    @ Base.MainInclude ./client.jl:453
 [13] top-level scope
    @ none:1
during initialization of module Foo
in expression starting at /Users/dilum/Downloads/MWE-eval/Bar.jl/src/Bar.jl:1
ERROR: Failed to precompile Bar [f4235cf3-1c45-4253-b7f4-6bb3fb59c5c4] to /Users/dilum/.julia/compiled/v1.7/Bar/jl_Q6bC82.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::Base.TTY, internal_stdout::Base.TTY)
   @ Base ./loading.jl:1356
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1302
 [4] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1017
 [5] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:910
 [6] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:897

Is this a correct description of what we are trying to accomplish here? That is, we need to have global constants like CPU_BRAND, but we can't define those constants until runtime, and in fact the types of those constants is not known until runtime?

Docstring for VecUnroll?

I'm looking into providing support for multichannel colors in ImageFiltering. As you may know, JuliaImages provides real RGB types that encode the color of a pixel without adding an array dimension to do it. Naturally, these aren't natively supported by VectorizationBase. Obviously, I can reinterpret(reshape, Float32 #=or whatever=#, img), but everything gets a lot uglier if you have to add array dimensions. I am guessing that VecUnroll is kind of like a SVector, is that right? If so, what do the parameters "mean"? Or if that's not the case, is there a good solution for supporting the equivalent of an NTuple{N,T} where T<:NativeTypes?

Contiguous not defined error when Precompiling

ERROR: LoadError: UndefVarError: Contiguous not defined
Stacktrace:
 [1] include(::Function, ::Module, ::String) at .\Base.jl:380
 [2] include(::Module, ::String) at .\Base.jl:368
 [3] top-level scope at none:2
 [4] eval at .\boot.jl:331 [inlined]
 [5] eval(::Expr) at .\client.jl:467
 [6] top-level scope at .\none:3
in expression starting at C:\Users\kool7\.julia\packages\VectorizationBase\qmYqb\src\VectorizationBase.jl:4

LoadError: type NullAttr has no field size on Apple M1 (ARM) architecture

I am not 100% sure this is the right place, but the same tests run fine on my older machine but fail on my new MacBook Pro with the M1 chip (so an ARM architecture). The error message reads

ERROR: LoadError: LoadError: type NullAttr has no field size
Stacktrace:
 [1] getproperty(::Hwloc.NullAttr, ::Symbol) at ./Base.jl:33
 [2] top-level scope at /Users/ronny/.julia/packages/VectorizationBase/26Yla/src/topology.jl:8
 [3] include(::Function, ::Module, ::String) at ./Base.jl:380
 [4] include at ./Base.jl:368 [inlined]
 [5] include(::String) at /Users/ronny/.julia/packages/VectorizationBase/26Yla/src/VectorizationBase.jl:1
 [6] top-level scope at /Users/ronny/.julia/packages/VectorizationBase/26Yla/src/VectorizationBase.jl:270
 [7] include(::Function, ::Module, ::String) at ./Base.jl:380
 [8] include(::Module, ::String) at ./Base.jl:368
 [9] top-level scope at none:2
 [10] eval at ./boot.jl:331 [inlined]
 [11] eval(::Expr) at ./client.jl:467
 [12] top-level scope at ./none:3

i.e. I have

ERROR: LoadError: Failed to precompile VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f] to /Users/ronny/.julia/compiled/v1.5/VectorizationBase/Dto5m_1dTjA.ji.

when actually trying to compile DiffEqBase (6.44.3). So to recreate this, after ] add DiffEqbase doing using DiffEqBase causes this error for example.

If I can provide any further information, let me know.

edit: Oh sorry only checked the ReadMe a little late – maybe the question could also be: Do you plan to support non x86 architectures?

Cut VectorizationBase into pieces

The goal of this is to improve compile and load times.
It also would be nice for libraries wanting to take on smaller dependencies.

Additionally, LoopVectorization itself might no longer depend on llvmcall at all in the future, but may still want other parts.

The pieces:

  • LLVMCalls
  • Hardware info
  • StridedPointers
  • VectorizationBase

no method matching VectorizationBase.Pointer{Float64}

@chriselrod , thanks for the quick response on the PR just now.

I'm getting an error that (I think?) is coming from VectorizationBase.jl

Custom logsumexp functions: Error During Test at D:\libraries\julia\dev\ShaleDrillingLikelihood\test\utilities\sum-functions.jl:41
  Test threw exception
  Expression: bmark β‰ˆ logsumexp!(z1, x)
  MethodError: no method matching VectorizationBase.Pointer{Float64}(::Ptr{Float64})
  Stacktrace:
   [1] macro expansion at D:\libraries\julia\packages\VectorizationBase\KoDSv\src\vectorizable.jl:305 [inlined]
   [2] vectorizable at D:\libraries\julia\packages\VectorizationBase\KoDSv\src\vectorizable.jl:297 [inlined]
   [3] macro expansion at .\gcutils.jl:189 [inlined]
   [4] macro expansion at D:\libraries\julia\packages\LoopVectorization\QozFw\src\LoopVectorization.jl:333 [inlined]
   [5] macro expansion at D:\libraries\julia\dev\ShaleDrillingLikelihood\src\utilities\sum-functions.jl:94 [inlined]
   [6] logsumexp!(::Array{Float64,1}, ::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}) at D:\libraries\julia\dev\ShaleDrillingLikelihood\src\utilities\sum-functions.jl:84
   [7] top-level scope at D:\libraries\julia\dev\ShaleDrillingLikelihood\test\utilities\sum-functions.jl:41
   [8] top-level scope at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Test\src\Test.jl:1113
   [9] top-level scope at D:\libraries\julia\dev\ShaleDrillingLikelihood\test\utilities\sum-functions.jl:33

Custom logsumexp functions: Error During Test at D:\libraries\julia\dev\ShaleDrillingLikelihood\test\utilities\sum-functions.jl:41

Original code is

"""
    logsumexp!(r, x)

Compute `r` = softmax(x) and return `logsumexp(x)`.

Based on code from
https://arxiv.org/pdf/1412.8695.pdf eq 3.8 for p(y)
https://discourse.julialang.org/t/fast-logsumexp/22827/7?u=baggepinnen for stable logsumexp
"""
@generated function logsumexp!(r::AbstractArray{T}, x::AbstractArray{T}) where {T}
    quote
        n = length(x)
        length(r) == n || throw(DimensionMismatch())
        isempty(x) && return -T(Inf)
        1 == stride1(r) == stride1(x) || throw(error("Arrays not strided"))

        u = maximum(x)                                       # max value used to re-center
        abs(u) == Inf && return any(isnan, x) ? T(NaN) : u   # check for non-finite values

        s = zero(T)
        @vectorize $T for i = 1:n
            tmp = exp(x[i] - u)
            r[i] = tmp
            s += tmp
        end

        invs = inv(s)
        r .*= invs

        return log1p(s-1) + u
    end
end

See screenshot

image

cpu_info_generic.jl seems broken on ARM

On aarch64 (NVIDIA Jetson Xavier NX), I get

julia> using VectorizationBase
[ Info: Precompiling VectorizationBase [3d5dd08c-fd9d-11e8-17fa-ed2836048c2f]
ERROR: LoadError: LoadError: syntax: "$" expression outside quote around /home/user/.julia/packages/VectorizationBase/GlxB2/src/cpu_info_generic.jl:5

VectorizationBase.jl breaks StaticArrays.jl

julia> using Pkg; Pkg.activate(temp=true); Pkg.add(["StaticArrays", "VectorizationBase"])
  Activating new project at `/tmp/jl_es7LWm`
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
    Updating `/tmp/jl_es7LWm/Project.toml`
  [90137ffa] + StaticArrays v1.5.2
  [3d5dd08c] + VectorizationBase v0.21.45
...

julia> using StaticArrays

julia> mutable struct HoldsAnSVector
           x::SVector{1, Float64}
       end

julia> foo = HoldsAnSVector(SVector(1.0))
HoldsAnSVector([1.0])

julia> foo.x = [2.0]
1-element Vector{Float64}:
 2.0

julia> using VectorizationBase

julia> foo.x = [3.0]
ERROR: MethodError: no method matching _offset_index(::Tuple{}, ::Tuple{StaticInt{1}})
Closest candidates are:
  _offset_index(::Tuple{}, ::Tuple{}) at ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:17
  _offset_index(::Tuple{I1}, ::Tuple{I2}) where {I1, I2} at ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:22
  _offset_index(::Tuple{I1, I2, Vararg}, ::Tuple{I3}) where {I1, I2, I3} at ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:20
Stacktrace:
 [1] offset_index
   @ ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:31 [inlined]
 [2] linear_index
   @ ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:32 [inlined]
 [3] _vload
   @ ~/.julia/packages/VectorizationBase/sAHNI/src/strided_pointers/stridedpointers.jl:41 [inlined]
 [4] vload
   @ ~/.julia/packages/VectorizationBase/sAHNI/src/llvm_intrin/memory_addr.jl:970 [inlined]
 [5] getindex
   @ ~/.julia/packages/VectorizationBase/sAHNI/src/special/misc.jl:197 [inlined]
 [6] unroll_tuple(a::Vector{Float64}, #unused#::Length{1})
   @ StaticArrays ~/.julia/packages/StaticArrays/8Dz3j/src/convert.jl:206
 [7] convert
   @ ~/.julia/packages/StaticArrays/8Dz3j/src/convert.jl:199 [inlined]
 [8] setproperty!(x::HoldsAnSVector, f::Symbol, v::Vector{Float64})
   @ Base ./Base.jl:43
 [9] top-level scope
   @ REPL[7]:1

This causes the CI failures in trixi-framework/Trixi.jl#1202 and trixi-framework/Trixi.jl#1149

InitError when porting precompiled module

In all versions of VectorizationBase since 0.20.13 I get the following InitError when porting the precompiled module between different machines:

ERROR: InitError: TypeError: non-boolean (Nothing) used in boolean context
Stacktrace:
 [1] _define_cache(N::Int64, c::NamedTuple{(:size, :linesize, :associativity, :type, :inclusive), Tuple{Int64, Int64, Nothing, Nothing, Nothing}})
   @ VectorizationBase ~/.julia/packages/VectorizationBase/VXa02/src/topology.jl:194
 [2] redefine_cache(N::Int64)
   @ VectorizationBase ~/.julia/packages/VectorizationBase/VXa02/src/topology.jl:220
 [3] foreach
   @ ./abstractarray.jl:2141 [inlined]
 [4] __init__()
   @ VectorizationBase ~/.julia/packages/VectorizationBase/VXa02/src/VectorizationBase.jl:391
 [5] _include_from_serialized(path::String, depmods::Vector{Any})
   @ Base ./loading.jl:674
 [6] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
   @ Base ./loading.jl:760
 [7] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:998
 [8] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:914
 [9] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:901
during initialization of module VectorizationBase

If you need more information on the hardware, just let me know.

Mask{UInt8}(::UInt8) throws a MethodError: no method matching isless(::Val{UInt8}, ::Int64)

I'm seeing MethodError: no method matching isless(::Val{UInt8}, ::Int64) coming from line 167 of llvm_intrin/masks.jl in VectorizationBase. For example (from https://github.com/mcabbott/Tullio.jl/runs/1542174891):

  MethodError: no method matching isless(::Val{UInt8}, ::Int64)
  Closest candidates are:
    isless(!Matched::Missing, ::Any) at missing.jl:87
    isless(!Matched::AbstractFloat, ::Real) at operators.jl:167
    isless(!Matched::ForwardDiff.Dual{Tx,V,N} where N where V, ::Integer) where Tx at /home/runner/.julia/packages/ForwardDiff/qTmqf/src/dual.jl:139
    ...
  Stacktrace:
   [1] <(::Val{UInt8}, ::Int64) at ./operators.jl:277
   [2] <=(::Val{UInt8}, ::Int64) at ./operators.jl:326
   [3] mask_type(::Val{UInt8}) at /home/runner/.julia/packages/VectorizationBase/lAowq/src/llvm_intrin/masks.jl:167
   [4] Mask at /home/runner/.julia/packages/VectorizationBase/lAowq/src/VectorizationBase.jl:145 [inlined]
   [5] Mask{UInt8,U} where U<:Unsigned(::UInt8) at /home/runner/.julia/packages/VectorizationBase/lAowq/src/VectorizationBase.jl:150
   [6] top-level scope at /home/runner/work/Tullio.jl/Tullio.jl/test/runtests.jl:217
   [7] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
   [8] top-level scope at /home/runner/work/Tullio.jl/Tullio.jl/test/runtests.jl:217
   [9] include(::String) at ./client.jl:457
   [10] top-level scope at none:6
   [11] eval(::Module, ::Any) at ./boot.jl:331
   [12] exec_options(::Base.JLOptions) at ./client.jl:272
   [13] _start() at ./client.jl:506

This was with:

  [bdcacae8] LoopVectorization v0.9.6
  [3d5dd08c] VectorizationBase v0.13.10

Define `VectorizationBase.CACHE_COUNT`, etc. in the module `__init___()` function

Currently, the global constants that are related to CPU topology (VectorizationBase.CACHE_COUNT, etc.) are defined at module precompilation time. See e.g. https://github.com/chriselrod/VectorizationBase.jl/blob/master/src/topology.jl

Unfortunately, this will make it very difficult for us to allow users to override the values of these constants at run-time, which is required for e.g. the implementation of JuliaLinearAlgebra/Octavian.jl#7

I would describe the implementation of JuliaLinearAlgebra/Octavian.jl#7 as two steps:

  1. Define VectorizationBase.CACHE_COUNT and other such global constants at import-time by defining them in the VectorizationBase.__init__() function.
  2. Add the feature that allows users to spoof CPU features

@chriselrod If you can implement task 1, then I can implement task 2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.