Giter Site home page Giter Site logo

threadingutilities.jl's People

Contributors

chriselrod avatar dependabot[bot] avatar dilumaluthge avatar ranocha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

threadingutilities.jl's Issues

Does not play nicely with `Threads.@spawn` or `Threads.@threads`

Two problems:

  1. Possible race conditions. @tkf suggested a good fix here, which is to have a "job started" stated. Then, if the job hasn't started, ThreadingUtilitilies.wait can steal the task back (note that the overhead on this will be much higher than that of a proper work stealing implementation, which would use jumps).
  2. ThreadingUtilities' tasks currently still spend some time before going to sleep. This gets in the way of Threads.@spawn and Threads.@threads, causing bad performance. It'd be great if there were something they could check to go to sleep immediately if someone used Thread.@spawn or Threads.@threads.

overhead of each task still seems large (much better than @threads tho)

so I managed to implement the multithreaded version of subspace_mm: https://gist.github.com/Roger-luo/0748a53b58c55e4187b545632917e54a

It is indeed now faster than the single thread version now (and much faster than Threads.@threads), however, when I looked into the benchmark of each task, it seems they still have some overheads (roughly 50% of each task's running time), e.g I can manually just run one or a few of the tasks created (my thread pool is of size 12)

initialization
T = Float64
n = 20
S = rand(ComplexF64, 100, 1<<n);
U = rand(ComplexF64, 1<<3, 1<<3);
locs = Locations((1, 3, 5))
subspace = bsubspace(n, locs)
comspace = bcomspace(n, locs)
indices = StrideArray{Int}(undef, (StaticInt{length(comspace)}(), ))
@simd ivdep for i in eachindex(indices)
    indices[i] = comspace[i] + 1
end

D = ArrayInterface.static_length(indices)
U_re = StrideArray{T}(undef, (D, D))
U_im = StrideArray{T}(undef, (D, D))
@inbounds @simd ivdep for i in 1:length(U)
    U_re[i] = real(U[i])
    U_im[i] = imag(U[i])
end

subspace_ref = Ref(subspace)
total = length(subspace)
nthreads = 6
len, rem = divrem(total, nthreads)
f1, l1 = div_thread(1, len, rem)
f2, l2 = div_thread(2, len, rem)
f3, l3 = div_thread(3, len, rem)
f4, l4 = div_thread(4, len, rem)
f5, l5 = div_thread(5, len, rem)
f6, l6 = div_thread(6, len, rem)

@benchmark GC.@preserve S U_re U_im subspace_ref begin
    launch_subspace_mm(1, S, indices, U_re, U_im, subspace_ref, f1, l1)
    launch_subspace_mm(2, S, indices, U_re, U_im, subspace_ref, f2, l2)
    launch_subspace_mm(3, S, indices, U_re, U_im, subspace_ref, f3, l3)
    # launch_subspace_mm(4, S, indices, U_re, U_im, subspace_ref, f4, l4)
    # launch_subspace_mm(5, S, indices, U_re, U_im, subspace_ref, f5, l5)
    # launch_subspace_mm(6, S, indices, U_re, U_im, subspace_ref, f6, l6)
    ThreadingUtilities.wait(1)
    ThreadingUtilities.wait(2)
    ThreadingUtilities.wait(3)
    # ThreadingUtilities.wait(4)
    # ThreadingUtilities.wait(5)
    # ThreadingUtilities.wait(6)
end

one can comment out some of the tasks inside GC.@preserve to see how overheads scale, I'll post the 6 thread scaling here, when I spawn 6 tasks, the total time is 2x of a single task, I guess this is due to the overhead while spawning the task to the thread pool, but it looks strange to me why it scales so significantly, wondering if @chriselrod knows any clue about this overhead, is it due to some sort of cache clash from the implementation itself?

1 launch_subspace_mm:

details
julia> @benchmark GC.@preserve S U_re U_im subspace_ref begin
           launch_subspace_mm(1, S, indices, U_re, U_im, subspace_ref, f1, l1)
           # launch_subspace_mm(2, S, indices, U_re, U_im, subspace_ref, f2, l2)
           # launch_subspace_mm(3, S, indices, U_re, U_im, subspace_ref, f3, l3)
           # launch_subspace_mm(4, S, indices, U_re, U_im, subspace_ref, f4, l4)
           # launch_subspace_mm(5, S, indices, U_re, U_im, subspace_ref, f5, l5)
           # launch_subspace_mm(6, S, indices, U_re, U_im, subspace_ref, f6, l6)
           ThreadingUtilities.wait(1)
           # ThreadingUtilities.wait(2)
           # ThreadingUtilities.wait(3)
           # ThreadingUtilities.wait(4)
           # ThreadingUtilities.wait(5)
           # ThreadingUtilities.wait(6)
       end
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     56.152 ms (0.00% GC)
  median time:      56.464 ms (0.00% GC)
  mean time:        56.552 ms (0.00% GC)
  maximum time:     60.565 ms (0.00% GC)
  --------------
  samples:          89
  evals/sample:     1
  • 2 launch_subspace_mm:
details
julia> @benchmark GC.@preserve S U_re U_im subspace_ref begin
           launch_subspace_mm(1, S, indices, U_re, U_im, subspace_ref, f1, l1)
           launch_subspace_mm(2, S, indices, U_re, U_im, subspace_ref, f2, l2)
           # launch_subspace_mm(3, S, indices, U_re, U_im, subspace_ref, f3, l3)
           # launch_subspace_mm(4, S, indices, U_re, U_im, subspace_ref, f4, l4)
           # launch_subspace_mm(5, S, indices, U_re, U_im, subspace_ref, f5, l5)
           # launch_subspace_mm(6, S, indices, U_re, U_im, subspace_ref, f6, l6)
           ThreadingUtilities.wait(1)
           ThreadingUtilities.wait(2)
           # ThreadingUtilities.wait(3)
           # ThreadingUtilities.wait(4)
           # ThreadingUtilities.wait(5)
           # ThreadingUtilities.wait(6)
       end
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  2
  --------------
  minimum time:     60.429 ms (0.00% GC)
  median time:      61.072 ms (0.00% GC)
  mean time:        61.196 ms (0.00% GC)
  maximum time:     65.528 ms (0.00% GC)
  --------------
  samples:          82
  evals/sample:     1
  • 4 launch_subspace_mm:
details
julia> @benchmark GC.@preserve S U_re U_im subspace_ref begin
           launch_subspace_mm(1, S, indices, U_re, U_im, subspace_ref, f1, l1)
           launch_subspace_mm(2, S, indices, U_re, U_im, subspace_ref, f2, l2)
           launch_subspace_mm(3, S, indices, U_re, U_im, subspace_ref, f3, l3)
           launch_subspace_mm(4, S, indices, U_re, U_im, subspace_ref, f4, l4)
           # launch_subspace_mm(5, S, indices, U_re, U_im, subspace_ref, f5, l5)
           # launch_subspace_mm(6, S, indices, U_re, U_im, subspace_ref, f6, l6)
           ThreadingUtilities.wait(1)
           ThreadingUtilities.wait(2)
           ThreadingUtilities.wait(3)
           ThreadingUtilities.wait(4)
           # ThreadingUtilities.wait(5)
           # ThreadingUtilities.wait(6)
       end
BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  4
  --------------
  minimum time:     85.298 ms (0.00% GC)
  median time:      86.338 ms (0.00% GC)
  mean time:        86.908 ms (0.00% GC)
  maximum time:     95.323 ms (0.00% GC)
  --------------
  samples:          58
  evals/sample:     1
  • 6 launch_subspace_mm:
details
ulia> @benchmark GC.@preserve S U_re U_im subspace_ref begin
           launch_subspace_mm(1, S, indices, U_re, U_im, subspace_ref, f1, l1)
           launch_subspace_mm(2, S, indices, U_re, U_im, subspace_ref, f2, l2)
           launch_subspace_mm(3, S, indices, U_re, U_im, subspace_ref, f3, l3)
           launch_subspace_mm(4, S, indices, U_re, U_im, subspace_ref, f4, l4)
           launch_subspace_mm(5, S, indices, U_re, U_im, subspace_ref, f5, l5)
           launch_subspace_mm(6, S, indices, U_re, U_im, subspace_ref, f6, l6)
           ThreadingUtilities.wait(1)
           ThreadingUtilities.wait(2)
           ThreadingUtilities.wait(3)
           ThreadingUtilities.wait(4)
           ThreadingUtilities.wait(5)
           ThreadingUtilities.wait(6)
       end
BenchmarkTools.Trial: 
  memory estimate:  288 bytes
  allocs estimate:  6
  --------------
  minimum time:     122.614 ms (0.00% GC)
  median time:      123.734 ms (0.00% GC)
  mean time:        123.999 ms (0.00% GC)
  maximum time:     126.589 ms (0.00% GC)
  --------------
  samples:          41
  evals/sample:     1

Missing atomics operation on v1.8

This was introduced on v1.8:

https://buildkite.com/julialang/ordinarydiffeq-dot-jl/builds/1225#0182b0ca-b9b0-4a29-bc10-90bbbab0036d/861-1887

| Got exception outside of a @test
  | MethodError: no method matching _atomic_store!(::Ptr{UInt16}, ::UInt64)
  | Closest candidates are:
  | _atomic_store!(::Ptr{UInt16}, ::UInt16) at /cache/julia-buildkite-plugin/depots/9c4ff4c4-1e2d-49a4-b1ab-2e8221967d27/packages/ThreadingUtilities/k3fsO/src/atomics.jl:10
  | _atomic_store!(::Ptr{UInt64}, ::UInt64) at /cache/julia-buildkite-plugin/depots/9c4ff4c4-1e2d-49a4-b1ab-2e8221967d27/packages/ThreadingUtilities/k3fsO/src/atomics.jl:10

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

ThreadingUtilities.jl is currently not relocatable

Consider this line:

https://github.com/chriselrod/ThreadingUtilities.jl/blob/23434f69c8712d2388088a24abcf9b3a071bfa07/src/threadtasks.jl#L6-L6

This line is evaluated at precompile time. Unfortunately, this means that if you precompile ThreadingUtilities.jl on one machine, you cannot then use it on a different machine with a different value for Sys.CPU_THREADS.

In other words, this line makes ThreadingUtilities not relocatable.

So, for example, you cannot use PackageCompiler.jl to make a relocatable app that uses ThreadingUtilities.

Stable docs do not exist

Hi @chriselrod,

Just clicked on the stable docs in the readme and the seem to not exist. Probably just remove the batch and the link and just point to the dev ones.

Cheers, Tobi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.