Comments (20)
which may explain why I'm seeing those invalidations and others may not?
You shouldn't when starting Julia with -t4
.
If we want to fix these, the solution is to have those libraries / functions stop using num_threads()
.
I don't think the invalidations of num_threads
are avoidable if we want it to maintain the current behavior.
from cpusummary.jl.
Again, I don't think this is an issue of primary importance, but it does seems worth keeping track of for a rainy day. The
perform_step!
recompilation is ~0.5s, so not entirely cheap (but not catastrophic either).
I strongly suspect this is enough to favor Threads.nthreads()
there.
If you or @ChrisRackauckas have a benchmark I can run to confirm negligible runtime difference, I'll do that. I'll also try a few microbenchmarks.
from cpusummary.jl.
Ah, yes. It doesn't actually use Sys.CPU_THREADS
, instead preferring to use information from Hwloc (width adjustments for ARM Macs -- I think Hwloc might have more details to handle things better).
Presumably you won't get invalidations when starting with 12 threads.
Perhaps, in the case of disagreement, I should favor Sys.CPU_THREADS
over the actual number of threads, assuming the disagreement is because of a deliberate user choice.
from cpusummary.jl.
There's https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/.github/workflows/Downstream.yml which is a quick way to setup a bunch of integration tests on subsets of downstream package tests.
from cpusummary.jl.
https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/test/runtests.jl#L5-L17
It just grabs the group and runs the subset of the tests.
from cpusummary.jl.
For things like setting the number of threads, can LV basically do the same thing we do with LLVM multiversioning?
@turbo
could emit a block that starts with
@tturbo
or @turbo threads=true
could do something like that, but we probably only need the check vs 1
.
from cpusummary.jl.
See the good news in SciML/DifferentialEquations.jl#786 (emergency is over 🙂 ). We should still poke at this but it may not beisn't urgent.
from cpusummary.jl.
I did just start Julia with julia -t4
and still saw those invalidations. I'm wondering if there's some compilation-state dependence, and in particular whether it matters whether you build CPUSummary via ] precompile
or via using SomePackageThatForcesItToBuild
.
With master
for Julia and ] dev SnoopCompile SnoopCompileCore
(or just "regular" when 2.8 comes out) it's pretty easy to check:
using SnoopCompileCore
invalidations = @snoopr using OrdinaryDiffEq ModelingToolKit;
using SnoopCompile
trees = invalidation_trees(invalidations)
and then look for a num_threads
tree.
Again, I don't think this is an issue of primary importance, but it does seems worth keeping track of for a rainy day. The perform_step!
recompilation is ~0.5s, so not entirely cheap (but not catastrophic either).
from cpusummary.jl.
using SnoopCompileCore
invalidations = @snoopr using OrdinaryDiffEq, ModelingToolkit;
using SnoopCompile, CPUSummary
trees = invalidation_trees(invalidations);
ctrees = filtermod(CPUSummary, trees)
I get
julia> ctrees = filtermod(CPUSummary, trees)
1-element Vector{SnoopCompile.MethodInvalidations}:
inserting convert(S::Type{<:Union{Number, T}}, p::MultivariatePolynomials.AbstractPolynomialLike{T}) where T in MultivariatePolynomials at /home/chriselrod/.julia/packages/MultivariatePolynomials/vqcb5/src/conversion.jl:65 invalidated:
mt_backedges: 1: signature Tuple{typeof(convert), Type{Hwloc.Attribute}, Any} triggered MethodInstance for CPUSummary.safe_topology_load!() (1 children)
julia> Threads.nthreads(), Sys.CPU_THREADS
(8, 8)
In another Julia session
julia> ctrees = filtermod(CPUSummary, trees)
2-element Vector{SnoopCompile.MethodInvalidations}:
inserting convert(S::Type{<:Union{Number, T}}, p::MultivariatePolynomials.AbstractPolynomialLike{T}) where T in MultivariatePolynomials at /home/chriselrod/.julia/packages/MultivariatePolynomials/vqcb5/src/conversion.jl:65 invalidated:
mt_backedges: 1: signature Tuple{typeof(convert), Type{Hwloc.Attribute}, Any} triggered MethodInstance for CPUSummary.safe_topology_load!() (1 children)
deleting num_threads() in CPUSummary at /home/chriselrod/.julia/packages/CPUSummary/dEmFX/src/topology.jl:42 invalidated:
backedges: 1: superseding num_threads() in CPUSummary at /home/chriselrod/.julia/packages/CPUSummary/dEmFX/src/topology.jl:42 with MethodInstance for CPUSummary.num_threads() (2 children)
julia> Threads.nthreads(), Sys.CPU_THREADS
(1, 8)
(ode) pkg> st CPUSummary
Status `~/Documents/progwork/julia/env/ode/Project.toml`
[2a0fbf3d] CPUSummary v0.1.2
So it appears to be working as intended for me.
from cpusummary.jl.
I have
$ env | grep -i thread
JULIA_CPU_THREADS=4
Is that possibly problematic? This is on a Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz (6 physical cores).
from cpusummary.jl.
At least on nightly, and when starting with -t4
, the only thing that seems to be holding back good precompilation of LV-generated code is the redefinition of cache_size
Line 62 in f5315fc
Line 34 in f5315fc
Not easy for me to fix because method redefinition like this is not "typical" (I recognize you do amazing, atypical things) and I don't know the motivations well enough to offer an alternative.
CC @Tokazama
from cpusummary.jl.
I'd be happy to offer an integration test that checks for new inference when running the precompiled workload for a demo consumer of LoopVectorization. If you want it, just let me know which repo I should submit it to.
from cpusummary.jl.
Not easy for me to fix because method redefinition like this is not "typical" (I recognize you do amazing, atypical things) and I don't know the motivations well enough to offer an alternative.
We should stop doing that.
from cpusummary.jl.
I'd be happy to offer an integration test that checks for new inference when running the precompiled workload for a demo consumer of LoopVectorization. If you want it, just let me know which repo I should submit it to.
Which repo do you think would be best? LoopVectorization.jl itself, or something that depends on it like TriangularSolve.jl or RecursiveFactorization.jl?
from cpusummary.jl.
Probably LV itself. The only issue to be aware of is that tracking down the origin of breakage might require a bit of hunting: if a PR to, say, this package breaks the integration test, then you won't know you've broken it until you next run the tests of LoopVectorization.jl. Unless you like the idea of running that specific test in several of LV's dependencies? You can see somethng similar to what I mean in CodeTracking, which exists to serve Revise:
- https://github.com/timholy/CodeTracking.jl/blob/426fc0e5af69ca410f3a1b77458db5fc5d68e864/test/runtests.jl#L160
- https://github.com/timholy/CodeTracking.jl/blob/426fc0e5af69ca410f3a1b77458db5fc5d68e864/.github/workflows/ci.yml#L45-L56
from cpusummary.jl.
Apologies I'm not sure what the original motivation for redefining it like this was.
from cpusummary.jl.
method redefinition
We should stop doing that.
One thing to check: are you aware that you can use the precompilation process to your advantage? Your package can contain
const some_value_or_type_that_must_be_known_to_inference = begin
# Some complicated computation, calling lots of functions, which may not be inferrable
end
and the only thing that gets written to the .ji
cache file is some_value_or_type_that_must_be_known_to_inference
itself. In other words, that block only runs at precompile time, it doesn't run when you load the package.
Of course, if you need to some things in __init__
, then this won't help.
For things like setting the number of threads, can LV basically do the same thing we do with LLVM multiversioning? @turbo
could emit a block that starts with
if Threads.nthreads() == 1
# single-threaded implementation
elseif Threads.nthreads() = 6 # my laptop has 6 physical cores
# 6-thread implementation
else
@debug "Non-optimized implementation"
# fallback
end
For users who might want to customize the default number (I typically use 4 threads to reserve a couple for something besides Julia) we could use Preferences.
from cpusummary.jl.
@ChrisRackauckas, do you have a link to whatever sits on the opposite side of that workflow? It looks useful but I wasn't sure how to trigger it.
from cpusummary.jl.
https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/test/runtests.jl#L5-L17
It just grabs the group and runs the subset of the tests.
from cpusummary.jl.
Not easy for me to fix because method redefinition like this is not "typical" (I recognize you do amazing, atypical things) and I don't know the motivations well enough to offer an alternative.
You give me too much credit!
I'd meant to start working on cache-based blocking in LoopVectorization, but started working on the rewrite instead.
This was added for that, under the theory it's unlikely to change normally.
Then, more recently, I decided to start redefining L3 cache sizes based on how many threads we have, so code using it won't try to use more than its "share".
This causes invalidations, but is maybe helpful for packages like Octavian.
All that said, one fix was to remove it from LoopVectorization:
JuliaSIMD/LoopVectorization.jl@def5ad1
A second fix was to define the cache as cache per core:
e6f6461
Long term, I'm not overly concerned about this library.
The rewrite will get cache sizes via
https://llvm.org/doxygen/classllvm_1_1TargetTransformInfo.html#a11e8f29aef00ec6b5ffe4bfcc9e965f4
and should hopefully play well with whatever multi-versioning scheme we're using. But we'll see what issues arise when we get there, and that's still a long ways off at the moment.
from cpusummary.jl.
Related Issues (11)
- TagBot trigger issue HOT 38
- InitError when porting precompiled module HOT 2
- CPUSummary.jl v0.1.14 breaks CI of Trixi.jl on skylake-avx512 HOT 8
- Failed to precompile CPUSummary on x86 HOT 1
- Links in README to docs pages do not work HOT 1
- Warning: `cpucores_total()` is deprecated, use `cputhreads()` instead. HOT 1
- Failed to precompile CPUSummary v.0.1.26 HOT 1
- Division error due to CpuId.cpucores() == 0 HOT 1
- Doc ref 404 HOT 1
- Method redefinitions HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cpusummary.jl.