Comments (8)
I think (but I'm not 100% sure) that what Hwloc calls "Packages" are what we would call sockets.
So I think we should be able to get this info from Hwloc.jl.
from vectorizationbase.jl.
The topology load shows it.
julia> t = topology_load()
D0: L0 P0 Machine
D1: L0 P0 Package
D2: L0 P-1 L3Cache Cache{size=14417920,depth=3,linesize=64,associativity=11,type=Unified}
D3: L0 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L0 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L0 P0 Core
D6: L0 P0 PU
D6: L1 P10 PU
D3: L1 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L1 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L1 P1 Core
D6: L2 P1 PU
D6: L3 P11 PU
D3: L2 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L2 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L2 P2 Core
D6: L4 P2 PU
D6: L5 P12 PU
D3: L3 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L3 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L3 P3 Core
D6: L6 P3 PU
D6: L7 P13 PU
D3: L4 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L4 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L4 P4 Core
D6: L8 P4 PU
D6: L9 P14 PU
D3: L5 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L5 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L5 P8 Core
D6: L10 P5 PU
D6: L11 P15 PU
D3: L6 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L6 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L6 P9 Core
D6: L12 P6 PU
D6: L13 P16 PU
D3: L7 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L7 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L7 P10 Core
D6: L14 P7 PU
D6: L15 P17 PU
D3: L8 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L8 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L8 P11 Core
D6: L16 P8 PU
D6: L17 P18 PU
D3: L9 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L9 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L9 P12 Core
D6: L18 P9 PU
D6: L19 P19 PU
julia> t.type_
:Machine
This computer (:Machine
)
julia> length(t.children)
1
julia> t.children[1].type_
:Package
has 1 CPU (:Package
). That CPU has
julia> t.children[1].children[1].type_
:L3Cache
1 :L3Cache
.
If I look at the hierarchy you posted, I see that the packages also have a single L3 cache, but that there are 2 L3 caches total because there are two packages.
from vectorizationbase.jl.
I'm not 100% sure either, in that maybe packages could mean something else, but that's how I'd implement a "number of sockets" function.
Of course, on a cluster, Hwloc would show multiple machines as well.
EDIT: @DilumAluthge
A processor Package is the physical package that usually gets inserted into a socket on the motherboard. It is also often called a physical processor or a CPU even if these names bring confusion with respect to cores and processing units. A processor package usually contains multiple cores (and may also be composed of multiple dies). hwloc Package objects were called Sockets up to hwloc 1.10.
https://www.open-mpi.org/projects/hwloc/doc/v2.3.0/a00346.php
from vectorizationbase.jl.
What's the value of VectorizationBase.CACHE_COUNT[3]
on the dual-socket Xeon?
The value of course comes from Hwloc.jl
. It's also not distinguishable from the split L3 cache on many Ryzen/Epyc CPUs.
from vectorizationbase.jl.
julia> VectorizationBase.CACHE_COUNT[3]
2
julia> VectorizationBase.CACHE_COUNT
(24, 24, 2, 0)
julia> Hwloc.topology_load()
D0: L0 P0 Machine
D1: L0 P0 Package
D2: L0 P-1 L3Cache Cache{size=20185088,depth=3,linesize=64,associativity=11,type=Unified}
D3: L0 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L0 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L0 P0 Core
D6: L0 P0 PU
D3: L1 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L1 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L1 P1 Core
D6: L1 P1 PU
D3: L2 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L2 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L2 P3 Core
D6: L2 P2 PU
D3: L3 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L3 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L3 P4 Core
D6: L3 P3 PU
D3: L4 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L4 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L4 P5 Core
D6: L4 P4 PU
D3: L5 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L5 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L5 P6 Core
D6: L5 P5 PU
D3: L6 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L6 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L6 P8 Core
D6: L6 P6 PU
D3: L7 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L7 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L7 P9 Core
D6: L7 P7 PU
D3: L8 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L8 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L8 P10 Core
D6: L8 P8 PU
D3: L9 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L9 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L9 P11 Core
D6: L9 P9 PU
D3: L10 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L10 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L10 P12 Core
D6: L10 P10 PU
D3: L11 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L11 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L11 P13 Core
D6: L11 P11 PU
D1: L1 P1 Package
D2: L1 P-1 L3Cache Cache{size=20185088,depth=3,linesize=64,associativity=11,type=Unified}
D3: L12 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L12 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L12 P0 Core
D6: L12 P12 PU
D3: L13 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L13 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L13 P1 Core
D6: L13 P13 PU
D3: L14 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L14 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L14 P3 Core
D6: L14 P14 PU
D3: L15 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L15 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L15 P4 Core
D6: L15 P15 PU
D3: L16 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L16 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L16 P5 Core
D6: L16 P16 PU
D3: L17 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L17 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L17 P6 Core
D6: L17 P17 PU
D3: L18 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L18 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L18 P8 Core
D6: L18 P18 PU
D3: L19 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L19 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L19 P9 Core
D6: L19 P19 PU
D3: L20 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L20 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L20 P10 Core
D6: L20 P20 PU
D3: L21 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L21 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L21 P11 Core
D6: L21 P21 PU
D3: L22 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L22 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L22 P12 Core
D6: L22 P22 PU
D3: L23 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L23 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L23 P13 Core
D6: L23 P23 PU
from vectorizationbase.jl.
And here's lscpu
in full:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 1
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Stepping: 4
CPU MHz: 3299.987
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 5200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 19712K
NUMA node0 CPU(s): 0-11
NUMA node1 CPU(s): 12-23
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp
from vectorizationbase.jl.
Idk if it helps, but here is the output of numastat
:
$ numastat
node0 node1
numa_hit 2236145352 1986775692
numa_miss 0 5776120
numa_foreign 5776120 0
interleave_hit 65823 65606
local_node 2236102907 1986719478
other_node 42445 5832334
And here is the output of numactl --hardware
:
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
node 0 size: 96940 MB
node 0 free: 75997 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23
node 1 size: 98304 MB
node 1 free: 94762 MB
node distances:
node 0 1
0: 10 21
1: 21 10
from vectorizationbase.jl.
hwloc Package objects were called Sockets up to hwloc 1.10.
Perfect!
from vectorizationbase.jl.
Related Issues (20)
- Mask{UInt8}(::UInt8) throws a MethodError: no method matching isless(::Val{UInt8}, ::Int64) HOT 5
- `L1CACHE.linesize` is `nothing` on WSL2 Ubuntu making LoopVectorization.jl fail to precompile HOT 7
- Define `VectorizationBase.CACHE_COUNT`, etc. in the module `__init___()` function HOT 9
- Definition of `const CACHE_LEVELS` causing precompilation to fail on Manjaro Linux HOT 3
- StackOverflowError on VectorizationBase v0.15 HOT 2
- Problem statement/MWE of the relocatability issue HOT 5
- World age errors when using VectorizationBase with `--compiled-modules=no`
- Contiguous not defined error when Precompiling HOT 7
- StackOverflowError: vsub(a::UInt128, b::UInt128) HOT 1
- Win10, error with building VectorizationBase? HOT 11
- Commit "Fast integer ops shouldn't wrap" slowed down loading from PtrArrays with four or more dimensions HOT 3
- InitError when porting precompiled module HOT 1
- Cut VectorizationBase into pieces HOT 2
- UInt256 not defined HOT 1
- Docstring for VecUnroll? HOT 5
- No method matching on 32 bit machine HOT 8
- "this intrinsic must be compiled to be called" HOT 4
- Precompiling VectorizationBase errors in Ubuntu, julia 1.7.2 HOT 5
- VectorizationBase.jl breaks StaticArrays.jl HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vectorizationbase.jl.