Giter Site home page Giter Site logo

Comments (8)

DilumAluthge avatar DilumAluthge commented on August 19, 2024 1

I think (but I'm not 100% sure) that what Hwloc calls "Packages" are what we would call sockets.

So I think we should be able to get this info from Hwloc.jl.

from vectorizationbase.jl.

chriselrod avatar chriselrod commented on August 19, 2024 1

The topology load shows it.

julia> t = topology_load()
D0: L0 P0 Machine
    D1: L0 P0 Package
        D2: L0 P-1 L3Cache  Cache{size=14417920,depth=3,linesize=64,associativity=11,type=Unified}
            D3: L0 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L0 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L0 P0 Core
                        D6: L0 P0 PU
                        D6: L1 P10 PU
            D3: L1 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L1 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L1 P1 Core
                        D6: L2 P1 PU
                        D6: L3 P11 PU
            D3: L2 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L2 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L2 P2 Core
                        D6: L4 P2 PU
                        D6: L5 P12 PU
            D3: L3 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L3 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L3 P3 Core
                        D6: L6 P3 PU
                        D6: L7 P13 PU
            D3: L4 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L4 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L4 P4 Core
                        D6: L8 P4 PU
                        D6: L9 P14 PU
            D3: L5 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L5 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L5 P8 Core
                        D6: L10 P5 PU
                        D6: L11 P15 PU
            D3: L6 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L6 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L6 P9 Core
                        D6: L12 P6 PU
                        D6: L13 P16 PU
            D3: L7 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L7 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L7 P10 Core
                        D6: L14 P7 PU
                        D6: L15 P17 PU
            D3: L8 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L8 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L8 P11 Core
                        D6: L16 P8 PU
                        D6: L17 P18 PU
            D3: L9 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L9 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L9 P12 Core
                        D6: L18 P9 PU
                        D6: L19 P19 PU


julia> t.type_
:Machine

This computer (:Machine)

julia> length(t.children)
1

julia> t.children[1].type_
:Package

has 1 CPU (:Package). That CPU has

julia> t.children[1].children[1].type_
:L3Cache

1 :L3Cache.

If I look at the hierarchy you posted, I see that the packages also have a single L3 cache, but that there are 2 L3 caches total because there are two packages.

from vectorizationbase.jl.

chriselrod avatar chriselrod commented on August 19, 2024 1

I'm not 100% sure either, in that maybe packages could mean something else, but that's how I'd implement a "number of sockets" function.
Of course, on a cluster, Hwloc would show multiple machines as well.

EDIT: @DilumAluthge

A processor Package is the physical package that usually gets inserted into a socket on the motherboard. It is also often called a physical processor or a CPU even if these names bring confusion with respect to cores and processing units. A processor package usually contains multiple cores (and may also be composed of multiple dies). hwloc Package objects were called Sockets up to hwloc 1.10.

https://www.open-mpi.org/projects/hwloc/doc/v2.3.0/a00346.php

from vectorizationbase.jl.

chriselrod avatar chriselrod commented on August 19, 2024

What's the value of VectorizationBase.CACHE_COUNT[3] on the dual-socket Xeon?
The value of course comes from Hwloc.jl. It's also not distinguishable from the split L3 cache on many Ryzen/Epyc CPUs.

from vectorizationbase.jl.

DilumAluthge avatar DilumAluthge commented on August 19, 2024
julia> VectorizationBase.CACHE_COUNT[3]
2

julia> VectorizationBase.CACHE_COUNT
(24, 24, 2, 0)

julia> Hwloc.topology_load()
D0: L0 P0 Machine
    D1: L0 P0 Package
        D2: L0 P-1 L3Cache  Cache{size=20185088,depth=3,linesize=64,associativity=11,type=Unified}
            D3: L0 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L0 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L0 P0 Core
                        D6: L0 P0 PU
            D3: L1 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L1 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L1 P1 Core
                        D6: L1 P1 PU
            D3: L2 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L2 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L2 P3 Core
                        D6: L2 P2 PU
            D3: L3 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L3 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L3 P4 Core
                        D6: L3 P3 PU
            D3: L4 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L4 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L4 P5 Core
                        D6: L4 P4 PU
            D3: L5 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L5 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L5 P6 Core
                        D6: L5 P5 PU
            D3: L6 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L6 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L6 P8 Core
                        D6: L6 P6 PU
            D3: L7 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L7 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L7 P9 Core
                        D6: L7 P7 PU
            D3: L8 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L8 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L8 P10 Core
                        D6: L8 P8 PU
            D3: L9 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L9 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L9 P11 Core
                        D6: L9 P9 PU
            D3: L10 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L10 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L10 P12 Core
                        D6: L10 P10 PU
            D3: L11 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L11 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L11 P13 Core
                        D6: L11 P11 PU
    D1: L1 P1 Package
        D2: L1 P-1 L3Cache  Cache{size=20185088,depth=3,linesize=64,associativity=11,type=Unified}
            D3: L12 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L12 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L12 P0 Core
                        D6: L12 P12 PU
            D3: L13 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L13 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L13 P1 Core
                        D6: L13 P13 PU
            D3: L14 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L14 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L14 P3 Core
                        D6: L14 P14 PU
            D3: L15 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L15 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L15 P4 Core
                        D6: L15 P15 PU
            D3: L16 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L16 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L16 P5 Core
                        D6: L16 P16 PU
            D3: L17 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L17 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L17 P6 Core
                        D6: L17 P17 PU
            D3: L18 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L18 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L18 P8 Core
                        D6: L18 P18 PU
            D3: L19 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L19 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L19 P9 Core
                        D6: L19 P19 PU
            D3: L20 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L20 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L20 P10 Core
                        D6: L20 P20 PU
            D3: L21 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L21 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L21 P11 Core
                        D6: L21 P21 PU
            D3: L22 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L22 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L22 P12 Core
                        D6: L22 P22 PU
            D3: L23 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L23 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L23 P13 Core
                        D6: L23 P23 PU

from vectorizationbase.jl.

DilumAluthge avatar DilumAluthge commented on August 19, 2024

And here's lscpu in full:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    1
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Stepping:              4
CPU MHz:               3299.987
CPU max MHz:           3700.0000
CPU min MHz:           1000.0000
BogoMIPS:              5200.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              19712K
NUMA node0 CPU(s):     0-11
NUMA node1 CPU(s):     12-23
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp

from vectorizationbase.jl.

DilumAluthge avatar DilumAluthge commented on August 19, 2024

Idk if it helps, but here is the output of numastat:

$ numastat
                           node0           node1
numa_hit              2236145352      1986775692
numa_miss                      0         5776120
numa_foreign             5776120               0
interleave_hit             65823           65606
local_node            2236102907      1986719478
other_node                 42445         5832334

And here is the output of numactl --hardware:

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
node 0 size: 96940 MB
node 0 free: 75997 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23
node 1 size: 98304 MB
node 1 free: 94762 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

from vectorizationbase.jl.

DilumAluthge avatar DilumAluthge commented on August 19, 2024

hwloc Package objects were called Sockets up to hwloc 1.10.

Perfect!

from vectorizationbase.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.