AVX-512 hardware counters collector written in Go, based on Go toolchain.
This program utilized Go 1.11 assembler AVX-512 support, extensive end2end test suite and Linux perf tool to build CSV that records some relevant hardware counter values associated with every available AVX-512 instruction form.
The output is printed in CSV format.
There are 6 columns:
extension
is a extension this instruction form belongs toinstruction form
is a combination of operands applied to specific opcodeclass
is a category this instruction form can reachlevel0
showscore_power.lvl0_turbo_license
hardware counter valuelevel1
showscore_power.lvl1_turbo_license
hardware counter valuelevel2
showscore_power.lvl2_turbo_license
hardware counter value
Example output line:
"avx512f","KANDNW K, K, K","turbo0","1249200","0","0"
The instruction form
can have these argument classes:
imm
for immediate (const) argumentK
for opmask registers (K0-K7)X
for 128-bit vector registers (X0-X31)Y
for 256-bit vector registers (Y0-Y31)Z
for 512-bit vector registers (Z0-Z31)mem
for memory operands, including VSIBreg
for scalar register operands like AX and CX
Example of the complete output is provided in avx512_core_i9_7900x.csv file.
Disclaimer: provided example is not a reliable reference. The results may vary between collector runs, execution on different machines may lead to other results as well.
- Go 1.11 or above (AVX-512 support)
- Linux perf that recognizes
core_power.lvl{0,1,2}_turbo_license
events - Intel CPU with at least
avx512f
Hint: pmu-tools contains ocperf.py that can be used on systems with older
perf
that does not recognize required CPU events even if machine has them.
go get -u github.com/Quasilyte/avx512counters
The $GOPATH/bin
is expected to be included into your system $PATH
.
If it's not, you may want to move installed binary somewhere where it
will be accessible.
avx512counters -help
output:
-extensions string
comma-separated list of extensions to be evaluated (default "avx512f,avx512dq,avx512cd,avx512bw")
-iformSpanSize uint
how many instruction lines form a single iform span. Higher values slow down the collection (default 100)
-loopCount uint
how many times to execute every iform span. Higher values slow down the collection (default 1000000)
-perf string
perf tool binary name. ocperf and other drop-in replacements will do (default "perf")
-perfRounds uint
how many times to re-validate perf results. Higher values slow down the collection (default 1)
-workDir string
where to put results and the intermediate files (default "./avx512counters-workdir")
The only thing you might want to adjust is extensions
argument.
Suppose you're only interested in avx512f
, then you can run avx512counters
like this:
avx512counters -extension=avx512f | tee results.csv
The result CSV is printed to stdout. Collection status is printed to stderr.
Supported extensions:
aes_avx512f
avx512_4fmaps
avx512_4vnniw
avx512_bitalg
avx512_ifma
avx512_vbmi
avx512_vbmi2
avx512_vnni
avx512_vpopcntdq
avx512bw
avx512cd
avx512dq
avx512er
avx512f
avx512pf
gfni_avx512f
vpclmulqdq_avx512f