rrze-hpc / likwid Goto Github PK
View Code? Open in Web Editor NEWPerformance monitoring and benchmarking suite
Home Page: https://hpc.fau.de/research/tools/likwid/
License: GNU General Public License v3.0
Performance monitoring and benchmarking suite
Home Page: https://hpc.fau.de/research/tools/likwid/
License: GNU General Public License v3.0
Why do I have to worry about registers in group files? Likwid could automatically allocate registers to measured events. It is also cumbersome to write the metrics with the register names instead of the events.
Idea:
Build list of interesting metrics and let user select metrics instead of groups. If insufficient registers are available offer user to multiplex measurement.
Thanks for the nice work on Likwid, I really like the tool.
On line 337, likwid-topology implicitly assumes that the LLC cache size can be divided by 2^20.
On Broadwell E5-2680 v4, the LLC is 35 MB, which doesn't cause an issue, unless the node is configured to have 4, rather than 2 NUMA domains. In that case, the division no longer results in an integer, and the format function fails since the format code is '%d'.
It is of course not a big deal, and few people will notice.
At first, I didn't even notice that both are being output. (I only read that on the wikipage and then checked on my system.)
So, I just think that having no blank lines above/below the descriptions, such as
This architecture has 97 counters.
Counter tags(name, type<, options>)
has the effect of "hiding" these lines within the flood of events/counters.
In my case, some of the output looks like this:
[lots of lines here]
PBOX3, Physical Layer box, EDGEDETECT|THRESHOLD|INVERT
This architecture has 695 events.
Event tags (tag, id, umask, counters<, options>):
TEMP_CORE, 0x0, 0x0, TMP0
[lots of lines here]
Here, the first line is still a counter, then two lines of description and then the events are listed. It is almost impossible to see lines 3 and 4 if one doesn't know they are there.
Allow users to define new events during runtime (either through arguments or in a config file), rather then recompiling likwid.
I'm getting low power measurements -- 14W idle power which is much lower than the minimum package power of 37W. Active power and DRAM power is also similarly low.
Is likwid supported on Haswell-EP? What could be the reason of this behavior?
$ likwid-powermeter -s 0.5s
--------------------------------------------------------------------------------
CPU name: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz
CPU type: Intel Xeon Haswell EN/EP/EX processor
CPU clock: 3.49 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Runtime: 0.500555 s
Measure for socket 0 on CPU 0
Domain PKG:
Energy consumed: 7.19739 Joules
Power consumed: 14.3788 Watt
Domain PP0:
Energy consumed: 0 Joules
Power consumed: 0 Watt
Domain DRAM:
Energy consumed: 0.364752 Joules
Power consumed: 0.728695 Watt
--------------------------------------------------------------------------------
$ likwid-powermeter -i
--------------------------------------------------------------------------------
CPU name: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz
CPU type: Intel Xeon Haswell EN/EP/EX processor
CPU clock: 3.49 GHz
--------------------------------------------------------------------------------
Base clock: 3500.00 MHz
Minimal clock: 1200.00 MHz
Turbo Boost Steps:
C0 3600.00 MHz
C1 3600.00 MHz
C2 3600.00 MHz
C3 3600.00 MHz
--------------------------------------------------------------------------------
Info for RAPL domain PKG:
Thermal Spec Power: 140 Watt
Minimum Power: 37 Watt
Maximum Power: 140 Watt
Maximum Time Window: 39040 micro sec
Info for RAPL domain DRAM:
Thermal Spec Power: 8 Watt
Minimum Power: 2.375 Watt
Maximum Power: 8 Watt
Maximum Time Window: 18544 micro sec
Info about Uncore:
Minimal Uncore frequency: 1200 MHz
Maximal Uncore frequency: 3000 MHz
Performance energy bias: 7 (0=highest performance, 15 = lowest energy)
Hi,
I run make install as root since I use the accessDaemon. After that, the permissions for any folder in $PREFIX/share/likwid/perfgroups are 0700 as well as for all the files in any subfolder.
This causes likwid-perfctr -a to return exactly nothing (no error, no output, no errorcode) and took me quite a while to figure that out.
I also propose here to add an error message if likwid-perfctr doesn't find anything.
[../likwid-master/ext/hwloc/hwloc/topology-linux.c:1593]: (error) Common realloc mistake: 'maps' nulled but not freed upon failure
[../likwid-master/ext/hwloc/hwloc/topology-linux.c:2466]: (error) Common realloc mistake: 'ret' nulled but not freed upon failure
[../likwid-master/ext/hwloc/hwloc/topology-linux.c:3504]: (error) Common realloc mistake: 'Lprocs' nulled but not freed upon failure
If realloc cannot find enough space, it returns a null pointer, and leaves the previous region allocated. This solution from Stack Overflow may be helpful in fixing the error:
tmp = realloc(orig, newsize);
if (tmp == NULL)
{
// could not realloc, but orig still valid
}
else
{
orig = tmp;
}
NOTE: Similar errors exist in the following three files:
[../likwid-master/ext/hwloc/hwloc/topology-synthetic.c:911]: (error) Common realloc mistake: 'loops' nulled but not freed upon failure
[../likwid-master/ext/hwloc/hwloc/topology.c:290]: (error) Common realloc mistake: 'infos' nulled but not freed upon failure
[../likwid-master/ext/hwloc/hwloc/topology.c:320]: (error) Common realloc mistake: 'dst_infos' nulled but not freed upon failure
sursri wrote:
I was surprised to see L2_DATA_READ_MISS_MEM_FILL, L2_DATA_READ_MISS_CACHE_FILL as 0 for few programs when measured in PMC1, so i made a comparison with VTUNE and feel that there is some problem with likwid-perfcntr?
Later when I ran experiment running only 1 event with PMC0 i got some small values. I dont see any document mentioning that it can be measured only with PMC0. And there is a huge difference in L2_DATA_WRITE_MISS_MEM_FILL, L2_DATA_READ_MISS_CACHE_FILL compared to VTUNE. Can you help me with this?
What steps will reproduce the problem?
What is the expected output? What do you see instead?
likwid | Vtune | |
---|---|---|
DATA_READ | 200292452 | 216400000 |
DATA_WRITE | 130125076 | 380400000 |
BANK_CONFLICTS | 5197 | 100000 |
BRANCHES | 110225835 | 123100000 |
INSTRUCTIONS_EXECUTED | 1291182962 | 1370940000 |
DATA_READ_OR_WRITE | 330451293 | 609500000 |
DATA_READ_MISS_OR_WRITE_MISS | 42962603 | 74130000 |
L2_DATA_READ_MISS_CACHE_FILL | 4389 | 120000 |
L2_DATA_WRITE_MISS_CACHE_FILL | 129949990 | 129870000 |
L2_DATA_READ_MISS_MEM_FILL | 200038669 | 199920000 |
L2_DATA_WRITE_MISS_MEM_FILL | 3876 | 30140000 |
What version of the product are you using?
likwid-perfctr 3.0
Please provide any additional information below.
I ran if for add program of stream benchmark with just 1 thread.
Hi, the event ID and umask I use for these events are according to the documentation. In the
documentation it says that FUB is CRI :-). No idea what this means, they do not introduce those
terms. The only way to say who is right is to compare against a microbenchmark where you know
the result. I plan to do this for Phi also. I find it suspicious that the vtune results are all flat to the
fifth digit. Are those end to end measurements?
sursri:
I am running the same program pinned on different core on xeon phi and
measuring the same event. and the values are different in different
cores. please checkout the result of multiple runs.~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g L2_READ_HIT_M:PMC0 -C 58 -O /home/snataraj/perf_anal/copy ------------------------------------------------------------- ------------------------------------------------------------- CPU type: Intel Xeon Phi Coprocessor CPU clock: 1.05 GHz ------------------------------------------------------------- /home/snataraj/perf_anal/copy K=1048577 Status: 0x0 Event,core 58 L2_READ_HIT_M,10761.000000
~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g L2_READ_HIT_M:PMC0 -C 58 -O /home/snataraj/perf_anal/copy ------------------------------------------------------------- ------------------------------------------------------------- CPU type: Intel Xeon Phi Coprocessor CPU clock: 1.05 GHz ------------------------------------------------------------- /home/snataraj/perf_anal/copy K=1048577 Status: 0x0 Event,core 58 L2_READ_HIT_M,11010.000000
~/perf_anal $ ./work_1.sh ~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g L2_READ_HIT_M:PMC0 -C 40 -O /home/snataraj/perf_anal/copy ------------------------------------------------------------- ------------------------------------------------------------- CPU type: Intel Xeon Phi Coprocessor CPU clock: 1.05 GHz ------------------------------------------------------------- /home/snataraj/perf_anal/copy K=1048577 Status: 0x0 Event,core 40 L2_READ_HIT_M,0.000000
~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g L2_READ_HIT_M:PMC0 -C 10 -O /home/snataraj/perf_anal/copy ------------------------------------------------------------- ------------------------------------------------------------- CPU type: Intel Xeon Phi Coprocessor CPU clock: 1.05 GHz ------------------------------------------------------------- /home/snataraj/perf_anal/copy K=1048577 Status: 0x0 Event,core 10 L2_READ_HIT_M,10768.000000
~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g L2_READ_HIT_M:PMC0 -C 40 -O /home/snataraj/perf_anal/copy ------------------------------------------------------------- ------------------------------------------------------------- CPU type: Intel Xeon Phi Coprocessor CPU clock: 1.05 GHz ------------------------------------------------------------- /home/snataraj/perf_anal/copy K=1048577 Status: 0x0 Event,core 40 L2_READ_HIT_M,0.000000
~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g L2_READ_HIT_M:PMC0 -C 54 -O -m /home/snataraj/perf_anal/copy ------------------------------------------------------------- ------------------------------------------------------------- CPU type: Intel Xeon Phi Coprocessor CPU clock: 1.05 GHz ------------------------------------------------------------- /home/snataraj/perf_anal/copy K=1048577 Status: 0x0 ===================== Region: Compute ===================== Region Info,core 54 RDTSC Runtime [s],0.021591 call count,1.000000 Event,core 54 L2_READ_HIT_M,158.000000
~/perf_anal $ /home/snataraj/perf_anal/likwid/likwid-perfctr -g L2_READ_HIT_M:PMC0 -C 4 -O /home/snataraj/perf_anal/copy ------------------------------------------------------------- ------------------------------------------------------------- CPU type: Intel Xeon Phi Coprocessor CPU clock: 1.05 GHz ------------------------------------------------------------- /home/snataraj/perf_anal/copy K=1048577 Status: 0x0 Event,core 4 L2_READ_HIT_M,524291.000000
sursri:
Reply from intel forum:FUB must stand for something like "Functional Unit Block" because P54C
refers to the processor core (P54C is a specific version of the Pentium
core, though the actual core in the Xeon Phi has been heavily upgraded
from the original P54C), CRI refers to the "Cache-Ring-Interface", and
VPU refers to the "Vector-Processing-Unit". In the Xeon Phi performance
counters, the UMASK field actually specifies the functional unit for
which the event is requested, with 0x00 referring to the core (P54C),
0x10 referring to the CRI, and 0x20 referring to the VPU. This was
confusing to me at first because because on other processors the UMASK
is almost always used to modify the specific details of what is measured
by an Event Select code, rather than actually specifying the Unit on
which the measurements should occur. On Xeon Phi the UMASK field is
used in a way that is much closer to what you would expect a "Unit Mask"
to mean -- it specifies the "Unit" for which you want the measurements
to be taken.
Hi Thomas,
I'm currently working on embedding Likwid into a PDE framework (written in C++) to enable profiling of the individual compute kernels. Following common design guidelines most of the output functions of the generic profiler interface provided by the framework are declared "const".
In the implementation of these const methods in my Likwid-wrapping profiler I need to call methods from the Likwid API, e.g. double power_printEnergy(PowerData* data)
. The problem is that the PowerData instance in my case is a member of the profiler, and inside a const method I can only provide a const PowerData*
, not a PowerData*
(leaving aside const_cast
as a dirty solution).
Considering the implementation of power_printEnergy
which clearly doesn't modify its arguments (one could even say it would be unexpected if it did) and some other similar examples in the Likwid API, I wanted to ask if it would be OK to explicitly declare these arguments constant in some future version of Likwid.
Please let me know what you think. Thanks.
(I can send a PR for the obvious examples that I encountered if that helps, but wanted to check with you first.)
When running multiple single-thread MPI processes on a single node, all processes get non-zero uncore events values (for example, memory read/write events). However, when running single MPI process with multiple threads, only NSOCKETS threads get non-zero uncore events values. Is is possible to support the same behavior in the multiple process case? This is highly demanded for single-threaded MPI applications.
just compiled master to check on new features, but received this upon a call to "make":
$ make
===> COMPILE GCC/perfmon.o
In file included from ./src/perfmon.c:60:0:
./src/includes/perfmon_haswell.h: In function ‘perfmon_setupCounterThread_haswell’:
./src/includes/perfmon_haswell.h:1002:43: error: ‘dev’ undeclared (first use in this function)
if (haveLock && HPMcheck(dev, cpu_id))
^
./src/includes/perfmon_haswell.h:1002:43: note: each undeclared identifier is reported only once for each function it appears in
Makefile:174: recipe for target 'GCC/perfmon.o' failed
make: *** [GCC/perfmon.o] Error 1
I am using 12dd580 on fc21 with default gcc. looks to me that something along the lines of
PciDeviceIndex dev = counter_map[index].device;
is missing here: https://github.com/rrze-likwid/likwid/blob/master/src/includes/perfmon_haswell.h#L917
Problem:
Likwid overwrites anything in the counter registers without asking. Intel proposed to allow using multiple tools without interference.
Solution:
Honor the solution of the Intel Whitepaper. Provide a overwrite mode.
Make init and setup of counters specific for the thread group and performance group. (Only overwrite register which are needed)
Intel's Performance Monitoring Unit Sharing Guide is attached.
http://software.intel.com/file/30388
The paper is outdated. Further counters as Uncore (PCI based) and RAPL are not addressed.
Therefore this issue is delayed.
I really would like to see some method that checks whether the counters are already in use - Intel
PCM reports within its init phase if the counters are not zeroed and ask to enforce a clearing.
But there is also a conflict with tools like PAPI that rely on the Linux kernel perf_event interface:
these tools do not set the MSRs directly and do not clean up at the end. So monitoring a node as
root using likwid (or Intel PCM) might always fail if an application used the perf_event interface.
The current trunk version checks the counter at the beginning. It skipps counters where the
control register is not empty. In general we wanted to avoid additional register reads at the
beginning as we have seen long access times for the MSRs in particular situations. But these
checks are needed so that we can use common control files for e.g. SandyBridge and
SandyBridge EP.
In the end, LIKWID zeros the control registers but leaves the counter registers untouched.
This is not so easy to realize. Currently Likwid checks if registers are readable and writable but
does not fail when config registers are already in use. There is no flag like "--force".
We think about it in Likwid 4.1
Trying to build likwid using more then one "make" job will cause it to fail.
Example, (replace 6 with any number higher than 1) ;
$ make -j6
===> GENERATE HEADER GCC/perfmon_westmere_events.h
===> GENERATE HEADER GCC/perfmon_k8_events.h
===> GENERATE HEADER GCC/perfmon_sandybridge_events.h
===> GENERATE HEADER GCC/perfmon_nehalem_events.h
===> GENERATE HEADER GCC/perfmon_silvermont_events.h
===> GENERATE HEADER GCC/perfmon_core2_events.h
===> GENERATE HEADER GCC/perfmon_ivybridge_events.h
===> GENERATE HEADER GCC/perfmon_p6_events.h
===> GENERATE HEADER GCC/perfmon_haswellEP_events.h
===> GENERATE HEADER GCC/perfmon_pm_events.h
===> GENERATE HEADER GCC/perfmon_kabini_events.h
===> GENERATE HEADER GCC/perfmon_interlagos_events.h
===> GENERATE HEADER GCC/perfmon_broadwell_events.h
===> GENERATE HEADER GCC/perfmon_atom_events.h
===> GENERATE HEADER GCC/perfmon_sandybridgeEP_events.h
===> GENERATE HEADER GCC/perfmon_westmereEX_events.h
===> GENERATE HEADER GCC/perfmon_phi_events.h
===> GENERATE HEADER GCC/perfmon_ivybridgeEP_events.h
===> GENERATE HEADER GCC/perfmon_k10_events.h
===> GENERATE HEADER GCC/perfmon_nehalemEX_events.h
===> GENERATE HEADER GCC/perfmon_haswell_events.h
===> COMPILE GCC/configuration.o
===> COMPILE GCC/numa.o
===> COMPILE GCC/pci_proc.o
===> COMPILE GCC/perfmon.o
===> COMPILE GCC/numa_hwloc.o
===> COMPILE GCC/topology_proc.o
===> COMPILE GCC/hashTable.o
===> COMPILE GCC/pci.o
===> COMPILE GCC/luawid.o
===> COMPILE GCC/cpuFeatures.o
===> COMPILE GCC/pci_hwloc.o
===> COMPILE GCC/power.o
===> COMPILE GCC/topology.o
===> COMPILE GCC/thermal.o
===> COMPILE GCC/memsweep.o
===> COMPILE GCC/topology_hwloc.o
===> COMPILE GCC/bitUtil.o
===> COMPILE GCC/topology_cpuid.o
===> COMPILE GCC/ghash.o
===> COMPILE GCC/affinity.o
===> COMPILE GCC/accessClient.o
===> COMPILE GCC/libperfctr.o
===> COMPILE GCC/timer.o
===> COMPILE GCC/access.o
===> COMPILE GCC/msr.o
===> COMPILE GCC/bstrlib.o
===> COMPILE GCC/numa_proc.o
===> COMPILE GCC/tree.o
===> COMPILE GCC/loadData.o
===> ENTER ext/hwloc
===> ENTER ext/lua
gcc: error: ./GCC/lvm.o: No such file or directory
gcc: error: ./GCC/lapi.o: No such file or directory
gcc: error: ./GCC/lgc.o: No such file or directory
gcc: error: ./GCC/lcode.o: No such file or directory
gcc: error: ./GCC/ldump.o: No such file or directory
Makefile:49: recipe for target 'liblikwid-lua.so' failed
make[1]: *** [liblikwid-lua.so] Error 1
make[1]: *** Waiting for unfinished jobs....
Makefile:161: recipe for target 'ext/lua/liblikwid-lua.so' failed
make: *** [ext/lua/liblikwid-lua.so] Error 2
$
Now try again with only one job ;
$ make -j1
===> ENTER ext/hwloc
===> ENTER ext/lua
===> CREATE SHARED LIB liblikwid.so
===> CREATE LIB liblikwidpin.so
===> ADJUSTING likwid-perfctr
===> ADJUSTING likwid-pin
===> ADJUSTING likwid-powermeter
===> ADJUSTING likwid-topology
===> ADJUSTING likwid-memsweeper
===> ADJUSTING likwid-agent
===> ADJUSTING likwid-mpirun
===> ADJUSTING likwid-perfscope
===> ADJUSTING likwid-genTopoCfg
===> ADJUSTING likwid-setFrequencies
===> ADJUSTING likwid.lua
===> BUILD access daemon likwid-accessD
===> BUILD frequency daemon likwid-setFreq
===> ENTER bench
===> COMPILE C GCC/strUtil.o
===> COMPILE C GCC/threads.o
===> COMPILE C GCC/barrier.o
===> COMPILE C GCC/allocator.o
===> COMPILE C GCC/bstrlib.o
===> COMPILE C GCC/bench.o
===> GENERATE BENCHMARKS
===> ASSEMBLE GCC/triad_avx.o
===> ASSEMBLE GCC/store_sse.o
===> ASSEMBLE GCC/striad_plain.o
===> ASSEMBLE GCC/sum_plain.o
===> ASSEMBLE GCC/vtriad_avx.o
===> ASSEMBLE GCC/copy_mem.o
===> ASSEMBLE GCC/copy_mem_avx.o
===> ASSEMBLE GCC/clstore.o
===> ASSEMBLE GCC/stream.o
===> ASSEMBLE GCC/stream_avx.o
===> ASSEMBLE GCC/update_plain.o
===> ASSEMBLE GCC/vtriad_sse.o
===> ASSEMBLE GCC/striad_mem_avx.o
===> ASSEMBLE GCC/copy_mem_sse.o
===> ASSEMBLE GCC/striad_avx.o
===> ASSEMBLE GCC/striad_mem_sse.o
===> ASSEMBLE GCC/copy.o
===> ASSEMBLE GCC/load.o
===> ASSEMBLE GCC/copy_avx.o
===> ASSEMBLE GCC/load_avx.o
===> ASSEMBLE GCC/store_plain.o
===> ASSEMBLE GCC/sum.o
===> ASSEMBLE GCC/load_sse.o
===> ASSEMBLE GCC/sum_avx.o
===> ASSEMBLE GCC/triad_mem.o
===> ASSEMBLE GCC/sum_sse.o
===> ASSEMBLE GCC/clcopy.o
===> ASSEMBLE GCC/update.o
===> ASSEMBLE GCC/update_avx.o
===> ASSEMBLE GCC/store_mem_avx.o
===> ASSEMBLE GCC/vtriad_plain.o
===> ASSEMBLE GCC/copy_sse.o
===> ASSEMBLE GCC/update_sse.o
===> ASSEMBLE GCC/store_mem_sse.o
===> ASSEMBLE GCC/store_mem.o
===> ASSEMBLE GCC/store_avx.o
===> ASSEMBLE GCC/triad.o
===> ASSEMBLE GCC/vtriad_mem_avx.o
===> ASSEMBLE GCC/clload.o
===> ASSEMBLE GCC/triad_split.o
===> ASSEMBLE GCC/store.o
===> ASSEMBLE GCC/striad_sse.o
===> ASSEMBLE GCC/stream_mem.o
===> ASSEMBLE GCC/copy_plain.o
===> ASSEMBLE GCC/load_plain.o
===> ASSEMBLE GCC/vtriad_mem_sse.o
===> LINKING likwid-bench
$
Actually, I did it with JNI and it looks like it is working fine. Would you guys consider including it if I created a pull request?
Hello,
First of all, I would like to thank you for developing this tool, it's very helpful for my work.
I have one question regarding the hardware counters available in the Haswell EP architecture.
Are they any counters available to monitor the disk activities?
I have some insights by using several counters related to uncore or memory but is it the right way to achieve such monitoring?
Thanks for your help.
In some cases the overflow recognition for the RAPL counters does not work. See https://groups.google.com/forum/#!topic/likwid-users/z2CYJGmqOv8
Intel's docs say that for Sandy Bridge onward, all 8 PMC's are available for the single active thread if hyperthreading is disabled. See Tables 18-30, 18-42, and 18-53 all entitled "Core PMU Comparison": http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf
Does Likwid allow for use of more than 4 PMC's at a time?
This might be a local configuration issue, but I had trouble compiling Likwid on KNL. The Intel specific libraries weren't being found when compiling the lua library. The solution was adding an RPATH so that liblikwid-lua.so was able to find libimf.so and friends.
I also removed the '-vec-report=0' flag from the command line since it produced lots of "deprecated" messages. I switched from -O1 to -Ofast since there didn't seem to be any reason not to. I dropped -Wno-format as it didn't seem to make a difference. These changes are optional, while the RPATH fix is required.
I have not yet tested this on non-KNL using ICC, but don't see much chance of it causing breakage.
diff --git a/make/include_ICC.mk b/make/include_ICC.mk
index 9dfe66b..06dd39b 100644
--- a/make/include_ICC.mk
+++ b/make/include_ICC.mk
@@ -9,7 +9,7 @@ GEN_PMHEADER = ./perl/gen_events.pl
ANSI_CFLAGS = -std=c99 #-strict-ansi
-CFLAGS = -O1 -Wno-format -vec-report=0 -fPIC -pthread
+CFLAGS = -Ofast -fPIC -pthread
FCFLAGS = -module ./
ASFLAGS = -gdwarf-2
PASFLAGS = x86-64
@@ -25,4 +25,8 @@ DEFINES += -DPAGE_ALIGNMENT=4096
INCLUDES =
LIBS = -lrt
-
+# colon separated list of paths to search for libs
+ICC_LIB_RPATHS = /opt/intel/compilers_and_libraries/linux/lib/intel64_lin
+ifneq (strip $(ICC_LIB_RPATHS),)
+RPATHS += -Wl,-rpath=$(ICC_LIB_RPATHS)
+endif
I've barely tested, but with this tiny fix, it seems to be working for me on KNL!
According to the likwid-bench wiki page, TYPE INT
is supported, but the following bench code does not compile for me:
STREAMS 1
TYPE INT
FLOPS 0
BYTES 4
DESC Integer load, only 32-bit scalar operations
LOADS 1
STORES 0
LOOP 1
movl ISCALAR, [STR0 + GPR1 * 4]
with this error message:
===> ASSEMBLE GCC/load_int32.o
./GCC/load_int32.s: Assembler messages:
./GCC/load_int32.s:32: Error: no such instruction:type INT' ./GCC/load_int32.s:36: Error: no such instruction:
movl ISCALAR,[rsi+rax * 4]'
make[1]: *** [GCC/load_int32.o] Error 1
Thanks and cheers from ATX.
Hello,
I was trying to build likwid for Intel xeon phi, but although I followed the configurations indicated, I got the following error:
===> COMPILE MIC/timer.o
/tmp/icckuOoB2as_.s: Assembler messages:
/tmp/icckuOoB2as_.s:109: Error: rdtscp' is not supported on
k1om'
make: *** [MIC/timer.o] Error 1
I looked into config_defines.mk for the flag DHAS_RDTSCP but I couldn' t find, I also added it manually to default value 0 but the problem still occurs.
Is there any way around?
Thanks in advance
Hi,
I would like to reopen this issue; with the current correction in the source code, the command still shows weird outputs.
Here is the command I've always tried:
likwid-perfctr -f -c 0-3 -g BRANCH -t 2s
And the outputs:
--------------------------------------------------------------------------------
# CORES: 0|1|2|3
1 8 4 2.0013075179619 2.0013102503959 2.0013102503959 2.0013102503959 2.0013102503959 6.1923223535304e-06 1.158459678048e-06 1.1490437761027e-06 9.5708087192693e-07 3389.9079014806 3397.4141617506 3369.8001504725 3417.0112020124 2.7854898210138 1.5682565789474 1.5612876599257 1.5499262174127 0.19770460445416 0.19366776315789 0.19397441188609 0.19281849483522 0.003825659243066 0.0024671052631579 0.0028889806025588 0.0029513034923758 0.019350380096752 0.012738853503185 0.014893617021277 0.01530612244898 5.0580511402903 5.1634819532909 5.1553191489362 5.1862244897959
1 8 4 4.004496399955 2.0009526654214 2.0009526654214 2.0009526654214 2.0009526654214 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 8 4 6.00749528636 2.0007557606874 2.0007557606874 2.0007557606874 2.0007557606874 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 8 4 8.0101798471776 2.0007722017634 2.0007722017634 2.0007722017634 2.0007722017634 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
--------------------------------------------------------------------------------
Only the first line shows some values, however, the following lines shows many 0's.
Best,
S.
sursri wrote:
Is it possible to collect the event stats at a regular interval of 10 million instructions or so? I don't see the timeline function working properly for multithreaded program using perfcntr.
Is it possible to make modifications in existing likwid-perfcntr?
Please provide more details on why timeline mode does not work for threaded case.
Since no additional information was postet by 'sursri' we cannot work on the problem with the
multithreaded timeline measurements.For your first question. Yes it is possible, but there is currently no implementation of this behavior.
The steps are the following:
- Initialize counter as normal but enable interrupt on overflow (polling would also be possible but
does not provide accurate results)- Set desired counter register ( Something like PMC0 with event INSTRUCTIONS_RETIRED to
2^(register width) - (instruction interval) )- Start counters
- On interrupt read all configured counters and reinitialize counter registers.
And so on.
Not implemented in Likwid 4.0
Likwid does not use the interrupts, so only polling possible.
This is about likwid 4.1.1 (release), built to use hwloc that comes with it (config.mk included for completeness).
For some obscure reason the assignment of processors to physical address/core-id is not what one would expect on Intel hardware. Normally, one expects on a dual socket, 12-core machine (haswell E5-2680 v3), hyperthreading disabled:
0 -> 0:0
1 -> 0:1
..
11 -> 0:11
12 -> 1:0
13 -> 1:1
...
23 -> 1:11
The left-hand number is the processor, the first right-hand number the physical address, the second the core-id according to /proc/cpuinfo.
On some machines however, we get:
0 -> 0:0
1 -> 0:2
2 -> 0:4
..
5 -> 0:10
6 -> 1:0
7 -> 1:2
...
11 -> 1:10
12 -> 0:1
13 -> 0:3
..
17 -> 0:11
18 -> 1:1
19 -> 1:3
...
23 -> 1:11
Obviously, it is not what we want, but that is our problem.
However, when likwid-topology is run on such a node, it seems to get confused. It reports:
Sockets: 2
Cores per socket: 6
Threads per core: 2
Apparently, the weird round-robin assignment tricks likwid-topology into assuming that hyperthreading is enabled. The complete output of likwid-topology is in attachment.
lscpu and lstopo (version 1.10.1) reports are consistent with /proc/cpuinfo htough (output of both in attachment as well). So it would seem that the information coming for hwloc is somehow misinterpreted.
Thanks, best regards, Geert Jan Bex
lscpu_out.txt
cpuinfo_out.txt
likwid_topology_out.txt
lstopo_out.txt
config.txt
Hello again,
I am using the latest version of likwid and when I run likwid-perfctr with an executable, which uses omp, i get the following error:
./helloflops3_xphi: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory
I looked around and found that because likwid-lua has set the setuid bit, it ignores env LD_LIBRARY_PATH, thus the exec cannot link with libiomp5.so
Moreover the exec is dynamically linked with:
linux-vdso.so.1 => (0x00007fff67bff000)
libm.so.6 => /lib64/libm.so.6 (0x00007f28b3d64000)
libiomp5.so => /home/echristof/libraries/libiomp5.so (0x00007f28b3a4e000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f28b383c000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f28b361f000)
libc.so.6 => /lib64/libc.so.6 (0x00007f28b32c7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f28b30c3000)
/lib64/ld-linux-k1om.so.2 (0x00007f28b3f93000)
and likwid-lua:
linux-vdso.so.1 => (0x00007fff279dd000)
liblikwid-lua.so.4 => /home/echristof/likwid_mic/lib/liblikwid-lua.so.4 (0x00007ff84f2c9000)
libm.so.6 => /lib64/libm.so.6 (0x00007ff84f099000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ff84ee95000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff84ec83000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff84ea65000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff84e70d000)
libimf.so => /home/echristof/libraries/libimf.so (0x00007ff84e2b4000)
libsvml.so => /home/echristof/libraries/libsvml.so (0x00007ff84dade000)
libirng.so => /home/echristof/libraries/libirng.so (0x00007ff84d8cb000)
libintlc.so.5 => /home/echristof/libraries/libintlc.so.5 (0x00007ff84d6aa000)
/lib64/ld-linux-k1om.so.2 (0x00007ff84f55d000)
Only libiomp5.so is missing.
One solution would be to link them statically. Is there another solution through likwid-lua maybe?
Thanks in advance!
Build event definitions for Intel architectures from https://download.01.org/perfmon/, with the optional introduction of patches to fix known issues with the provided JSON definitions.
Do other vendors provide similar machine readable documentation?
When setting ACCESSMODE
to direct in config.mk and BUILDDAEMON
to false, LIKWID returns Unable to get path to access daemon
When the daemon is build although direct access is selected, the error does not occur.
The wiki states on page likwid-perfctr:
in the directory $HOME/.likwid/groups/ARCH/, where ARCH is a short name for the processor
microarchitecture. You can get the short name by runninglikwid-perfctr -i
.
I did run that command and my output looks like this:
CPU name: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
CPU type: Intel Xeon SandyBridge EN/EP processor
CPU clock: 2.30 GHz
CPU family: 6
CPU model: 45
CPU stepping: 7
PERFMON version: 3
PERFMON number of counters: 8
PERFMON width of counters: 48
Supported Intel processors:
Intel Core 2 65nm processor
Intel Core 2 45nm processor
Intel Xeon MP processor
Intel Atom 45nm processor
Intel Atom 32nm processor
Intel Atom 22nm processor
Intel Core Bloomfield processor
Intel Core Lynnfield processor
Intel Core Westmere processor
Intel Nehalem EX processor
Intel Westmere EX processor
Intel Core SandyBridge processor
Intel Xeon SandyBridge EN/EP processor
Intel Core IvyBridge processor
Intel Xeon IvyBridge EN/EP/EX processor
Intel Core Haswell processor
Intel Xeon Haswell EN/EP/EX processor
Intel Atom (Silvermont) processor
Intel Atom (Airmont) processor
Intel Xeon Phi Coprocessor
Intel Core Broadwell processor
Supported AMD processors:
AMD Opteron single core 130nm processor
AMD Opteron Dual Core Rev E 90nm processor
AMD Opteron Dual Core Rev F 90nm processor
AMD Barcelona processor
AMD Shanghai processor
AMD Istanbul processor
AMD Magny Cours processor
AMD Interlagos processor
AMD Family 16 model - Kabini processor
I have, admittedly, as a novice user no idea what I should use for "ARCH" and this should be clarified (also in the documentation); for me, architecture is something like "x86_64" but that does not seem reasonable here (it does not even appear in the output of said command).
% likwid-perfctr -E cache
Found 0 event(s) with search key cache:
but
% likwid-perfctr -E CACHE
Found 13 event(s) with search key CACHE:
L3_LAT_CACHE_REFERENCE, 0x2E, 0x4F, PMC
L3_LAT_CACHE_MISS, 0x2E, 0x41, PMC
I think case-insensitivity would be quite nice here - or give a warning in case 0 matches were found. (It may also be useful to add a hint to the case sensitivity to the helptext (-h switch) and to the wiki text (https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr)
Add filter events to all supporting architectures. Candidates are:
UOPS_ISSUED*
UOPS_EXECUTED*
UOPS_RETIRED*
LSD_UOPS*
IDQ_MITE*
IDQ_DSB*
IDQ_MS_DSB*
IDQ_MS_MITE*
As recent architectures support up to 6 UOPs per cycle, the corresponding groups should be only available for systems with deactivated HyperThreading.
The line:
print(string.format("EXEC: %s/mpdboot -r ssh -n %d -f %s", path, nrNodes, hostfile))
makes ssh the default connector between nodes but it shouldn't be that way, it should be modifiable via an environment variable because ssh is not allowed in all clusters (this was my case, in the cluster we use oarsh to connect between nodes, for security reasons ssh is prohibited)
so if someone is trying to use likwid-mpirun like that an error will occur:
mpdboot_moonshot1-40 (handle_mpd_output 932): mpdboot: can not get anything from the mpd daemon; please check connection to moonshot2-14 mpiexec_moonshot1-40: cannot connect to local mpd (/tmp/mpd2.console_dlagunes); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_dlagunes); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
We have a user that reports over-estimated power-readings for the Haswell-EP DRAM domain (factor 4). I had a look into the source code and found the bug. You only determine the energy unit by reading MSR_RAPL_POWER_UNIT.
In [1, Section 5.3.3] Intel reports that the “ENERGY UNIT for DRAM domain is 15.3 μJ". This should be used for uncore-PCI-register RAPL readings. Hackenberg et al. described in [2, Section IV] that the statement about the energy unit is also correct for the DRAM RAPL reading from MSRs.
Best,
Robert
[1] Intel® Xeon® Processor E5-1600 and E5-2600 v3 Product Families, Volume 2 of 2, Registers Datasheet, Intel Corp.
[2] An Energy Efficiency Feature Survey of the Intel Haswell Processor, Hackenberg et al., Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015
Problem:
The physical counters are not enough to measure all interesting metrics in one run. Some architectures even have not enough counter to compute a single metric in one run.
Solution:
Implement multiplexing which allows to use virtual counters. In a multiplexing groups physical counters are mapped in multiple event sets on a unlimited number of virtual counters. The virtual counters can be used in computing derived metrics.
Tasks:
Extend group syntax to enable multiplexing groups
Create a module managing the multiplexing of multiple event sets
Plan timing issues (accumulate or extrapolate)
Thomas Roehl
Not implemented in Likwid 4.0
Since the results of multiplexed events are calculated, this could introduce a high error to the results.
Moreover, in order to get reliable results, it will introduce a high overhead because of steady
switching of event configurations.
There is a problem when likwid-mpirun read the hostfile the user provides, if the hosts names contains, for example, a dash in the name the program will ignore all its name after the dash making:
into:
And so the mpi execution will fail because it will not found the good names for the hosts.
The solution is to replace the line
hostname = line:match("^([%.%a%d]+)")
of all 4 readHostfile* methods to:
hostname = line:match("^([%.%a%d-]+)")
but anyway this solution is for this specific problem, I think that the regular expression should take all the line from the hostfile without modifications.
The wiki (https://github.com/RRZE-HPC/likwid/wiki) links to slides from the BOF session at ISC '13. That link does not work and should be updated.
There's a mismatch between the dataset size used in benchmarks and the dataset size used to compute the benchmark results which can lead to significantly off results.
Example:
./likwid-bench -t my_benchmark -w N:10kB:1
My loop stride is 512. Data type is SINGLE.
The loop limit is set to 2048 instead of 2500 (10000/sizeof(float = 2500).
The results are calculated as follows:
cycPerUp = ((double) maxCycles / (double) (threads_data[0].data.iter * realSize));
with realSize is the user-supplied 10000B.
The reported measurement in this case is off by about 20%!
I've pulled the latest version that builds successfully. When I run I get the following error.
$ sudo likwid-perfctr -a
/usr/local/bin/likwid-lua: /usr/local/bin/likwid-perfctr:363: attempt to call a nil value (field 'getGroups')
Hi,
We're trying to measure energy usage of some parallel codes on a KNL machine and I wanted to check if support to profile energy usage on KNL architecture is already available on a devel-branch or if it's in the pipeline. Let me know when time permits.
Thanks in advance.
[email protected] wrote:
Currently likwid-bench focuses on streaming and instruction throughput limited benchmarks.
Task:
Extend likwid-bench to also allow for latency bound data access benchmarks.
Basic application ready but until now no integration in likwid-bench
Hi,
I try to profile a multi-threaded application and the impact on Numa systems,
When using likwid-perfctr -g QPI [...]
the following metrics report zeroes:
My platform is:
4x Intel(R) Xeon(R) CPU E5-4650 0 @ 2.70GHz
CPU type:,Intel Xeon SandyBridge EN/EP processor
What am I missing?
Thanks
Hi,
I tried to use the additional PMCs on my Broadwell machine, but the additional counters only show zeros.
The CPU is a Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz wtih Hyperthreading turned off. I used the PORT_USAGE group as a test and this is the result:
+--------------------------------+---------+--------------+
| Event | Counter | Sum |
+--------------------------------+---------+--------------+
| INSTR_RETIRED_ANY STAT | FIXC0 | 179574190000 |
| CPU_CLK_UNHALTED_CORE STAT | FIXC1 | 175468080000 |
| CPU_CLK_UNHALTED_REF STAT | FIXC2 | 175469240000 |
| UOPS_EXECUTED_PORT_PORT_0 STAT | PMC0 | 43542586000 |
| UOPS_EXECUTED_PORT_PORT_1 STAT | PMC1 | 61975496000 |
| UOPS_EXECUTED_PORT_PORT_2 STAT | PMC2 | 51583122000 |
| UOPS_EXECUTED_PORT_PORT_3 STAT | PMC3 | 51799443000 |
| UOPS_EXECUTED_PORT_PORT_4 STAT | PMC4 | 0 |
| UOPS_EXECUTED_PORT_PORT_5 STAT | PMC5 | 0 |
| UOPS_EXECUTED_PORT_PORT_6 STAT | PMC6 | 0 |
| UOPS_EXECUTED_PORT_PORT_7 STAT | PMC7 | 0 |
+--------------------------------+---------+--------------+
I have also tried assigning the events to different counters, but it is always PMC{4..7} that refuse to count. Attached is a log of a run with -V 3. Maybe someone with insider infos can figure out whats going wrong.
MfG
K-Wic
likwid-pin -c S0:0 false
has a successful return value, which makes it inconvenient to check if the pinned command has failed. A command line setting to change this behavior would also be fine.
The current CSV output is not very useful since the original table was meant to be interpreted by humans. It would be help full to have a data format that can parsed and right afterwards processed.
For example, consider the following (regular) output:
phinally:~$ likwid-perfctr -g DATA -C S0:0 du -csh .
--------------------------------------------------------------------------------
CPU name: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
CPU type: Intel Xeon SandyBridge EN/EP processor
CPU clock: 2.70 GHz
--------------------------------------------------------------------------------
[...]
--------------------------------------------------------------------------------
Group 1: DATA
+-------------------------+---------+-----------+
| Event | Counter | Core 0 |
+-------------------------+---------+-----------+
| INSTR_RETIRED_ANY | FIXC0 | 80729480 |
| CPU_CLK_UNHALTED_CORE | FIXC1 | 203480258 |
| CPU_CLK_UNHALTED_REF | FIXC2 | 457818723 |
| MEM_UOPS_RETIRED_LOADS | PMC0 | 37244141 |
| MEM_UOPS_RETIRED_STORES | PMC1 | 13267422 |
+-------------------------+---------+-----------+
+----------------------+-----------+
| Metric | Core 0 |
+----------------------+-----------+
| Runtime (RDTSC) [s] | 20.4862 |
| Runtime unhalted [s] | 0.0754 |
| Clock [MHz] | 1200.0812 |
| CPI | 2.5205 |
| Load to store ratio | 2.8072 |
+----------------------+-----------+
phinally:~$
and a usable JSON representation could look as following:
{
"node": {
"hostname": "phinally",
"CPU name": "Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz",
"CPU type": "Intel Xeon SandyBridge EN/EP processor",
"CPU clock [GHz]": 2.7
},
"groups": {
"DATA": {
"events": {
"INSTR_RETIRED_ANY": [80729480],
[...]
},
"metrics": {
"Runtime (RDTSC) [s]": [20.4862],
[...]
}
}
}
}
or something along those lines.
I personally like to include the return code, stdout and stderr as well.
In the script likwid-mpirun in the function: writeWrapperScript
There are cases where a float is written in the bash script causing errors in bash and stopping working:
likwid-mpirun -hostfile $OAR_NODEFILE -np 8 -d echo $HOSTNAME
results in:
/home/users/dlagunes/.likwidscript_22531.txt: line 8: [: 8.0: integer expression expected /home/users/dlagunes/.likwidscript_22531.txt: line 11: 0 - (8.0 - 1): syntax error: invalid arithmetic operator (error token is ".0 - 1)")
the error must be when this operation local full = tostring(np - (np % ppn))
returns an integer with an appended ".0"
Solution: use math.floor lua function to force a integer in all instances of the variable "full"
local full = tostring(math.floor(np - (np % ppn)))
likwid-4.0.0 fails to build on linux. The failure happens in likwid-bench, see the end of this post.
This happens with gcc-4.8.4 and gcc-4.9.2. I use binutils-2.23.1
I did not modify the likwid configuration config.mk.
Please feel free to ask if you need any further information.
===> COMPILE C GCC/strUtil.o
===> COMPILE C GCC/threads.o
===> COMPILE C GCC/barrier.o
===> COMPILE C GCC/allocator.o
===> COMPILE C GCC/bench.o
===> GENERATE BENCHMARKS
===> ASSEMBLE GCC/vtriad_mem_avx.o
===> ASSEMBLE GCC/store_sse.o
===> ASSEMBLE GCC/striad_plain.o
===> ASSEMBLE GCC/sum_plain.o
===> ASSEMBLE GCC/vtriad_avx.o
===> ASSEMBLE GCC/copy_mem.o
===> ASSEMBLE GCC/copy_mem_avx.o
===> ASSEMBLE GCC/clstore.o
===> ASSEMBLE GCC/stream.o
===> ASSEMBLE GCC/stream_avx.o
===> ASSEMBLE GCC/update_plain.o
===> ASSEMBLE GCC/sum.o
===> ASSEMBLE GCC/vtriad_sse.o
===> ASSEMBLE GCC/striad_mem_avx.o
===> ASSEMBLE GCC/copy_mem_sse.o
===> ASSEMBLE GCC/striad_mem_sse.o
===> ASSEMBLE GCC/stream_mem.o
===> ASSEMBLE GCC/copy.o
===> ASSEMBLE GCC/load.o
===> ASSEMBLE GCC/copy_avx.o
===> ASSEMBLE GCC/load_avx.o
===> ASSEMBLE GCC/store_plain.o
===> ASSEMBLE GCC/load_sse.o
===> ASSEMBLE GCC/striad_avx.o
===> ASSEMBLE GCC/copy_sse.o
===> ASSEMBLE GCC/sum_avx.o
===> ASSEMBLE GCC/triad_mem.o
===> ASSEMBLE GCC/sum_sse.o
===> ASSEMBLE GCC/clcopy.o
===> ASSEMBLE GCC/update.o
===> ASSEMBLE GCC/store_mem.o
===> ASSEMBLE GCC/update_avx.o
===> ASSEMBLE GCC/store_mem_avx.o
===> ASSEMBLE GCC/vtriad_plain.o
===> ASSEMBLE GCC/update_sse.o
===> ASSEMBLE GCC/store_mem_sse.o
===> ASSEMBLE GCC/triad.o
===> ASSEMBLE GCC/triad_avx.o
===> ASSEMBLE GCC/clload.o
===> ASSEMBLE GCC/triad_split.o
===> ASSEMBLE GCC/store.o
===> ASSEMBLE GCC/striad_sse.o
===> ASSEMBLE GCC/store_avx.o
===> ASSEMBLE GCC/copy_plain.o
===> ASSEMBLE GCC/load_plain.o
===> ASSEMBLE GCC/vtriad_mem_sse.o
===> LINKING likwid-bench
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.4/../../../../x86_64-pc-linux-gnu/bin/ld: ./GCC/striad_sse.o: relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC
./GCC/striad_sse.o: could not read symbols: Bad value
collect2: error: ld returned 1 exit status
Makefile:95: recipe for target 'likwid-bench_target' failed
make: *** [likwid-bench_target] Error 1
I don't know if this is really an issue, and I haven't figured out exactly what it is doing, but it looks suspicious. In configuration.c, it checks for likwid.cfg and likwid_topo.cfg in INSTALL_PREFIX rather than in INSTALLED_PREFIX. Is this correct? I'm compiling for MIC and have these set differently.
I'm finding the fancy formatting of the marker results to make the numbers difficult to interpret. The automatic centering makes the table look pretty, but it makes it more difficult to compare the magnitude of the numbers at a glance. I'm frequently finding myself counting the digits trying to figure out whether one number is 1/10 of another, or 1/100.
For example, rather than this with the centering:
+---------------------------+---------+--------------+
| Event | Counter | Core 16 |
+---------------------------+---------+--------------+
| Runtime (RDTSC) [s] | TSC | 8.833629e-01 |
| RS_FULL_STALL_ALL | PMC0 | 2132399 |
| NO_ALLOC_CYCLES_RAT_STALL | PMC1 | 136345931 |
| INSTR_RETIRED_ANY | FIXC0 | 1920997053 |
| CPU_CLK_UNHALTED_CORE | FIXC1 | 1141587342 |
| CPU_CLK_UNHALTED_REF | FIXC2 | 1141587369 |
+---------------------------+---------+--------------+
I would find something like this to be faster to compare with similar regions:
+---------------------------+---------+--------------+
| Event | Counter | Core 16 |
+---------------------------+---------+--------------+
| Runtime (RDTSC) [s] | TSC | 0.8833629 |
| RS_FULL_STALL_ALL | PMC0 | 2132399 |
| NO_ALLOC_CYCLES_RAT_STALL | PMC1 | 136345931 |
| INSTR_RETIRED_ANY | FIXC0 | 1920997053 |
| CPU_CLK_UNHALTED_CORE | FIXC1 | 1141587342 |
| CPU_CLK_UNHALTED_REF | FIXC2 | 1141587369 |
+---------------------------+---------+--------------+
This is probably personal preference, and I don't expect to convince anyone else that one way is better, but where in the code would I find the formatting code to try out some variations?
On page https://code.google.com/p/likwid/wiki/LikwidTopology#Intel_Dunnington
Intel Dunnington
CPU name: Intel Xeon MP processor
System have 4 sockets and 6 cores per socket
Sockets: 4
Cores per socket: 6
But for Level 3 cache shown 3 groups (not 4) of cores with 8 cores per group (not 6).
Cache groups: ( 0 12 1 13 2 14 3 15 ) ( 4 16 5 17 6 18 7 19 ) ( 8 20 9 21 10 22 11 23 )
This result looks like error.
Hi,
I ran into the problem that I could run a large simulation when using plain likwid-perfctr, but using the marker API would fail.
By now I figured out it is due to the call to fork() when calling the marker for the first time and which fails with an errno of ENOMEM. This is caused by the fact that the forked process inherits the resources from the parent process, including the memory consumption. Since the markers are called well into the simulation, I have already allocated a lot of memory and fork now tries to reserve the same amount for each child processes, which naturally fails.
I tried to work around this problem by calling LIKWID_MARKER_REGISTER early on, but unfortunately this does not cause forking. As another workaround I called MARKER_START and MARKER_STOP in quick succession and it does lead to the expected/hoped for behavior and hopefully does not mess with the results too much.
This is the solution I am currently going with, but maybe there is a better way?
Perhaps the REGISTER call could be changed to handle the forking and maybe give the user a hint when the forking fails due to ENOMEM?
Also I came across two other problems while debugging this.
Turning on the debug mode in config.mk does not disable the gcc optimizations, which makes the use of gdb impossible. I have submitted a pull request which addresses this (#52).
(Additionally, calling gdb on all the lua stuff is not exactly trivial, so I give an example here in the hopes that no one else ever needs it: "gdb --args likwid-lua /usr/bin/likwid-perfctr -g xxx -C xxx ./executable" and creating the breakpoints depending on future loads of libraries)
There is an error message when fork fails, but likwid does not abort, but instead fails when it cannot open the socket in /tmp/likwid--1. The -1 is the pid of the forked process...
I therefore propose the following change:
diff --git a/src/access_client.c b/src/access_client.c
--- a/src/access_client.c
+++ b/src/access_client.c
@@ -151,7 +151,8 @@ access_client_startDaemon(int cpu_id)
}
else if (pid < 0)
{
- ERROR_PLAIN_PRINT(Failed to fork);
+ ERROR_PRINT(Failed to fork while starting client for cpu %d, cpu_id);
+ exit(EXIT_FAILURE);
}
EXIT_IF_ERROR(socket_fd = socket(AF_LOCAL, SOCK_STREAM, 0), socket() failed);
This ensures that likwid aborts right away and it also prints the reason for the fork failure.
MfG
K-Wic
Hi,
I'm collecting performance counter values using likwid-perfctr command with -t option. Here is the command that I tried (since my computer has four CPUs; N:0-3):
likwid-perfctr -c N:0-3 -g BRANCH -t 2s
However, the output that I could get is very different from the output at the Wiki page. First of all, it just stops after two seconds, and, second, anything appeared at the command line (the below is all I have):
--------------------------------------------------------------------------------
# CORES: 0|1|2|3
--------------------------------------------------------------------------------
Could you tell me what's wrong with the likwid-perfctr?
Best,
SH.
Let user give a function name (or regex) and automatically instrument all matching functions based on DWARF information.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.