Giter Site home page Giter Site logo

ovis-hpc / ovis Goto Github PK

View Code? Open in Web Editor NEW
97.0 17.0 50.0 18.58 MB

OVIS/LDMS High Performance Computing monitoring, analysis, and visualization project.

Home Page: https://github.com/ovis-hpc/ovis-wiki/wiki

License: Other

Shell 3.11% C 70.90% C++ 0.31% Makefile 1.37% Perl 0.58% Python 9.91% M4 3.94% Roff 7.66% CWeb 0.03% Lex 0.04% Yacc 0.04% Cython 2.11%
ldms monitoring

ovis's Introduction

status

OVIS / LDMS

For more information on installing and using LDMS: https://ovis-hpc.readthedocs.io/en/latest/

To join the LDMS Users Group: https://github.com/ovis-hpc/ovis-wiki/wiki/Mailing-Lists

Besides the Users Group, there have been three sub-workgroups: Best Practices, Multi-tenancy, and Stream Security. To request access to the discussion documents, create an Overleaf account and email Tom Tucker ([email protected]) your email address corresponding to your Overleaf account with the subject "LDMS-UG: Request access to Workgroup Documents."

OVIS is a modular system for HPC data collection, transport, storage, -log message exploration, and visualization as well as analysis.

LDMS is a low-overhead, low-latency framework for collecting, transfering, and storing metric data on a large distributed computer system.

The framework includes:

  • a public API with a reference implementation
  • tools for collecting, aggregating, transporting, and storing metric values
  • collectors for several common types of metrics
  • Data transport over socket, RDMA (IB/iWarp/RoCE), and Cray Gemini as well as Aries

The API provides a way for vendors to expose system information in a uniform manner without being required to provide source code for accessing the information (although we advise it be included) which might reveal proprietary methods or information.

Metric information can be updated by a kernel module which runs only when applications yield the processor and transported using RDMA-like operations, resulting in minimal jitter during collection. LDMS has been run on 10,000 cores collecting over 100,000 metric values per second with less than 0.2% overhead.

Building the OVIS / LDMS source code

Pre-built containers

You may avoid building LDMS from scratch by leveraging containerized deployments. Here's a collection of LDMS container images available for you to pull and run. Each image offers a specific set of functionalities to suit your needs. Please refer to the corresponding links for detailed information on each image. They are currently built with OVIS-4.3.11.

  • ovishpc/ldms-samp: a small image for 'sampler' daemons meant to be deployed on compute nodes.
  • ovishpc/ldms-agg: an image for 'aggregator' daemons, which also includes various storage plugins.
  • ovishpc/ldms-storage: an image that contains storage technologies (e.g. SOS, Kafa).
  • ovishpc/ldms-web-svc: an image for the back-end (Django) that queries SOS data for a Grafana server.
  • ovishpc/ldms-grafana: a Grafana image with 'DSOS' Grafana plugin that allows Grafana to get data from 'ovishpc/ldms-web-svc'.
  • ovishpc/ldms-dev: an image for LDMS code development and binary building.

NOTE: To quickly check the version of ldmsd in a container, issue the following command:

$ docker run --rm -it ovishpc/ldms-samp ldmsd -V

Obtaining ldms-dev container

You may build OVIS on your barebone computers. In which case, you can skip this section. Alternatively, you may get ovishpc/ldms-dev docker image from docker hub which is an ubuntu:22.04 container with required development libraries. The following commands pull the image and run a container created from it.

$ docker pull ovishpc/ldms-dev
$ docker run -it --name dev --hostname dev ovishpc/ldms-dev /bin/bash
root@dev $ # Now you're in 'dev' container

Please see ovishpc/ldms-dev for more information about the container.

Docker Cheat Sheet

$ docker ps # See contianers that are 'Up'
$ docker ps -a  # See all containers (regardless of state )
$ docker stop _NAME_ # Stop '_NAME_' container, this does NOT remove the container
$ docker kill _NAME_ # Like `stop` but send SIGKILL with no graceful wait
$ docker start _NAME_ # Start '_NAME_' container back up again
$ docker rm _NAME_ # Remove the container '_NAME_'
$ docker create -it --name _NAME_ --hostname _NAME_ _IMAGE_ _COMMAND_ _ARG_
  # Create a container '_NAME_' without starting it.
  # -i = interactive
  # -t = create TTY
  # --name _NAME_ to set _NAME_ for easy reference
  # --hostname _NAME_ to set the container hostname to _NAME_ to reduce
  #            confusion
  # _IMAGE_ the container image that the new container shall be created from
  # _COMMAND_ the command to run in the container (e.g. /bin/bash). This is
  #           equivalent to 'init' process to the container. When this process
  #           exited, the container stopped
  # _ARG_ the arguments to _COMMAND_
$ docker create -it --name _NAME_ --hostname _NAME_ _IMAGE_ _COMMAND_ _ARG_
  # `create` + `start` in one go

Obtaining the source code

You may obtain the source code by obtaining an official release tarball, or by cloning the ovis-hpc/ovis Git repository at github.

Release tarballs

Official Release tarballs are available from the GitHub releases page:

https://github.com/ovis-hpc/ovis/releases

The tarball is avialble in the "Assets" section of each release. Be sure to download the tarball that has a name of the form "ovis-ldms-X.X.X.tar.gz".

The links that are named "Source code (zip)" and "Source code (tar.gz)" are automatic GitHub links that we are unable to remove. They will be missing the configure script, because they are raw source from git repository and not the official release tarball distribution.

Cloning the git repository

To clone the source code, go to https://github/com/ovis-hpc/ovis, and click one the "Code" button. Or use the following command:

git clone https://github.com/ovis-hpc/ovis.git -b OVIS-4

Build Dependencies

  • autoconf (>=2.63)
  • automake
  • libtool
  • make
  • bison
  • flex
  • libreadline
  • openssl development library (for OVIS, LDMS Authentication)
  • libmunge development library (for Munge LDMS Authentication plugin)
  • Python >= 3.6 development library and Cython >= 0.29 (for the LDMS Python API and the LDMSD Interface, ldmsd_controller)
  • doxygen (for the OVIS documentation)

Some LDMS plug-ins have dependencies on additional libraries.

REMARK Missing dependencies (e.g. python3-dev) may NOT break the configuration and build but the features requiring them won't be built.

For cray-related LDMS sampler plug-in dependencies, please see the man page of the plug-in in ldms/man/.

RHEL7/CentOS7 dependencies

RHEL7/CentOS7 systems will require a the following packages at a minimum:

  • autoconf
  • automake
  • libtool
  • make
  • bison
  • flex
  • openssl-devel

Additionally, the Python API and the ldmsd_controller command require Python and Cython. One way to obtain those packages is from EPEL (install the epel-release package, and then "yum update"). The packages from EPEL are:

  • python3-devel
  • python36-Cython

Compling the code

If you are interested in storing LDMS data in SOS, then first follow the instructions at https://github.com/ovis-hpc/sos to obtain, build, and install SOS before proceding.

	cd <ovis source directory>
	sh autogen.sh
	./configure [--prefix=<installation prefix>] [other options]
	make
	make install

Run configure --help for a full list of configure options.

Supported systems

  • Ubuntu and friends
  • CentOS and friends
  • Cray XE6, Cray XK, Cray XC

Unsupported features

The following LDMS sampler plugins are considered unsupported. Use are your own risk:

  • perfevent sampler
  • hweventpapi sampler
  • switchx

gnulib

Some m4 files come from the gnulib project. To update these files, first checkout gnulib:

git clone git://git.savannah.gnu.org/gnulib.git

There is no need to build or install the checked out code. The gnulib/gnulib-tool program works directly from the checked out tree.

Next look at the comment at the top of the gnulib/Makefile.am file in the ovis source tree. That comment will tell you the full gnulib-tool command to repeat to install the latest versions of the currently selected components from gnulib. Additional gnulib components can be added to the command line as more macros are desired.

After running gnulib-tool, check in the resulting changes.

ovis's People

Contributors

avilcheslopez avatar baallan avatar bnoyestrd2 avatar bschwal avatar carbonneau1 avatar cmcantalupo avatar drone2537 avatar eric-roman avatar gregjoyce avatar iramin avatar jennfshr avatar joncooknmsu avatar medberry avatar mjmac avatar morrone avatar narategithub avatar nichamon avatar nick-enoent avatar oceandlr avatar opengridcomputing avatar sidjana avatar sindhu-106 avatar slabasan avatar snell1224 avatar tom95858 avatar tpatki avatar valleydlr avatar vanshintel avatar vsurjadidjaja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ovis's Issues

man pages status

The build/install man pages have some out of date descriptions.
Please see the wiki pages for current info.
The pages are in the process of getting updated.

ldms_metric_by_name() needs improved implementation

ldms_metric_by_name() currently employs a simple linear search. Because of this, it is currently forbidden for plugins to actually use ldms_metric_by_name() in the critical path. But plugins where a name-to-index number conversion is required by the nature of the data being sampled, ldms_metric_by_name() accomplishes exactly the task required, but the plugin is forbidden to use it and must implement its own more efficient lookup.

It would seem that improving the ldms_metric_by_name() implementation such that it is permitted to be used in the critical path (where required) would help plugins avoid re-implementing the same lookup code repeatedly.

ldms plugin option "schema" issues

ldms plugins are currently required to parse and option named "schema". There are a few ways in which does not seem to be a good requirement.

First of all, how are plugins that use multiple schemas supposed to handle a single schema name configuration option?

For plugins with a single statically coded schema, it would seem like the schema name should be chosen by the person writing the plugin. Allowing users to give different names to the same exact schema seems like a recipe for unnecessary confusion.

We might also consider either explicitly introducing a schema version field in metric sets, or perhaps developing guidelines plugin developers for how to include versioning in the schema string. Then when changes need to be made to a statically coded schema, the programmer has a way to communicate to the users that the schema has change in a standard way.

Building with ovis-pkg, cython version issue

I am building ovis 4.2.1-rc1 with ovis-pkg. I found that using the latest stable python package provided by RHEL (0.19-5) the sos build fails with multiple cython errors: "Obtaining 'char const *' from temporary Python value." I used Cython-0.29 directly from cython.org and that built without error.

LDMS V4 Beta Release

LDMS V4 Beta has been released! Pull the OVIS-4.2_Beta branch. V3 Branches will continue to be hosted here, but development will be limited to bug fixes only.

ldms sampler logging needs per-sampler log levels

Apparently there is a need for having different logging levels enabled in different ldms sampler plugins at the same time.

One way to achieve this goal is to pass an opaque logging handle to the plugin at get_plugin() time. This handle would embed or reference the currently configured logging level for that particular plugin (reference is probably better so that ldmsd can change the logging level without the plugin needing to be involved). The sample plugins would simply call a standard logging function that is exported by the ldms library, that looks something like this:

  void ldmsd_sampler_log(ldms_log_handle_t handle, enum ldmsd_loglevel level, const char *fmt, ...);

LDMS v4 RC1 Release

OVIS-4.2.1-rc1 has been released.
The OVIS-4.2_Beta branch will be deprecated.

a question about __ldms_remote_update

While reading code I came across something that I didn't understand, that may or may not be a bug. I don't have the bandwidth to pursue this right now, could someone take a look and comment on it?

Suppose that a sampler makes a change to the metadata portion of a metric set, and updates meta_gn in both the metadata and data sections of the set in its memory.

The aggregator will call update_data -> do_update -> ldms_xprt_update -> __ldms_remote_update in order to download the sampler's latest metric set. The code tries to figure out whether it has to download the entire set, or just the data portion.

It looks to me like the code near the top of the routine is reading the meta_gn values out of the PREVIOUS metric set that it received from the sampler. Do I have this right? In that previous sample, the two (older) meta_gn values matched, and so the aggregator is only going to call do_read_data() to get the data portion of the next metric set. So when this metric set is processed, it will have a metadata portion with an old meta_gn paired with a data portion with a newer meta_gn.

The next time that __ldms_remote_update gets called, it will see that the two meta_gn values don't match, and it will pull down both metadata and data portions, so the problem resolves itself in the subsequent sample, but there is that one sample in which there is a meta_gn mismatch. I suspect that's an issue, but I don't know how serious it is. If it's not a big deal, just close this.

Can you provide a Baler usage example?

Hi, I'm having a cooperate project with LANL and this project requires using Baler to perform syslog analysis. I have succeed build OVIS and baler. However I don't know how to use it. The balerd -h page provide very little information and the man page of balerd are all garbled codes. Besides this the balerd -h page saids for more information see balerd(3) manpage, I can't find what balerd(3) is. So can you provide me some information about how to use baler to read a system log file and generate the pattern sequence file?
I tried to run command "balerd -l file.txt" to analyze file.txt, the output is a folder called "store" but all the files in that folder are unreadable. Do my method of running baler is correct? If not can you tell me what the correct way is?

ldms plugin job_id difficulties

Currently, it is a requirement that all ldms plugins implement a job_id option and store the job_id in every metric set.

The first issue is that not all plugins will be running on components where a job_id makes sense. With some plugins, it may be obvious that a job would never be running on the nodes, and with others the plugin might run on nodes that run jobs, and on others it might run on components that do not run jobs. With the former, we might just relax the rules and allow them to not implement job_id. With the latter, we still have an issue, although perhaps just designating "0" in the job_id field to mean "no job" would be fine. But since job_id is a compile time option, and few people will want different builds for different nodes in the same center, the job_id will stay around uselessly.

Another issue with the current job_id implementation is that it seems to assume that only a single job is running on a node at a time. This is not the case on some of our clusters. So job_id is going to be insufficient there.

job_id is an integer, and that might be overly restrictive for some job managers. Switching to a string might make sense.

Finally, the all-macro implementation of the helpers that all plugins are required to use make too many assumptions about the plugin's internal implementation. It would be very nice to move to a proper function-based API for the implementation, which by its nature would probably eliminate the issues that the macro implementation has.

OVIS 3.4.8 compilation fails when using --with-aqmp

OS: CentOS 7.5 (x86_64)

OVIS: 3.4.8

Configure options:

./configure --prefix=/opt/ovis/3.4.8 --enable-swig --enable-rabbitkw --enable-rabbitv3 --enable-amqp

Make result:

  CC       libstore_function_csv_la-store_function_csv.lo
store_amqp.c: In function 'open_store':
store_amqp.c:446:10: error: incompatible type for argument 7 of 'amqp_exchange_declare'
          0, 0, amqp_empty_table);
          ^
In file included from /usr/include/amqp.h:765:0,
                 from /usr/include/amqp_tcp_socket.h:36,
                 from store_amqp.c:59:
/usr/include/amqp_framing.h:798:11: note: expected 'amqp_boolean_t' but argument is of type 'amqp_table_t'
 AMQP_CALL amqp_exchange_declare(amqp_connection_state_t state, amqp_channel_t channel, amqp_bytes_t exchange, amqp_bytes_t type, amqp_boolean_t passive, amqp_boolean_t durable, amqp_boolean_t auto_delete, amqp_boolean_t internal, amqp_table_t arguments);
           ^
store_amqp.c:446:10: error: too few arguments to function 'amqp_exchange_declare'
          0, 0, amqp_empty_table);
          ^
In file included from /usr/include/amqp.h:765:0,
                 from /usr/include/amqp_tcp_socket.h:36,
                 from store_amqp.c:59:
/usr/include/amqp_framing.h:798:11: note: declared here
 AMQP_CALL amqp_exchange_declare(amqp_connection_state_t state, amqp_channel_t channel, amqp_bytes_t exchange, amqp_bytes_t type, amqp_boolean_t passive, amqp_boolean_t durable, amqp_boolean_t auto_delete, amqp_boolean_t internal, amqp_table_t arguments);
           ^
make[5]: *** [libstore_amqp_la-store_amqp.lo] Error 1
make[5]: *** Waiting for unfinished jobs....
make[5]: Leaving directory `/root/ovis-3.4.8/ldms/src/store'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory `/root/ovis-3.4.8/ldms/src'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory `/root/ovis-3.4.8/ldms'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/root/ovis-3.4.8/ldms'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/ovis-3.4.8'
make: *** [all] Error 2

Systemd Aggregator Integration Issue

The LDMSD_AUTH_OPTION="-A conf=/opt/ovis/etc/ldms/ldmsauth.conf" is necessary for the ldmsd.aggregator service to work properly. It is defined as a variable in .env file however it is not included in the ldmsd.aggregator.service file. For reasons I'm not 100% on, when not including this option ldms does not look in the default location for the authentication config and will therefore return this error:

ERROR : 'sock' transport creation with auth 'ovis' failed, error: ENOENT(2). Please check transpot configuration, authentication configuration, ZAP_LIBPATH (env var), and LD_LIBRARY_PATH.

Please add the authentication option to the aggregator service code and maybe add LDMS_AUTH_FILE environment variable to the .env file to make sure it is absorbed into the systemctl call.

Why is logging function passed to plugin through a function pointer?

Is there any reason that an ldms sampler plugin could not simply use ldmsd_msg_logger() directly instead of using the function pointer that is passed to the sampler through get_plugin()?

Using the exported symbol directly is much preferable to a plugin implementer when they are using multiple files in their plugin for organizational purposes. Using the function pointer, combined with the ldmsd rule that all globals must be file-scope in samplers results in unnecessary cruft in the code.

math error in ldms_set_new() causes a corrupt metric value

Greetings,

There is a math error in each of the routines get_schema_name() and get_first_metric_desc(). Code fixes are shown below. The result of the errors is that sometimes due to the rounding in get_first_metric_dest(), all of the metadata descriptors can be placed 8 bytes higher in memory than was calculated at the beginning of ldms_set_new() here:

    meta_sz = schema->meta_sz /* header + metric dict */
            + strlen(schema->name) + 2 /* schema name + '\0' + len */
            + strlen(instance_name) + 2; /* instance name + '\0' + len */
    meta_sz = roundup(meta_sz, 8);
    meta = mm_alloc(meta_sz + schema->data_sz);

If the metric set contains META metrics, the last 8 bytes of the value of the last META metric will be overwritten by the first DATA metric, appearing as data corruption. If there are no META metrics in the set, then the last 8 bytes of the last DATA metric will lie outside of the byte range mm_alloc'd by ldms_set_new(), and won't get rdma-transferred to the aggregator, appearing as data corruption.

I have tested the fixes locally, they work fine.

Kevan

static inline ldms_name_t get_schema_name(struct ldms_set_hdr *meta)
{
ldms_name_t inst = get_instance_name(meta);

  • return (ldms_name_t)(&inst->name[inst->len+sizeof(*inst)]);
  • return (ldms_name_t)(&inst->name[inst->len]);
    }

static inline struct ldms_value_desc *get_first_metric_desc(struct ldms_set_hdr *meta)
{
ldms_name_t name = get_schema_name(meta);

  • char *p = &name->name[name->len+sizeof(*name)];
  • char *p = &name->name[name->len];
    p = (char *)roundup((uint64_t)p, 8);
    return (struct ldms_value_desc *)p;
    }

ovis-base.spec.in has a variety of issues

I'm trying to build rpms so make deployment easier. I have diffs to support sles, to run ldconfig
in a post so that builds where we install to prefix other than /usr and some other clean up.

``diff --git a/packaging/ovis-base.spec.in b/packaging/ovis-base.spec.in
index 78a4eb2..9b53eb0 100644
--- a/packaging/ovis-base.spec.in
+++ b/packaging/ovis-base.spec.in
@@ -6,7 +6,7 @@
#%-define _unpackaged_files_terminate_build 0
#%-define _missing_doc_files_terminate_build 0

-%define ldms_all System Environment/Libraries
+%define ovis System Environment/Libraries

%if 0%{?rhel} && 0%{?rhel} <= 6
%{!?__python2: %global __python2 /usr/bin/python2}
@@ -14,21 +14,50 @@
%{!?python2_sitearch: %global python2_sitearch %(%{__python2} -c "from distutils.sysconfig import get_python_lib; print(get_python_lib(1))")}
%endif

+%define suse %(if [ "%_vendor" = "suse" ]; then echo 1; else echo 0; fi)
+%define redhat %(if [ "%_vendor" = "redhat" ]; then echo 1; else echo 0; fi)
+%if "%{_vendor}" == "redhat"
+%define is_rhel5 %(test -f /etc/redhat-release && grep -q "^Red Hat Enterprise Linux Server release 5" /etc/redhat-release && echo 1 || echo 0)
+%define is_rhel6 %(test -f /etc/redhat-release && grep -q "^Red Hat Enterprise Linux Server release 6" /etc/redhat-release && echo 1 || echo 0)
+%define is_rhel7 %(test -f /etc/redhat-release && grep -q "^Red Hat Enterprise Linux Server release 7" /etc/redhat-release && echo 1 || echo 0)
+%else
+%define is_rhel5 0
+%define is_rhel6 0
+%define is_rhel7 0
+%endif
+
+%if "%{_vendor}" == "suse"
+%define is_sles11 %(test -f /etc/SuSE-release && grep -q "^SUSE Linux Enterprise Server 11" /etc/SuSE-release && echo 1 || echo 0)
+%define is_sles12 %(test -f /etc/SuSE-release && grep -q "^SUSE Linux Enterprise Server 12" /etc/SuSE-release && echo 1 || echo 0)
+%else
+%define is_sles11 0
+%define is_sles12 0
+%endif
+
+

Main package

Summary: OVIS LDMS Commands and Libraries
-Name: ldms-all
+Name: ovis
Version: @Version@
Release: ovis1%{?dist}
License: GPLv2 or BSD
-Group: %{ldms_all}
+Group: %{ovis}
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)
Source: %{name}-%{version}.tar.gz
-Requires: rpm >= 4.8.0 ovis-libevent2 boost-devel genders libyaml libyaml-devel python2 python2-devel
-BuildRequires: doxygen openssl-devel libibverbs-devel librdmacm-devel gcc ovis-libevent2-devel glib2-devel libibmad-devel boost-devel genders ovis-libevent2 libyaml libyaml-devel python2 python2-devel gettext-devel
+%if "%{_vendor}" == "suse"
+Requires: rpm >= 4.8.0 libevent-2_0-5 boost-devel genders libyaml-0-2 libyaml-devel python2 python2-devel
+BuildRequires: doxygen openssl-devel libibverbs-devel librdmacm-devel gcc libevent-devel glib2-devel libibmad-devel boost-devel genders libyaml-0-2 libyaml-devel python2 python2-devel gettext-devel
+%endif
+%if "%{_vendor}" == "redhat"
+Requires: rpm >= 4.8.0 libevent boost-devel genders libyaml libyaml-devel python2 python2-devel
+BuildRequires: doxygen openssl-devel libibverbs-devel librdmacm-devel gcc libevent-devel glib2-devel libibmad-devel boost-devel genders libyaml libyaml-devel python2 python2-devel gettext-devel
+%endif
Url: http://ovis.ca.sandia.gov/

-Prefix: /usr

+Prefix: @Prefix@
+%define _prefix @Prefix@
+%define _datadir %{_prefix}/share
+%define _docdir %{_datadir}/doc

%description
This package provides the LDMS commands and libraries, OVIS apis and transport libraries, and scalable object store libraries.
@@ -48,6 +77,7 @@ echo bBUILDROOT $RPM_BUILD_ROOT
%configure @ac_configure_args@
make

+# core
%install
echo TMPPATH %{_tmppath}
echo BUILDROOT $RPM_BUILD_ROOT
@@ -56,13 +86,12 @@ make DESTDIR=${RPM_BUILD_ROOT} install
rm -f $RPM_BUILD_ROOT%{_libdir}/.la
rm -f $RPM_BUILD_ROOT%{_libdir}/ovis-ldms/lib
.la

fix in subsequent after sorting use of sysconfdir or share/baler in baler

-rm $RPM_BUILD_ROOT%{_prefix}/etc/ovis/eng-dictionary
rm $RPM_BUILD_ROOT%{bindir}/test*
rm $RPM_BUILD_ROOT%{_bindir}/ldms_ban.sh
-mv $RPM_BUILD_ROOT%{_docdir}/ovis-ldms-/ $RPM_BUILD_ROOT%{_docdir}/%{name}-%{version}/
-mkdir $RPM_BUILD_ROOT%{_sysconfdir}
-cp -r $RPM_BUILD_ROOT%{_docdir}/%{name}-%{version}/sample_init_scripts/genders/etc/init.d $RPM_BUILD_ROOT%{_sysconfdir}
-cp -r $RPM_BUILD_ROOT%{_docdir}/%{name}-%{version}/sample_init_scripts/genders/etc/sysconfig $RPM_BUILD_ROOT%{_sysconfdir}
+cp -r $RPM_BUILD_ROOT%{_docdir}/%{name}-%{version}/sample_init_scripts/genders/ $RPM_BUILD_ROOT%{_sysconfdir}/%{name}
+rm $RPM_BUILD_ROOT%{_docdir}/%{name}-lib-%{version}/README
+rm $RPM_BUILD_ROOT%{_docdir}/%{name}-lib-%{version}/COPYING
+rm $RPM_BUILD_ROOT%{_docdir}/%{name}-lib-%{version}/ChangeLog

%clean
rm -rf $RPM_BUILD_ROOT
@@ -72,16 +101,17 @@ rm -rf $RPM_BUILD_ROOT
%{_libdir}/*
%{_bindir}/*
%{_sbindir}/*
-%{_docdir}/%{name}-%{version}/COPYING
-%{_docdir}/%{name}-%{version}/ChangeLog
-%{_docdir}/%{name}-%{version}/AUTHORS
+%{_docdir}/%{name}-ldms-%{version}/COPYING
+%{_docdir}/%{name}-ldms-%{version}/ChangeLog
+%{_docdir}/%{name}-ldms-%{version}/AUTHORS
+%{_docdir}/%{name}-ldms-%{version}/README
#end core

devel

%package devel
Summary: LDMS devel package
Group: %{ldms_grp}
-Requires: ldms-all = @Version@
+Requires: ovis = @Version@
%description devel
This is a development package of Lightweight Distributed Metric System (LDMS).
Users who want to implement their own sampler or store must install this
@@ -99,7 +129,7 @@ package.
%package initscripts
Summary: LDMS initscripts for libgenders control of %{name}
Group: %{ldms_grp}
-Requires: ldms-all = @Version@
+Requires: ovis = @Version@
%description initscripts
This is the libgenders based boot scripts for LDMS daemons.
Users must provide information via /etc/genders (or alternate file)
@@ -108,43 +138,54 @@ to make these scripts operate. They are required to fail out of the box.
%files initscripts
%defattr(-,root,root)
%{_sysconfdir}//
+
+%post -n initscripts
+/sbin/ldconfig
+
+%postun -n initscripts
+/sbin/ldconfig
+
#end initscripts

+# doc
%package doc
Summary: Documentation files for %{name}
-Group: %{ldms_all}
+Group: %{ovis}

Requires: %{name}-devel = %{version}-%{release}

%description doc
-Doxygen files for ldms-all package.
+Doxygen files for ovis package.
%files doc
%defattr(-,root,root)
%{_mandir}//
-%{_datadir}/doc/%{name}-%{version}
+#%{_datadir}/doc/%{name}-%{version}
+%{_datadir}/doc
%docdir %{_defaultdocdir}
+#end doc

+# python2
%package python2
Summary: Python files for LDMS
%description python2
Python files for LDMS
%files python2
%defattr(-,root,root)
-%{python2_sitelib}/*
+%{_prefix}/lib/python2.7/site-packages/*
#end python2

see https://fedoraproject.org/wiki/Packaging:Python_Old

and https://fedoraproject.org/wiki/Packaging:Python

%changelog
-* Thu Oct 13 2015 Ben Allan [email protected] 3.0.0-1
+* Wed Feb 14 2018 Jeff Hanson [email protected] 3.0.1-1
+clean up of rpm creation
+* Tue Oct 13 2015 Ben Allan [email protected] 3.0.0-1
update to v3.
-* Thu Aug 25 2015 Ben Allan [email protected] 2.4.5-1
+* Tue Aug 25 2015 Ben Allan [email protected] 2.4.5-1
update to latest upstream.
-* Thu Jul 29 2015 Ben Allan [email protected] 2.4.4-1
+* Wed Jul 29 2015 Ben Allan [email protected] 2.4.4-1
update to latest upstream.

RHEL 7.1 Build Error

I'm building with 4.2.1-rc1 and I am getting the following error on a rhel 7.1 system.

/usr/bin/ld: .libs/_ldms_la-ldms.o: relocation R_X86_64_PC32 against undefined symbol `ldms_rbuf_desc_array_metric_value_get' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value

Any help is appreciated. Thanks!

LDMS lustre support

We would like to see LDMS updated to work with the recent generations of Lustre. I don't know the full extent of what we'll need yet, but I figure that it is worth while to begin the conversation with what we have found so far.

In the past Lustre made most of its stats available through entries in /proc. At the time, that was pretty much the only option available under Linux. These days Linux provides /sys and debugfs. Lustre has been migrating from /proc to a combination of /sys and debugfs in recent major releases.

Rather than hard-code the various possible locations for the same information into something like LDMS, the recommended approach is to use the command "lctl get_param" to retrieve lustre values. For instance, in Lustre 2.8 the lustre version can be found in /proc/fs/lustre/version, but in 2.12 the version is found in /sys/fs/lustre/version. And who knows when exactly that transition happened.

The following command works in both versions:

quartz1$ lctl get_param version
version=lustre: 2.8.2_6.chaos
kernel: patchless_client
build:  2.8.2_6.chaos

opal1$ lctl get_param version
version=2.12.0_1.chaos

Note, however, that the "version" output did change at some point between 2.8 and 2.12.

In 2.12, many, but not all, of the files named "stats" have moved from /proc to debug fs. Of most immediate interest to us are the 'llite.*.stats' files. The lctl command unfortunately fails to find any of the entries that moved to debugfs, but let's assume that is a Lustre bug that will be fixed.

Next, LDMS seems to be collecting a great many values that probably don't have a great deal of interest to us. This isn't to say that someone, somewhere might not find them useful, but we would like to focus initially on gathering a smaller subset of information. I will break them down by Lustre node type.

On Lustre clients, we want to gather the information from the following commands:

lctl get_param version
lctl get_param jobid_name
lctl get_param jobid_var
lctl get_param 'llite.*.stats'

On Lustre MDS nodes we will want to gather:

lctl get_param version
lctl get_param 'mdt.*.md_stats'
lctl get_param 'mdt.*.job_stats'

On Lustre OSS nodes we will want to gather:

lctl get_param version
lctl get_param 'obdfilter.*.stats'
lctl get_param 'obdfilter.*.job_stats'

I can follow up with the details of what the output of each of these looks like.

Building v4 Beta, SOS issues

I did a public grab of v4_Beta, used git submodule to get SOS, and ran into the following problems:

  1. The ovis INSTALL file suggests that building ovis will build SOS if it is there, but it did not, for me. I had to build SOS first and then use "--with-sos=/sos/install/path" to get configure to finish.
  2. Then in compiling I get the errors below, which to me look like the default SOS that I grabbed with the git submodule command is an incompatible (older?) version. The SOS version information is below the compile errors.
    CC libmap_la-libmap.lo
    ../../../../lib/src/coll/libmap.c: In function ‘map_transform’:
    ../../../../lib/src/coll/libmap.c:235:7: warning: implicit declaration of function ‘sos_key_for_attr’ [-Wimplicit-function-declaration]
    if (!sos_key_for_attr(trans_key, map_s->src_attr, match_val))
    ^
    ../../../../lib/src/coll/libmap.c: In function ‘map_transform_min’:
    ../../../../lib/src/coll/libmap.c:285:41: error: ‘sos_index_find_min’ undeclared (first use in this function)
    return __attr_find(ret, map->tgt_attr, sos_index_find_min);
    ^
    ../../../../lib/src/coll/libmap.c:285:41: note: each undeclared identifier is reported only once for each function it appears in
    ../../../../lib/src/coll/libmap.c: In function ‘map_transform_max’:
    ../../../../lib/src/coll/libmap.c:290:41: error: ‘sos_index_find_max’ undeclared (first use in this function)
    return __attr_find(ret, map->tgt_attr, sos_index_find_max);
    ^
    ../../../../lib/src/coll/libmap.c: In function ‘map_inverse_min’:
    ../../../../lib/src/coll/libmap.c:295:41: error: ‘sos_index_find_min’ undeclared (first use in this function)
    return __attr_find(ret, map->src_attr, sos_index_find_min);
    ^
    ../../../../lib/src/coll/libmap.c: In function ‘map_inverse_max’:
    ../../../../lib/src/coll/libmap.c:300:41: error: ‘sos_index_find_max’ undeclared (first use in this function)
    return __attr_find(ret, map->src_attr, sos_index_find_max);
    ^
    SOS Version info:
    ~/tools/ovis/sos$ git status
    HEAD detached at ba9a4db
    nothing to commit, working directory clean

(I couldn't find any other SOS branch I should be on; the above is from just doing "git submodule init" and then "git submodule update")

ldms plugins should not be required to honor an "instance" configuration option

Currently, it is required that all ldms plugins implement an "instance" configuration option. Unfortunately, this option just does not make sense for all plugins, and this requirement should likely be amended.

First of all, some plugins generate multiple metric sets from one or more schemas. How is this single instance name supposed to be applied to multiple metric sets which are required to have different instance names?

The current approach by some of those plugins to accept the instance plugin option, and then completely ignore it. This is probably not a great approach, but it demonstrates that the instance option's design is not general enough to be a required configuration in all plugins.

ldms_set_delete() does not clean up set_tree entry

As part of its processing, ldms_set_new() calls __record_set(), which adds a RBT entry to set_tree. The routine ldms_set_delete() does NOT remove that entry from set_tree, so the reference to the metric set is left in set_tree forever even though the set memory has been released. The easiest way to see this is to do:

load name=abc
config name=abc instance=xyz
term name=abc
load name=abc
config name=abc instance=xyz

The second config command will fail because of the instance name collision with the previous instance of that plugin still in set_tree. You have to restart the ldmsd daemon in order to reuse that instance name.

Here is a clip from routine ldms_set_delete(), to fix this problem I added the three lines at the end and it seems to work fine.

@@ -480,6 +480,9 @@ void ldms_set_delete(ldms_set_t s)
LIST_REMOVE(rbd, set_link);
__ldms_free_rbd(rbd);
}

  •           rem_local_set(set);
    
  •           mm_free(set->meta);
    
  •           free(set);
     }
     __ldms_set_tree_unlock();
     free(sd);
    

Failures connecting to ldmsd w OVIS-4.2.1-rc1

I'm seeing intermittent failures while connecting to LDMS daemons in LDMS 4.2.1-rc1.

  1. The aggregators start showing messages like
    Error 5 in lookup callback for set 'nid07421/jobinfo'
    We've had about 1000000 of these messages in the past 24 hrs.

  2. There is also a regular stream of
    Producer agg1107.nid07737 rejected the connection (ugni nid07737:411)
    messages. Restarting the aggregator clears these.

  3. Running updtr_status on the aggregator nodes shows roughly half of the nodes on each aggregator in a CONNECTED state and half in a DISCONNECTED state. The number of connected and disconnected nodes is not static, but increases and decreases over time.

  4. Connections to the aggregator nodes via ldmsctl and ldms_ls fail randomly a few times per minute.

  5. I'm experiencing trouble connecting via ldms_ls from some nodes. This problem is consistent, i.e. it happens every time I try to connect.

boot-cori:~ # ssh mom2 ZAP_UGNI_COOKIE=0x876543 ldms_ls -h mom4 -x ugni -p 412 -a munge | head -2
nid13054/vmstat
nid13054/procstat

boot-cori:~ # ssh mom1 ZAP_UGNI_COOKIE=0x876543 ldms_ls -h mom4 -x ugni -p 412 -a munge | head -2
Warning: Unable to initialize DLA, GNI_RC_ERROR_RESOURCE at line 506 in file cdm.c
zap_ugni: ERROR: GNI_CdmAttach() failed: GNI_RC_ERROR_RESOURCE
ldms: Cannot get zap plugin: ugni
Error creating transport.

This started after I restarted one of the daemons while I was polling it via the command line clients. The node running the clients is no longer able to connect to ldms.

Is it possible to add hosts to aggregator on the fly?

I am using docker-swarm for testing my plugins. Is there a way to add hosts in the aggregator config database on the fly? Docker swarm allows to scale the number of computes dynamically, but there is no way today to add hosts to aggregator dynamically or may be I am just unaware of this functionality.

ldms plugin component_id design change

All ldms plugins are currently required to honor a "component_id" option and store it in a metadata metric in all of its metric sets. The data type of component_id is currently an LDMS_V_U64.

As I understand it, the intention is that this number will uniquely identify the component on which the ldmsd is running, e.g. a node. So one common use appears to be that if one has a compute cluster of 1000 nodes, each of the nodes will be assigned a unique integer which is passed to every ldms plugin and in turn stored in the component_id of each metric set on a node.

This probably works fine when there is only a single cluster in one's center, or if every cluster in a center has its own separate monitoring system. However it begins to become an undesirable burden when scaled up.

Consider a site that has 15 clusters and all of the monitoring data from each of the clusters is combined into a single central monitoring database. One might imagine that the configuration of such a system becomes a challenge in general. Now we try to add ldms to that system, and suggenly the system administrators have a new configuration burden to uniquely assign integer values to all nodes across all of the clusters, even though those clusters may be maintained by many different people at various times.

The sysadmins might rightfully argue that they already have a unique way to identify nodes: hostnames. The additional integer component ids just for ldms are an added configuration burden that they would not want.

So I would recommend that we can address this issue by changing the type of component_id from an integer to a string. For people that wish to assign unique integers to all nodes, they can still do that because an number can be stored in string form (and possibly converted into a real integer before final insertion into the final monitoring database). But a hostname cannot be stored in an integer.

A string component_id also accommodates other components that might not look exactly like a linux node (network switches?).

I would recommend also that the default value of the component_id string be the hostname on systems that support an API like gethostname(). This too is not in conflict with those who wish to use integer component_id values, because those need to be set manually any way. But for those who with to use hostnames, it eliminates another configuration burden.

Also, it would probably be best to have the default value determined in ldmsd proper rather than requiring each plugin to individually develop methods to determine the default.

For complex systems, it should always be our goal to make configuration be as simple as reasonable. It is possible that in many cases, with careful schema design, a plugin can operate with little or no configuration decisions on the part of the user.

All_example plugin issue

I am getting the following error when starting my aggregator when using the all_example plugin:

ERROR : store_csv: metric id 3: no name at list index 3.
ERROR : store_csv: reconfigure to resolve schema definition conflict for schema=all_example
and instance=node31/all_example.

Here are the relevant lines of configuration:
sampler.conf
load name=all_example
config name=all_example producer=node31 instance=node31/all_example component_id=5200001 job_set=node31/jobinfo

agg.conf
strgp_add name=policy_all_example plugin=store_csv container=csv schema=all_example
strgp_prdcr_add name=policy_all_example regex=.*
strgp_start name=policy_all_example

What is causing this error? Additionally, when I store with csv it stores fine, however storing with sos causes the aggregator to fail with no reported errors after failing.

Problem of installing ovis on CentOS 7

Hello ovis development team,

I am working on a project with Los Almos lab that need to install the LDMS to collect some data. I follow these instructions to install: https://www.opengridcomputing.com/wordpress/index.php/ovis-3-3-user-guide/
However, I installed all the dependencies, and follow the Quick build guide. However, after I have done the command "../configure --prefix=/home/foo/opt/ovis --enable-baler --enable-sos --enable-swig" and start doing "make", there are a few errors coming up: (the following is the screenshot)
image

I am not sure how to fix the undefined reference to sos_iter_pos_set or sos_iter_pos_get. I am not sure if this is version issue or not. I also checked the keyword using grep, it gave me the following result:
image

Does anyone know anything about this issue? I would really appreciate for y'all helps!

Bests,
Mzcg

Exceeded max store keys

On OVIS 4.2.1-rc1:

I am trying to create 22 csv store files on my L2 aggregator. 3 csv files have 0 bytes stored in them even though the values appear under ldms_ls -l on the L2 aggregator and the files are non-zero in my L1 agg store. My log file has the error:

store_csv.c: Error: Exceeded max store keys

I see there is a default in that file of 20 keys. What is the correct way to increase this number? Is that the cause of my zero bytes files?
Any help resolving this is appreciated.

err_str[LEN_ERRSTR] buffer overflow causes ldmsd abort

Due to the way that Cray packages products, the pathname to dynamically loaded samplers and stores can be quite long. If a sampler is not found, this results in an error message that is larger than the current 128-byte size of err_str[LEN_ERRSTR].

"-1dlerror /opt/cray/ldms/0.1-1.0000.292e049.0.0.el7/lib64/ovis-ldms/libcray_procdiskstats.so: cannot open shared object file: No such file or directory"

The result is a ldmsd daemon abort(). I doubled the size of LEN_ERRSTR to 256 in our local copy of OVIS, and this seems to solve the problem.

Program received signal SIGABRT, Aborted.
0x00007f43131851d7 in raise () from /lib64/libc.so.6
(gdb) where
#0 0x00007f43131851d7 in raise () from /lib64/libc.so.6
#1 0x00007f43131868c8 in abort () from /lib64/libc.so.6
#2 0x00007f43131c4f07 in __libc_message () from /lib64/libc.so.6
#3 0x00007f431325f047 in __fortify_fail () from /lib64/libc.so.6
#4 0x00007f431325f010 in __stack_chk_fail () from /lib64/libc.so.6
#5 0x000000000040a166 in process_load_plugin (
replybuf=0x622120 "-1dlerror /opt/cray/ldms/0.1-1.0000.292e049.0.0.el7/lib64/ovis-ldms/libcray_procdiskstats.so: cannot open shared object file: No such file or directory", av_list=, kw_list=)
at ldmsd_config.c:1535
...

ldmsd should enforce plugin API semantics

ldmsd should enforce the sematics chosen for the plugin API.

For instance, one rule in ldms is that every sampler plugin must have config() called before the sampler is started. It should be fairly simple to add a flag somewhere like struct ldms_plugin_cfg that tracks when config() has been successfully called. Then ldmsd can issue an error any time a command is issued to start a plugin that has not been configured.

Right now ldmsd fully permits starting a plugin without configuring it first.

Question about passive mode for an aggregator

I would like more information on 'passive' mode for an aggregator. Reading the man pages it seems like you can setup an aggregator to wait for a connection instead of initiating it. My aggregator is not going to be able to connect to the node the sampler is running on, the sampler node must initiate the request. The documentation is a little bit light on details.

Is there additional information somewhere on this? I have not seen any examples of this online or in this project. I do see a reference here: https://github.com/ovis-hpc/ovis/wiki/Configuration-Considerations-and-Best-Practices-%28v4%29#Active_vs_Passive_Connections

ldmsd_updtr_state_str bug

In https://github.com/ovis-hpc/ovis/blob/master/ldms/src/core/ldmsd.h the inline function provokes compiler warning on every file which includes ldmsd.h. Either the switch should use 'default: return "BAD STATE";' or the case LDMSD_UPDTR_STATE_STOPPING should be handled. Ideally, it should be both, as C provides no guarantee that a passed enum value is a valid enum member.

static inline const char *ldmsd_updtr_state_str(enum ldmsd_updtr_state state) {
switch (state) {
case LDMSD_UPDTR_STATE_STOPPED:
return "STOPPED";
case LDMSD_UPDTR_STATE_RUNNING:
return "RUNNING";
}
return "BAD STATE";
}

LDMS V4 Video Request list

We will be populating the wiki with demo videos. Please add your Video demo requests to this issue. Current requests include:

  • overview with how the data can be used
  • cluster build and configuration
  • new feature: failover
  • new feature: metric set groups and vectors of sets
  • papi sampler (coming soon)
  • Cray Aries network metrics - samplers and use
  • Storage and Analysis interactions -- Hello World of python-based analysis using the Scalable Object Store (SOS)

LDMS Users group

Topic: LDMS Users Group
Schedule below. For call in info and notes, please see the ovis-hpc/ovis wiki

=======================================================

Time: Jan 21, 2019 12:00 PM Mountain Time (US and Canada)

Every 2 weeks on Mon, until May 27, 2019, 10 occurrence(s)

Jan 21, 2019 12:00 PM

Feb 4, 2019 12:00 PM

Feb 18, 2019 12:00 PM

Mar 4, 2019 12:00 PM

Mar 18, 2019 12:00 PM

Apr 1, 2019 12:00 PM

Apr 15, 2019 12:00 PM

Apr 29, 2019 12:00 PM

May 13, 2019 12:00 PM

May 27, 2019 12:00 PM

https://zoom.us/j/599135580

One tap mobile

+16468769923,,599135580# US (New York)

+16699006833,,599135580# US (San Jose)

Dial by your location

+1 646 876 9923 US (New York)

+1 669 900 6833 US (San Jose)

Meeting ID: 599 135 580

store split keys

The CSV and other container oriented stores, for data sets that represent the same metrics for many entities produce unwieldy containers (very large single files).
The storage policy architecture routes data by schema only; there is no way to filter by instance name.

One way to alleviate the forced coarseness of data storage and analytics, which at large scale is troublesome, is to let the (for instance) CSV user define a list of metrics that are the keys on which file and subdirectory splits are done. For example, a schema containing a device name (metric such as port in opa2) could at the CSV store produce a file per device name if configured with keys=port.

Similarly, for export to users by job, we could apply a CSV with keys=user,job_id such that output ends up in $path/$user/$job_id/meminfo

Similarly, scalability of containers to weekly system data volumes for sos could be arranged as
key=ProducerName, which would provoke per-producer containers instead of a single 1800x larger container including data from 1800 producers. This would let us look at a single node for a much larger time window; we have jobs that run 2 weeks to infinity at SNL, so this would be a substantial win in the analytics pipeline.

problem in aggregators

I had two machine vmtest1 and vmtest2

in both machines i installed ovis4 and getting ldms data
in vmtest1

[ovis@vmtest1 ~]$ ldms_ls -h localhost -x sock -p 10444
vmtest1/vmstat
vmtest1/procstat
vmtest1/procdiskstats
vmtest1/meminfo
vmtest1/lustre2_client
vmtest1/dstat

on vmtest2

[ovis@vmtest2 ~]$ ldms_ls -h localhost -x sock -p 10447
vmtest2/vmstat
vmtest2/procstat
vmtest2/procdiskstats
vmtest2/meminfo
vmtest2/lustre2_client
vmtest2/dstat

now i installed on third machine with host aggr
agg11.conf

============================================
prdcr_add name=vmtest1 host=vmtest1 type=active xprt=sock port=10444 interval=20000000
prdcr_start name=vmtest1
updtr_add name=policy_h1 interval=1000000 offset=100000
updtr_prdcr_add name=policy_h1 regex=vmtest1
updtr_start name=policy_h1
prdcr_add name=vmtest2 host=vmtest2 type=active xprt=sock port=10447 interval=20000000
prdcr_start name=vmtest2
updtr_add name=policy_h2 interval=2000000 offset=100000
updtr_prdcr_add name=policy_h2 regex=vmtest2
updtr_start name=policy_h2

now i started ldms on host aggr and ran the following command

-bash-4.2$ sudo ldms_ls -h localhost -x sock -p 10445

===============================================

its empty
its not showing vmtest1 and vmtest2

Please help me

thanks
sagar

trapping and stopping self-connections is needed

Trapping and preventing connections to self is needed.
Ideally we need a daemon uuid so that if failover configuration or misconfiguration generates a loop to self by any transport path we disallow it rather than having thread logic go silently insane.
An approximate version of this trap (direct case only) was done in ogc gitlab MR 1096.

lib/python/Map.pxd extra semicolons break build with sos on rhel7

lib/python/Map.pxd has many lines ending in ; for which the compiler says they should not and dies.

  • int map_transform_min(map_t _map, uint64_t *ret);
  • int map_transform_max(map_t _map, uint64_t *ret);
  • int map_inverse_min(map_t _map, uint64_t *ret);
  • int map_inverse_max(map_t _map, uint64_t *ret);
  • int map_transform_ge(map_t map, uint64_t src, uint64_t *dst);
  • int map_transform_le(map_t map, uint64_t src, uint64_t *dst);
  • int map_inverse_ge(map_t map, uint64_t dst, uint64_t *src);
  • int map_inverse_le(map_t map, uint64_t dst, uint64_t *src);

ldms_set_new() does not allow reuse of names previously relinquished by ldms_set_delete()

This is a break-out of an issue that I encounted in issue #20.

It appears that ldms_set_delete() is not fully deleting all of the state from a metric set. If one creates a metric set with name "A" using ldms_set_new(), then deletes that set with ldms_set_delete(), and then tries to create a new set with the name "A" using ldms_set_new(), the last metric set creation will fail.

In other words, metric set names cannot be reused. It would appear that ldms_set_delete() is not fully deleting the metric set.

Building v4 Beta with ovis-pkg-latest-stable | new excludes

For ovis-pkg-latest-stable 4.1.1 I had previously needed to add the below excludes to ovis-ldms.spec and comment out all kokkos related statements.

%exclude %{_libdir}/ovis-ldms/libdstat.*
%exclude %{_libdir}/libparse_stat.*

I now need to add this exclude for it to finish building.

%exclude %{_libdir}/ovis-ldms/libstore.*
%exclude %{_libdir}/ovis-ldms/libstore_none.*

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.