mpounsett / nagiosplugin Goto Github PK
View Code? Open in Web Editor NEWA Python class library which helps with writing Nagios (Icinga) compatible plugins.
Home Page: https://nagiosplugin.readthedocs.io/
License: Other
A Python class library which helps with writing Nagios (Icinga) compatible plugins.
Home Page: https://nagiosplugin.readthedocs.io/
License: Other
Original report by jonathansd (Bitbucket: jonathansd, GitHub: jonathansd).
If you're installing a plugin that uses nagiosplugin. Since nagiosplugin is not in the standard library what are the installation procedures needed for the plugin.
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
Context.describe should also process context attributes as fmt_metric keywords.
Original report by Mayur Patil (Bitbucket: mayurp7, GitHub: mayurp7).
Installation of nagiosplugin 1.2.4:
OS: Win 7 SP1 64 bits
Python: 2.7.x 32 bits and 3.5.x 64 bits
Installation mode: pip
For python 2.7, C:\Python27\python.exe
For python 3.5, C:\Python35\python3.exe
For nagiosplugin installation, Python 2.7
C:>python -m pip install nagiosplugin
For nagiosplugin installation, Python 3.5
C:>pip install nagiosplugin
Linting complains about the signatures of several methods in this class (no-self-use), and the issue could be resolved by reimplementing them as @staticmethod
methods, but that might be API-breaking if anyone is making use of self
references in subclassed implementations. If these can't be @staticmethod
then they could be reimplemented as a proper ABC using the @abstractmethod
decorator.
Linting complains about the Context.performance()
signature, and the issue could be resolved by reimplementing this as a proper ABC, using the @abstractmethod
decorator on the performance
method.
Original report by Louis Sautier (Bitbucket: sbraz, GitHub: sbraz).
Hello,
I’m using the built-in logger to display additional messages based on verbosity but I noticed that they are displayed after the probe function is done running, not in real time. This is a bit confusing because in case of errors, the verbose messages appear after the summary, could this behaviour be changed?
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
From: Lofton Newton [email protected]
I have a question about subclassing the nagiosplugin.performance.Performance class. I wanted to override the str method, but ran into a problem (see traceback below). Looking over the documentation for super. It seems the second argument is expected to be an instance or subtype of the first argument. The way super is called in the new method of the nagiosplugin.performance.Performance class it doesn't seem possible to override class methods like below. If I were to swap the arguments in the super call in Performance.new I no longer get the TypeError below. Is this implementation by design? My apologies if I'm misinterpreting something here.
#!python
class PluginPerformance(nagiosplugin.Performance):
__str__(self):
# Do some stuff here
Traceback (most recent call last):
File "./plugins/check_consumer_metrics", line 75, in <module>
main()
File "./plugins/check_consumer_metrics", line 71, in main
check.main()
File "/usr/local/lib/python2.7/site-packages/nagiosplugin/check.py", line 115, in main
runtime.execute(self, verbose, timeout)
File "/usr/local/lib/python2.7/site-packages/nagiosplugin/runtime.py", line 118, in execute
with_timeout(self.timeout, self.run, check)
File "/usr/local/lib/python2.7/site-packages/nagiosplugin/platform/posix.py", line 19, in with_timeout
func(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/nagiosplugin/runtime.py", line 107, in run
check()
File "/usr/local/lib/python2.7/site-packages/nagiosplugin/check.py", line 100, in __call__
self._evaluate_resource(resource)
File "/usr/local/lib/python2.7/site-packages/nagiosplugin/check.py", line 86, in _evaluate_resource
self.perfdata.append(str(metric.performance() or ''))
File "/usr/local/lib/python2.7/site-packages/nagiosplugin/metric.py", line 104, in performance
return self.contextobj.performance(self, self.resource)
File "/tmp/kafka-monitoring/monitoring.py", line 171, in performance
return PluginPerformance(metric.context, metric.value)
File "/usr/local/lib/python2.7/site-packages/nagiosplugin/performance.py", line 51, in __new__
return super(cls, Performance).__new__(
TypeError: super(type, obj): obj must be an instance or subtype of type
Original report by Vincent Danjean (Bitbucket: vdanjean, GitHub: vdanjean).
Hi,
I tried to subclass the Range class in one of my project. I'm using the official Debian package (ie upstream version 1.2.4).
I see that new in the Range class is calling super(cls, Range) instead of super(Range, cls)
I checkout the last sources and see that, in the repo, the Range object do not have the new method anymore. However, a quick look show me that the remaining call to super(,) have their parameters reverted. With basic usage, it is not a problem (as both parameters refer to the same type), but it leads to problem when subclassing AND overriding the method.
Regards,
Vincent
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
Using umlauts in various output-related strings causes UnicodeDecodeErrors.
For example, using a ScalarContext like this:
nagiosplugin.ScalarContext(
'cert_expiration_days',
fmt_metric="Zeritifikat läuft in {value} Tagen ab."
)
results in
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Python27\lib\site-packages\nagiosplugin\runtime.py", line 108, in run
self.output.add(check)
File "C:\Python27\lib\site-packages\nagiosplugin\output.py", line 27, in add
self.status = self.format_status(check)
File "C:\Python27\lib\site-packages\nagiosplugin\output.py", line 41, in format_status
summary_str = check.summary_str.strip()
File "C:\Python27\lib\site-packages\nagiosplugin\check.py", line 143, in summary_str
return self.summary.problem(self.results) or ''
File "C:\Python27\lib\site-packages\nagiosplugin\summary.py", line 51, in problem
return str(results.first_significant)
File "C:\Python27\lib\site-packages\nagiosplugin\result.py", line 57, in __str__
return '{0} ({1})'.format(desc, self.hint)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 13: ordinal not in range(128)
Same problem on Linux machines.
Let me first start by saying that this is not an issue in production, as I am fully aware there should be only one call of the actual guarded()
code.
However, when unittesting the plugin I'm writing, I wanted to ensure the various verbose levels each gave me the right output. So I have one test class, with each verbose level a separate test function in that class and all of them getting run when I do my test.
Because of this approach, I of course need to mock out sys.exit()
to allow the tests to finish.
I quickly noticed that the output of verbose level 3 included the one from verbose level 2, although there was no code path that could allow it to be generated.
Debugging the test run revealed that indeed, at the end of the verbose 2 test, something was not getting cleaned up. Further investigation led to the conclusion that the Runtime
class has been designed as a singleton, ie. the very same instance is always returned when guarded()
asks for a Runtime
. Furthermore, Runtime
accumulates verbose output and then prints it, but does not afterward empty out that storage. Because the entire test run is a single invocation of the interpreter, this global Runtime
instance doesn't get reset between tests and the output just accumulates.
I have implemented a pretty simple workaround, where I set nagiosplugin.runtime.Runtime.instance = None
at the start of each test to force it to create a new Runtime
each time.
Now I don't really expect this you to solve this for me, but I felt it worth noting in case anyone ever runs into a similar issue.
If you do feel like looking into a solution... I just have to ask, why exactly is Runtime
a singleton? I can't see what purpose that serves, nor do I even see where else a Runtime
instance is created except in guarded()
...
If you access your results with by_state it creates and empty list entry for the accessed value.
Example:
I use len(results.by_state[np.Critical]) > 0:
in Summary.problem
to print Critical first followed by Warnings. This though creates an empty record for Critical in my tests (where no critical results exist only warning)- making them fail:
pprint((check.results.most_significant_state))
pprint((check.results.most_significant))
Result:
Critical(code=2, text='critical')
[]
This results in the whole check return CRITICAL, despite having no results that are actually critical (only warnings are present). The issue is most likely due to self.by_state = collections.defaultdict(list)
Workaround:
Don't use results.by_state if you are not sure it contains elements, use results.most_significant_state == np.Critical
instead. Still this seems rather critical problem to me.
The nagiosplugin.runtime.Runtime
class is implemented as a singleton. This is an anti-pattern, and doesn't seem to serve any purpose, so should be refactored out of the code.
Since it's possible the side-effects this causes are relied upon by existing code, this should be done in a 2.x release.
Hi,
I just stumbled across what I think is an inconsistency. Those states exist currently:
Ok
Warn
Critical
Unknown
I think this module should rather support Warning
instead of Warn
to comply with the exact wording used in the plugin API spec: https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/pluginapi.html
Backwards compatibility with existing checks should be provided by still supporting Warn
.
For background, I am using a statement like this getattr(nagiosplugin.state, nagios_state_string, nagiosplugin.state.Unknown)
to use an already existing Nagios state (nagios_state_string
) in a custom Context class.
What do you think?
Lines 5 and 6 of state.py have duplicated text in the docstring.
It would be nice to provide common/standart solution for plugins to support "Extra-Opts" (https://nagios-plugins.org/doc/extra-opts.html).
That highly required to "hide" passwords for command line arguments.
I think this project is a very good place to provide such a solution.
It would be nice to provide common/standart solution for plugins to support "Extra-Opts", like done in C and Perl plugins: https://nagios-plugins.org/doc/extra-opts.html.
That highly required to hide passwords from command line arguments and for other tasks too.
I think this library is a very good place to provide such a solution, as it used as a base for plugins.
My proposal is to add common/standart solution for this task into library, which already has 'ArgumentParser' part. That part should become smart to properly handle the --extra-opts option and support reading options from a given configuration file.
Then any of Python plugins that use this library will be able to handle arguments defined in file without (or with minor) code changes.
I'm not a Python developer, so I can't suggest an implementation.
Thanks for your project.
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
I'd like to share one small bit of code which I found myself reusing
again and again, for situations in which you need want to signal that
something went wrong, but cannot do that by crossing a (possibly
unknown) threshold or raising an exception (because an exception might
actually mean critical failure instead of unknown).
This is the idiom I use:
#!python
class FatalContext(nagiosplugin.Context):
def __init__(self, name):
super(FatalContext, self).__init__(name)
def evaluate(self, metric, resource):
return nagiosplugin.Result(nagiosplugin.state.Critical,
"Fatal Error: %r" % metric.value,
metric
)
...
archive_fatal_ctx = FatalContext('fatal_archive')
...
try:
sess = get_session(session_id)
except IOError as e:
yield nagiosplugin.Metric('fatal_%s' % self.name_postfix, repr(e))
return
...
If you find this a useful scenario as well, go ahead and put it into the
nagiosplugin distribution.
This should probably go alongside the LICENSE, CONTRIBUTORS, etc. once #20 is fixed.
First off great plugin, we were writing a bunch of our own logic and boiler plate for our nagios style tests and I am considering moving over to this plugin to avoid copy paste errors.
The one issue I've had is converting our tests, our current modules use pytest's capsys fixture. To compare the expected output of a check with test data. Sample from one of these tests below.
[.....]
mock_get.return_value.text = test_data
expected_msg = "OK :load average is 2% and max is 10% | " + \
"load_average=2;75;85;100; load_max=10;75;85;100;\n"
with pytest.raises(SystemExit) as e:
cpdmain(test_args)
assert capsys.readouterr().out == expected_msg
What's interesting is none of the pytest fixtures for capturing stdout seem to capture nagiosplugins output. More interesting is that the output does appear in the test results summary like this
test_load.py::test_main FAILED [ 56%]LOAD OK - avg is 2 | avg=2;75;85;0;100 max=10;75;85;0;100
As far as I can tell the reason appears to be related to the Runtime
class execute
method where it prints with the file
keyword argument set to self.stdout
which itself is just sys.stdout
Since writing to stdout is the default behavior of print
I'm curious if it being explicitly specified is to solve a specific problem or if this can be removed to allow test frameworks like pytest to capture the output?
Original report by Thomas Wallrafen (Bitbucket: tom-userlike, GitHub: tom-userlike).
Hello,
is there a way to control whether metrics are printed in scientific notation or not?
I'm using nagiosplugin 1.2.4 under Python 2.7.14
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
Provide a Range method that returns a new Range where the thresholds are linearly scaled by factor.
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
what CheckError is good for
unhandled exceptions
avoid second-order errors in Summary (e.g., by accessing non-existent results)
Original report by Arthur Lutz (Bitbucket: arthurlogilab, GitHub: arthurlogilab).
A debian package would be nice to deploy these plugins.
We're contemplating contributing the debian packaging.
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
From: "Joseph L. Casale" [email protected]
When a context is unknown until the time of check execution, how can one build a context and metric during the nagiosplugin.Check probe invocation?
It would be too late during execution of the nagiosplugin.Resource as far as I can see unless I missed someway the resource can gain access to the nagiosplugin.Check instance?
The Performance class inherits from namedtuple
and then implements additional methods. This seems like a prime candidate for a dataclass. Care should be taken that re-implementing this doesn't break the API.
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
The way output is currently generated from ScalarContexts is hard to understand and customize.
Original report by Louis Sautier (Bitbucket: sbraz, GitHub: sbraz).
Hi, after looking at the code for the latest revision, it seems there is no way to change the verbosity of the exceptions trapped by @nagiosplugin.guarded
without monkey-patching a lot of code.
Until version 1.2.3, I would force it with:
#!python
nagiosplugin.runtime.Runtime._verbose = 0
but since the attribute moved to another object, I don't see a simple way to do it.
Maybe adding arguments to the decorator could be the answer to this problem.
Hi,
It looks like setuptools will eventually remove pkg_resources
so the following will fail:
nagiosplugin/tests/test_examples.py
Line 19 in 050651a
Running the test suite results in:
tests/test_examples.py:2
/var/tmp/portage/dev-python/nagiosplugin-1.3.3/work/nagiosplugin-1.3.3/tests/test_examples.py:2: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resource
s.html
import pkg_resources
There is documentation here that explains how to migrate to importlib.resources
: https://importlib-resources.readthedocs.io/en/latest/migration.html#pkg-resources-resource-filename
Original report by Onur Yalazı (Bitbucket: yalazi, ).
Nagios and icinga2 uses only a single line output to get check results and performance data. In case of multiple metrics output.py:format_perfdata method joins metrics with newlines. This breaks perf data collection. This should join with spaces.
Sample output:
ONLINE OK - Online is 2528
| 'www.xxxx.com'=74;;;0 'www.yyyy.com'=3;;;0 'www.zzzz.com'=306;;;0
'www.ttttt.com'=80;;;0 'www.bbbbbb.com'=819;;;0
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
I want to return just a state singleton from evaluate(). This avoid hard-to-debug errors like this:
#!python
CEPH UNKNOWN: AttributeError: 'Ok' object has no attribute 'state'
Traceback (most recent call last):
File "/home/ckauhaus/.local/lib64/python3.2/site-packages/nagiosplugin/runtime.py", line 38, in wrapper
return func(*args, **kwds)
File "puppet/modules/sys_cluster/files/check_ceph_health.py", line 135, in main
check.main(args.verbose, args.timeout)
[...]
File "/home/ckauhaus/.local/lib64/python3.2/site-packages/nagiosplugin/check.py", line 81, in _evaluate_resource
self.results.add(metric.evaluate())
File "/home/ckauhaus/.local/lib64/python3.2/site-packages/nagiosplugin/result.py", line 116, in add
self.by_state[result.state].append(result)
AttributeError: 'Ok' object has no attribute 'state'
Currently, evaluate() must return a Result object. Change base class to wrap Status singletons automatically using the result result class.
Allow me to first explain a bit the work I'm doing. I'm trying to build a nagios check based on the contents of a logfile of a different program I wrote. Now I'm maybe being way too defensive, but I'm trying not to assume anything about the correct format of said log.
This of course means that the Resource
I built can yield metrics that represent broken/unknown state. So far I have done this by yielding Metrics
whose value is None
, but for example, a ScalarContext
chokes on that because of course None
cannot be compared to the ranges.
So I was pondering writing my own ScalarContext
and whether or not Metrics
that are None
even make sense, when I had the idea that I could just not yield those Metrics
.
After all, you can provide as many Contexts
as you want to a Check
, only those that get matched up to a Metric
will become Results
.
But as I was continuing along that train of thought, it struck me that this would lead to an issue. Missing Metrics
do not lead to Results
, so when Check
tries to determine its state, there is no way to have a missing Metric
lead to a critical state, and there's nothing you can do in a Summary
to fix it. Yes I could subclass Check
, but at some point I start feeling like I'm rewriting the entire library...
Which to some degree I wouldn't mind doing, but at that point I would probably turn it into a pull request. But of course, if you as the maintainer disagree with the choices made, that would be a lot of wasted effort...
And so we reach the discussion if there should not be an "official" stance about the proper practice in this case.
Option 1 - Metrics whose value is None
. If this is chosen as best practice, I would recommend changing the ScalarContext to take this into account, so that people have a proper working example of the best practice.
Option 2 - Do not yield missing metrics. This would then require support for this chosen path in Check
, so that a missing metric can properly lead to a Critical
state. If this is not chosen, perhaps Check
should still be adapted so that each Context
has to be paired with a Metric
(in addition to vice versa), to more clearly dissuade this approach.
Option 3 - Inspired by #3, actually. Yield a Metric
with a different name, and have Contexts
with both possible names ready to pair up and give a Result
. This approach could do with some support, probably in Check
.
Like I said, I'm willing to put my money where my mouth is and help code, but we should first find a consensus on what is the best practice in this case.
Linting complains about the Resource.probe()
signature, and the issue could be resolved by reimplementing this as a proper ABC, using the @abstractmethod
decorator on the probe
method.
If more than one result is a failure, it currently arbitrarily picks one to display and does not display the others.
I believe it would more useful for the administrator to see in a single message everything that is wrong, rather than fix one thing and then have to wait for the next check to learn that something else was wrong as well.
On top of which the program can't decide which is the most important failed result (and indeed doesn't try, in first_significant)
I know I can write my own Summary class but it seems like it would be sensible to default to showing all problems.
I believe it should at least show all of the problems of the highest severity (but it would probably be okay to omit problems of lower severity). So if something is critical, display all the things that are critical.
Example, here you can see that queue_size
and panic_size
are critical, but I won't know about the latter until I fix the former:
$ ./check_exim -qc 0 -pc 0
EXIMQUEUE CRITICAL - queue_size is 2 (outside range 0:0) | oldest_age=600 panic_size=6;;0 queue_size=2;;0 queue_size_bounce=2 queue_size_frozen=2
$ ./check_exim -pc 0
EXIMQUEUE CRITICAL - panic_size is 6 (outside range 0:0) | oldest_age=600 panic_size=6;;0 queue_size=2 queue_size_bounce=2 queue_size_frozen=2
Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).
show how to:
I have a POC and will open a PR in a few weeks. Some docs will need to be written first.
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/passivechecks.html
Original report by Gabriel Hege (Bitbucket: datensuppe, ).
When passing None as a warning or critical range to the Context or ScalarContext constructor, it uses "0:" for evaluation. This results in a warning or critical status to be written, when the value of the Metric is negative. This is very counterintuitive, because one would expect no warning or critical alert to be generated, especially since the range string does not show in the performance data.
Here is a minimal example:
#!python
import nagiosplugin
class Sensors(nagiosplugin.Resource):
def probe(self):
return nagiosplugin.Metric('sens1', -3)
check = nagiosplugin.Check(Sensors(), nagiosplugin.ScalarContext('sens1', warning="-10:10"))
check.main()
which generates the following output:
SENSORS CRITICAL - sens1 is -3 (outside range 0:) | sens1=-3;-10:10
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.