Giter Site home page Giter Site logo

systemd_mon's Introduction

SystemdMon

Monitor systemd units and trigger alerts for failed states. The command line tool runs as a daemon, using dbus to get notifications of changes to systemd services. If a service enters a failed state, or returns from a failed state to an active state, notifications will be triggered.

Built-in notifications include email, slack, and hipchat, but more can be added via the ruby API.

It works by subscribing to DBus notifications from Systemd. This means that there is no polling, and no busy-loops. SystemdMon will sit in the background, happily waiting and using minimal processes.

Requirements

  • A linux server
  • Ruby > 1.9.3
  • Systemd (v204 was used in development)
  • mail gem (if email notifier is used)
  • slack-notifier gem > 1.0 (if slack notifier is used)
  • hipchat (if hipchat notifier is used)

Installation

Install the gem using:

gem install systemd_mon

Usage

To run the command line tool, you will first need to create a YAML configuration file to specify which systemd units you want to monitor, and which notifications you want to trigger. A full example looks like this:

---
verbose: true # Default is off
notifiers:
  email:
    to: "[email protected]"
    from: "[email protected]"
    # These are options passed to the 'mail' gem
    smtp:
        address: smtp.gmail.com
        port: 587
        domain: mydomain.com
        user_name: "[email protected]"
        password: "supersecr3t"
        authentication: "plain"
        enable_starttls_auto: true
  slack:
    webhook_url: https://hooks.slack.com/services/super/secret/tokenthings
    channel: mychannel
    username: doge
    icon_emoji: ":computer"
    icon_url: "http://example.com/icon"
  hipchat:
    token: bigsecrettokenhere
    room: myroom
    username: doge
units:
- unicorn.service
- nginx.service
- sidekiq.service

Save that somewhere appropriate (e.g. /etc/systemd_mon.yml), then start the command line tool with:

$ systemd_mon /etc/systemd_mon.yml

You'll probably want to run it via systemd, which you can do with this example service file (change file paths as appropriate):

[Unit]
Description=SystemdMon
After=network.target

[Service]
Type=simple
User=deploy
StandardInput=null
StandardOutput=syslog
StandardError=syslog
ExecStart=/usr/local/bin/systemd_mon /etc/systemd_mon.yml

[Install]
WantedBy=multi-user.target

Behaviour

Systemd provides information about state changes in very fine detail. For example, if you start a service, it may go through the following states: activating (start-pre), activiating (start) and finally active (running). This will likely happen in less than a second, and you probably don't want 3 notifications. Therefore, SystemdMon queues up states until it comes across one that you think you should know about. In this case, it will notify you when the state reaches active (running), but the notification can show the history of how the state changed so you get the full picture.

SystemdMon does simple analysis on the history of state changes, so it can summarise with statuses like "recovered", "automatically restarted", "still failed", etc. It will also report with the host name of the server.

You'll also want to know if SystemdMon itself falls over, and when it starts back up again. It will attempt to send a final notification before it exits, and one to say it's starting. However, be aware that it might not send a notification in some conditions (e.g. in the case of a SIGKILL), or a network failure. The age-old question: who will watch the watcher?

Docker integration

There is a public Docker image available which bundles all requirements (Ruby + Gems). Since systemd_mon relies on dbus, you need to mount the host dbus directory into your container. Besides that, the configuration filename is currently hardcoded to systemd_mon.yml. You have to mount the directory where the systemd_mon.yml file is located on your host system into your container as well. Below is a working example:

docker run --name "systemd_mon" -v /var/run/dbus:/var/run/dbus -v /path/to/systemd_mon/config/:/systemd_mon/ kromit/systemd_mon

If you want to run this image with systemd (very handy on CoreOS for example) you can use it as follows:

[Unit]
Description=systemd_mon
After=docker.service
Requires=docker.service

[Service]
Restart=always
RestartSec=60
ExecStartPre=-/usr/bin/docker kill systemd_mon
ExecStartPre=-/usr/bin/docker rm systemd_mon
ExecStart=/usr/bin/docker run --name "systemd_mon" -v /var/run/dbus:/var/run/dbus -v /path/to/systemd_mon/config/:/systemd_mon/ kromit/systemd_mon

[Install]
WantedBy=multi-user.target

Contributing

I'd love more contributions, particulary new notifiers. Follow the example of the slack and email notifiers and either package as a new gem or submit a pull request if you think it should be part of the main project.

  1. Fork it ( https://github.com/joonty/systemd_mon/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

systemd_mon's People

Contributors

faburem avatar joonty avatar tylerjl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

systemd_mon's Issues

Uncaught exception

At start systemd_mon i get this messages in log

systemd[1]: Starting SystemdMon...
systemd_mon[21178]: /usr/local/rvm/gems/ruby-2.4.1/gems/ruby-dbus-0.11.2/lib/dbus/marshall.rb:301: warning: constant ::Fixnum is deprecated
systemd_mon[21178]: /usr/local/rvm/gems/ruby-2.4.1/gems/ruby-dbus-0.11.2/lib/dbus/message.rb:129: warning: constant ::Fixnum is deprecated
systemd_mon[21178]: /usr/local/rvm/gems/ruby-2.4.1/gems/ruby-dbus-0.11.2/lib/dbus/message.rb:129: warning: constant ::Fixnum is deprecated
systemd_mon[21178]: /usr/local/rvm/gems/ruby-2.4.1/gems/ruby-dbus-0.11.2/lib/dbus/marshall.rb:301: warning: constant ::Fixnum is deprecated
systemd_mon[21178]: Uncaught exception (NoMethodError) in callback: undefined method `first' for #<SystemdMon::StateValue:0x000000027da8a0>
systemd_mon[21178]: Uncaught exception (NoMethodError) in callback: undefined method `first' for #<SystemdMon::StateValue:0x00000002d15590>

last two strings - i think for two monitored services

[RFE] Only alert on failed services

Hello

thank you for this software, it is really useful.
A feature I'd like to have on it is to get alerts/messages only in specific states, like failed, instead of knowing every time a service is reloaded/restarted (like for example after rotating its log files).

thanks again, I was looked for something like systemd_mon for a long time

Environment variables

Is there a way to reference environment variables in the YAML configuration, or override the YAML configuration using environment variables?

crash

Mär 09 10:55:47 gna.vfn-nrw.de systemd[1]: systemd_mon.service: Failed with result 'exit-code'.
Mär 09 10:55:47 gna.vfn-nrw.de audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd_mon comm="systemd" exe="/usr/lib/systemd/systemd" 
Mär 09 10:55:47 gna.vfn-nrw.de systemd[1]: systemd_mon.service: Unit entered failed state.
Mär 09 10:55:47 gna.vfn-nrw.de systemd[1]: systemd_mon.service: Main process exited, code=exited, status=1/FAILURE
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from <internal:abrt_prelude>:2:in `<compiled>'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:39:in `require'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:126:in `rescue in require'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems.rb:187:in `try_activate'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/specification.rb:936:in `find_inactive_by_path'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/specification.rb:748:in `stubs'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/specification.rb:870:in `dirs'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems.rb:355:in `path'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems.rb:332:in `paths'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems.rb:332:in `new'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/path_support.rb:34:in `initialize'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: /usr/share/rubygems/rubygems/path_support.rb:71:in `path=': undefined method `+' for nil:NilClass (NoMethodError)
Mär 09 10:55:47 gna.vfn-nrw.de systemd[1]: Starting SystemdMon...

memory leaks?

I've been using systemd_mon for a little while, and I love it for its Slack integration. (Sometimes I'll just restart services just to see those little coloured messages pop up in our channel :) ) I noticed that, over a long weekend (3.5 days or so) it managed to go from 4% of memory consumption to 20% (on a 768MB VPS). I'm using both e-mail and the aforementioned Slack notifications.

I can help with debugging if you like, but I'll forewarn you that I'm by no means a Ruby pro.

possible false positives on oneshot type services without 'RemainAfterExit=yes'

First, a word of gratitude for this systemd monitoring app. In all honesty, I was using https://github.com/gkarakou/systemd-denotify for quite a while on desktops, but recently I was looking for something more geared towards servers when stumbling onto this project. Generally works great for my use cases.

Only one issue so far: when setting up a systemd unit of type 'oneshot' that doesn't have 'RemainAfterExit=yes' (like the default logrotate.service on archlinux) I see some errors and systemd_mon (erroneously) notifies via email. No clue if this is expected behaviour or a bug, my systemd knowledge is far from 'developed'.. It could be related to the recently integrated pull request supporting oneshot type services (#3), but as that happened before I started to use systemd_mon this is something I cannot judge.

What happens on the logrotate.service unit:

(1) without RemainAfterExit=yes --> Active: inactive (dead)
      ==> systemd_mon starts notifying (repeatedly)
      unexpected: systemctl --failed --all doesn't report anything for logrotate.service

(2) with RemainAfterExit=yes --> Active: active (exited)
      ==> systemd_mon doesn't notify
     expected

I have worked around this issue by adding /etc/systemd/system/logrotate.service, which differs only in the 'RemainAfterExit=yes' part. Yet it might prove useful to report this issue, hope it doesn't cause too much confusion :-)

Some debug info on the issue:

$ cat /usr/lib/systemd/system/logrotate.service
[Unit]
Description=Rotate log files

[Service]
Type=oneshot
ExecStart=/usr/bin/logrotate /etc/logrotate.conf
Nice=19
IOSchedulingClass=best-effort
IOSchedulingPriority=7

$ sudo systemctl --failed
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

$ sudo systemctl status -l logrotate.service
● logrotate.service - Rotate log files
Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static; vendor preset: disabled)
Active: inactive (dead) since Mon 2015-12-07 21:39:04 UTC; 41min ago
Main PID: 28010 (code=exited, status=0/SUCCESS)

Dec 07 21:39:04 do16 systemd[1]: Starting Rotate log files...
Dec 07 21:39:04 do16 systemd[1]: Started Rotate log files.

$ systemd_mon ~/.systemd_mon_testing.yml
SystemdMon::Notifiers::Email doesn't respond to 'notify_start!', not sending notification
Monitoring changes to 13 units

Using notifiers: SystemdMon::Notifiers::Email

SystemdMon::State:0x00000002e70228

SystemdMon::State:0x000000027a8300

SystemdMon::State:0x00000001dcf360

SystemdMon::State:0x00000001ca02f0

SystemdMon::State:0x00000001330918

logrotate.service failed: inactive (dead)
Uncaught exception (NoMethodError) in callback: undefined method `first' for #SystemdMon::StateValue:0x00000001329d48

SystemdMon::State:0x00000001d5ab00

SystemdMon::State:0x00000001e866f0

SystemdMon::State:0x000000027e11a0

SystemdMon::State:0x00000002920ed0

SystemdMon::State:0x00000002a9d808

SystemdMon::State:0x00000002b8ac20

SystemdMon::State:0x00000002c60028

SystemdMon::State:0x00000002d21f98 [*]

logrotate.service still failed: inactive (dead)
active state changed from inactive to inactive then activating then inactive

Notifying state change of logrotate.service via SystemdMon::Notifiers::Email
SystemdMon::Notifiers::Email: Sending email to [email protected]:
SystemdMon::Notifiers::Email: -> Subject: "Alert: logrotate.service on do16: still failed"
SystemdMon::Notifiers::Email: -> Message: "Systemd unit logrotate.service on do16 still failed: inactive (dead)


| Time | Active |


| 22:29:48.815 +0000 | inactive |


| 22:30:14.110 +0000 | inactive |


| 22:30:14.114 +0000 | activating |


| 22:31:17.794 +0000 | inactive |


Regards, SystemdMon"
SystemdMon::Notifiers::Email: sent email notification

SystemdMon::State:0x00000002d1b1e8

[*] running commands in another terminal window
$ sudo systemctl start logrotate.service
$ sudo systemctl status -l logrotate.service
● logrotate.service - Rotate log files
Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static; vendor preset: disabled)
Active: inactive (dead) since Mon 2015-12-07 22:25:20 UTC; 9s ago
Process: 28278 ExecStart=/usr/bin/logrotate /etc/logrotate.conf (code=exited, status=0/SUCCESS)
Main PID: 28278 (code=exited, status=0/SUCCESS)

Dec 07 22:25:20 do16 systemd[1]: Starting Rotate log files...
Dec 07 22:25:20 do16 systemd[1]: Started Rotate log files.

Wildcard service names

A question/feature request: is it possible to specify a "*" as a service name to get notifications in case any service fails?

oneshot services incorrectly reported as `still failed`

i see that oneshot services are supported:
#3
but it's not working well for me.

for example, i have the certbot.service unit added to the config.
this is a oneshot service running on a timer to manage ssl cert renewal.

$ sudo systemctl status certbot.service
● certbot.service - Certbot
   Loaded: loaded (/lib/systemd/system/certbot.service; static; vendor preset: enabled)
   Active: inactive (dead) since Mon 2018-02-12 12:45:13 CST; 50s ago
     Docs: file:///usr/share/doc/python-certbot-doc/html/index.html
           https://letsencrypt.readthedocs.io/en/latest/
  Process: 17261 ExecStart=/usr/bin/certbot -q renew (code=exited, status=0/SUCCESS)
 Main PID: 17261 (code=exited, status=0/SUCCESS)

Feb 12 12:45:12 example.com systemd[1]: Starting Certbot...
Feb 12 12:45:13 example.com systemd[1]: Started Certbot.
$ sudo systemctl --failed
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

this service is running fine. exited without error.

but using slack-notifier, i'm getting this every time certbot.service runs:

Alert: systemd unit certbot.service on example.com still failed
Hostname
example.com
Unit
certbot.service
Active
inactive
Status
dead

Support oneshot services (inactive vs. failure services)

Hey @joonty, I've been digging around in the code again and wanted to propose extending functionality to address use cases that include oneshot services (a github issue seems like the best place for this discussion, let me know if it belongs elsewhere.)

To illustrate, the use case I'm thinking of would be a cron replacement using timer units and oneshot service units. Each time a timer triggers its accompanying service unit, as long as systemd doesn't see a nonzero exit code, the service unit does not go into a failure state but remains inactive (dead) (after briefly transitioning to activating.) However, for a oneshot service that does not RemainAfterExit, this is basically still a "good" state to be in - it just means the last service execution was a success (nonzero exit.)

Currently, when trying to use systemd_mon with oneshot services, I get errors about calls to first in state_change.rb.

The current state paradigm indicates that inactive is a bad state, but for oneshot services, this could actually be ok. The "important" states in the case of a oneshot would be whether ActiveState is inactive or failed.

I think the easiest way to implement this may be to pass the Type of a service unit to the State constructor and, if oneshot, alter the possible ok_states and failure_states that ActiveState can take on (I think Type is a gettable property from the unit's dbus handle.)

Does this sound like a reasonable + good idea? I'm asking before coding it up because my ruby is kind of rusty and want to confirm this would work for the way you've got the state change algorithm set up; I don't grok it fully. 😄 If so I'll throw a PR together, the aforementioned implementation doesn't seem hard, just needs to fit correctly into the current paradigm you've got going.

Ruby gem is missing hipchat.rb

Thanks for an awesome gem. I'm pretty new to systemd and just got some watchdog monitoring setup, so I was glad to find this gem to keep tabs on service changes without constantly scanning logs.

Anyway, just wanted to report that version 0.1.0 installed via "gem install" is missing hipchat.rb

I see now that hipchat.rb was actually added after 0.1.0 so probably just need a new version tag and update the gem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.