sebastien / monitoring Goto Github PK

View Code? Open in Web Editor NEW

440.0 440.0 58.0 171 KB

Server monitoring and data-collection daemon

Python 91.99% Makefile 7.71% Shell 0.30%

monitoring's Introduction

Monitoring - Server monitoring and data-collection daemon

Monitoring is an API with a DSL feel to write monitoring daemons in Python.

Use cases

Monitoring works well for the following tasks:

to be notified when incidents happen (email, XMPP, ZeroMQ...)
automatic actions to be taken (restart, rm, git pull...)
to collect system statistics for further processing e.g. graphs
tie into existing/third-party Python code
play along nicely with existing deployment/configuration ecosystem (fabric/cuisine)

Overview

monitoring DSL: declarative programming to define monitoring strategy
wide spectrum: from data collection and incident reporting to taking automatic actions
Small, easy to read, a single file API
Revised BSD License
written in Python

Use Cases

ensure service availability: test and start/stop when problems
collect system statistics/data, log locally and/or remotely
alert on system/service health, take actions

Installation

` python setup.py install` or

` easy_install monitoring`

More?

Read the presentation on Monitoring (previously named Watchdog).

monitoring's People

Contributors

Stargazers

Watchers

Forkers

enki brunobord remotesyssupport plar ghuntley tsudot seacoastboy dumpforjunk saidimu mariuz jackode zenweasel pierd rmoorman finpingvin zhangxu heyunlong565 lihangtong zhangmh2 xxz markandrewj mignev fastrom pyghassen rantav mschober nduhamel suhongrui fruch yacoding vlaxy syci gr3yr0n1n davergarag cloudxtreme kalranitin kulbida yashodhank congto alon7 yonglehou zhao07 jasonwanga lourenjie ahlfors coopci bailijiang vinitkantrod samuelzuuka codingbits ramboramzey13579 jpluimers aglaianwoman chanyiou ib23 moonlightbright66 omargnagy91 kaderghal

monitoring's Issues

firewall watchdog Rule

We should come up with a watchdog rule that tests whether or not certain netfilter rules are in place

use iptables to gather current state
maybe be smart such as: if we run a httpd let port 80 be open, look for Port xxxx in sshd_config, ...
even if this rule has a tiny bit of "smartness" at its core, the user still has total control i.e. can add config on top or entirely override
don't try to do to much though as watchdog is not ruleset creator

I guess what I am trying to say is for example: if I want all ports but 80 be closed from the outside, it would be nice to have a watchdog rule that could check that... e.g. after a reboot, maybe loading my iptables script into the kernel didn't work for some reason, someone fiddled with the live-config, some process...

SSH keys watchdog Rule

Would be nice to have a check to see whether or not all SSH keys on the server are legit:

compare to a list of well-known good keys e.g. a list of current employees keys
check against things like key strength (length etc.)
...

This Rule could work in concert with sebastien/cuisine#28

Question RE: Failure to remediate

During a failure condition, what is Watchdog's stance on failure to remedy. Lets say I have an Incident attached to a service. It trips the threshold, and for some reason the code cannot fix the underlying condition (Service fails to start, etc..). Currently WatchDog just keeps trying forever. Is this the intended use case, or are there any plans for a maximum attempt / give up / bail on this monitor.

error: file '/home/hgj/software/monitoring/Scripts/monitoring' does not exist

this error came up when I run python3 setup.py install.
Scripts dir is exactly not exist

HTTP rule to test if page contain one string

Hi,

I need something like :

HTTP(
    GET="http://localhost:8000/",
    freq=Time.ms(500),
    contain="foobar"
)

to test if the web page contain "foobar" string.

I'll try to implement it as soon as.

Regards,
Stephane

Remove the Logger class

This is kind of ridiculous when it doesn't do anything different than logging

error on example-system-health.py

(env)[21:23 /tmp/watchdog/Examples]$ python example-system-health.py
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(_self.__args, *_self.__kwargs)
File "/var/sites/a_cityguides/cityguides/env/lib/python2.6/site-packages/watchdog.py", line 1118, in _run
self.result = self.runnable.run(*self.args)
File "/var/sites/a_cityguides/cityguides/env/lib/python2.6/site-packages/watchdog.py", line 967, in run
value = self.extractor(res.value)
TypeError: () takes exactly 2 arguments (1 given)

Better URL parsing in the HTTP constructor

the URL parsing in https://github.com/sebastien/watchdog/blob/master/Sources/watchdog.py#L868 is not very "pythonic".

there's a very nice module in the Standard Library for this: urlparse

http://docs.python.org/library/urlparse.html

that was one of the patches I wanted to work on, but I haven't got time yet. If someone wants to convert this series of "splits", please, hack on. :op

(and if nobody does, I may eventually find time)

Trying to setup a test mail server.

I want to write a python script that monitors a server list and emails an specified recipient(s) based in data collected from these servers (hard disk capacity, server load, etc). It looks like monitoring has a wider scope than my project, but I think I am on the right track looking at this project.

Actually I have a general idea about how to do some of these tasks, including sending email with python (yes), but I am unable to setup a email server on my local machine for testing sending/receiving emails.

I have been following several postfix configuration tutorials to no avail. I also looked at some SO questions dealing with Yosemite specific updates to some of the aforementioned tutorials, but still no luck.

Do you guys have any tips or recommendations?

What I have tried...

Although, I just want to be able to send email via localhost for testing, for starters, I tried setting up postfix for gmail. Here is the relevant part of my /etc/postfix/main.conf:

675 relayhost = smtp.gmail.com:587
676 smtp_sasl_auth_enable = yes
677 smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
678 smtp_sasl_security_options = #noanonymous
679 smtp_use_tls = yes
680 smtp_sasl_mechanism_filter = plain
681 
682 smtp_tls_security_level=encrypt
683 tls_random_source=dev:/dev/urandom

After reloading postfix, I tried sending some dummy emails like:

tree | mail -s "Hi" "[email protected]"

This is what mailq logs:

B83E0EFDD0      828 Wed Mar  4 22:56:05  $USER@$HOSTNAME
                                                (unknown mail transport error)
                                         [email protected]

B87C6EE74B      282 Wed Mar  4 21:42:27  $USER@$HOSTNAME
(delivery temporarily suspended: local data error while talking to smtp.gmail.com[74.125.203.108])

Interesting in refactor it for use with better python libs like requests and sarge?

This look very interesting, but wonder why not use libs like requests and http://sarge.readthedocs.org/en/latest/overview.html?

Are you accepting pull request to work towards that?

HTTP test method refactor ?

Hi there,

I was a little bit puzzled by the HTTP constructor:

https://github.com/sebastien/watchdog/blob/master/Sources/watchdog.py#L854

a - if you use a "POST" or a "GET" argument, the method used is "GET", whatever it is.
b - I've added a HEAD argument on my latest pull request, according to the current practise, but I wonder if this constructor could be simplified. something like:

def __init__(self, url, method=None, timeout=Time.s(10), freq=Time.m(1), fail=(), success=()):
    Rule.__init__(self, freq, fail, success)
    if not method:
        method = "GET"
    if method not in ('GET', 'POST', 'HEAD'):
        raise Exception()
    # .. all the rest is the same...
    if url.startswith("http://"):
        url = url[7:]
    server, uri = url.split("/",  1)
    # ...

Of course, this would be annoying, because it's not retro-compatible with the current API, that's the reason why I didn't open a pull request, and I wonder if this refactor is possible or not.

HTTP over TLS/SSL

https://github.com/sebastien/watchdog/blob/master/Sources/watchdog.py#L852 should work with TLS/SSL (https) too.

DOCs: Broken link in README -> More

"Read the presentation on Monitoring (previously named Watchdog)."

I would love to do that, but the link is broken.

put it on PyPI, make it pip-installable

Next thing to do IMO would be to put it on PyPI and make pip install watchdog work. One problem though, there already is a package called watchdog: http://pypi.python.org/pypi/watchdog

Feature: Introspection API

This issue is related to #27, and aims at identifying the requirements for a monitor's state to be accessed and manipulated by another program (which could be written in Python or not).

What we should do is:

define use cases
list the required features
see why the current implementation does not allow for the above two points

Some remarks about what we need to consider:

Identify how many instances of a monitor can run in a single Python process (I'm not sure that we can run many instances)
Be clear about the security implications of accessing a running monitor (it might become a security weak point)
Identify ways to do IPC to interact with a running monitor (we should be able to query a monitor status without Python)

ease contribution

I'd love to contribute some code and have features that I need to implement(check samba) be easier to add.

I suggest breaking down the monolithic watchdog.py file into different sub modules.
Specially Actions, these in my opinion should be one per file.

Include tool for monitoring streaming data

Hello,
I'm developing a visualization tool for streaming data, https://github.com/nupic-community/nupic.visualizations . Came across your project and wondering if it could be useful for your project?
The code is in JS, but able to run online.
Cheers,

error when running on OSX

(env) : monitoring
Traceback (most recent call last):
File "/Users/john/.virtualenvs/env/bin/monitoring", line 2, in
import sys, monitoring
File "/Users/john/.virtualenvs/env/lib/python2.7/site-packages/monitoring.py", line 1637, in
System.CPUStats()
File "/Users/john/.virtualenvs/env/lib/python2.7/site-packages/monitoring.py", line 471, in CPUStats
time_list = cat("/proc/stat").split("\n")[0].split(" ")[2:6]

Feature: Web interface for Monitoring

Let me suggest adding a web interface for monitoring.
Since Monitoring is a long-running service it'd be nice to be able to access it from the web (and by that - perhaps even monitor monitoring).

In the web UI (and API) I'd like to see the following:

Which monitors are running?
For each monitor - a list of its services
For each service: Name etc, and a list of monitor and actions
For each monitor - what was it's last status (success/fail and some data if it has some) and when did it run?