Giter Site home page Giter Site logo

monitoring's Introduction

Monitoring - Server monitoring and data-collection daemon

Monitoring is an API with a DSL feel to write monitoring daemons in Python.

Use cases

Monitoring works well for the following tasks:

  • to be notified when incidents happen (email, XMPP, ZeroMQ...)
  • automatic actions to be taken (restart, rm, git pull...)
  • to collect system statistics for further processing e.g. graphs
  • tie into existing/third-party Python code
  • play along nicely with existing deployment/configuration ecosystem (fabric/cuisine)

Overview

  • monitoring DSL: declarative programming to define monitoring strategy
  • wide spectrum: from data collection and incident reporting to taking automatic actions
  • Small, easy to read, a single file API
  • Revised BSD License
  • written in Python

Use Cases

  • ensure service availability: test and start/stop when problems
  • collect system statistics/data, log locally and/or remotely
  • alert on system/service health, take actions

Installation

` python setup.py install` or

` easy_install monitoring`

More?

Read the presentation on Monitoring (previously named Watchdog).

monitoring's People

Contributors

brunobord avatar dillongreen avatar jd-boyd avatar mattupstate avatar medecau avatar rantav avatar sebastien avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monitoring's Issues

firewall watchdog Rule

We should come up with a watchdog rule that tests whether or not certain netfilter rules are in place

  • use iptables to gather current state
  • maybe be smart such as: if we run a httpd let port 80 be open, look for Port xxxx in sshd_config, ...
  • even if this rule has a tiny bit of "smartness" at its core, the user still has total control i.e. can add config on top or entirely override
  • don't try to do to much though as watchdog is not ruleset creator

I guess what I am trying to say is for example: if I want all ports but 80 be closed from the outside, it would be nice to have a watchdog rule that could check that... e.g. after a reboot, maybe loading my iptables script into the kernel didn't work for some reason, someone fiddled with the live-config, some process...

SSH keys watchdog Rule

Would be nice to have a check to see whether or not all SSH keys on the server are legit:

  • compare to a list of well-known good keys e.g. a list of current employees keys
  • check against things like key strength (length etc.)
  • ...

This Rule could work in concert with sebastien/cuisine#28

Question RE: Failure to remediate

During a failure condition, what is Watchdog's stance on failure to remedy. Lets say I have an Incident attached to a service. It trips the threshold, and for some reason the code cannot fix the underlying condition (Service fails to start, etc..). Currently WatchDog just keeps trying forever. Is this the intended use case, or are there any plans for a maximum attempt / give up / bail on this monitor.

HTTP rule to test if page contain one string

Hi,

I need something like :

HTTP(
    GET="http://localhost:8000/",
    freq=Time.ms(500),
    contain="foobar"
)

to test if the web page contain "foobar" string.

I'll try to implement it as soon as.

Regards,
Stephane

error on example-system-health.py

(env)[21:23 /tmp/watchdog/Examples]$ python example-system-health.py
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(_self.__args, *_self.__kwargs)
File "/var/sites/a_cityguides/cityguides/env/lib/python2.6/site-packages/watchdog.py", line 1118, in _run
self.result = self.runnable.run(*self.args)
File "/var/sites/a_cityguides/cityguides/env/lib/python2.6/site-packages/watchdog.py", line 967, in run
value = self.extractor(res.value)
TypeError: () takes exactly 2 arguments (1 given)

Trying to setup a test mail server.

I want to write a python script that monitors a server list and emails an specified recipient(s) based in data collected from these servers (hard disk capacity, server load, etc). It looks like monitoring has a wider scope than my project, but I think I am on the right track looking at this project.

Actually I have a general idea about how to do some of these tasks, including sending email with python (yes), but I am unable to setup a email server on my local machine for testing sending/receiving emails.

I have been following several postfix configuration tutorials to no avail. I also looked at some SO questions dealing with Yosemite specific updates to some of the aforementioned tutorials, but still no luck.

Do you guys have any tips or recommendations?

What I have tried...

Although, I just want to be able to send email via localhost for testing, for starters, I tried setting up postfix for gmail. Here is the relevant part of my /etc/postfix/main.conf:

675 relayhost = smtp.gmail.com:587
676 smtp_sasl_auth_enable = yes
677 smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
678 smtp_sasl_security_options = #noanonymous
679 smtp_use_tls = yes
680 smtp_sasl_mechanism_filter = plain
681 
682 smtp_tls_security_level=encrypt
683 tls_random_source=dev:/dev/urandom

After reloading postfix, I tried sending some dummy emails like:

tree | mail -s "Hi" "[email protected]"

This is what mailq logs:

B83E0EFDD0      828 Wed Mar  4 22:56:05  $USER@$HOSTNAME
                                                (unknown mail transport error)
                                         [email protected]

B87C6EE74B      282 Wed Mar  4 21:42:27  $USER@$HOSTNAME
(delivery temporarily suspended: local data error while talking to smtp.gmail.com[74.125.203.108])

HTTP test method refactor ?

Hi there,

I was a little bit puzzled by the HTTP constructor:

https://github.com/sebastien/watchdog/blob/master/Sources/watchdog.py#L854

a - if you use a "POST" or a "GET" argument, the method used is "GET", whatever it is.
b - I've added a HEAD argument on my latest pull request, according to the current practise, but I wonder if this constructor could be simplified. something like:

def __init__(self, url, method=None, timeout=Time.s(10), freq=Time.m(1), fail=(), success=()):
    Rule.__init__(self, freq, fail, success)
    if not method:
        method = "GET"
    if method not in ('GET', 'POST', 'HEAD'):
        raise Exception()
    # .. all the rest is the same...
    if url.startswith("http://"):
        url = url[7:]
    server, uri = url.split("/",  1)
    # ...

Of course, this would be annoying, because it's not retro-compatible with the current API, that's the reason why I didn't open a pull request, and I wonder if this refactor is possible or not.

Feature: Introspection API

This issue is related to #27, and aims at identifying the requirements for a monitor's state to be accessed and manipulated by another program (which could be written in Python or not).

What we should do is:

  1. define use cases
  2. list the required features
  3. see why the current implementation does not allow for the above two points

Some remarks about what we need to consider:

  1. Identify how many instances of a monitor can run in a single Python process (I'm not sure that we can run many instances)
  2. Be clear about the security implications of accessing a running monitor (it might become a security weak point)
  3. Identify ways to do IPC to interact with a running monitor (we should be able to query a monitor status without Python)

ease contribution

I'd love to contribute some code and have features that I need to implement(check samba) be easier to add.

I suggest breaking down the monolithic watchdog.py file into different sub modules.
Specially Actions, these in my opinion should be one per file.

error when running on OSX

(env) : monitoring
Traceback (most recent call last):
File "/Users/john/.virtualenvs/env/bin/monitoring", line 2, in
import sys, monitoring
File "/Users/john/.virtualenvs/env/lib/python2.7/site-packages/monitoring.py", line 1637, in
System.CPUStats()
File "/Users/john/.virtualenvs/env/lib/python2.7/site-packages/monitoring.py", line 471, in CPUStats
time_list = cat("/proc/stat").split("\n")[0].split(" ")[2:6]

Feature: Web interface for Monitoring

Let me suggest adding a web interface for monitoring.
Since Monitoring is a long-running service it'd be nice to be able to access it from the web (and by that - perhaps even monitor monitoring).

In the web UI (and API) I'd like to see the following:

  • Which monitors are running?
  • For each monitor - a list of its services
  • For each service: Name etc, and a list of monitor and actions
  • For each monitor - what was it's last status (success/fail and some data if it has some) and when did it run?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.