karimsa / patrol Goto Github PK

View Code? Open in Web Editor NEW

55.0 1.0 11.0 1.35 MB

Host your own status pages.

JavaScript 0.46% Dockerfile 1.26% HTML 15.74% Go 81.98% Shell 0.56%

health-checks docker statuspage uptime monitoring

patrol's Introduction

Host your own uptime monitoring status pages.

TLDR
- Installing natively
- Running with docker
Usage
Creating a service
Creating health checks
- Health check images
- Health check options
Managing secrets
Troubleshooting
Building container from source
License

TL;DR

Create a config file like this one.

Installing natively

$ curl -sf https://gobinaries.com/karimsa/patrol/cmd/patrol | sh

Running with docker

Image is hosted at ghcr.io/karimsa/patrol.

$ docker run -d \
	--name patrol \
	--restart=on-failure \
	-v "$PWD:/data" \
	-p 8080:8080 \
	--log-driver json-file \
	--log-opt max-size=100m \
	ghcr.io/karimsa/patrol:latest \
	run \
	--config /config/patrol.yml

There are two tags that are published to the docker repo for this project:

latest: As per docker convention, this is the latest stable release of patrol.
unstable: This is the latest copy of the image from master - if you like to live life on the edge.

Running on raspberry pi

$ docker run -d \
	--name patrol \
	--restart=on-failure \
	-v "$PWD:/data" \
	-p 8080:8080 \
	--log-driver json-file \
	--log-opt max-size=100m \
	ghcr.io/karimsa/patrol:latest-arm64v8 \
	run \
	--config /config/patrol.yml

latest-arm64v8: Latest stable release of patrol for the raspberry pi.
unstable-arm64v8: Latest copy of the image for the raspberry pi from master - if you like to live life on the edge.

Notes about docker container

For security reasons, the default user inside the patrol container is patrol (in the group patrol). This is a non-root user.

What this means for you:

Ports < 1024 cannot be listened on within the container. This is not a problem, just have patrol listen on a port above 1024 and use docker to perform port forwarding (see the run example above).
File permissions on the config file and data directory passed to patrol have to be such that patrol can read/write to those files.
- For the config file, the minimum permissions are 0644.
- For the data file, the minimum permissions are 0666.

If you are still having issues, please check the Troubleshooting section and then open a GitHub issue if your issue persists.

Usage

The purpose of patrol is to be able to self-host an automated status page that gives you an overview of your operations. The idea is different from something like Atlassian's Statuspage, since that is more for communicating your operation status to external stakeholders while patrol is more for just monitoring.

To run patrol on your own, you simply need access to a machine with docker installed. To start, you should write your own configuration file, to something like this.

You can then run patrol via docker:

$ ls
patrol.yml

$ patrol run --config /config/patrol.yml

This will start patrol on port 80 with the web interface. It will also give patrol access to your host machine's docker daemon so that it can spin up additional containers to run checks.

Note: limiting the maximum log size for patrol is crucial, since patrol logs every time checks are run.

Creating a service

Services in patrol are simply a collection of health checks. For now, they are mostly a visual grouping - checks belonging to the same service will be grouped together on the status page. To create a new service, you simply need to add a new key-value pair to the services key of the configuration.

For example, a simple service assigned to 'google.ca' could have the configuration:

services:
	google.ca:
		checks:
		- name: Delivers homepage
		  cmd: 'curl -fsSL https://www.google.ca/'

Creating health checks

Health checks are the core of patrol. Each health check is a simple shell script that tests the availability of a given feature in a service. If the script executes successfully, the health check is considered to be passed. If the script exits with a non-zero exit code, the health check is considered to be failed.

A simple health check might test an HTTP server's ability to deliver content by simplying executing a curl request. In the example above, curl -fsSL https://www.google.ca/ is used to simply hit the google homepage at www.google.ca and will fail if the content is not delivered.

Since health checks can be any shell script, it is not necessary that you only use one command. For instance, if you are testing the availability of a specific page (let's say login) and your SPA might have a 404 but the HTTP server might not return a 404, you can use grep to verify that the right content was received instead of 'any content was received'.

For example:

services:
	My App:
		checks:
		- name: Delivers login
		  cmd: 'curl -fsSL https://myapp.com/login | grep MyApp'

Default shell options:

Patrol first tries to use the shell specified by $SHELL - in the provided docker image, this env var is undefined.
If no shell is found, patrol tries to rely on /bin/sh. If /bin/sh is symlinked to dash (on ubuntu systems), it will rely on /bin/bash.
The shell is started with -e and -o pipefail. This means that any failing commands in your script will result in the health check failing, and any errors in piped operations resulting in failure of the entire piped command.

Since in the above example the errors are carried forward in the pipe, and grep fails when it cannot find the query, this check is complete.

As you can see, layering multiple checks might help you diagnose where the issue is when there is an issue. In this case, having both the Delivers homepage and Delivers login checks might tell you that if the first succeeds and the second fails, there is most likely a content delivery issue as opposed to an infrastructure issue.

Note: Since the exit code of the health check is used to determine whether the service is running or not, it is important that your command is setup to only fail if the service is failing. In the example of a curl request, you must specify the -f, --fail flag to ensure that curl exits with a non-zero exit code if the web server does not respond with a 2XX/3XX response.

Health check options

name (required): a string specifying the name to give this health check. If this name is changed, the entire history for the health check will be reset.
cmd (required; string/array):
- If this is a string, it must be a command which can be passed to the shell via /bin/sh -c 'cmd'.
- If this is an array, it must have all string elements and the contents will be concatenated with a ';' in between and then passed to the shell.
type ('boolean' or 'metric', defaults to boolean): if specified as 'metric', the stdout of the check's command will be parsed as a numeric value.
unit (required if type is 'metric'): if type is metric, this will be used when displaying the metric chart on the status page.

Managing Secrets

There are two ways to manage secrets for patrol config files.

Using environment variables

The first is to store secure values inside of environment variables. Since patrol passes its own environment variables down to the child process, any environment variables that are passed to the patrol process (via docker or otherwise) are made available to the commands.

Example:

$ cat > patrol.yml << EOF
services:
	env test:
		checks:
		- name: testing environment vars
		  cmd: 'test "$SECRET" = "hello"'
EOF

$ patrol run --config patrol.yml
# the check will always fail here

$ SECRET=hello patrol run --config patrol.yml
# the check will always pass

Encrypting your config file

You can use openssl or secrets to encrypt specific keys or the entire config file. When deploying your statuspage, remember to decrypt the keys or file so that patrol can access the raw values.

Troubleshooting

There are a number of steps you can take to troubleshoot an installation of patrol. See the information below to get started.

Docker

`open patrol.yml: permission denied`

If you encounter a permission denied error for patrol.yml ensure that the file has at least a chmod value of 664. To fix this run chmod 664 patrol.yml.

2021/04/02 03:38:33 Initializing with SHELL = /bin/sh
open data.db: permission denied

When having permission issues with data.db ensure that the file has at least a chmod value of 666. To fix this run chmod 666 data.db

`open patrol.yml: no such file or directory`

Patrol will try to resolve the config file relative to the current working directory. In the docker container, this is set to /data (you can override it using workdir).

If you receive a "no such file or directory" error for the config file, you can try:

Provide an absolute path to the config file.
Make sure you have mounted the config file into the docker container correctly.

`panic: listen tcp :80: bind: permission denied`

As stated previously, the default user in the docker container cannot access ports < 1024 since it is a non-root user.

To resolve this, change your port to something above 1024. For example:

port: 8080

# rest of the patrol.yml file here

Don't see your issue above?

When submitting an issue report please make sure to gather the docker container logs. This can be done by running docker logs CONTAINER on your docker host. Replace CONTAINER with the name you gave your patrol container. If you're using the docker run command above the command will be docker logs patrol. Make sure to copy and provide the output in the issue report.

Please also provide your patrol.yml file. Make sure to sanitize it before submitting it so sensitive information isn't posted on Github. This includes but is not limited to names, addresses, phone numbers, public IP addresses (not RFC1918 or RFC4193 address space).

Last but not least, anything else that you think will help diagnose your issue please include in the report as well.

Building from source

Building container from source

To build docker containers from source, you will need to clone the git repository and make sure you have the latest version of Docker installed.

Building the x86 image: docker build -t patrol .
Building for Raspberry Pi: docker build -t patrol-pi -f Dockerfile.arm64v8 .

License

Licensed under MIT license.

Badge in logo created by Artdabana@Design from the Noun Project.

patrol's People

Contributors

Stargazers

Watchers

Forkers

shrunbr nicholaiii marcookl liviu-macsen cnmars jaidetura55 yemo-memeda vandenboschvincent 5lick luis-sousa-pinto

patrol's Issues

SSR content becomes out of date quickly

The page is rendered with initial content and then becomes out of date. Should be updated on interval.

Add service specific hooks

Allow multiple metrics on single chart

Add event: `on_first_success`

Should run every time a success is detected directly after a failure.

Overall last updated date always says “Invalid Date”

I have set "interval: 60s" but the last check has been a few minutes ago...
Do I have to create a cron task to check every minute? How does this work?

Please you help.

Add image badges

Add notification support to v1

Build for arm/v7?

Hi everyone,

great work so far. Just wondering if there is a build or workaround to make this docker container running for arm/v7?

Error message is:
WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/arm/v7) and no specific platform was requested

So the container is starting but then immediately crashing.

Thanks in advance if anyone has a solution :)

Install Trouble | 500 Response GoBinaries | Docker Formatting | Docker Failures & Restarts

Your app looks cool and all but I gotta say your install docs have issues.


root@ubuntu:/home/ubuntu# curl -sf https://gobinaries.com/karimsa/patrol/cmd/patrol | sh

  ==> Downloading github.com/karimsa/patrol/cmd/patrol@master
  ==> Resolved version master to v1.1.0
  ==> Downloading binary for linux arm64

  Error downloading, got 500 response from server

Getting 500 response from whatever this source is, looking at the file I assume this is from gobinaries.

Your docker formatting for the run command is also pretty bad. See below when pasting it.

root@ubuntu:/home/ubuntu# docker run \
>
.bash_history              .cache/                    .gitconfig                 .profile                   .sudo_as_admin_successful  github-forks/              snap/
.bash_logout               .dbshell                   .local/                    .python_history            .vscode-server/            patrol.yml
.bashrc                    .git-credentials           .mongorc.js                .ssh/                      .wget-hsts                 python/
> -d \
>
.bash_history              .cache/                    .gitconfig                 .profile                   .sudo_as_admin_successful  github-forks/              snap/
.bash_logout               .dbshell                   .local/                    .python_history            .vscode-server/            patrol.yml
.bashrc                    .git-credentials           .mongorc.js                .ssh/                      .wget-hsts                 python/
> --name patrol \
>
.bash_history              .cache/                    .gitconfig                 .profile                   .sudo_as_admin_successful  github-forks/              snap/
.bash_logout               .dbshell                   .local/                    .python_history            .vscode-server/            patrol.yml
.bashrc                    .git-credentials           .mongorc.js                .ssh/                      .wget-hsts                 python/
> --restart=on-failure \
>
.bash_history              .cache/                    .gitconfig                 .profile                   .sudo_as_admin_successful  github-forks/              snap/
.bash_logout               .dbshell                   .local/                    .python_history            .vscode-server/            patrol.yml
.bashrc                    .git-credentials           .mongorc.js                .ssh/                      .wget-hsts                 python/
> -v "$PWD:/data" \
>
.bash_history              .cache/                    .gitconfig                 .profile                   .sudo_as_admin_successful  github-forks/              snap/
.bash_logout               .dbshell                   .local/                    .python_history            .vscode-server/            patrol.yml
.bashrc                    .git-credentials           .mongorc.js                .ssh/                      .wget-hsts                 python/
> -p 80:8080 \
>
.bash_history              .cache/                    .gitconfig                 .profile                   .sudo_as_admin_successful  github-forks/              snap/
.bash_logout               .dbshell                   .local/                    .python_history            .vscode-server/            patrol.yml
.bashrc                    .git-credentials           .mongorc.js                .ssh/                      .wget-hsts                 python/
> --log-driver json-file \
>
.bash_history              .cache/                    .gitconfig                 .profile                   .sudo_as_admin_successful  github-forks/              snap/
.bash_logout               .dbshell                   .local/                    .python_history            .vscode-server/            patrol.yml
.bashrc                    .git-credentials           .mongorc.js                .ssh/                      .wget-hsts                 python/
> --log-opt max-size=100m \
>
.bash_history              .cache/                    .gitconfig                 .profile                   .sudo_as_admin_successful  github-forks/              snap/
.bash_logout               .dbshell                   .local/                    .python_history            .vscode-server/            patrol.yml
.bashrc                    .git-credentials           .mongorc.js                .ssh/                      .wget-hsts                 python/
> ghcr.io/karimsa/patrol:latest
649226ab30feb614c8692535816993f2ef5cfa1d7db5e60d9d5177b03bcd21cf

Just wanted to call these things out to you so they can be resolved or explained for others who have this same issue either now or in the future.

Reliability of check is weak

If a check fails, it is currently not possible for patrol to self-detect whether that was a fault in the network that patrol is running on or if it is a real check failure.

Possible fix: Add a minimum timespan during which the check must repeatedly fail, even with delayed retries. Only if the check fails repeatedly, then the failure must be recorded.

Webhook Notification Body

The example config file shows you can do this in the webhook notification:
body: '{"text":"Service \"{{service}}\" is up (check \"{{check.name}}\" completed)."}
However, it appears the {{service}} and {{check.name}} aren't replaced. Is this not implemented, or am I doing something wrong?

Add non-boolean metrics

I want to be able to see graphs such latency, number of clusters running, memory usage, etc.

'db' propery must be specified in config file

Hello,

When using the documented command, we end up with this error:

2023/01/05 15:10:46 Initializing with SHELL = /bin/sh
'db' propery must be specified in config file

Tested with the documented command :

docker run -d \
	--name patrol \
	--restart=on-failure \
	-v "$PWD:/data" \
	-p 8080:8080 \
	--log-driver json-file \
	--log-opt max-size=100m \
	ghcr.io/karimsa/patrol:latest \
	run \
	--config ./config/patrol.yml

Also tested with docker-compose:

version: "3.9"

services:
  monitoring:
    image: ghcr.io/karimsa/patrol:latest
    ports:
      - "8080:7080"
    configs:
      - source: patrol
        target: /patrol.yml
        mode: 0664
    command: run --config /patrol.yml

configs:
  patrol:
    file: ./config/patrol.yml

Content of patrol.yml:

services:
  google:
    checks:
    - name: Internal
      cmd: 'curl -fsSL https://www.google.fr'

Maybe the documentation is obsolete? Thanks for your help!

Improve graphs

At the moment, the graphs in the web interface don't provide much use due to two reasons, in my opinion:

No timestamps on the X axis (so if there was a spike in latency, for example, it's impossible to find out when exactly it was).
No way to go back further than what is shown.

Ideally, the graphs show the date only at those points where the day switches over, and timestamps everywhere else. On click/tap, you could show a small tooltip with detailed info for that particular data point.

Here's how it currently looks for me:

Maybe this is configurable, but I couldn't find out how to do it.

Here's how I imagine it (screengrabbed from an app called "Govee Home" which I have for a smart home device):

FullSizeRender.MOV

I really appreciate your project. Thanks for considering my request!

Memory usage is very high during startup

On startup, the SSR uses 2.2+ GB of resident memory, which causes an OOM on small machines. After startup, patrol is able to survive on just ~1 GB of resident memory.

"notifications" field missing in configRaw

Using the example config throws an error because the config type doesn't have a notifications field.

Webhook notifications defaulting to GET?

First off - just wanted to thank you for making this app - I've been using it to monitor a few of my servers. I've been trying to send POST webhooks to pushover.net and for some reason, Patrol is automatically switching them to GET and causing them to fail.

Here is an example of what I'm using in my patrol.config:

name: **NAME**
db: data.db

## Map consisting of services to display on your statuspage. Each service
## can have multiple checks.
## All check commands are simply run using the default shell (/bin/sh).
services:
  WEB:
    checks:
      - name: Latency
        type: metric
        unit: ms
        interval: 60s
        cmd: 'curl -fsSL -w "%{time_total}" -o /dev/null **SITE**'
      - name: Server Status
        interval: 60s
        cmd: 'curl -fsSL -m 10 -o /dev/null **SITE**'
    on_failure:
      - webhook:
          method: post
          url: https://api.pushover.net/1/messages.json
          headers:
            'Content-Type': 'application/json'
          body: 'body here'

And here is what I'm seeing in the logs:

"WEB": {
      "Checks": [
        {
          "Name": "Latency",
          "Interval": 60000000000,
          "Timeout": 180000000000,
          "Cmd": "curl -fsSL -w \"%{time_total}\" -o /dev/null **SITE**",
          "Type": "metric",
          "MetricUnit": "ms"
        },
        {
          "Name": "Server Status",
          "Interval": 60000000000,
          "Timeout": 180000000000,
          "Cmd": "curl -fsSL -m 10 -o /dev/null **SITE**",
          "Type": "boolean",
          "MetricUnit": ""
        }
      ],
      "OnFailure": [
        {
          "Webhook": {
            "URL": {
              "Scheme": "https",
              "Opaque": "",
              "User": null,
              "Host": "api.pushover.net",
              "Path": "/1/messages.json",
              "RawPath": "",
              "ForceQuery": false,
              "RawQuery": "",
              "Fragment": "",
              "RawFragment": ""
            },
            "Method": "GET",
            "Headers": {
              "Content-Type": "application/json"
            },
            "Body": "body here"
          }
        }
      ],
      "OnRecovered": null,
      "OnSuccess": null
    }

Am I doing something wrong?

Add real-time updates

History should always keep failures

Right now, a success will overwrite a failure, which means uptime is always 100% - which it should not be.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.