Giter Site home page Giter Site logo

mattogodoy / omni Goto Github PK

View Code? Open in Web Editor NEW
163.0 8.0 8.0 517 KB

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

License: GNU General Public License v3.0

Dockerfile 6.57% Shell 5.62% Python 87.81%
kubernetes influxdb

omni's Introduction

OMNI

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

omni

Why?

When I finished my Kubernetes cluster using a few Raspberry Pis, the first thing I wanted to do is install Prometheus + Grafana for monitoring, and so I did. But when I had all of it working I found a few drawbacks:

  • The Prometheus exporter pods use a lot of RAM
  • The Prometheus exporter pods use a considerable amount of CPU
  • Prometheus gathers way too much data that I don't really need.
  • The node where the main Prometheus pod is installed gets all of the information and saves it in its own database, constantly performing a lot of writes to the SD card. SD cards under lots of constant writing operations tend to die.

Last but not least, I like to learn how these things work.

Advantages

Omni has (what I consider) some advantages over the regular Prometheus + Grafana combo:

  • It uses almost no RAM (13 Mb)
  • It uses almost no CPU
  • It gathers only the information I need
  • All of the information is sent to an InfluxDB instance that could be outside of the cluster. This means that no information is persisted in the Pis, extending their SD card's lifetime.
  • InfluxDB acts as the database and the graph dashboard at the same time, so there is no need to also install Grafana (although you could if you wanted to).

Prerequisites

For Omni to work, you'll need to have a couple of things running first.

InfluxDB

It's a time series database (just like Prometheus) that has nice charts and UI overall.

One of the goals of this project is to avoid constant writing to the SD cards, so you have a few options for the placement of the database:

  • InfluxDB Cloud: This might be the easiest option. InfluxDB offers a free plan which should be more than enough for a small Pi Cluster running Omni. More information here.
  • Docker: You can also run InfluxDB in dockerin a server outside the Pi cluster (this what I'm doing right now). More information here.
  • In the cluster: If you have better storage in your cluster (like M.2, SSD, etc.) and don't have the SD card limitation, you could also run InfluxDB in the same cluster you are monitoring.

Libraries

You'll need to have the libseccomp2.deb library installed in each of your nodes to avoid a Python error:

Fatal Python Error: pyinit_main: can't initialize time

(more info here)

To install it you can do it in two ways (only one is needed):

  • Ansible: all nodes at the same time

    Edit the file ansible-playbook-libs.yaml in this repo, add your hosts and run:

    ansible-playbook install-libs.yaml
  • SSH: one by one

    Connect into each of your nodes and run:

    wget http://ftp.us.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_armhf.deb
    sudo dpkg -i libseccomp2_2.5.1-1_armhf.deb

Once you have it, everything should work ok.

⚠️ NOTE: These libraries are for the armhf (32-bit version) operative systems. If you are running a 64-bit version, please refer to this issue for instructions.

Installation

Before deploying Omni you'll need an instance if InfluxDB running somewhere.

Setting up InfluxDB

Once you have your InfluxDB instance running, you'll have to perform some steps to start receiving data:

  1. Log into your InfluxDB instance.
  2. Click "Data" on the left panel and then click on the "Buckets" tab.
  3. Click on "Create Bucket" at the top right.
  4. Write a name for your bucket (you can use anything you want here) and click on "Create". This is where you will store the information sent by Omni.
  5. Once you have your bucket created you'll need a token. For this, click on the "Tokens" tab.
  6. To create a new one click on "Generate" > "Read/Write Token" at the top right.
  7. Write a description for your token and select the bucket you just created in both panels (Read and Write). Click on "Save".
  8. Now you'll see your new token in the list. Click on its name to see the details.
  9. Here you will see your token. Click on the "Copy to clipboard" button. You'll need this token to configure Omni next.

Setting up Omni

Now you'll need to specify the attributes of your InfluxDB instance in Omni's configuration:

  1. Open omni-install.yaml and fill the variables in the ConfigMap section with your InfluxDB instance information.

    NOTE: The attribute OMNI_DATA_RATE_SECONDS specifies the number of seconds between data reporting events that are sent to the InfluxDB server.

  2. Deploy Omni in your cluster:

kubectl apply -f omni-install.yaml
  1. Check that everything is running as expected:
kubectl get all -n omni-system

Creating a dashboard

Once the Omni DaemonSet is up and running in your cluster, it's already sending telemetry to the Influx database.

To create a dashboard, log into your InfluxDB instance and follow these steps:

  1. Click on "Boards" at the left panel and then "Create Dashboard" > "New dashboard".
  2. Click on "Add Cell". This will open the query page.
  3. In the "FROM" box, click on your bucket.
  4. In the "Filter" boxe click on "cpu" > "usage_percent"
  5. Select all of your nodes and click on "Submit". You will see your data in the graph.
  6. At the top you can set a name and a style for this cell. Also you can change fine tune it further by clickong on "Customize".
  7. Once you are happy with the results, click the ✅ button at the top right.
  8. And there you have it! Your first cell with data from your cluster 🎉

To visualize more information, repeat these steps for disk usage, memory, temperature, etc.

Contributions

Pull requests with improvements and new features are more than welcome.

omni's People

Contributors

mattogodoy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omni's Issues

Omni not passing token?

this is great - I did EXACTLY what you did and setup Prometheus and Grafana and nearly killed my 4-node Rpi cluster. When I set everything up, I ended up seeing in the omni pod logs this error:

                omni
                return self.request("POST", url,
                omni
                File "/usr/local/lib/python3.8/site-packages/influxdb_client/rest.py", line 250, in request
                omni
                raise ApiException(http_resp=r)
                omni
                influxdb_client.rest.ApiException: (401)
                omni
                Reason: Unauthorized

There is a corresponding error in influxdb:

              influxdb
              ts=2021-10-13T17:10:18.772914Z lvl=info msg=Unauthorized log_id=0XAaFye0000 error="token required"

I've triple-checked, and tried a couple of different kinds of tokens. Have you seen any common reasons for this? Thanks in advance for any help, if I can end up helping here I'll do so but I'm pretty rusty.

armhf vs arm64

Hey, This is really cool - thanks for all the hard work 😄

Just thought I'd raise this, as it was a small issue I hit.

my hardware:

1 x raspberry pi 4 (8gb)
2 x raspberry pi 3b+ 's (1gb)
7 x raspberry pi compute module 3+'s (1gb)
All running ubuntu server 20.04 LTS 64-bit

On trying to install using your Ansible, i got this error across all nodes:

fatal: [node2]: FAILED! => {"changed": true, "cmd": ["dpkg", "-i", "libseccomp2_2.5.1-1_armhf.deb"], "delta": "0:00:00.237499", "end": "2021-06-18 14:07:28.893279", "msg": "non-zero return code", "rc": 1, "start": "2021-06-18 14:07:28.655780", "stderr": "dpkg: error processing archive libseccomp2_2.5.1-1_armhf.deb (--install):\n package architecture (armhf) does not match system (arm64)\nErrors were encountered while processing:\n libseccomp2_2.5.1-1_armhf.deb", "stderr_lines": ["dpkg: error processing archive libseccomp2_2.5.1-1_armhf.deb (--install):", " package architecture (armhf) does not match system (arm64)", "Errors were encountered while processing:", " libseccomp2_2.5.1-1_armhf.deb"], "stdout": "", "stdout_lines": []}

I just switched the URL to point to an arm64 binary instead and all was fine.
I was going to make the change and open a pull request, but i appreciate you may want to keep the current armhf deb in place for a number of reasons! If not, let me know and I'll get a pull request opened!

For anyone who may come across this issue in future, my fix was updating the ansible-playbook-libs.yaml file to the following:

  - name: Download libseccomp2.deb
    command: "wget http://ftp.us.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_arm64.deb"
    become: yes

  - name: Install libseccomp2.deb
    command: "dpkg -i libseccomp2_2.5.1-1_arm64.deb"
    become: yes`

Thanks again!

404 on libseccomp2 library wget url

The library url http://ftp.us.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_armhf.deb for the wget shown in both the readme and the ansible playbook returns 404 now.

I am figuring out the update and will do a PR but if anyone else gets this error I am poking around on the packages webpage here to find the updated url.

For now I I have 64 bit Ubuntu 21.10 on my rpi4 cluster so the latest version was installed anyways and all worked great. Thanks again for thsi project

Pod specific statistics - probably a feature request

Hi,

I really like the light-weight nature of Omni. It was simple to install as well. I was like you, I tried to run Prometheus on my 3 pi k3s cluster, and while it would run for a day or so, it would soon crash no mater how many resources I gave to it.

Would it be possible to add pod-specific statistics to Omni? I'd like to be able to make sure that individual pods don't run out of disk space, and also see how much CPU they consume over time. I do have the Kubernetes dashboard installed, but I'd like the flexibility of setting up alerts etc.

Thanks!

P.S. it was awesome hooking up Omni to the influx free cloud service. I had no idea that existed, and its great for my small use-case!

Dashboard

Hello, first of all thank you so much. This repo is so great for a small start on monitoring Raspberry Pi's.

I couldn't find the InfluxDB Dashboard you have attached in the README anywhere in the repo. Am I missing something or did you simply not added the dashboard to the project?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.