Giter Site home page Giter Site logo

zpool_prometheus's People

Contributors

bmcgough avatar richardelling avatar riyad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

zpool_prometheus's Issues

uint64 overflowing float64

For anything that's a counter, just do mod 2^52 and Prometheus will handle the counter reset gracefully within rate() etc.

zpool health state as value

We have adopted zpool_prometheus for use on one of our clusters. We are prometheus+grafana users with some existing detailed dashboards.

I would very much like to be able to alert on health state and I cannot find a way to do this with the health state information in a label. I can make some useful graphs with the multistat plugin, grouped by label values, but that panel doesn't seem to support alerting.

Other zfs exporters index the health values like this:

0 ONLINE
1 DEGRADED
2 FAULTED
3 OFFLINE
4 UNAVAIL
5 REMOVED
6 AVAIL
7 INUSE
-1 no data/timeout

Is this something you would consider adding? No existing metrics would change, it would just be one additional metric per vdev.

Perhaps I am missing a way to do this in Grafana?

Runtime error on Alpine Linux

When building and running from the alpine:latest Docker image, the following error occurs:

zpool_latency_vdev_scrub_histo_seconds_bucket{name="tank",vdev="root",le="+Inf"} 14168517
zpool_latency_vdev_scrub_histo_seconds_sum{name="tank",vdev="root"} 0
zpool_latency_vdev_scrub_histo_seconds_count{name="tank",vdev="root"} 14168517
error: can't get vdev_trim_histo

I do not have any SSDs attached to this machine, so perhaps that's why it isn't working. When I remove the TRIM-related lines, the application runs to completion.

print stats on spares and caches

Currently, zpool_prometheus follows the vdev children, which represent the currently active devices in pools. Auxiliary devices, such as spares and caches, are not shown unless or until the spare is activated in a pool.

Questions:

  1. do we care about information on spares that are not active?
  2. how do we reconcile duplicate spare devices shared amongst pools?
  3. do we care about information on caches?

second HELP line for metric name

I'm writing the output of zpool_prometheus to a file and reading that file with node_exporter (version 0.18.1) and the parsing of the file is failing with the following error:

May 21 08:18:22 pct-hanas-1.mines.edu node_exporter[123282]: time="2020-05-21T08:18:22-06:00" level=error msg="Error parsing "/var/lib/node_exporter/zpool_prometheus.prom": text format parsing error in line 41: second HELP line for metric name "zpool_stats_state"" source="textfile.go:211"

I've done a little research and found a similar bug elsewhere where it was stated that rules on HELP lines may have changed and there should be only one HELP line per metric name. I have manually removed the HELP line in question to confirm that the parsing does move forward, but I have a large setup with many vdevs so I haven't taken the time to manually remove all the extra HELP lines. I am looking at the code to see if I can figure out how to change it to meet the needs of the new node_exporter versions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.