Giter Site home page Giter Site logo

netapp / harvest Goto Github PK

View Code? Open in Web Editor NEW
138.0 17.0 34.0 43.96 MB

Open-metrics endpoint for ONTAP and StorageGRID

Home Page: https://netapp.github.io/harvest/latest

License: Apache License 2.0

Makefile 0.48% Shell 0.98% Go 97.66% Dockerfile 0.09% Python 0.61% CUE 0.12% JavaScript 0.07%
netapp-public prometheus monitoring grafana-dashboard observability storagegrid go

harvest's Introduction

What is NetApp Harvest?

Harvest is the open-metrics endpoint for ONTAP and StorageGRID

NetApp Harvest brings observability to ONTAP and StorageGRID clusters. Harvest collects performance, capacity and hardware metrics from ONTAP and StorageGRID, transforms them, and routes them to your choice of time-series database.

The included Grafana dashboards deliver the datacenter insights you need, while new metrics can be collected with a few edits of the included template files.

Harvest is open-source, built with Go, released under an Apache2 license, and offers great flexibility in how you collect, augment, and export your datacenter metrics.

To get started, follow our quickstart guide or install Harvest.

Community

There is a vibrant community of Harvest users on Discord and GitHub discussions. Come join! ๐Ÿ‘‹

Documentation

๐Ÿ“• https://netapp.github.io/harvest/

Videos


Developed with ๐Ÿ’™ by NetApp - Privacy Policy

harvest's People

Contributors

7840vz avatar burkl avatar cdurai-netapp avatar cgrinds avatar chrishenzie avatar dependabot[bot] avatar george-strother avatar github-actions[bot] avatar hardikl avatar mrydeen avatar rahulguptajss avatar renovate[bot] avatar samyuktham avatar schmots1 avatar sridevimm avatar vgratian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

harvest's Issues

Build release artifacts for other OSes

  • Alpine Linux for small containers - see relevant stackoverflow
  • Mac (darwin) - several folks have asked for this
  • Evaluate GoReleaser

While Harvest runs on the Mac (several of us develop/run on Macs), we aren't packaging pre-compiled binaries for it yet.

Building is easy with

GOOS=darwin make

you'll also want to edit your harvest.yml and add a cluster since the unix poller doesn't work on the Mac.

harvest stop does not stop pollers that have been renamed

Steps to reproduce - will add

  1. Edit harvest.yml and add/enable one poller, call it foo
  2. Verify not running
bin/harvest status
Datacenter            Poller                PID        PromPort        Status
+++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++
nane                  foo                                              not running
+++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++
  1. Start poller
bin/harvest start
Datacenter            Poller                PID        PromPort        Status
+++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++
nane                  foo                   5828                       running
+++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++
  1. Edit harvest.yml change name of foo to foo2
  2. Status fails because the "wrong" poller is queried
bin/harvest status
Datacenter            Poller                PID        PromPort        Status
+++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++
nane                  foo2                                             not running
+++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++

If you run harvest start you will create a new poller named foo2 while the first started poller is still running

ps aux | grep poller
root      5828  3.0  0.0 2795344 76752 ?       Sl   11:17   0:04 bin/poller --poller foo --loglevel 2 --promPort  --daemon
root      5912 49.8  0.0 2869588 97988 ?       Sl   11:19   0:02 bin/poller --poller foo2 --loglevel 2 --promPort  --daemon

start/stop/status should be more resilient to name changes. In a few places, we already interrogate /proc, extract command line arguments, and parse them. We should do the same for stop/status too. In other words, stopping and status should not depend on the names in harvest.yml, they instead should query the OS.

Support global_prefix on Grafana import

Is your feature request related to a problem? Please describe.
Support exporter global_prefix for Grafana dashboard import. If a prefix is defined today, all queries on the dashboards have to be updated manually

Describe the solution you'd like
While importing the dashboards via the Grafana tool, the exporter section should be checked for global_prefix entries.

Describe alternatives you've considered
Add a command parameter (e.g. --db-prefix) to the Grafana tool like the --datasource parameter for naming the prefix and updating the queries in the dashboards.

Harvest stops working after reboot on CentOS / RHEL (tmpfilesd configuration missing)

Describe the bug
After reboot harvest pollers do not start.

Environment

  • Harvest: harvest version 21.05.1-1 (commit 2211c00) (build date 2021-05-21T01:28:12+0530) linux/amd64
  • OS: RHEL 7.9
  • Install method: yum

To Reproduce

  1. yum install harvest.rpm
  2. configure poller
  3. systemctl enable harvest
  4. restart

Expected behavior
Harvest pollers should run after restart

Actual behavior
Harvest pollers don't start because /var/run/harvest is missing and can't be created by itself

error mkdir [/var/run/harvest/]: mkdir /var/run/harvest/: permission denied

Possible solution, workaround, fix
The RPM creates the directory /var/run/harvest for the PID files with mkdir. The directory /var/run in RHEL/CentOS 7 is managed by tmpfilesd and created from scratch during reboot. You need to include a tmpfiles-configuration for the directory.

Example:
cat /usr/lib/tmpfiles.d/harvest.conf
d /var/run/harvest 0755 harvest harvest -

Additional context

Status label on cluster_status metric disappears

Describe the bug
Status label on cluster_status metric disappears roughly after 15mins of starting Harvest

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Harvest version: harvest version 21.05.1-1 (commit 2211c00) (build date 2021-05-21T01:28:12+0530) linux/amd64
  • Command line arguments used: [e.g. bin/harvest start --config=foo.yml --collectors Zapi]
  • OS: RHEL 7.9
  • Install method: yum
  • ONTAP Version: 9.7
  • Other:

To Reproduce
curl -s localhost:12990/metrics | grep cluster_status

Expected behavior
Should report cluster_status metric as
cluster_status{cluster="clustername",datacenter="dc",status="ok"} 1

Actual behavior
Initially reports correct metric then after 15minutes roughly the metric turns into
cluster_status{cluster="clustername",datacenter="dc"} 1

Possible solution, workaround, fix
None

Performance metrics dont display volume name

when polling performance metrics, it looks like the following:
volume_write_latency{datacenter="lod",cluster="cluster1",node="cluster1-01",svm="cluster1-01",aggr="aggr0",type="flexvol"} 79.36234458259325
=> volume name is missing

whereas other metrics display the following
volume_size_available{datacenter="lod",cluster="cluster1",volume="trident_qtree_pool_nas2_LRHCOABWFT",node="cluster1-01",svm="nfs_svm",aggr="aggr1",style="flexvol"} 5367836672
=> volume name is present

is that expected behavior?

Environment configuration:

  • NetApp Lab on Demand "Using Trident with Kubernetes and ONTAP v4.0"
  • host: RHEL 7.5
  • installation method: yum install harvest-21.05.1-1.x86_64.rpm
  • Harvest collectors: zapi & zapiperf
  • ONTAP 9.7

allow TLS server verification for basic auth

Describe the bug
Pollers won't start with this combination of options:

use_insecure_tls: false
auth_style: basic_auth
username: user
password: pass

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Harvest version: harvest version 21.05.1-1 (commit 2211c00) (build date 2021-05-21T01:28:12+0530) linux/amd64
  • OS: RHEL 7.9
  • Install method: yum
  • ONTAP Version: 9.7P12

To Reproduce
Set use_insecure_tls to false and use basic authentication with username & password

Expected behavior
Pollers should start, server certifcate in ONTAP is valid.

Actual behavior
Pollers won't start.

2021/05/25 09:40:40  (warning) (poller) (hostname): init collector-object (Zapi:Node): connection error => invalid parameter => use_insecure_tls is false, but no certificates
2021/05/25 09:40:40  (warning) (poller) (hostname): aborting collector (Zapi)
2021/05/25 09:40:40  (warning) (poller) (hostname): init collector-object (ZapiPerf:SystemNode): connection error => invalid parameter => use_insecure_tls is false, but no certificates
2021/05/25 09:40:40  (warning) (poller) (hostname): aborting collector (ZapiPerf)
2021/05/25 09:40:40  (warning) (poller) (hostname): no collectors initialized, stopping
2021/05/25 09:40:40  (info   ) (poller) (hostname): cleaning up and stopping [pid=22027]

Possible solution, workaround, fix
Fix client.go logic to allow certificate verification with basic auth.

Add Universal Collector to collect metrics from files or HTTP endpoint

Is your feature request related to a problem? Please describe.
Harvest 1.6 supported running extensions (scripts in any language) to easily collect custom metrics. One example is collecting NFS mount information from ONTAP CLI. But extensions were able to collect any metric from any source. This functionality is missing in Harvest 2.0. Many users already have some scripts to collect custom metrics which they can not easily integrate to Harvest. Currently only solution is to rewrite those scripts in Go an create a Harvest collector.

Describe the solution you'd like
Idea was suggested by @georgmey. Create a collector that will collect metrics in open-metric-format from files or a remote HTTP endpoint. This would be very similar to Prometheus' node exporter.

Describe alternatives you've considered
Implementing the same Extension framework that we had in Harvest 1.6 would require too much work. Moreover, running custom scripts as part of Harvest runtime would raise considerable security concerns (one of the reasons we discontinued 1.6).

Additional context
The general workflow of the Universal collector:

  • User has a custom script or program that generates metrics and writes them either in a file on exposes on an HTTP endpoint
  • Universal Collector collects those metrics and parses them into the Matrix.
  • User can choose to which databases to export these metrics (as per usual).

The Universal collector will be only good for small number of metrics. It would not be a good solution for large chunks of metrics where high-performance is critical.

Prometheus exporter: incomplete metatags

Describe the bug

  • instance labels (when exported as separate metric) and status metrics do not get the "HELP" and "TYPE" metatags

This was reported by a customer, I will add more details when I get them from the customer and/or reproduce the issues myself.

Shelf metrics appear to only collect metrics for one shelf

Describe the bug
Shelf metrics appear to only be collected for one shelf

Environment
Provide accurate information about the environment to help us reproduce the issue.

  • Harvest version: harvest version 21.05.1-1 (commit 2211c00) (build date 2021-05-21T01:28:12+0530) linux/amd64
  • Command line arguments used: [e.g. bin/harvest start --config=foo.yml --collectors Zapi]
  • OS: RHEL 7.9
  • Install method: yum
  • ONTAP Version: 9.7
  • Other:

To Reproduce
N/A

Expected behavior
metrics such as shelf_temperature_reading shelf_psu_power_drawn should report metrics for each shelf id.

Actual behavior
One time series is returned per cluster for what appears to be a shelf id at random.

Possible solution, workaround, fix
N/A

Additional context
Appears to be related to the Shelf plugin used at /conf/zapi/cdot/9.8.0/shelf.yaml, shelf_labels appears to correctly collect data for each shelf.

InfluxDB exporter should support URL end-point

Is your feature request related to a problem? Please describe.
From Slack:

Steve S  3:19 PM
for the influxdb exporter is it possible to just pass a URL?

Chris Grindstaff  3:21 PM
@Steve S can you give an example of what you want? you mean instead of decomposing the url into an addr and port?

Steve S  3:25 PM
correct - we have a large scale configuration behind load balancer  (so weโ€™re not hitting a single host).

Chris Grindstaff  3:28 PM
make sense - I'll create an issue to track - if you're interested this is the line of code to change if you want to give it a try

Describe the solution you'd like
Allow me to exporter to a URL instead of address and IP

Add more metrics to Grafana dashboards

The current set of Prometheus dashboards display a small fraction of the metrics Harvest collects.
We should consider:

  • culling metrics we don't plan on ever displaying
  • add useful ones to existing dashboards
  • make it easy for customers to decide on their own

Improve Grafana tool

Needs more tests.

Feature request: Make datasource a variable.
See

  -v, --variable            use datasource as variable, overrides: --datasource

Unified Manager (AIQUM) as data source for clusters

Is your feature request related to a problem? Please describe.
With more than 100 clusters there is a lot of work to keep the harvest.conf updated.

Describe the solution you'd like
Harvest connects to AIQUM to get a list of all clusters (maybe with the possibility to filter by annotations). Afterwards it is using this list to connect to every single cluster and gather all the performance / health information.

"systemctl status harvest" not accurate

Hi

my harvest environment seems to be working, however the systemctl status harvest command does not show my poller as running, even though it works...

some details about the environment:

 systemctl status harvest
โ— harvest.service - Harvest
   Loaded: loaded (/etc/systemd/system/harvest.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-05-21 08:06:19 UTC; 5min ago
  Process: 8875 ExecStart=/opt/harvest/bin/harvest restart --config /opt/harvest/harvest.yml (code=exited, status=0/SUCCESS)
 Main PID: 8882 (poller)
    Tasks: 8
   Memory: 31.2M
   CGroup: /system.slice/harvest.service
           โ””โ”€8882 bin/poller --poller cluster1 --loglevel 2 --config /opt/harvest/harvest.yml --daemon

May 21 08:06:19 rhel6 systemd[1]: Starting Harvest...
May 21 08:06:19 rhel6 harvest[8875]: Datacenter            Poller                PID        PromPort        Status
May 21 08:06:19 rhel6 harvest[8875]: +++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++
May 21 08:06:19 rhel6 harvest[8875]: lod                   cluster1                                         not running
May 21 08:06:19 rhel6 systemd[1]: Started Harvest.

Polling Harvest:

curl localhost:31000/metrics | grep volume_write_data
volume_write_data{datacenter="lod",cluster="cluster1",node="cluster1-01",svm="nfs_svm",aggr="aggr1",type="flexvol"} 0
volume_write_data{datacenter="lod",cluster="cluster1",node="cluster1-01",svm="nfs_svm",aggr="aggr1",type="flexvol"} 282.42660719938857
volume_write_data{datacenter="lod",cluster="cluster1",node="cluster1-01",svm="cluster1-01",aggr="aggr0",type="flexvol"} 11622.309120154781

my Harvest configuration

$ more harvest.yml
Exporters:
  Harvest:
    exporter: prometheus
    port: 31000
    addr: 0.0.0.0

Defaults:
  collectors:
    - Zapi
    - ZapiPerf
  exporters:
    - Harvest
  use_insecure_tls: true

Pollers:
  cluster1:
    datacenter: lod
    addr: 192.168.0.101
    auth_style: basic_auth
    username: admin
    password: Netapp1!

Environment configuration:

  • NetApp Lab on Demand "Using Trident with Kubernetes and ONTAP v4.0"
  • host: RHEL 7.5
  • installation method: yum install harvest-21.05.1-1.x86_64.rpm

Adding promPort to pollers section in harvest.conf

Is your feature request related to a problem? Please describe.
We want to have only one exporter and define the ports for the prometheus exporter in the single pollers

Describe the solution you'd like
Add the parameter promPort to the Poller section.

Additional context
Example conf file

  Exporters:
  harvest:
    exporter: prometheus
    addr: 0.0.0.0
    global_prefix: netapp_
    master: True

Defaults:
  collectors:
    - Zapi
    - ZapiPerf
  exporters:
    - harvest
  use_insecure_tls: true
  auth_style: basic_auth
  username: <user>
  password: <password>

Pollers:
  clusterA:
    datacenter: DC1
    addr: clusterA
    promPort: 25000
  clusterB:
    datacenter: DC2
    addr: clusterB
    promPort: 25001

Improve config tool

The config tool helps configure pollers that monitor ONTAP clusters.

The tool should:

  • validate harvest configuration flie (harvest.yml)

Implemented, but needs more testing:

  • create a client certificate on the local system
  • create a read-only harvest user for ONTAP
  • install client certificate on ONTAP

Known issues

  • client certificate is re-generated each time

We may want to integrate this tool with the Zapi tool

Create harvest doctor to validate customer environments

Similar to brew doctor and support bundle. Collect information, validate, and present to customer.
Compress and offer to share.

Ideas:

  • validate yaml
  • verify harvest is or is not running
  • verify that Prometheus is or is not reachable
  • verify permissions
  • verify credentials
  • hit the Prometheus endpoint(s) and show results
  • check versions of Prometheus, Grafana, InfluxDB, OS, and embedded release/commit/build date
  • validate /var/run/harvest permissions

Extract what's reasonable from Troubleshooting Harvest automate and/or include

Add Prometheus service discovery

Harvest should use one of Prometheus's existing service discovery options to make it easier for customers to add hundreds of pollers to Harvest without having to specify the exact port for each poller and each Prometheus target.

Reference

https://prometheus.io/docs/guides/file-sd/
https://prometheus.io/blog/2018/07/05/implementing-custom-sd/
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dns_sd_config
https://yetiops.net/posts/prometheus-srv-discovery/

version flag is missing new line on some shells

$ bin/harvest --version
harvest version 21.05.1919 darwin/amd6 $
---------------------------------------^  cursor is left here, should be on next line

This happens in bash, does not happen in zsh or fish.

Improve ZAPI tool

The Zapi tool (harvest zapi) retrieves available counters and metadata from ONTAP (CDOT or 7mode). The tool should enable you to configure Harvest to collect any ONTAP metric.

Currently the tool is able to

  • retrieve ZAPI attributes and ZAPIPERF objects and counters
  • export ZAPIPERF objects into Harvest .yaml templates

Issues:

  • ugly output
  • exported templates not properly tested

NAbox compatibility

As Alpine Linux doesn't use glibc, harvest 2 doesn't currently work, and prevents integration in NAbox 3

This can be solved either by providing an alpine build in the pipeline (or apk package), or compiling harvest statically.

Sorry @vgratian I totally forgot about that one !

Thoughts ?

Harvest Plugins

Harvest Plugins

Harvest v21.05.01 has limited support for dynamically linked Go plugins built using buildmode=plugin. Unfortunately Go's support for dynamic plugins is weak and comes with significant drawbacks.

Basically, Go plugins were not designed as a way for other people to extend your app. They were designed for you to extend your app.

Before outlining the pros and cons of Go's plugins, let's explore why we want them.

Why do we want Harvest plugins?

  1. Plugins allow 3rd parties to extend Harvest without (re)building it. You program to Harvest's API and, at runtime, Harvest dynamically loads your code into its process and calls it.

  2. You only "pay" for what you use. If you don't use a feature from plugin A, you don't pay for it in disk footprint, you don't load the code into memory, etc. This is less important than #1.

Plugins, as a concept, are great. They allow us to build a loosely-coupled modular system. Harvest's current implementation doesn't address 1 or 2, and arguably introduces more problems than it solves.

Cons of Go Plugins

  1. Plugins and Harvest must be complied with the exact same version of Go.
  2. Plugins and Harvest must be compiled with the same GOPATH.
  3. Any imported packages between Harvest and the plugin must be the exact same version.
  4. Plugins and Harvest can NOT vendor dependencies. If either have vendored dependencies, Harvest won't work.
  5. Debuggers don't work with dynamic code - this means you can't use a debugger with Harvest right now because the interesting parts of Harvest are implemented as Go plugins.
  6. Makes cross compiling harder or impossible (see Windows, Alpine, etc.)
  7. Creates ~3x larger executables - with buildmode=plugin the bin directory is 140 MB, without buildmode=plugin the bin directory is 48M
  8. ~7x slower builds
# with buildmode=plugin
$ make clean
$ time GOOS=darwin make build
Executed in   24.94 secs

# without buildmode=plugin
$ make clean
$ time GOOS=darwin make build
Executed in    3.39 secs

Experience of others

We're not the only team to hit issues with Go plugins. I haven't found a project that recommends them.

Traefik

Traefik tried and abandoned plugins due to development pain

traefik/traefik#1336 (comment)
traefik/traefik#1336 (comment)

OK, bad news...
We can't load an external plugin if Traefik is built with CGO_ENABLED=0, and we really need this to build a statically linked golang executable to run in a Docker container.
golang/go#19569 and there are no plan on this golang/go#19569 (comment)

If you compile traefik binary on your laptop, and a plugin in docker on your laptop, it does not work either: Error loading plugin: error opening plugin: plugin.Open: plugin was built with a different version of package

Ultimately they had so many problems they abandoned Go plugins and built a Go interpreter and use that instead.

Telegraf

Telegraf added Go plugin support, but they consider it experimental with limited support and it still requires a custom build of Telegraf. Their issue tracker has the usual build and version issues everyone does.

influxdata/telegraf#7162

Prometheus and VictoriaMetrics

Not sure if they rejected or never tried. Probably rejected.

Caddy

Not sure if Caddy learned from others and rejected Go plugins or went a different way from the beginning. The way you extend Caddy is by building a custom version yourself with side-effect loading init functions. No dynamic linking. Edit Caddy main and include imports.

Options for Harvest

  1. Remove buildmode=plugin code. If folks want to extend Harvest with plugins they clone, add their code, and build their own version of Harvest.

  2. Add an approach similar to Caddy and Benthos - build your plugin and add an import to Harvest's main . Benthos example

  3. Keep what we have - not great given the downsides: no debugger, bigger executables, no cross compile.

  4. Add some sort of exec model where we can call any process and read/write to stdout/stdin. Not great from a performance or security point of view. A poor man's RPC.

  5. RPC - something like Hashicorp's. Performance may be a concern here too - the trick with both of these last two is to avoid too many trips across the RPC layer.

Recommendation

We should go with #1 & #2. Keep the plugin concept in Harvest, but don't implement it with Go plugins. First we remove buildmode=plugin (already done) and then we work on #2 using an architecture similar to Caddy's.

Until we implement #2, if you want to extend Harvest, you extend it the same way you would most open-source projects: clone, make your changes, keep your fork up to date.

Resources

Release 21.05.0

  • Cherry pick remaining changes
  • Use Jenkins to create release artifacts
  • Update changelog for 21.05
  • Back fill rc2 and rc1 sections of changelog
  • Create NOTICE file for 21.05
  • Commit, push, create a new release pull request - use same all the way through to final
  • Use GitHub Create Release to public
  • Announce

Replace current Harvest logging with Zerolog

Current logging is pretty basic. We should be using standard framework to solve this.

logrus is currently in maintenance. That leaves Zerolog and zap. Zerolog is simpler and integrates well with lumberjack.

Volume dashboard does not display volume names

Kicking the tires on 2.0. The Netpp Detail: Volume dashboard does not display volume names on any of the charts, except the "volumes in cluster" table. All other charts on the dashboard display data but no volume labels

Screen Shot 2021-05-26 at 11 17 02 AM

.

Add REST support to Harvest

ZAPIs are being deprecated in ONTAP.
From slack forum:

ZAPIs will still be available in the version of ONTAP released around Oct 2022.
That version of ONTAP will have support till 2025.

Disk serial number and is-failed are missing from cdot query

Describe the bug
disk serial was in rc2, but isn't in 21.05

Environment

  • Harvest version: harvest version 21.05.2512-v21.05.0 (commit fc433fe) (build date 2021-05-25T12:21:55-0400) linux/amd64
  • Command line arguments used: bin/harvest start
  • OS: RHEL 8
  • Install method: native
  • ONTAP Version: 9.9
  • Other:

See change here
6bb79ec

Workaround
edit conf/zapi/cdot/9.8.0/disk.yaml and add ^serial-number => serial_number to the counters/storage-disk-info/disk-inventory-info section

counters:
  storage-disk-info:
    - ^^disk-uid
    - ^^disk-name               => disk
    - disk-inventory-info:
      - bytes-per-sector        => bytes_per_sector
      - capacity-sectors        => sectors
      - ^disk-type              => type
      - ^is-shared              => shared
      - ^model                  => model
      - ^serial-number          => serial_number

In the same file export it like so:

export_options:
  instance_keys:
    - node
    - disk
  instance_labels:
    - type
    - model
    - outage
    - owner_node
    - shared
    - shelf
    - shelf_bay
    - serial_number

Make vendored copy of dependencies

This is a small change. I've found that using vendored dependencies can help reduce build flakiness, particularly when the project has many dependencies it pulls in. It doesn't look like there are many here, but maybe this can help future proof it.

Enhance systemd service implementation to monitor all pollers

Is your feature request related to a problem? Please describe.
In the current way systemd service is implemented, harvest just forks all poller child processes and closes. Systemd has no concept of keeping track of all the children. If I have more then one poller I can never be sure that all are running by monitoring the systemd service.

Example:
Harvest Systemd status is active

โ— harvest.service - Harvest
   Loaded: loaded (/etc/systemd/system/harvest.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-05-25 09:56:15 CEST; 24min ago
  Process: 2512 ExecStart=/opt/harvest/bin/harvest restart --config /opt/harvest/harvest.yml (code=exited, status=0/SUCCESS)
 Main PID: 28539 (code=exited, status=0/SUCCESS)
    Tasks: 30
   Memory: 104.6M
   CGroup: /system.slice/harvest.service
           โ”œโ”€2521 bin/poller --poller unix --loglevel 2 --config /opt/harvest/harvest.yml --daemon
           โ”œโ”€2538 bin/poller --poller ontap1 --loglevel 2 --config /opt/harvest/harvest.yml --daemon
           โ””โ”€2548 bin/poller --poller ontap2 --loglevel 2 --config /opt/harvest/harvest.yml --daemon

I kill a single poller, service status remains active.

kill 2548

# ./bin/harvest status
Datacenter            Poller                PID        PromPort        Status
+++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++
local                   unix                2521                       running
DC1                     ontap1              2538                       running
DC2                     ontap2                                         not running
+++++++++++++++++++++ +++++++++++++++++++++ ++++++++++ +++++++++++++++ ++++++++++++++++++++
โ— harvest.service - Harvest
   Loaded: loaded (/etc/systemd/system/harvest.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-05-25 09:56:15 CEST; 25min ago
  Process: 2512 ExecStart=/opt/harvest/bin/harvest restart --config /opt/harvest/harvest.yml (code=exited, status=0/SUCCESS)
 Main PID: 28539 (code=exited, status=0/SUCCESS)
    Tasks: 20
   Memory: 63.2M
   CGroup: /system.slice/harvest.service
           โ”œโ”€2521 bin/poller --poller unix --loglevel 2 --config /opt/harvest/harvest.yml --daemon
           โ””โ”€2538 bin/poller --poller ontap1 --loglevel 2 --config /opt/harvest/harvest.yml --daemon

Describe the solution you'd like
Systemd harvest service should monitor the status of all pollers and should switch to "failed" if any poller is down. Alternatively create single service instances, one for each poller.

Describe alternatives you've considered
Monitoring the service and getting the status of every poller is essential for production usage.

Add ASUP analytics

Similar to Harvest 1.6, Harvest2 should offer an option (ASUP, EMS, etc) to track # of installs, # pollers, etc.

Code that sends ASUP from Harvest 1.6

# Send AutoSupport with statistics about Harvest poller
sub send_autosupport_stats()
{
	
	# Get counts of nodes and volumes monitored
	# No point to continue if we don't have node instances
	my $nodes_count = keys %{$instance{"system:node"}};
	return 0 unless ($nodes_count > 0);
	my $vol_count = keys %{$instance{volume}};
	
	# Get OS
	my $distro = `cat /etc/os-release 2>/dev/null | grep PRETTY` || $Config{osname} || "unknown";
	# If we read os-release, what we need is the string after = optionally between single/double quotes
	$distro =~ s/^\s?PRETTY_NAME\s?=\s?['"]?(.*?)\s?['"]?.?$/$1/ms; # if $distro =~ /PRETTY/;

	# Runtime of worker
	my $runtime = `ps -ly --pid=$$ --no-headers 2>/dev/null` || "--";
	$runtime =~ s/.*\s(\d\d:\d\d:\d\d)\s.*/$1/ms;

	# Compose name for Harvest instance
	my $host_name = hostname();
	my $harvest_name = $connection{group} . "_" . $host_name;;

	# Stats about worker performance
	my $stats = "";
	for my $k ( qw (metrics fails skips plugin_time api_time last_time) )
	{
		$stats = $stats . ";" . $connection{statistics}{$k};
	}

	my $log_message = "HARVEST $VERSION [$harvest_name] [$distro] [$connection{product};$connection{host_version_generation};$connection{host_version_major}] [$nodes_count;$vol_count;$runtime] [$stats]";

	logger ("DEBUG", "[send_autosupport_stats] Composed AutuSupport log with statistics [$log_message]");

	my $server = $connection{server_obj};

	for my $node (keys %{$instance{"system:node"}})
	{
		my $node_name = $instance{"system:node"}{$node}{name};

		my $in = NaElement->new("autosupport-invoke");
		$in->child_add_string("force", "true");
		$in->child_add_string("message", $log_message);
		$in->child_add_string("node-name", $node_name);
		$in->child_add_string("type", "all");

		my $out = $server->invoke_elem($in);

		if ($out->results_status() eq "passed")
		{
			logger ("NORMAL", "[send_autosupport_stats] Sent AutoSupport Log Message for Node [$node_name] with statistics [$log_message].");
		}
		else
		{
			logger ("WARNING", "[send_autosupport_stats] Sending AutoSupport log message for Node [$node_name] failed with reason: $out->results_reason(). Message was [$log_message].");
		}
	}

	return 0;
}

Grafana Dashboards panels getting 404

Describe the bug
Some metrics in some Grafana dashboards panels getting an error code 404

Environment

  • Harvest version: harvest version 21.05.1-1 (commit 2211c00) (build date 2021-05-21T01:28:12+0530) linux/amd64
  • OS: CentOS Linux release 7.9.2009 (Core)
  • Install method: rhel
  • ONTAP Version: 9.7

To Reproduce
Just open any dashboard

Expected behavior
all panels should return values

Actual behavior
some of them returns error code 404

Possible solution, workaround, fix
Disable Exemplars feature

Additional context
I did notice that only a panels with that feature enabled getting 404. I did some tests and looks like the issue coming from that query - query_exemplars. When I disabled Exemplars feature in each panel in each dashboard, all panels started to get a metrics and all looks good.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.