Giter Site home page Giter Site logo

monitoring's Introduction

KubeVirt Monitoring

Note that everything is experimental and may change significantly at any time.

This repository collects Grafana dashboards for KubeVirt and Prometheus runbooks for alerts shipped with the KubeVirt stack.

monitoring's People

Contributors

0xfelix avatar akalenyu avatar andreyod avatar apinnick avatar arnongilboa avatar ashleyschuett avatar assafad avatar avlitman avatar borod108 avatar dankenigsberg avatar davidvossel avatar davozeni avatar dharmit avatar erkanerol avatar ezrasilvera avatar iholder101 avatar jean-edouard avatar jherrman avatar machadovilaca avatar marceloamaral avatar nunnatsa avatar orenc1 avatar oshoval avatar prnaraya avatar ramlavi avatar sradco avatar web-flow avatar yuhaohaoyu avatar zcahana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

monitoring's Issues

Please make this repo usable for the kubevirt community

Taking the comment from #38 (comment):

Note that this repo is source of truth for dashboards. The files here are JSON files and compatible with OCP UI and Grafana. They will be consumed by hyperconverged-cluster-operator for OCP UI.

The job below fetches all json files here and convert them to yaml for k8s configmaps.
https://github.com/kubevirt/hyperconverged-cluster-operator/actions/workflows/dashboard-updater.yml

Existing dashboards are here:
https://github.com/kubevirt/hyperconverged-cluster-operator/tree/main/assets/dashboards

Then HCO will deploy them during installation automatically and OCP UI will display them.

Therefore, we need to ensure that this dashboard works in OCP UI as well.

This make the repo unusable for the community and PRs like #38 can realistically never be added to this repo.

Please either add proper e2e testing and releases or move openshift-only dashboards directly to HCO.

Update some of the commands that fetch the namespace to not use 'awk' and 'grep'

We would like to update the runbooks to not use awk or grep for fetching the namespace.
For example in:
https://github.com/kubevirt/monitoring/blob/main/docs/runbooks/CnaoDown.md#diagnosis

We fetch the namespace by running:

 export NAMESPACE="$(kubectl get deployment -A | grep cluster-network-addons-operator | awk '{print $1}')"

and a more precise way to do that would be:

oc get deployment -A --field-selector metadata.name=cluster-network-addons-operator -o=custom-columns=NS:.metadata.namespace --no-headers

Formatting issues in 'kubevirt-top-consumers' dashboard

When the @hco-bot tried to sync the kubevirt-top-consumers dashboard to the HCO repo I found a few issues with the file.

In line 1767 there's a missing { in the string format, where "legendFormat": "{name}} / {{namespace}}", should be "legendFormat": "{{name}} / {{namespace}}",

Other than that, there are several spacing issues such in line 155 and line 1257, where spacing in topk (5 , sum (avg does not follow spacing used in other occurrences such as in line 320.

Other spacing issues appear a lot around operators such as + and < where sometimes they are surrounded by spaces, other times not surrounded by any spaces, sometimes only space after and not before, etc...

List of possible labels that could be included in exposed metrics

I've recently encountered a cardinality & memory explosion on a Prometheus instance and eventually realized that KubeVirt had been deployed by someone and Prometheus was consuming metrics related to it. To my surprise, it had added approximately 220 new labels. This can end up being a substantial addition of cardinality especially for Prometheus instances which are already large to begin with. For example, here's a small excerpt of the labels that get added at the k8s node level:

cpu-feature.node.kubevirt.io/amd_stibp: true 
cpu-feature.node.kubevirt.io/apic: true 
cpu-feature.node.kubevirt.io/arat: true 
cpu-feature.node.kubevirt.io/arch_capabilities: true 
cpu-feature.node.kubevirt.io/avx: true
cpu-feature.node.kubevirt.io/avx2: true

which translates into this for a Prometheus metric:

cpu_feature_node_kubevirt_io_amd_stibp="true", 
cpu_feature_node_kubevirt_io_apic="true", 
cpu_feature_node_kubevirt_io_arat="true", 
cpu_feature_node_kubevirt_io_arch_capabilities="true", 
cpu_feature_node_kubevirt_io_avx="true", 
cpu_feature_node_kubevirt_io_avx2="true",

Although I realize we can then choose to whitelist or drop labels which we don't want, it would be great if the possible list of K8s labels was documented. Is that something that could be added so that users have a better idea of what to expect when this is deployed?

Thanks for your consideration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.