Note that everything is experimental and may change significantly at any time.
This repository collects Grafana dashboards for KubeVirt and Prometheus runbooks for alerts shipped with the KubeVirt stack.
KubeVirt monitoring dashboards
Home Page: https://kubevirt.io/monitoring/
Note that everything is experimental and may change significantly at any time.
This repository collects Grafana dashboards for KubeVirt and Prometheus runbooks for alerts shipped with the KubeVirt stack.
Taking the comment from #38 (comment):
Note that this repo is source of truth for dashboards. The files here are JSON files and compatible with OCP UI and Grafana. They will be consumed by hyperconverged-cluster-operator for OCP UI.
The job below fetches all json files here and convert them to yaml for k8s configmaps.
https://github.com/kubevirt/hyperconverged-cluster-operator/actions/workflows/dashboard-updater.ymlExisting dashboards are here:
https://github.com/kubevirt/hyperconverged-cluster-operator/tree/main/assets/dashboardsThen HCO will deploy them during installation automatically and OCP UI will display them.
Therefore, we need to ensure that this dashboard works in OCP UI as well.
This make the repo unusable for the community and PRs like #38 can realistically never be added to this repo.
Please either add proper e2e testing and releases or move openshift-only dashboards directly to HCO.
We would like to update the runbooks to not use awk or grep for fetching the namespace.
For example in:
https://github.com/kubevirt/monitoring/blob/main/docs/runbooks/CnaoDown.md#diagnosis
We fetch the namespace by running:
export NAMESPACE="$(kubectl get deployment -A | grep cluster-network-addons-operator | awk '{print $1}')"
and a more precise way to do that would be:
oc get deployment -A --field-selector metadata.name=cluster-network-addons-operator -o=custom-columns=NS:.metadata.namespace --no-headers
Thanks for providing documentation here. However I can't find any license for this repository: is it intentionally proprietary?
https://reuse.software/ can provide advice.
The runbooks docs/runbooks/LowKVMNodesCount.md points to enabling software emulation fallback, which is an unsupported action in downstream. This section should be removed or at least put in a warning for downstream user.
When the @hco-bot tried to sync the kubevirt-top-consumers dashboard to the HCO repo I found a few issues with the file.
In line 1767 there's a missing {
in the string format, where "legendFormat": "{name}} / {{namespace}}",
should be "legendFormat": "{{name}} / {{namespace}}",
Other than that, there are several spacing issues such in line 155 and line 1257, where spacing in topk (5 , sum (avg
does not follow spacing used in other occurrences such as in line 320.
Other spacing issues appear a lot around operators such as +
and <
where sometimes they are surrounded by spaces, other times not surrounded by any spaces, sometimes only space after and not before, etc...
After kubevirt/kubevirt#5766 merged, it gave us the ability to view how quickly VMIs progress through their phases. It would be good to have a shared dashboard to view this data.
Hi,
in the current kubevirt operator only a subset of the Alerts documented in this repo are defined.
Where can I find the definition for the other alerts?
I've recently encountered a cardinality & memory explosion on a Prometheus instance and eventually realized that KubeVirt had been deployed by someone and Prometheus was consuming metrics related to it. To my surprise, it had added approximately 220 new labels. This can end up being a substantial addition of cardinality especially for Prometheus instances which are already large to begin with. For example, here's a small excerpt of the labels that get added at the k8s node level:
cpu-feature.node.kubevirt.io/amd_stibp: true
cpu-feature.node.kubevirt.io/apic: true
cpu-feature.node.kubevirt.io/arat: true
cpu-feature.node.kubevirt.io/arch_capabilities: true
cpu-feature.node.kubevirt.io/avx: true
cpu-feature.node.kubevirt.io/avx2: true
which translates into this for a Prometheus metric:
cpu_feature_node_kubevirt_io_amd_stibp="true",
cpu_feature_node_kubevirt_io_apic="true",
cpu_feature_node_kubevirt_io_arat="true",
cpu_feature_node_kubevirt_io_arch_capabilities="true",
cpu_feature_node_kubevirt_io_avx="true",
cpu_feature_node_kubevirt_io_avx2="true",
Although I realize we can then choose to whitelist or drop labels which we don't want, it would be great if the possible list of K8s labels was documented. Is that something that could be added so that users have a better idea of what to expect when this is deployed?
Thanks for your consideration.
Very low priority.
File name and title of KubevirtVmHighMemoryUsage runbook do not match other runbooks ("KubeVirt...").
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.