kubevirt / monitoring Goto Github PK

KubeVirt monitoring dashboards

Home Page: https://kubevirt.io/monitoring/

Python 8.54% Makefile 3.06% Go 87.44% Dockerfile 0.59% Shell 0.36%

monitoring's Introduction

KubeVirt Monitoring

Note that everything is experimental and may change significantly at any time.

This repository collects Grafana dashboards for KubeVirt and Prometheus runbooks for alerts shipped with the KubeVirt stack.

monitoring's People

Contributors

Stargazers

Watchers

Forkers

sradco ansijain isabella232 ashleyschuett erkanerol ezrasilvera tiraboschi davidvossel andreyod nunnatsa ramlavi oshoval jean-edouard marceloamaral yuhaohaoyu borod108 dankenigsberg iholder101 akalenyu orenc1 zcahana assafad arnongilboa rhrazdil apinnick bobz965 jherrman davozeni machadovilaca fossedihelm codrin-iftimie ksimon1 0xfelix lyarwood enp0s3 avlitman dharmit prnaraya steveefemsc hoo29

monitoring's Issues

Please make this repo usable for the kubevirt community

Taking the comment from #38 (comment):

Note that this repo is source of truth for dashboards. The files here are JSON files and compatible with OCP UI and Grafana. They will be consumed by hyperconverged-cluster-operator for OCP UI.

The job below fetches all json files here and convert them to yaml for k8s configmaps.
https://github.com/kubevirt/hyperconverged-cluster-operator/actions/workflows/dashboard-updater.yml

Existing dashboards are here:
https://github.com/kubevirt/hyperconverged-cluster-operator/tree/main/assets/dashboards

Then HCO will deploy them during installation automatically and OCP UI will display them.

Therefore, we need to ensure that this dashboard works in OCP UI as well.

This make the repo unusable for the community and PRs like #38 can realistically never be added to this repo.

Please either add proper e2e testing and releases or move openshift-only dashboards directly to HCO.

Update some of the commands that fetch the namespace to not use 'awk' and 'grep'

We would like to update the runbooks to not use awk or grep for fetching the namespace.
For example in:
https://github.com/kubevirt/monitoring/blob/main/docs/runbooks/CnaoDown.md#diagnosis

We fetch the namespace by running:

 export NAMESPACE="$(kubectl get deployment -A | grep cluster-network-addons-operator | awk '{print $1}')"

and a more precise way to do that would be:

oc get deployment -A --field-selector metadata.name=cluster-network-addons-operator -o=custom-columns=NS:.metadata.namespace --no-headers

Add public copyright license to the repository

Thanks for providing documentation here. However I can't find any license for this repository: is it intentionally proprietary?

https://reuse.software/ can provide advice.

Remove potentially unsupported action in LowKVMNodesCount runbooks

The runbooks docs/runbooks/LowKVMNodesCount.md points to enabling software emulation fallback, which is an unsupported action in downstream. This section should be removed or at least put in a warning for downstream user.

Formatting issues in 'kubevirt-top-consumers' dashboard

When the @hco-bot tried to sync the kubevirt-top-consumers dashboard to the HCO repo I found a few issues with the file.

In line 1767 there's a missing { in the string format, where "legendFormat": "{name}} / {{namespace}}", should be "legendFormat": "{{name}} / {{namespace}}",

Other than that, there are several spacing issues such in line 155 and line 1257, where spacing in topk (5 , sum (avg does not follow spacing used in other occurrences such as in line 320.

Other spacing issues appear a lot around operators such as + and < where sometimes they are surrounded by spaces, other times not surrounded by any spaces, sometimes only space after and not before, etc...

Add Phase Transition dashboard

After kubevirt/kubevirt#5766 merged, it gave us the ability to view how quickly VMIs progress through their phases. It would be good to have a shared dashboard to view this data.

Where are the Alert definition for the Alerts from the runbook

Hi,

in the current kubevirt operator only a subset of the Alerts documented in this repo are defined.

source: https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-operator/resource/generate/components/prometheus.go

Where can I find the definition for the other alerts?

List of possible labels that could be included in exposed metrics

I've recently encountered a cardinality & memory explosion on a Prometheus instance and eventually realized that KubeVirt had been deployed by someone and Prometheus was consuming metrics related to it. To my surprise, it had added approximately 220 new labels. This can end up being a substantial addition of cardinality especially for Prometheus instances which are already large to begin with. For example, here's a small excerpt of the labels that get added at the k8s node level:

cpu-feature.node.kubevirt.io/amd_stibp: true 
cpu-feature.node.kubevirt.io/apic: true 
cpu-feature.node.kubevirt.io/arat: true 
cpu-feature.node.kubevirt.io/arch_capabilities: true 
cpu-feature.node.kubevirt.io/avx: true
cpu-feature.node.kubevirt.io/avx2: true

which translates into this for a Prometheus metric:

cpu_feature_node_kubevirt_io_amd_stibp="true", 
cpu_feature_node_kubevirt_io_apic="true", 
cpu_feature_node_kubevirt_io_arat="true", 
cpu_feature_node_kubevirt_io_arch_capabilities="true", 
cpu_feature_node_kubevirt_io_avx="true", 
cpu_feature_node_kubevirt_io_avx2="true",

Although I realize we can then choose to whitelist or drop labels which we don't want, it would be great if the possible list of K8s labels was documented. Is that something that could be added so that users have a better idea of what to expect when this is deployed?

Thanks for your consideration.

KubevirtVmHighMemoryUsage name/title inconsistency

Very low priority.

File name and title of KubevirtVmHighMemoryUsage runbook do not match other runbooks ("KubeVirt...").

monitoring/docs/runbooks/KubevirtVmHighMemoryUsage.md

Line 1 in 1e0049f

# KubevirtVmHighMemoryUsage

kubevirt / monitoring Goto Github PK

monitoring's Introduction

KubeVirt Monitoring

monitoring's People

Contributors

Stargazers

Watchers

Forkers

monitoring's Issues

Please make this repo usable for the kubevirt community

Update some of the commands that fetch the namespace to not use 'awk' and 'grep'

Add public copyright license to the repository

Remove potentially unsupported action in LowKVMNodesCount runbooks

Formatting issues in 'kubevirt-top-consumers' dashboard

Add Phase Transition dashboard

Where are the Alert definition for the Alerts from the runbook

List of possible labels that could be included in exposed metrics

KubevirtVmHighMemoryUsage name/title inconsistency

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent