sustainable-computing-io / kepler-doc Goto Github PK
View Code? Open in Web Editor NEWKepler uses eBPF to probe energy related system stats and exports as Prometheus metrics
Home Page: https://sustainable-computing.io/
License: Apache License 2.0
Kepler uses eBPF to probe energy related system stats and exports as Prometheus metrics
Home Page: https://sustainable-computing.io/
License: Apache License 2.0
I've deployed Kepler on a Kubernetes to monitor a cluster with a GPU node with a NIVIDIA H100 PCIe.
In the kepler logs from this node, I've this error. In parallel I'm monitoring this GPU with a dgcm-exporter instance and it can collect gpu energy consumption metrics correctly.
I0125 07:04:48.972351 1 power.go:86] Failed to collect GPU metrics, trying to initizalize again: failed to get processes' utilization on device {0x7f639b40bdf8}: Not Supported
I0125 07:04:48.972407 1 gpu_nvml.go:62] found 1 gpu devices
I0125 07:04:48.972416 1 gpu_nvml.go:73] GPU 0 NVIDIA H100 PCIe
Do you have an idea ?
Components and what they do
What metrics are gathered on various systems
Why these metric?
How is the power consumption attribution done?
Explain the models. How the models are different and is there a right use case/scenario for when to apply a particular model over another?
Each model has three sub models BPFOnly, CgroupOnly, CounterOnly but we use only one of these models. Why is that?
I found that Mkdocs Is easier for contributing in documentation than Sphinx for the following reasons:
Hey,
after reading this CNCF blog article, I think it would make for a great starting point when people check out kepler.
We could then from there link to more specific design subpages, what do you think?
The Deploy section of the Kepler Doc recommends installing Prometheus Operator. This would install 2 instances of Prometheus Operators that if not properly configured can render the cluster's in-platform monitoring unusable as the new Prometheus Operator can reconcile the prometheus-k8s
in openshift-monitoring
namespace.
Hi @rootfs ,
for prepare kubecon kiosk for kubecon China 2023.
I hope we can have BMC related document update in this repo.
for both English and Chinese.
Now that we have an SBOM in the releases we should document it for people's awareness.
The SBOM has just been produced for the latest release here: https://github.com/sustainable-computing-io/kepler/releases/tag/v0.5
This information would properly be best suited to the install.md
I will create a PR on Monday for discussion/approval
for kubecon China kiosk, hope to have zh support for model server and training process.
Is this doc updated? I don't think we do local cluster set up using yaml anymore.
Originally posted by @husky-parul in #70 (comment)
Home page,
This issue is to track sustainable-computing-io/kepler#334
Follow as per cncf/toc#1054 (comment)
Hey,
I saw that #91 was merged but realized the corresponding page is not included in the legend / table of contents:
https://sustainable-computing.io/platform-validation/
Where would it fit best? Or should we create a new section for it?
The idea is to document the configuration options for the kepler rpm install under /etc/kepler/kepler.config
It should be nice if we can have a rendered page when PR is pushed.
So that, reviewer can confirm the rendered result before approving the PR in the case that the pusher forget to do so (like me).
I was experienced that it is possible from kwok project before. It has a rendered page per PR.
https://deploy-preview-316--k8s-kwok.netlify.app/docs/adopters/
Building the documents shows several figures missing:
mkdocs build
:
WARNING - Doc file 'installation/kepler-operator.md' contains a relative link '../fig/ocp_installation/kind_grafana.png', but the target
'fig/ocp_installation/kind_grafana.png' is not found among documentation files.
:
WARNING - Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_1_0.6.z.png', but the target
'fig/ocp_installation/operator_installation_ocp_1_0.6.z.png' is not found among documentation files.
WARNING - Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_7_0.6.z.png', but the target
'fig/ocp_installation/operator_installation_ocp_7_0.6.z.png' is not found among documentation files.
WARNING - Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_2_0.6.z.png', but the target
'fig/ocp_installation/operator_installation_ocp_2_0.6.z.png' is not found among documentation files.
WARNING - Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_5a_0.6.z.png', but the target
'fig/ocp_installation/operator_installation_ocp_5a_0.6.z.png' is not found among documentation files.
WARNING - Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_5b_0.6.z.png', but the target
'fig/ocp_installation/operator_installation_ocp_5b_0.6.z.png' is not found among documentation files.
WARNING - Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_6_0.6.z.png', but the target
'fig/ocp_installation/operator_installation_ocp_6_0.6.z.png' is not found among documentation files.
WARNING - Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_3_0.6.z.png', but the target
'fig/ocp_installation/operator_installation_ocp_3_0.6.z.png' is not found among documentation files.
WARNING - Doc file 'installation/kepler-operator.zh.md' contains a relative link '../fig/ocp_installation/kind_grafana.png', but the target
'fig/ocp_installation/kind_grafana.png' is not found among documentation files.
:
@vprashar2929 - fig/ocp_installation/kind_grafana.png
was removed in #148, should the link be updated/removed or the image restored?
@vprashar2929 - I think the rest were removed in #106, should the link be updated/removed or the image restored?
This issue addresses the website tasks from Kepler being onboarded as a Sandbox project.
Main tracking issue: sustainable-computing-io/kepler#698
@rootfs ,
pre discussed with @marceloamaral , @jiangphcn , @jichenjc , @sunya-ch
personally I want to add a chapter in kepler-doc as enable a new hardware platform for kepler.
as a checklist for people when on board kepler with a new hardware platform as s390.
Here is the check list as overall design for this chapter, and assigned to people as task? and I hope we can make it with in this week, then submit to kubecorn as a proposal from our community.
Hence, we can use an existing content as proposal to kubecorn.
WDYT?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.