Giter Site home page Giter Site logo

sustainable-computing-io / kepler-doc Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 37.0 13.37 MB

Kepler uses eBPF to probe energy related system stats and exports as Prometheus metrics

Home Page: https://sustainable-computing.io/

License: Apache License 2.0

HTML 100.00%

kepler-doc's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kepler-doc's Issues

GPU Nvidia H100 PCIe Not Supported

I've deployed Kepler on a Kubernetes to monitor a cluster with a GPU node with a NIVIDIA H100 PCIe.

In the kepler logs from this node, I've this error. In parallel I'm monitoring this GPU with a dgcm-exporter instance and it can collect gpu energy consumption metrics correctly.

I0125 07:04:48.972351 1 power.go:86] Failed to collect GPU metrics, trying to initizalize again: failed to get processes' utilization on device {0x7f639b40bdf8}: Not Supported
I0125 07:04:48.972407 1 gpu_nvml.go:62] found 1 gpu devices
I0125 07:04:48.972416 1 gpu_nvml.go:73] GPU 0 NVIDIA H100 PCIe

Do you have an idea ?

Outline for kepler technical deep dive

Kepler Deep Dive

  • Components and what they do

  • What metrics are gathered on various systems

    • BM
    • VM
    • Linux standalone
  • Why these metric?

  • How is the power consumption attribution done?

  • Explain the models. How the models are different and is there a right use case/scenario for when to apply a particular model over another?

    • AbsComponentModelWeight
    • AbsComponentPower
    • AbsModelWeight
    • AbsPower
    • DynComponentModelWeight
    • DynComponentPower
    • XGBoost
  • Each model has three sub models BPFOnly, CgroupOnly, CounterOnly but we use only one of these models. Why is that?

Let's move to Mkdocs for easier contribution and integration

I found that Mkdocs Is easier for contributing in documentation than Sphinx for the following reasons:

Document for BMC

Hi @rootfs ,
for prepare kubecon kiosk for kubecon China 2023.
I hope we can have BMC related document update in this repo.
for both English and Chinese.

OpenSSF questions related for basic informations

Home page,

  • The project website MUST succinctly describe what the software does (what problem does it solve?).
    Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe energy related system stats and exports as Prometheus metric.
  • The project website MUST provide information on how to: obtain, provide feedback (as bug reports or enhancements), and contribute to the software.
  • The information on how to contribute SHOULD include the requirements for acceptable contributions (e.g., a reference to any required coding standard). (URL required)

docs: Missing Figures

Building the documents shows several figures missing:

mkdocs build
:
WARNING -  Doc file 'installation/kepler-operator.md' contains a relative link '../fig/ocp_installation/kind_grafana.png', but the target
           'fig/ocp_installation/kind_grafana.png' is not found among documentation files.
:
WARNING -  Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_1_0.6.z.png', but the target
           'fig/ocp_installation/operator_installation_ocp_1_0.6.z.png' is not found among documentation files.
WARNING -  Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_7_0.6.z.png', but the target
           'fig/ocp_installation/operator_installation_ocp_7_0.6.z.png' is not found among documentation files.
WARNING -  Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_2_0.6.z.png', but the target
           'fig/ocp_installation/operator_installation_ocp_2_0.6.z.png' is not found among documentation files.
WARNING -  Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_5a_0.6.z.png', but the target
           'fig/ocp_installation/operator_installation_ocp_5a_0.6.z.png' is not found among documentation files.
WARNING -  Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_5b_0.6.z.png', but the target
           'fig/ocp_installation/operator_installation_ocp_5b_0.6.z.png' is not found among documentation files.
WARNING -  Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_6_0.6.z.png', but the target
           'fig/ocp_installation/operator_installation_ocp_6_0.6.z.png' is not found among documentation files.
WARNING -  Doc file 'installation/community-operator.zh.md' contains a relative link '../fig/ocp_installation/operator_installation_ocp_3_0.6.z.png', but the target
           'fig/ocp_installation/operator_installation_ocp_3_0.6.z.png' is not found among documentation files.
WARNING -  Doc file 'installation/kepler-operator.zh.md' contains a relative link '../fig/ocp_installation/kind_grafana.png', but the target
           'fig/ocp_installation/kind_grafana.png' is not found among documentation files.
:

@vprashar2929 - fig/ocp_installation/kind_grafana.png was removed in #148, should the link be updated/removed or the image restored?
@vprashar2929 - I think the rest were removed in #106, should the link be updated/removed or the image restored?

[CNCF Sandbox Onboarding] Track website tasks

This issue addresses the website tasks from Kepler being onboarded as a Sandbox project.
Main tracking issue: sustainable-computing-io/kepler#698

  • Website: ensure LF footer is there and website guidelines followed (if your project doesn't have a dedicated website, please adopt those guidelines to the README file of your project on GitHub).
    • CNCF projects are strongly encouraged to host the source of their websites in an open source repository (and under the same organization) so that requests to change can be done via pull requests and the discussions are archived in a transparent manner.
    • It is OK to say that, e.g., “Prometheus was originally created by Soundcloud” or “Kubernetes builds upon 15 years of experience of running production workloads at Google,” but the origin company should not otherwise be referred to on the project homepage.
    • There should be no links or forms for capturing enterprise support leads. Instead, it is fine to have an enterprise support, commercial partners or similar page. Companies must be listed on that page in alphabetical order, or the order can be changed randomly on each page load. It’s OK to have different categories of support offered. Simple vetting by the project is needed to ensure that all companies listed really can provide the support promised. Projects are welcome to outsource this vetting to CNCF staff if it becomes a burden.
    • Links to companies offering support are expected to go a page that at least mentions support of the project. This can either be the company homepage or a project-specific landing page.
    • #52
    • #50
    • #53
  • #54

Add documentation for how to enable a new hardware platform support

@rootfs ,
pre discussed with @marceloamaral , @jiangphcn , @jichenjc , @sunya-ch
personally I want to add a chapter in kepler-doc as enable a new hardware platform for kepler.
as a checklist for people when on board kepler with a new hardware platform as s390.
Here is the check list as overall design for this chapter, and assigned to people as task? and I hope we can make it with in this week, then submit to kubecorn as a proposal from our community.

Hence, we can use an existing content as proposal to kubecorn.
WDYT?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.