Giter Site home page Giter Site logo

converged-computing / metrics-operator Goto Github PK

View Code? Open in Web Editor NEW
2.0 4.0 0.0 18.53 MB

Testing designs for a benchmarking operator (in experimental mode!)

Home Page: https://converged-computing.github.io/metrics-operator/

License: MIT License

Makefile 5.11% Go 75.85% Dockerfile 0.43% Smarty 0.60% Shell 0.67% Python 17.34%
converged-computing high-performance-computing hpc kubernetes metrics operator

metrics-operator's Introduction

metrics-operator

docs/images/metrics-operator-banner.png

Developing metrics and a catalog of applications to assess different kinds of Kubernetes performance. We likely will choose different metrics that are important for HPC. Note that I haven't started the operator yet because I'm testing ideas for the design. To learn more:

Dinosaur TODO

  • Figure out issue with errors.IsNotFound not working...
  • We need a way for the entrypoint command to monitor (based on the container) to differ (potentially)
  • For larger metric collections, we should have a log streaming mode (and not wait for Completed/Successful)
  • For services we are measuring, we likely need to be able to kill after N seconds (to complete job) or to specify the success policy on the metrics containers instead of the application
  • Add assertions checking for python tests
  • Plotting examples (python parsers) needed for
    • io-sysstat
    • app-kripke
    • app-quicksilver
    • app-pennant

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

metrics-operator's People

Contributors

vsoch avatar

Stargazers

Nathan Hanford avatar  avatar

Watchers

 avatar Tapasya Patki avatar Luis Yanes avatar  avatar

metrics-operator's Issues

Rate / completions should likely be scoped to an app

My original design assumed these would be globally relevant but I don't think that's the case. They should be metric-specific options instead as to not confuse the user they are applicable across metrics (they are not).

Consider metric app template

For the app-* metrics, I'm starting to see common patterns - there is some number of custom options, and then custom logic to derive entrypoints for a launcher and one or more workers. But the code files are getting very redundant! I'm wondering if there is some way (that would work with the limits of go interfaces) to have common JobSet patterns. In this case the launcher / worker would be a template that has the rest populated by a simpler struct.

Timing options

We should provide a start / end time for the entire collection. E.g., for storage (using FIO) it's likely the tool collects the time, but this likely isn't the case for most, and it would be an interesting (albeit simple) comparison metric.

Metrics / apps to consider

These are important to the labs! If you'd like to see an app, metric, or other added, please comment here.

Unsure

More Workflow Based

  • parsl (demo for molecular design)
  • merlin (demo) - too many steps / services to be considered a proxy app
  • fireworks (demo)
  • balsam (containers built but part of server seems buggy and/or proprietary, this is likely not going to be IT for a way to orchestrate workflows)
  • [mlcommons-deepcam (also very complex to actually setup, I stopped at the base container
  • nextflow ml workflow example
  • snakemake bioscience example workflow
  • weave (demos)

In Progress / Attempted

Recent readings / tools for performance

operator needs pre-defined delineation of sections and settings

I'm writing a small Python parsing library for metric logs, and I realize we need:

  • structured way to define different sections for splitting
  • timestamp between each section collection
  • dump of options / settings at the beginning (we can get this from the spec but better to not rely on it and keep with the log).

Devise registry strategy

I should be able to search for and view metrics by type, and get a description / link to more information. Ideally this could be derived via another command provided by the operator that parses metadata.

Addons to create (or add, lol)

  • Timing addon: should prefix the command with time (assumed time available in container)
  • Commands addon: should allow for arbitrary post commands to any entrypoint

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.