converged-computing / metrics-operator Goto Github PK
View Code? Open in Web Editor NEWTesting designs for a benchmarking operator (in experimental mode!)
Home Page: https://converged-computing.github.io/metrics-operator/
License: MIT License
Testing designs for a benchmarking operator (in experimental mode!)
Home Page: https://converged-computing.github.io/metrics-operator/
License: MIT License
The table is getting long! I think (for long descriptions) it would be good to find a way to collapse rows. I haven't looked much into it yet but I suspect it is possible. https://stackoverflow.com/questions/57550993/datatable-button-expand-collapse-row-jquery
I should be able to search for and view metrics by type, and get a description / link to more information. Ideally this could be derived via another command provided by the operator that parses metadata.
I'm writing a small Python parsing library for metric logs, and I realize we need:
My original design assumed these would be globally relevant but I don't think that's the case. They should be metric-specific options instead as to not confuse the user they are applicable across metrics (they are not).
We want to be able to assign hwloc metrics to run on specific nodes, so we need the nodeSelector of the pod exposed.
we would want to be able to run a flux operator application and measure metrics for it.
Right now volumes are added to all pods in the set, and it needs to be selected.
Love the logo!
These are important to the labs! If you'd like to see an app, metric, or other added, please comment here.
-lpmpi
) so not sure would work.Recent readings / tools for performance
We should provide a start / end time for the entire collection. E.g., for storage (using FIO) it's likely the tool collects the time, but this likely isn't the case for most, and it would be an interesting (albeit simple) comparison metric.
For the app-* metrics, I'm starting to see common patterns - there is some number of custom options, and then custom logic to derive entrypoints for a launcher and one or more workers. But the code files are getting very redundant! I'm wondering if there is some way (that would work with the limits of go interfaces) to have common JobSet patterns. In this case the launcher / worker would be a template that has the rest populated by a simpler struct.
This will allow us to do 1:1 mapping of nodes to pods. https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#namespace-selector. Likely we want a variable to control this is creating the replicated job, or jobset etc.
If we want to be able to reproduce a run, we could arguably generate yaml and save to logs: https://github.com/flux-framework/flux-operator/blob/b54246feaba2ca7abeca62efd42accbbaacff13e/controllers/flux/logging.go
OR we could provide a means to do this via Python (probably the better idea).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.