Comments (8)
As this is blocking all Katib PR merges, we should work on this ASAP.
@andreyvelich @johnugeorge WDYT?
from katib.
Yes, we should remove MXNet example from Katib Trials.
Maybe we could update this Katib PyTorch FashionMNIST example with the same PyTorch example as here: https://github.com/kubeflow/training-operator/blob/master/examples/sdk/create-pytorchjob-from-func.ipynb
But we need to see how fast this training will be executed on Katib E2Es with amount of resources that we have with GitHub actions.
from katib.
Yes, we should remove MXNet example from Katib Trials. Maybe we could update this Katib PyTorch FashionMNIST example with the same PyTorch example as here: https://github.com/kubeflow/training-operator/blob/master/examples/sdk/create-pytorchjob-from-func.ipynb But we need to see how fast this training will be executed on Katib E2Es with amount of resources that we have with GitHub actions.
@andreyvelich Why do we need to update this with the training-operator one? Could we just use https://github.com/kubeflow/katib/tree/master/examples/v1beta1/trial-images/pytorch-mnist?
from katib.
@tenzen-y Do we want to remove training evaluation to increase training time ?
Also, maybe we should update the PyTorch version to improve performance ?
https://github.com/kubeflow/katib/blob/master/examples/v1beta1/trial-images/pytorch-mnist/requirements.txt#L2
from katib.
Do we want to remove training evaluation to increase training time ?
@andreyvelich I'm ok with either way. I'm just wondering if reusing the existing example would be better.
Also, maybe we should update the PyTorch version to improve performance ?
https://github.com/kubeflow/katib/blob/master/examples/v1beta1/trial-images/pytorch-mnist/requirements.txt#L2
I agree with updating PyTorch version, but I think that we can work on version updates in a separate issue.
from katib.
Sure, let's see how much time it takes for E2Es if we just re-use this example rather than MXNet training.
from katib.
Sure, let's see how much time it takes for E2Es if we just re-use this example rather than MXNet training.
It makes sense.
from katib.
/assign
from katib.
Related Issues (20)
- Replace reflect.DeepEqual with cmp.Diff in tests HOT 8
- Flaky Test: Trial status is succeeded and metrics are properly populated HOT 4
- Update experiment instance status failed: the object has been modified HOT 2
- Update the kubernetes object's status with server-side apply
- Trial fails to be marked as completed after reporting the metrics
- Deprecate Skopt Suggestion Service HOT 21
- Define the Python deprecation/supporting policies for the Katib SDK HOT 3
- Whether the hyperparameter search algorithm will refer to the value of additionalMetricNames HOT 11
- Tuning API in Katib for LLMs HOT 4
- Migrate KatibCertGenerator to OPA CertController
- Migrate away from deprecated github.com/hpcloud/tail
- [image] enas-cnn-cifar10-cpu:v0.16.0 does not exist
- feat(rag): Auto-RAG HOT 6
- Return validation errors after all fields are verified HOT 3
- some questions about NAS HOT 3
- Documentation Improvements for Katib 0.17 HOT 2
- Update third party worflows in the gh actions. HOT 2
- [SDK] Support Docker image as objective in the `tune` API HOT 5
- katib use crd as tail template HOT 8
- Tracking Issue: Implementation of Tuning API in Katib for LLMs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from katib.