Giter Site home page Giter Site logo

opni-training-controller's Introduction

Training Controller Service

Run on k8s cluster

Pre-requisites:

  • Must have at least one GPU node (preferably K80 GPU or higher) and at least two CPUs as part of the cluster with at least 10 GiB memory as well.
  • Make sure appropriate rbac is set up.
* To setup Minio
helm install --set accessKey=myaccesskey,secretKey=mysecretkey minio minio/minio
* To setup rbac
kubectl apply -f rbac.yaml
* To install NVIDIA gpu driver
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.6.0/nvidia-device-plugin.yml
* To deploy training-controller service
kubectl apply -f training_controller.yaml


Methodology

  • Training controller service is subscribed to the Nats subject called "train"
  • When it receives any content from this subject, it will launch the necessary steps.
  • Controller will first fetch the logs from Elasticsearch that will be used by the NuLog training job.
  • Nulog model is then trained through a job.
  • Once, Nulog model training has been completed, it will send a message to the Nats subject indicating that a new model is ready to be used.

Payload sent to the "train" Nats subject should be in this format

    payload = {"model_to_train": "nulog","time_intervals": [{"start_ts": 1617039360000000000, "end_ts": 1617039450000000000}, {"start_ts": 1617039510000000000, "end_ts": 1617039660000000000}]}

Use nats-box to send training signal manually:

kubectl run -i --rm --tty nats-box --image=synadia/nats-box --restart=Never
nats-pub -s nats://nats_client:[email protected]:4222 train '{"model_to_train": "nulog","time_intervals": [{"start_ts": 1619661600000000000, "end_ts": 1619671569000000000}]}'
  • You can then view the pods and jobs of your cluster to verify that the Nulog model is undergoing training.

Contributing

We use pre-commit for formatting auto-linting and checking import. Please refer to installation to install the pre-commit or run pip install pre-commit. Then you can activate it for this repo. Once it's activated, it will lint and format the code when you make a git commit. It makes changes in place. If the code is modified during the reformatting, it needs to be staged manually.

# Install
pip install pre-commit

# Install the git commit hook to invoke automatically every time you do "git commit"
pre-commit install

# (Optional)Manually run against all files
pre-commit run --all-files

opni-training-controller's People

Contributors

dbason avatar amartc avatar kralicky avatar sanjay920 avatar tybalex avatar galal-hussein avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.