Giter Site home page Giter Site logo

gala-r / k8s-device-plugin-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rocm/k8s-device-plugin

0.0 1.0 0.0 11.01 MB

Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster

License: Apache License 2.0

Go 88.10% Smarty 4.17% Dockerfile 7.74%

k8s-device-plugin-1's Introduction

AMD GPU device plugin for Kubernetes

Go Report Card

Introduction

This is a Kubernetes device plugin implementation that enables the registration of AMD GPU in a container cluster for compute workload. With the appropriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to run jobs that require AMD GPU.

More information about RadeonOpenCompute (ROCm)

Prerequisites

Limitations

  • This plugin targets Kubernetes v1.18+.

Deployment

The device plugin needs to be run on all the nodes that are equipped with AMD GPU. The simplest way of doing so is to create a Kubernetes DaemonSet, which run a copy of a pod on all (or some) Nodes in the cluster. We have a pre-built Docker image on DockerHub that you can use for with your DaemonSet. This repository also have a pre-defined yaml file named k8s-ds-amdgpu-dp.yaml. You can create a DaemonSet in your Kubernetes cluster by running this command:

$ kubectl create -f k8s-ds-amdgpu-dp.yaml

or directly pull from the web using

kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml

If you want to enable the experimental device health check, please use k8s-ds-amdgpu-dp-health.yaml after --allow-privileged=true is set for kube-apiserver and kublet.

Example workload

You can restrict work to a node with GPU by adding resources.limits to the pod definition. An example pod definition is provided in example/pod/alexnet-gpu.yaml. This pod runs the timing benchmark for AlexNet on AMD GPU and then go to sleep. You can create the pod by running:

$ kubectl create -f alexnet-gpu.yaml

or

$ kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/example/pod/alexnet-gpu.yaml

and then check the pod status by running

$ kubectl describe pods

After the pod is created and running, you can see the benchmark result by running:

$ kubectl logs alexnet-tf-gpu-pod alexnet-tf-gpu-container

For comparison, an example pod definition of running the same benchmark with CPU is provided in example/pod/alexnet-cpu.yaml.

Labelling node with additional GPU properties

Please see AMD GPU Kubernetes Node Labeller for details. An example configuration is in k8s-ds-amdgpu-labeller.yaml:

$ kubectl create -f k8s-ds-amdgpu-labeller.yaml

or

$ kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-labeller.yaml

Notes

  • This plugin uses go modules for dependencies management
  • Please consult the Dockerfile on how to build and use this plugin independent of a docker image

TODOs

  • Add proper GPU health check (health check without /dev/kfd access.)

k8s-device-plugin-1's People

Contributors

agilob avatar arangogutierrez avatar catsdogone avatar davidthewatson avatar falaca avatar poznano-amd avatar rasmustwh avatar sisheogorath avatar sriharikarnam avatar st0rmingbr4in avatar y2kenny avatar y2kenny-amd avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.