Giter Site home page Giter Site logo

flex-gpu-device-plugin's Introduction

Flex GPU Device Plugin

Kubernetes device plugin for multiple container nvidia gpu share.

⚠This project is for code example purpose, if you want to use it in production, I am happy to provide support, feel free to contact me.

Related project: WLBF/flex-gpu-scheduler-plugin

Overview

Flex GPU device plugin will detect nvidia gpu and register two type of resource for each gpu.

  • nvidia.flex.com/gpu is for exclusively gpu usage like NVIDIA/k8s-device-plugin.

  • nvidia.flex.com/memory is for gpu share usage. For now gpu memory resource unit is GiB.

Example

The kubectl describe command show the node v124-worker-0 has 3 gpu and 8 GiB memory each gpu, 24 GiB in total.

# kubectl describe no v124-worker-0

...
Capacity:
  cpu:                     2
  ephemeral-storage:       4893836Ki
  hugepages-1Gi:           0
  hugepages-2Mi:           0
  memory:                  4026052Ki
  nvidia.flex.com/gpu:     3
  nvidia.flex.com/memory:  24
  pods:                    110
Allocatable:
  cpu:                     2
  ephemeral-storage:       4510159251
  hugepages-1Gi:           0
  hugepages-2Mi:           0
  memory:                  3923652Ki
  nvidia.flex.com/gpu:     3
  nvidia.flex.com/memory:  24
  pods:                    110
...

Install

Device plugin can be installed by helm chart. For development use values.dev.yaml instead of values.pord.yaml.

helm install flex-gpu-device-plugin -f  ./manifests/flexgpu/values.prod.yaml ./manifests/flexgpu

flex-gpu-device-plugin's People

Contributors

wlbf avatar

Watchers

 avatar

flex-gpu-device-plugin's Issues

Identify pod by allocated device IDs

In this case, Device IDs in Allocate rpc request not carry actual gpu index information. We need to identify pod according to device IDs, then extract index value from nvidia.flex.com/index annotaion. Maybe from api-server api or PodResource API.

// - Allocate is expected to be called during pod creation since allocation
//   failures for any container would result in pod startup failure.
// - Allocate allows kubelet to exposes additional artifacts in a pod's
//   environment as directed by the plugin.
// - Allocate allows Device Plugin to run device specific operations on
//   the Devices requested
type AllocateRequest struct {
	ContainerRequests    []*ContainerAllocateRequest `protobuf:"bytes,1,rep,name=container_requests,json=containerRequests,proto3" json:"container_requests,omitempty"`
	XXX_NoUnkeyedLiteral struct{}                    `json:"-"`
	XXX_sizecache        int32                       `json:"-"`
}

type ContainerAllocateRequest struct {
	DevicesIDs           []string `protobuf:"bytes,1,rep,name=devices_ids,json=devicesIds,proto3" json:"devices_ids,omitempty"`
	XXX_NoUnkeyedLiteral struct{} `json:"-"`
	XXX_sizecache        int32    `json:"-"`
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.