volcano-sh / devices Goto Github PK
View Code? Open in Web Editor NEWDevice plugins for Volcano, e.g. GPU
License: Apache License 2.0
Device plugins for Volcano, e.g. GPU
License: Apache License 2.0
For k8s v1.17.9 UnexpectedAdmissionError due to lack of pod update verbs
Warning UnexpectedAdmissionError 10s kubelet, amax-pcl Update plugin resources failed due to rpc error: code = Unknown desc = failed to update pod annotation pods "pod1" is forbidden: User "system:serviceaccount:kube-system:volcano-device-plugin" cannot update resource "pods" in API group "" in the namespace "default", which is unexpected.
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
Adding support for MIG devices - the pull request is still WIP - only to review general idea
#20
The tag in the containers
field:
can't be found on Docker Hub currently. Is it equivalent to the volcanosh/volcano-device-plugin:latest
tag?
Additionally, after actually pulling the latest image, I found it only compatible with the x86 environment. Pods on ARM architecture machines will report:
standard_init_linux.go:220: exec user process caused "exec format error"
libcontainer: container start initialization failed: standard_init_linux.go:220: exec user process caused "exec format error"
What version of cuda does vgpu support?
as the issue #1181
Is this a BUG REPORT or FEATURE REQUEST?:
Uncomment only one, leave it on its own line:
/kind feature
Currently, only one gpu device will be allocated, but in most scenarios, we hope two or more device on the same server or a few devices on servers can be allocated to a pod.
Thank u!
I0615 08:06:10.942719 1 plugin.go:382] Allocate Response [&ContainerAllocateResponse{Envs:map[string]string{CUDA_DEVICE_MEMORY_LIMIT_0: 1024m,CUDA_DEVICE_MEMORY_SHARED_CACHE: /tmp/vgpu/6b5a834e-6fec-47d6-b629-0468cd18ba69.cache,NVIDIA_VISIBLE_DEVICES: GPU-d8794152-5506-fe60-be38-c6ff3d35dbf4,},Mounts:[]*Mount{&Mount{ContainerPath:/usr/local/vgpu/libvgpu.so,HostPath:/usr/local/vgpu/libvgpu.so,ReadOnly:true,},&Mount{ContainerPath:/etc/ld.so.preload,HostPath:/usr/local/vgpu/ld.so.preload,ReadOnly:true,},&Mount{ContainerPath:/tmp/vgpu,HostPath:/tmp/vgpu/containers/1a7defed-9fff-4feb-8921-45cc7ea253f7_vgpu2,ReadOnly:false,},&Mount{ContainerPath:/tmp/vgpulock,HostPath:/tmp/vgpulock,ReadOnly:false,},},Devices:[]*DeviceSpec{},Annotations:map[string]string{},}]
I0615 08:06:11.002096 1 util.go:229] TrySuccess:
I0615 08:06:11.002123 1 util.go:235] AllDevicesAllocateSuccess releasing lock
I0615 08:06:11.338349 1 plugin.go:309] Allocate [&ContainerAllocateRequest{DevicesIDs:[GPU-d8794152-5506-fe60-be38-c6ff3d35dbf4-5],}]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1092953]
goroutine 68 [running]:
volcano.sh/k8s-device-plugin/pkg/plugin/vgpu4pd.(*NvidiaDevicePlugin).Allocate(0xc00038ec80, {0x14cfee0, 0xc0003ee510}, 0xc0005eca00)
/go/src/volcano.sh/devices/pkg/plugin/vgpu4pd/plugin.go:326 +0x353
k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1._DevicePlugin_Allocate_Handler({0x12920a0?, 0xc00038ec80}, {0x14cfee0, 0xc0003ee510}, 0xc00062e060, 0x0)
/go/pkg/mod/k8s.io/[email protected]/pkg/apis/deviceplugin/v1beta1/api.pb.go:1192 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003a41a0, {0x14d4df8, 0xc0004cc000}, 0xc0000b2100, 0xc000367aa0, 0x1ce57f8, 0x0)
/go/pkg/mod/google.golang.org/[email protected]/server.go:1082 +0xcab
google.golang.org/grpc.(*Server).handleStream(0xc0003a41a0, {0x14d4df8, 0xc0004cc000}, 0xc0000b2100, 0x0)
/go/pkg/mod/google.golang.org/[email protected]/server.go:1405 +0xa13
google.golang.org/grpc.(*Server).serveStreams.func1.1()
/go/pkg/mod/google.golang.org/[email protected]/server.go:746 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
/go/pkg/mod/google.golang.org/[email protected]/server.go:744 +0xea
Currently in the device plugin Allocate RPC, we need to find the candidate pod according to the container in the request.
If there are multiple gpu containers in one pod, obviously there will be logic problems when finding the candidate pod.
func (m *NvidiaDevicePlugin) Allocate(ctx context.Context, reqs *pluginapi.AllocateRequest) (*pluginapi.AllocateResponse, error) {
var reqCount uint
for _, req := range reqs.ContainerRequests {
reqCount += uint(len(req.DevicesIDs))
}
responses := pluginapi.AllocateResponse{}
firstContainerReq := reqs.ContainerRequests[0]
firstContainerReqDeviceCount := uint(len(firstContainerReq.DevicesIDs))
availablePods := podSlice{}
pendingPods, err := m.kubeInteractor.GetPendingPodsOnNode()
if err != nil {
return nil, err
}
for _, pod := range pendingPods {
current := pod
if IsGPURequiredPod(¤t) && !IsGPUAssignedPod(¤t) && !IsShouldDeletePod(¤t) {
availablePods = append(availablePods, ¤t)
}
}
sort.Sort(availablePods)
var candidatePod *v1.Pod
for _, pod := range availablePods {
for i, c := range pod.Spec.Containers {
if !IsGPURequiredContainer(&c) {
continue
}
if GetGPUResourceOfContainer(&pod.Spec.Containers[i]) == firstContainerReqDeviceCount {
klog.Infof("Got candidate Pod %s(%s), the device count is: %d", pod.UID, c.Name, firstContainerReqDeviceCount)
candidatePod = pod
goto Allocate
}
}
}
....
As we're going to support different devices, so prefer to iolated the code by directory; and share the common util at package, e.g. pkg/common.
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: ocr-job
spec:
minAvailable: 1
schedulerName: volcano
queue: default
policies:
- event: PodEvicted
action: RestartJob
tasks:
- replicas: 1
name: ocr
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- image: ai-grpc-ocr:v1.4
name: ocr
resources:
requests:
volcano.sh/gpu-number: 1
#nvidia.com/gpu: 1
limits:
volcano.sh/gpu-number: 1
#nvidia.com/gpu: 1
restartPolicy: Never
- replicas: 1
name: ocr-2
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- image: ai-grpc-ocr:v1.4
name: ocr
resources:
requests:
volcano.sh/gpu-number: 1
#nvidia.com/gpu: 1
limits:
volcano.sh/gpu-number: 1
#nvidia.com/gpu: 1
restartPolicy: Never
log
$ k get no
NAME STATUS ROLES AGE VERSION
10.122.2.14 Ready <none> 42d v1.26.1
10.122.2.26 Ready <none> 154m v1.26.1
10.122.2.37 Ready <none> 44m v1.26.1
$ k get po
NAME READY STATUS RESTARTS AGE
ocr-job-ocr-0 0/1 Pending 0 4m33s
ocr-job-ocr-2-0 0/1 Pending 0 4m33s
volcano-admission-7f76fc8cf4-rcp85 1/1 Running 0 35d
volcano-admission-init-785w6 0/1 Completed 0 35d
volcano-controllers-6875c95bd7-zs49k 1/1 Running 0 35d
volcano-scheduler-6dcf84d54d-gcwxm 0/1 CrashLoopBackOff 9 (80s ago) 58m
$ k get po
NAME READY STATUS RESTARTS AGE
ocr-job-ocr-0 0/1 Pending 0 41s
ocr-job-ocr-2-0 0/1 Pending 0 41s
volcano-admission-7f76fc8cf4-rcp85 1/1 Running 0 35d
volcano-admission-init-785w6 0/1 Completed 0 35d
volcano-controllers-6875c95bd7-zs49k 1/1 Running 0 35d
volcano-scheduler-6dcf84d54d-zg4d2 0/1 CrashLoopBackOff 2 (25s ago) 4m20s
I0816 12:30:34.061174 1 allocate.go:180] There are <3> nodes for Job <volcano-system/ocr-job-11507a57-1b68-46ad-83bf-38e0c2d76f99>
I0816 12:30:34.061251 1 predicate_helper.go:74] Predicates failed for task <volcano-system/ocr-job-ocr-0> on node <10.122.2.14>: task volcano-system/ocr-job-ocr-0 on node 10.122.2.14 fit failed: Insufficient volcano.sh/gpu-number
E0816 12:30:34.061385 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 334 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1caa960?, 0x32ac650})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00102cf70?})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1caa960, 0x32ac650})
/usr/local/go/src/runtime/panic.go:884 +0x212
volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare.getDevicesIdleGPUs(...)
/go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare/share.go:64
volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare.predicateGPUbyNumber(0xc000dadac0?, 0x0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare/share.go:166 +0x41
volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare.checkNodeGPUNumberPredicate(0xc000c68cf0?, 0x0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare/share.go:140 +0x3f
volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare.(*GPUDevices).FilterNode(0x1c8bec0?, 0xc000dadac0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare/device_info.go:161 +0x157
volcano.sh/volcano/pkg/scheduler/plugins/predicates.(*predicatesPlugin).OnSessionOpen.func4(0xc000848be0, 0xc0004e0180)
/go/src/volcano.sh/volcano/pkg/scheduler/plugins/predicates/predicates.go:522 +0x16e4
volcano.sh/volcano/pkg/scheduler/framework.(*Session).PredicateFn(0xc001094000, 0xc00100df80?, 0x0?)
/go/src/volcano.sh/volcano/pkg/scheduler/framework/session_plugins.go:615 +0x1ce
volcano.sh/volcano/pkg/scheduler/actions/allocate.(*Action).Execute.func1(0xc000848be0, 0xc0004e0180)
/go/src/volcano.sh/volcano/pkg/scheduler/actions/allocate/allocate.go:106 +0x1cb
volcano.sh/volcano/pkg/scheduler/util.(*predicateHelper).PredicateNodes.func1(0xc0002045a0?)
/go/src/volcano.sh/volcano/pkg/scheduler/util/predicate_helper.go:73 +0x3a2
k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
/go/src/volcano.sh/volcano/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:90 +0x106
created by k8s.io/client-go/util/workqueue.ParallelizeUntil
/go/src/volcano.sh/volcano/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:76 +0x1d7
I0816 12:30:34.061465 1 statement.go:352] Discarding operations ...
I0816 12:30:34.061494 1 allocate.go:135] Try to allocate resource to Jobs in Queue <default>
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x15ab261]
goroutine 334 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00102cf70?})
/go/src/volcano.sh/volcano/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x1caa960, 0x32ac650})
/usr/local/go/src/runtime/panic.go:884 +0x212
volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare.getDevicesIdleGPUs(...)
/go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare/share.go:64
volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare.predicateGPUbyNumber(0xc000dadac0?, 0x0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare/share.go:166 +0x41
volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare.checkNodeGPUNumberPredicate(0xc000c68cf0?, 0x0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare/share.go:140 +0x3f
volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare.(*GPUDevices).FilterNode(0x1c8bec0?, 0xc000dadac0)
/go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/gpushare/device_info.go:161 +0x157
volcano.sh/volcano/pkg/scheduler/plugins/predicates.(*predicatesPlugin).OnSessionOpen.func4(0xc000848be0, 0xc0004e0180)
/go/src/volcano.sh/volcano/pkg/scheduler/plugins/predicates/predicates.go:522 +0x16e4
volcano.sh/volcano/pkg/scheduler/framework.(*Session).PredicateFn(0xc001094000, 0xc00100df80?, 0x0?)
/go/src/volcano.sh/volcano/pkg/scheduler/framework/session_plugins.go:615 +0x1ce
volcano.sh/volcano/pkg/scheduler/actions/allocate.(*Action).Execute.func1(0xc000848be0, 0xc0004e0180)
/go/src/volcano.sh/volcano/pkg/scheduler/actions/allocate/allocate.go:106 +0x1cb
volcano.sh/volcano/pkg/scheduler/util.(*predicateHelper).PredicateNodes.func1(0xc0002045a0?)
/go/src/volcano.sh/volcano/pkg/scheduler/util/predicate_helper.go:73 +0x3a2
k8s.io/client-go/util/workqueue.ParallelizeUntil.func1()
/go/src/volcano.sh/volcano/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:90 +0x106
created by k8s.io/client-go/util/workqueue.ParallelizeUntil
/go/src/volcano.sh/volcano/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:76 +0x1d7
..
3个节点, 10.122.2.26 / 10.122.2.37 是 gpu 机器, ;10.122.2.14是 cpu 机器。
切换成 nvidia.com/gpu 这个资源,直接调度失败。 目前原因未知
是7月份部署的。 latest 镜像
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yamlkubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml
Maybe a bug
when i request i gpu for k8s (the yaml in #10), I find the volcano-deviceplugin give all the node gpus to my pod.
the picture is flowed(in /dev/):
I donot know why
k8s:1.17.3
volcano:v1.0.1
volcano-deviceplugin:1.0.0
docker:18.06.3-ce
os : ubuntu18.04
arm : x86
Will the applied 12288MiB graphics memory be evenly allocated to two cards when volcano. sh/vgpu memory means "12288" and volcano. sh/vgpu number means "2"
I found the example missing specify volcano.sh/gpu-index
annotation
If not specify volcano.sh/gpu-index
annotation when want to allocate volcano.sh/gpu-memory
, the pod will always fail to be created.
Request from #11
Refer the GPU share user doc (https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_gpu_sharing.md).
Currently, only support specify GPU share memory, to specify GPU number is a limitation.
We need support volcano.sh/gpu-number
just like nvidia gpu plugin to support specify GPU number for pod resource request
When I attempted to install the plugin, the pod gave the following error:
flag provided but not defined: -gpu-strategy
Usage of volcano-device-plugin:
I tried modifying the args
in volcano-device-plugin.yaml
from ["--gpu-strategy=share", "--gpu-memory-factor=1"]
to ["---gpu-strategy=share", "---gpu-memory-factor=1"]
. However, the error message then became:
flag provided but not defined: ----gpu-strategy
Usage of volcano-device-plugin:
In the end, I had to remove the args
to successfully create the pod.
for now, “volcano.sh/gpu-memory” can essentially only set 3 environment variables based on https://github.com/volcano-sh/devices/blob/master/pkg/plugin/nvidia/server.go#L324 to tell user how much gpu memory he can use in current container?
for example, when set “volcano.sh/gpu-memory: 1024”, environment variables looks like below in container, in fact, process in this container can use at most 4096M gpu memory, even VOLCANO_GPU_ALLOCATED is 1024?
NVIDIA_VISIBLE_DEVICES: “0”
VOLCANO_GPU_ALLOCATED: “1024”
VOLCANO_GPU_TOTAL: “4096”
can anyone help me confirm if I understand correctly?
This issue is an extension of #18
What happened:
Applying volcano-device-plugin on a server using 8*V100 GPU, but get volcano.sh/gpu-memory:0 when describe nodes:
Same situation did not occur when using T4 or P4.
Tracing kubelet logs, found following error message:
seems like sync message is too large.
What caused this bug:
volcano-device-plugin mock GPUs into a device list(every device in this list is considered as a 1MB memory block), so that different workloads can share one GPU through kubernetes device plugin mechanism. When large memory GPU such as V100 is implemented, the size of device list exceeds the bound, and ListAndWatch failed as a result.
Solutions:
The key is to minimize the size of the device list, so we can consider each device as a 10MB memory block and reform the whole bookkeeping process according to this assumption. This accuracy is enough for almost all production environments.
Currently, we did not support prow for this repo which make it unconvience for contributors :)
2023/08/14 12:05:25 You can check the prerequisites at: https://github.com/volcano-sh/k8s-device-plugin#prerequisites
2023/08/14 12:05:25 You can learn how to set the runtime at: https://github.com/volcano-sh/k8s-device-plugin#quick-start
2023/08/14 12:05:26 Could not start device plugin for 'volcano.sh/gpu-memory': listen unix /var/lib/kubelet/device-plugins/volcano.sock: bind: address already in use
2023/08/14 12:05:26 Plugin Volcano-GPU-Plugin failed to start: listen unix /var/lib/kubelet/device-plugins/volcano.sock: bind: address already in use
2023/08/14 12:05:26 You can check the prerequisites at: https://github.com/volcano-sh/k8s-device-plugin#prerequisites
2023/08/14 12:05:26 You can learn how to set the runtime at: https://github.com/volcano-sh/k8s-device-plugin#quick-start
2023/08/14 12:05:26 Could not start device plugin for 'volcano.sh/gpu-memory': listen unix /var/lib/kubelet/device-plugins/volcano.sock: bind: address already in use
2023/08/14 12:05:26 Plugin Volcano-GPU-Plugin failed to start: listen unix /var/lib/kubelet/device-plugins/volcano.sock: bind: address already in use
2023/08/14 12:05:26 You can check the prerequisites at: https://github.com/volcano-sh/k8s-device-plugin#prerequisites
2023/08/14 12:05:26 You can learn how to set the runtime at: https://github.com/volcano-sh/k8s-device-plugin#quick-start
2023/08/14 12:05:26 Could not start device plugin for 'volcano.sh/gpu-memory': listen unix /var/lib/kubelet/device-plugins/volcano.sock: bind: address already in use
2023/08/14 12:05:26 Plugin Volcano-GPU-Plugin failed to start: listen unix /var/lib/kubelet/device-plugins/volcano.sock: bind: address already in use
2023/08/14 12:05:26 You can check the prerequisites at: https://github.com/volcano-sh/k8s-device-plugin#prerequisites
2023/08/14 12:05:26 You can learn how to set the runtime at: https://github.com/volcano-sh/k8s-device-plugin#quick-start
2023/08/14 12:05:27 Could not start device plugin for 'volcano.sh/gpu-memory': listen unix /var/lib/kubelet/device-plugins/volcano.sock: bind: address already in use
2023/08/14 12:05:27 Plugin Volcano-GPU-Plugin failed to start: listen unix /var/lib/kubelet/device-plugins/volcano.sock: bind: address already in use
2023/08/14 12:05:27 You can check the prerequisites at: https://github.com/volcano-sh/k8s-device-plugin#prerequisites
2023/08/14 12:05:27 You can learn how to set the runtime at: https://github.com/volcano-sh/k8s-device-plugin#quick-start
When a pod with the volcano resource is run, it crashes
https://github.com/volcano-sh/devices/blob/master/volcano-vgpu-device-plugin.yml
运行pod报错 Warning UnexpectedAdmissionError 79s kubelet Allocate failed due to rpc error: code = Unknown desc = failed to find gpu id, which is unexpected @ @
The tag in the containers field:
image: volcanosh/volcano-device-plugin:1.0.0-ubuntu20.04
can't be found on Docker Hub currently. Is it equivalent to the volcanosh/volcano-device-plugin:latest tag?
Additionally, after actually pulling the latest image, I found it only compatible with the x86 environment. Pods on ARM architecture machines will report:
standard_init_linux.go:220: exec user process caused "exec format error"
libcontainer: container start initialization failed: standard_init_linux.go:220: exec user process caused "exec format error"
Maybe is a bug
The yaml is
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: mindx-dls-gpu
namespace: vcjob
spec:
minAvailable: 1
schedulerName: volcano
policies:
- event: PodEvicted
action: RestartJob
maxRetry: 3
queue: default
tasks:
- name: "default-1p"
replicas: 1
template:
metadata:
labels:
app: tf
spec:
containers:
- image: nvidia-train:v1
imagePullPolicy: IfNotPresent
name: cuda-container
command:
- "/bin/bash"
- "-c"
#- "chmod 777 -R /job;cd /job/code/ModelZoo_Resnet50_HC; bash train_start.sh"
args: [ "while true; do sleep 3000000; done;" ]
resources:
requests:
volcano.sh/gpu-number: 1
limits:
volcano.sh/gpu-number: 1
volumeMounts:
- name: timezone
mountPath: /etc/timezone
- name: localtime
mountPath: /etc/localtime
nodeSelector:
accelerator: nvidia-tesla-v100
volumes:
- name: timezone
hostPath:
path: /etc/timezone
- name: localtime
hostPath:
path: /etc/timezone
name:
restartPolicy: OnFailure`
And I use "olcano.sh/gpu-memory" resource is error:
Nov 10 20:11:23 ubuntu560 kubelet[26515]: E1110 20:11:23.149895 26515 manager.go:374] Failed to allocate device plugin resource for pod 28bc8549-e3b9-40f6-8adb-7830f967d97b: rpc error: code = Unknown desc = failed to find gpu id
Nov 10 20:11:23 ubuntu560 kubelet[26515]: W1110 20:11:23.149941 26515 predicate.go:74] Failed to admit pod mindx-dls-gpu-default-1p-0_vcjob(28bc8549-e3b9-40f6-8adb-7830f967d97b) - Update plugin resources failed due to rpc error: code = Unknown desc = failed to find gpu id, which is unexpected.
env:
volcano:v1.0.1
volcano-deviceplugin:v1.0.0
os:ubuntu 18.04 amd64
so is volcano.sh/gpu-memory support?
NVIDIA Telas V100 * 8
cuda10.1 driver
volcano-device-plugin
能够获取到volcano.sh/gpu-number: 8
获取不到显存 volcano.sh/gpu-memory: 0
这是什么原因?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.