- Docker & Kubernetes & GPU
- Golang & Python & Java
A kubernetes plugin which enables dynamically add or remove GPU resources for a running Pod
License: Apache License 2.0
log总是报这个,GPU数量设为0就正常。。。。
Refer to the K8S documentation
If quota is enabled in a namespace for compute resources like cpu and memory, users must specify requests or limits for those values; otherwise, the quota system may reject pod creation.
So slave pod creating will be failed if the owner pod namespace enabled resource quotas.
pods "xxx" is forbidden
And if we create slave pod in the owner pod namespace by set a resource quota, the slave pod need to consume resource quotas. It is unreasonable in a multi-tenant cluster scenario.
In current version, slave pod QoS class is BestEffort
.
The slave pod will most likely down when an eviction occurs. And it will lead to GPU resource leak( the user pod can still use GPU resource but GPU Mounter and kube-scheduler don't know at all).
k8s version:v1.14
docker version:18.09.5
测试进群中的kubelet.sock在宿主机的/var/lib/kubelet/device-plugins
所以修改了gpu-mounter-worker.yaml文件中的挂载位置
volumes:
- name: cgroup
hostPath:
type: Directory
path: /sys/fs/cgroup
- name: device-monitor
hostPath:
type: Directory
#path: /var/lib/kubelet/pod-resources
path: /var/lib/kubelet/device-plugins
- name: log-dir
hostPath:
type: DirectoryOrCreate
path: /etc/GPUMounter/log
报错信息如下:
[root@t32 deploy]# kubectl logs -f gpu-mounter-workers-2wfnp -n kube-system
2020-12-20T12:30:27.657Z INFO GPUMounter-worker/main.go:15 Service Starting...
2020-12-20T12:30:27.657Z INFO gpu-mount/server.go:21 Creating gpu mounter
2020-12-20T12:30:27.657Z INFO allocator/allocator.go:26 Creating gpu allocator
2020-12-20T12:30:27.657Z INFO collector/collector.go:23 Creating gpu collector
2020-12-20T12:30:27.657Z INFO collector/collector.go:41 Start get gpu info
2020-12-20T12:30:27.660Z INFO collector/collector.go:52 GPU Num: 2
2020-12-20T12:30:27.674Z ERROR collector/collector.go:106 Can not connect to /var/lib/kubelet/pod-resources/kubelet.sock
2020-12-20T12:30:27.674Z ERROR collector/collector.go:107 failure getting pod resources rpc error: code = Unimplemented desc = unknown service v1alpha1.PodResourcesLister
2020-12-20T12:30:27.674Z ERROR collector/collector.go:32 Failed to update gpu status
2020-12-20T12:30:27.674Z ERROR allocator/allocator.go:30 Failed to init gpu collector
2020-12-20T12:30:27.674Z ERROR gpu-mount/server.go:25 Filed to init gpu allocator
2020-12-20T12:30:27.674Z ERROR GPUMounter-worker/main.go:18 Failed to init gpu mounter
2020-12-20T12:30:27.674Z ERROR GPUMounter-worker/main.go:19 failure getting pod resources rpc error: code = Unimplemented desc = unknown service v1alpha1.PodResourcesLister
GPU Mounter depends on KubeletPodResources
api to get GPU usage from kubelet.
The KubeletPodResources
api is import from k8s.io/kubernetes directly.
Refer to kubernetes/issues/79384 and go/issues/32776, it is necessary to add require directives for matching versions of all of the subcomponents.
Lines 14 to 39 in f827c71
environment:
problem: following QuickStart.md, I install GPUMounter successfully in my k8s. However, never request remove gpu
and add gpu
sucessfully.
I pasted some logs from gpu-mounter-master-container:
remove gpu
2022-02-18T03:44:55.184Z INFO GPUMounter-master/main.go:120 access remove gpu service
2022-02-18T03:44:55.184Z INFO GPUMounter-master/main.go:134 GPU-5d237016-9ea5-77bd-8c2f-2b3fd4bfa2cd
2022-02-18T03:44:55.184Z INFO GPUMounter-master/main.go:135 GPU-5d237016-9ea5-77bd-8c2f-2b3fd4bfa2cd
2022-02-18T03:44:55.184Z INFO GPUMounter-master/main.go:146 Pod: jupyter-lab-54d76f5d58-rlklh Namespace: default UUIDs: GPU-5d237016-9ea5-77bd-8c2f-2b3fd4bfa2cd force: true
2022-02-18T03:44:55.188Z INFO GPUMounter-master/main.go:169 Found Pod: jupyter-lab-54d76f5d58-rlklh in Namespace: default on Node: dev06.ucd.qzm.stonewise.cn
2022-02-18T03:44:55.193Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-fbfj8 Node: dev05.ucd.qzm.stonewise.cn
2022-02-18T03:44:55.193Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-kwmsn Node: dev06.ucd.qzm.stonewise.cn
2022-02-18T03:44:55.201Z ERROR GPUMounter-master/main.go:217 Invalid UUIDs: GPU-5d237016-9ea5-77bd-8c2f-2b3fd4bfa2cd
add gpu
2022-02-18T03:42:22.897Z INFO GPUMounter-master/main.go:25 access add gpu service
2022-02-18T03:42:22.898Z INFO GPUMounter-master/main.go:30 Pod: jupyter-lab-54d76f5d58-rlklh Namespace: default GPU Num: 4 Is entire mount: false
2022-02-18T03:42:22.902Z INFO GPUMounter-master/main.go:66 Found Pod: jupyter-lab-54d76f5d58-rlklh in Namespace: default on Node: dev06.ucd.qzm.stonewise.cn
2022-02-18T03:42:22.907Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-fbfj8 Node: dev05.ucd.qzm.stonewise.cn
2022-02-18T03:42:22.907Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-kwmsn Node: dev06.ucd.qzm.stonewise.cn
2022-02-18T03:42:22.921Z ERROR GPUMounter-master/main.go:98 Failed to call add gpu service
2022-02-18T03:42:22.921Z ERROR GPUMounter-master/main.go:99 rpc error: code = Unknown desc = FailedCreated
Thanks for @ilyee add gang scheduling support in #15 which means gang scheduler can be selected when we add multi GPUs.
And still sth. need to fix:
Add the relevant docs
Add the relevant RESTful API
allocator/allocator.go:159 log format error
need to input all GPU uuids when unmount after gang mount
1.使用readme中的例子(yaml文件如下),会出现将节点调度到非GPU宿主机上
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: cuda-container
image: tensorflow/tensorflow:1.13.2-gpu
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "none"
mknod
执行脚本,会出错。查看源码发现的确需要有mknod
执行脚本,这个是不是应该在README
中说明gpu-pod-slave-*
POD会一直占用GPU资源(除非将gpu-pod删除),是否应该设置callback?GPUMounter-master.log:
2022-01-16T11:24:14.610Z INFO GPUMounter-master/main.go:25 access add gpu service
2022-01-16T11:24:14.610Z INFO GPUMounter-master/main.go:30 Pod: test Namespace: default GPU Num: 1 Is entire mount: false
2022-01-16T11:24:14.627Z INFO GPUMounter-master/main.go:66 Found Pod: test in Namespace: default on Node: rtxws
2022-01-16T11:24:14.634Z INFO GPUMounter-master/main.go:265 Worker: gpu-mounter-workers-7dsdf Node: rtxws
2022-01-16T11:24:19.648Z ERROR GPUMounter-master/main.go:98 Failed to call add gpu service
2022-01-16T11:24:19.648Z ERROR GPUMounter-master/main.go:99 rpc error: code = Unknown desc = Service Internal Error
GPUMounter-worker.log:
2022-01-16T11:24:14.635Z INFO gpu-mount/server.go:35 AddGPU Service Called
2022-01-16T11:24:14.635Z INFO gpu-mount/server.go:36 request: pod_name:"test" namespace:"default" gpu_num:1
2022-01-16T11:24:14.645Z INFO gpu-mount/server.go:55 Successfully get Pod: default in cluster
2022-01-16T11:24:14.645Z INFO allocator/allocator.go:159 Get pod default/test mount type
2022-01-16T11:24:14.645Z INFO collector/collector.go:91 Updating GPU status
2022-01-16T11:24:14.646Z INFO collector/collector.go:136 GPU status update successfully
2022-01-16T11:24:14.657Z INFO allocator/allocator.go:59 Creating GPU Slave Pod: test-slave-pod-2f66ed for Owner Pod: test
2022-01-16T11:24:14.657Z INFO allocator/allocator.go:238 Checking Pods: test-slave-pod-2f66ed state
2022-01-16T11:24:14.661Z INFO allocator/allocator.go:264 Pod: test-slave-pod-2f66ed creating
2022-01-16T11:24:19.442Z INFO allocator/allocator.go:277 Pods: test-slave-pod-2f66ed are running
2022-01-16T11:24:19.442Z INFO allocator/allocator.go:84 Successfully create Slave Pod: %s, for Owner Pod: %s test-slave-pod-2f66edtest
2022-01-16T11:24:19.442Z INFO collector/collector.go:91 Updating GPU status
2022-01-16T11:24:19.444Z DEBUG collector/collector.go:130 GPU: /dev/nvidia0 allocated to Pod: test-slave-pod-2f66ed in Namespace gpu-pool
2022-01-16T11:24:19.444Z INFO collector/collector.go:136 GPU status update successfully
2022-01-16T11:24:19.444Z INFO gpu-mount/server.go:81 Start mounting, Total: 1 Current: 1
2022-01-16T11:24:19.444Z INFO util/util.go:19 Start mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"} to Pod: test
2022-01-16T11:24:19.444Z INFO util/util.go:24 Pod :test container ID: e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3
2022-01-16T11:24:19.444Z INFO util/util.go:30 Successfully get cgroup path: /kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3 for Pod: test
2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:140 Exec "echo 'c 195:0 rw' > /sys/fs/cgroup/devices/kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3/devices.allow" failed
2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:141 Output: sh: 1: cannot create /sys/fs/cgroup/devices/kubepods/burstable/podc815ee4b-bea0-44ed-8ef4-239e69516ba2/e317ca7f5eb5e3c523fab9f0744a065cd69013a7c09522318d4bbf98ad0bb1c3/devices.allow: Directory nonexistent
2022-01-16T11:24:19.445Z ERROR cgroup/cgroup.go:142 exit status 2
2022-01-16T11:24:19.445Z ERROR util/util.go:33 Add GPU {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"}failed
2022-01-16T11:24:19.445Z ERROR gpu-mount/server.go:84 Mount GPU: {"MinorNumber":0,"DeviceFilePath":"/dev/nvidia0","UUID":"GPU-7fe47fc1-b21e-e675-f6ff-edd91910f8a7","State":"GPU_ALLOCATED_STATE","PodName":"test-slave-pod-2f66ed","Namespace":"gpu-pool"} to Pod: test in Namespace: default failed
2022-01-16T11:24:19.445Z ERROR gpu-mount/server.go:85 exit status 2
環境與版本
在k8s v1.23裡, "/sys/fs/cgroup/devices/kubepods/burstable/pod[pod-id]/[container-id]/devices.allow" 改為 "/sys/fs/cgroup/devices/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod[pod-id]/docker-[container-id].scope/devices.allow"
所以當前GPUMounter在v1.23裡無法正常運作
是否可以更新至可符合k8s v1.23版,謝謝
Hello, I am elihe from Zhihu. I have seen your article in Zhihu before. After reading your code I have a question:
Why is each slave pod bound to only one GPU in GetAvailableGPU method of pkg/util/gpu/allocator/allocator.go?
As far as I'm concerned, in a large-scale cluster, this will bring additional load to the master node (there will be a larger number of pod creation requests); And the creation of multiple single-card pods may cause two competing GPU mount requests all failing (for example There are 4 available GPUs and two requests to mount 4 cards. One request successfully created slave pods 1 and 2, and the other created slave pods 3 and 4. They will all be unable to obtain more resources.)
If you agree with me, can I submit a merge request to optimize this?
[root@t32 ~]# kubectl logs -f gpu-mounter-workers-ccqfv -n kube-system
2021-02-10T01:01:09.689Z INFO GPUMounter-worker/main.go:15 Service Starting...
2021-02-10T01:01:09.690Z INFO gpu-mount/server.go:21 Creating gpu mounter
2021-02-10T01:01:09.690Z INFO allocator/allocator.go:27 Creating gpu allocator
2021-02-10T01:01:09.690Z INFO collector/collector.go:23 Creating gpu collector
2021-02-10T01:01:09.690Z INFO collector/collector.go:41 Start get gpu info
2021-02-10T01:01:09.690Z ERROR collector/collector.go:43 nvml error: %+vcould not load NVML library
2021-02-10T01:01:09.690Z ERROR collector/collector.go:26 Failed to init gpu collector
2021-02-10T01:01:09.690Z ERROR allocator/allocator.go:31 Failed to init gpu collector
2021-02-10T01:01:09.690Z ERROR gpu-mount/server.go:25 Filed to init gpu allocator
2021-02-10T01:01:09.690Z ERROR GPUMounter-worker/main.go:18 Failed to init gpu mounter
2021-02-10T01:01:09.690Z ERROR GPUMounter-worker/main.go:19 could not load NVML library
[root@t32 ~]# kubectl get pod gpu-mounter-workers-ccqfv -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
gpu-mounter-workers-ccqfv 0/1 ImagePullBackOff 1814 20d 10.42.5.33 t90 <none> <none>
(base) root@t90:~# nvidia-smi
Tue Feb 23 14:28:51 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:03:00.0 Off | 0 |
| N/A 34C P0 32W / 250W | 9114MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:04:00.0 Off | 0 |
| N/A 34C P0 31W / 250W | 520MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1024 C python 869MiB |
| 0 1225 C /usr/local/bin/python 8235MiB |
| 1 1024 C python 255MiB |
| 1 1225 C /usr/local/bin/python 255MiB |
+-----------------------------------------------------------------------------+
请问下大佬,mount流程创建slave-pod之后,这个slave-pod是不是应该一直存在,直到removeGPU?
我这边这个slave-pod,running之后不一会就被kill了,然后再removeGPU就失败了,这块被kill是什么原因?有啥思路不?是被驱逐了?这块从哪里排查比较好?感谢!
sol-UniServer-R4900-G3:~/go/src/github.com/jason-gideon/GPUMounter/example$ kubectl -n gpu-pool describe pod gpu-pod-slave-pod-6ffc13
Name: gpu-pod-slave-pod-6ffc13
Namespace: gpu-pool
Priority: 0
Service Account: default
Node: software-dell-r740-015/10.115.0.253
Start Time: Tue, 06 Dec 2022 18:46:36 +0800
Labels: app=gpu-pool
Annotations: cni.projectcalico.org/containerID: f3cbb407ae1601047a04a8e322b4eca80abd70df24f9de9e5f105586dd1d98fd
cni.projectcalico.org/podIP: 10.42.1.143/32
cni.projectcalico.org/podIPs: 10.42.1.143/32
k8s.v1.cni.cncf.io/network-status:
[{
"name": "",
"ips": [
"10.42.1.143"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "",
"ips": [
"10.42.1.143"
],
"default": true,
"dns": {}
}]
Status: Terminating (lasts <invalid>)
Termination Grace Period: 30s
IP: 10.42.1.143
IPs:
IP: 10.42.1.143
Controlled By: Pod/gpu-pod
Containers:
gpu-container:
Container ID: docker://e7f1f51dd6c3996d93172e1f56b3f955042d6f15726b6fb71745eb2bb6499707
Image: alpine:latest
Image ID: docker-pullable://alpine@sha256:8914eb54f968791faf6a8638949e480fef81e697984fba772b3976835194c6d4
Port: <none>
Host Port: <none>
Command:
/bin/sh
Args:
-c
while true; do echo this is a gpu pool container; sleep 10;done
State: Running
Started: Tue, 06 Dec 2022 18:46:41 +0800
Ready: True
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhkdp (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-jhkdp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=software-dell-r740-015
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30s default-scheduler Successfully assigned gpu-pool/gpu-pod-slave-pod-6ffc13 to software-dell-r740-015
Warning OwnerRefInvalidNamespace 30s garbage-collector-controller ownerRef [v1/Pod, namespace: gpu-pool, name: gpu-pod, uid: 6c482ef7-9acd-41ab-925e-101e166f75de] does not exist in namespace "gpu-pool"
Normal AddedInterface 28s multus Add eth0 [10.42.1.143/32]
Normal Pulling 28s kubelet Pulling image "alpine:latest"
Normal Pulled 26s kubelet Successfully pulled image "alpine:latest" in 2.151216821s
Normal Created 26s kubelet Created container gpu-container
Normal Started 25s kubelet Started container gpu-container
Normal Killing 10s kubelet Stopping container gpu-container
第一次挂载成功了,后面卸载再次deploy 显示这个 Insufficient GPU on Node: yigou-dev-102-46,gpu 实际空闲
In current version, when calling remove GPU service, it return without waiting for the deletion of slave pods.
It is unreasonable, because kubelet think the GPU resource is still being occupied by slave pod until the slave pod deletion finshed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.