Comments (8)
/cc @wangyang0616 Can you help take a look?
ok, let me take a look
from devices.
你好,我在尝试volcano gpu number的服务调度,在根据volcano的教程步骤安装之后,每一个带gpu的node都能够正确的显示有多少块gpu,但是在创建pod的时候,container的容器中没有volcano-gpu-number这一个环境变量,在里面输入nvidia-smi能够看到该节点所有的gpu,想问一下是否需要更改yaml文件?
from devices.
你好,我在尝试volcano gpu number的服务调度,在根据volcano的教程步骤安装之后,每一个带gpu的node都能够正确的显示有多少块gpu,但是在创建pod的时候,container的容器中没有volcano-gpu-number这一个环境变量,在里面输入nvidia-smi能够看到该节点所有的gpu,想问一下是否需要更改yaml文件?
Hey, which version do you make use of?
from devices.
你好,我在尝试volcano gpu number的服务调度,在根据volcano的教程步骤安装之后,每一个带gpu的node都能够正确的显示有多少块gpu,但是在创建pod的时候,container的容器中没有volcano-gpu-number这一个环境变量,在里面输入nvidia-smi能够看到该节点所有的gpu,想问一下是否需要更改yaml文件?
Hey, which version do you make use of?
volcano-1.6.0
from devices.
/cc @wangyang0616 Can you help take a look?
from devices.
@Trainbow Is it convenient to post the yaml file for creating the test task?
By the way, can it be successfully scheduled using the default scheduler of k8s?
from devices.
@Trainbow Is it convenient to post the yaml file for creating the test task? By the way, can it be successfully scheduled using the default scheduler of k8s?
I used the sample yaml in vaolcano-gpu-number readme.
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod1
namespace: model
spec:
containers:
- name: cuda-container
image: nvidia/cuda:9.0-devel
command: ["sleep"]
args: ["100000"]
resources:
limits:
volcano.sh/gpu-number: 1 # requesting 1 gpu cards
# nvidia.com/gpu: 1
I also installed nvidia's k8s-device-plugin for testing. For example, when the limits field used nvidia.com/gpu, the pod's container works well, and it has one gpu devices. When i used volcano.sh/gpu-number, the container's env doesn't have the variable VOLCANO_GPU_ALLOCATED
, the NVIDIA_VISIBLE_DEVICES
is all
.
I tried the gpu-sharing with volcano, according to the official tutorial to test, I can find the corresponding environment variables in the pod.
from devices.
Volcano Device Plugin GPUSTRATEGY
default is the Share
mode, that is, you can use the Volcano.sh/GPU-MEMOMORY
.
If you use the volcano.sh/gpu-number
, you need number`, see for details: config-the-volcano-device-plugin-binary
Hope the above information is helpful to you.
from devices.
Related Issues (18)
- rpc error: code = Unknown desc = failed to find gpu id HOT 14
- when use volcano.sh/gpu-number: 1,why the pod has all GPU? HOT 3
- [enhance] support specify GPU number for pod resource request HOT 4
- add 2 or more allocatable devices to a pod HOT 1
- gpu-memory is just for a gpu memery claim? HOT 2
- Warning UnexpectedAdmissionErro HOT 7
- volcano.sh/gpu-memory: 0 HOT 4
- ListAndWatch failed when managing large memory GPU such as NVIDIA Telas V100 HOT 16
- Move related GPU code to pkg/gpu
- Add MIG Device support HOT 2
- Improve the logic of finding candidate pod in Allocate RPC HOT 1
- Enable prow for this repo HOT 1
- device-plugin painc HOT 20
- The Docker image tag in volcano-device-plugin.yaml is incorrect, and it only has an x86 version, there is no arm64 version available.
- The `args` configuration for the `containers` in the `volcano-device-plugin.yaml` is incorrect.
- create pod UnexpectedAdmissionError HOT 2
- Should add volcano.sh/gpu-index annotation in pod example HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from devices.