Comments (4)
@eliorc I'm sorry for not getting back to you sooner.
As far as I can tell, there is no way of sorting the GPUs according to fastest in nvidia-smi. Likewise, CUDAs heuristics for ordering the GPUs according to fastest is proprietary, which means there is no way of replicating the order. Secondly, they do not guarantee the order of the remaining GPUs.
In short, I do not see a reliable solution for GPUtil to deal with the default behavior of CUDAs GPU ordering (fastest).
I will keep the issue open.
from gputil.
I am not sure what your main concern is. Is it 1) that the IDs does not match between Tensorflow (CUDA_VISIBLE_DEVICES
) and GPUtil (nvidia-smi
) or 2) that the GPUs returned by GPUtil (nvidia-smi
) are not ordered according to their processing speed?
In case of 1), that can be solved by setting the CUDA environment variable CUDA_DEVICE_ORDER = "PCI_BUS_ID"
.
See the example Occupy only 1 GPU in TensorFlow in the GPUtil readme.
See also NVIDIAs description of the CUDA environment variables.
In case of 2), NVIDIA only guarantees that the first GPU is the fastest. The rest of the GPUs are returned in unspecified order.
From the CUDA environment variables:
FASTEST_FIRST causes CUDA to guess which device is fastest using a simple heuristic, and make that device 0, leaving the order of the rest of the devices unspecified.
As such there is no guarantee, that the GPU#2 is faster than GPU#3. And if GPU#1 is already occupied, you are back to the original problem.
Unfortunately, I do not see a solution to case 2) at the moment. However, you or anyone else are very welcome to suggest a solution :-)
- Edit: Fixed some spelling.
from gputil.
Thanks for quick response. I am talking about issue 1).
Yeah this is how I deal with it now, setting the CUDA_DEVICE_ORDER
, my suggestion was that since GPUtil is a standard choice when working with CUDA backed frameworks, it would be helpful if on the libraries side (GPUtil's side) there will be support for that default behavior since it is such a common use case (using the CUDA defaults)
from gputil.
Here are my two cents--GPUtil.get*()
functions should respect the environment variable CUDA_VISIBLE_DEVICES
. Let's say I have 4 GPUs but I make only 2 visible. Then, the above methods should return assuming only 2 are available. Currently, it looks at nvidia-smi
output and returns whatever that returns.
from gputil.
Related Issues (20)
- Crashing if nvidia-smi fails HOT 3
- GPUtil.showUtilization does not work for individual attrList HOT 3
- ImportError: No module named GPUtil HOT 4
- getFirstAvailable(maxMemory=0.9) inconsistent with showUtilization() HOT 1
- Get GPUs that are not used by any other user
- GPU memoryUsage per Process
- Is it possible to get the CUDA version? HOT 1
- Add Kubernetes support through device plugins
- Unable to find GPU on Windows HOT 11
- Request: Add all query information from nvidia-smi
- ValueError when nvidia-smi finds no GPU
- Over 60 times slower than nvidia-smi to asses resource usage
- GPUtil doesn't find GPU HOT 1
- showUtilization causes GPU stuttering
- ValueError: invalid literal for int() with base 10: 'No devices were found' HOT 1
- Drop dependency on distutils to support python 3.12 HOT 1
- Pyinstaller exe with console=False causes pop-up window every time nvidia-smi.exe is called HOT 1
- Very new to all of this, please help? HOT 1
- No longer works on Python 3.12 as distutils has been deprecated and removed HOT 1
- Handle nvidia-smi non-zero exit status
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gputil.