I understand that GPUtil infers the GPUs attributes so it will match the <code class="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Mixing up GPU names on slowest first GPU bus ID about gputil HOT 4 OPEN

eliorc commented on August 26, 2024

Mixing up GPU names on slowest first GPU bus ID

from gputil.

Comments (4)

anderskm commented on August 26, 2024 1

@eliorc I'm sorry for not getting back to you sooner.

As far as I can tell, there is no way of sorting the GPUs according to fastest in nvidia-smi. Likewise, CUDAs heuristics for ordering the GPUs according to fastest is proprietary, which means there is no way of replicating the order. Secondly, they do not guarantee the order of the remaining GPUs.
In short, I do not see a reliable solution for GPUtil to deal with the default behavior of CUDAs GPU ordering (fastest).

I will keep the issue open.

from gputil.

anderskm commented on August 26, 2024

I am not sure what your main concern is. Is it 1) that the IDs does not match between Tensorflow (CUDA_VISIBLE_DEVICES) and GPUtil (nvidia-smi) or 2) that the GPUs returned by GPUtil (nvidia-smi) are not ordered according to their processing speed?

In case of 1), that can be solved by setting the CUDA environment variable CUDA_DEVICE_ORDER = "PCI_BUS_ID".
See the example Occupy only 1 GPU in TensorFlow in the GPUtil readme.
See also NVIDIAs description of the CUDA environment variables.

In case of 2), NVIDIA only guarantees that the first GPU is the fastest. The rest of the GPUs are returned in unspecified order.
From the CUDA environment variables:

FASTEST_FIRST causes CUDA to guess which device is fastest using a simple heuristic, and make that device 0, leaving the order of the rest of the devices unspecified.

As such there is no guarantee, that the GPU#2 is faster than GPU#3. And if GPU#1 is already occupied, you are back to the original problem.
Unfortunately, I do not see a solution to case 2) at the moment. However, you or anyone else are very welcome to suggest a solution :-)

Edit: Fixed some spelling.

from gputil.

eliorc commented on August 26, 2024

Thanks for quick response. I am talking about issue 1).

Yeah this is how I deal with it now, setting the CUDA_DEVICE_ORDER, my suggestion was that since GPUtil is a standard choice when working with CUDA backed frameworks, it would be helpful if on the libraries side (GPUtil's side) there will be support for that default behavior since it is such a common use case (using the CUDA defaults)

from gputil.

tashrifbillah commented on August 26, 2024

Here are my two cents--GPUtil.get*() functions should respect the environment variable CUDA_VISIBLE_DEVICES. Let's say I have 4 GPUs but I make only 2 visible. Then, the above methods should return assuming only 2 are available. Currently, it looks at nvidia-smi output and returns whatever that returns.

from gputil.

Mixing up GPU names on slowest first GPU bus ID about gputil HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent