Giter Site home page Giter Site logo

Comments (4)

anderskm avatar anderskm commented on August 26, 2024 1

@eliorc I'm sorry for not getting back to you sooner.

As far as I can tell, there is no way of sorting the GPUs according to fastest in nvidia-smi. Likewise, CUDAs heuristics for ordering the GPUs according to fastest is proprietary, which means there is no way of replicating the order. Secondly, they do not guarantee the order of the remaining GPUs.
In short, I do not see a reliable solution for GPUtil to deal with the default behavior of CUDAs GPU ordering (fastest).

I will keep the issue open.

from gputil.

anderskm avatar anderskm commented on August 26, 2024

I am not sure what your main concern is. Is it 1) that the IDs does not match between Tensorflow (CUDA_VISIBLE_DEVICES) and GPUtil (nvidia-smi) or 2) that the GPUs returned by GPUtil (nvidia-smi) are not ordered according to their processing speed?

In case of 1), that can be solved by setting the CUDA environment variable CUDA_DEVICE_ORDER = "PCI_BUS_ID".
See the example Occupy only 1 GPU in TensorFlow in the GPUtil readme.
See also NVIDIAs description of the CUDA environment variables.

In case of 2), NVIDIA only guarantees that the first GPU is the fastest. The rest of the GPUs are returned in unspecified order.
From the CUDA environment variables:

FASTEST_FIRST causes CUDA to guess which device is fastest using a simple heuristic, and make that device 0, leaving the order of the rest of the devices unspecified.

As such there is no guarantee, that the GPU#2 is faster than GPU#3. And if GPU#1 is already occupied, you are back to the original problem.
Unfortunately, I do not see a solution to case 2) at the moment. However, you or anyone else are very welcome to suggest a solution :-)

  • Edit: Fixed some spelling.

from gputil.

eliorc avatar eliorc commented on August 26, 2024

Thanks for quick response. I am talking about issue 1).

Yeah this is how I deal with it now, setting the CUDA_DEVICE_ORDER, my suggestion was that since GPUtil is a standard choice when working with CUDA backed frameworks, it would be helpful if on the libraries side (GPUtil's side) there will be support for that default behavior since it is such a common use case (using the CUDA defaults)

from gputil.

tashrifbillah avatar tashrifbillah commented on August 26, 2024

Here are my two cents--GPUtil.get*() functions should respect the environment variable CUDA_VISIBLE_DEVICES. Let's say I have 4 GPUs but I make only 2 visible. Then, the above methods should return assuming only 2 are available. Currently, it looks at nvidia-smi output and returns whatever that returns.

from gputil.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.