Hey, first off, thanks for the library ! I have had some weird issue

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Maybe <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Issue when discrepancy between available CUDA device at build time / runtime about keops HOT 8 CLOSED

timlacroix commented on August 28, 2024

Issue when discrepancy between available CUDA device at build time / runtime

from keops.

Comments (8)

bcharlier commented on August 28, 2024

Hi @timlacroix ,

when you set CUDA_VISIBLE_DEVICES=0, the nvidia driver expose only the GPU with id=0. So at compilation time, keops only detect the gpu with id=0. This is thus the expected behavior.

Can you try to call python test.py without setting the env variable CUDA_VISIBLE_DEVICE. It should work as you expected, as you already ask keops to run on gpu 0 through to the .to('cuda:0')...

For instance:

import torch
from pykeops.torch import LazyTensor

def test(data):
	neigh_state = LazyTensor(data[None, :, :])
	state = LazyTensor(data[:, None, :])
	all_distances = ((neigh_state - state) ** 2).sum(dim=2)
	return (- all_distances).logsumexp(dim=1)

tensor = torch.randn(10,128).to('cuda:0')
test(tensor) # should run on gpu 0

tensor1 = torch.randn(10,128).to('cuda:1')
print(torch.cuda.device_count())
test(tensor) # should run on gpu 1... without recompiling

from keops.

timlacroix commented on August 28, 2024

hi, I used CUDA_VISIBLE_DEVICES here to make the problem reproducible.

My question is in a set-up where development (and thus compilation) happens on a machine with N gpus and test happens on a machine with M gpus, but sharing the same compilation cache.

Couldn't the number of GPUs available at compile time be used in the compiled code hash ? This way, changing the number of GPUs would just force a rebuild, but wouldn't raise an error.

from keops.

bcharlier commented on August 28, 2024

Maybe @joanglaunes know that better than me, but I think it will not be possible to make the same shared lib working on 2 different system. Why don't you define 2 separated cache folder ?

from keops.

bcharlier commented on August 28, 2024

hi, I used CUDA_VISIBLE_DEVICES here to make the problem reproducible.

My question is in a set-up where development (and thus compilation) happens on a machine with N gpus and test happens on a machine with M gpus, but sharing the same compilation cache.

Couldn't the number of GPUs available at compile time be used in the compiled code hash ? This way, changing the number of GPUs would just force a rebuild, but wouldn't raise an error.

ok, a quick solution could be: include the number of gpu and their respected arch in the name of the cache folder. So when you call your code from different node, it will get the sharedlib from the right cache dir

from keops.

timlacroix commented on August 28, 2024

@bcharlier yes, if that's possible that would be great :)

from keops.

bcharlier commented on August 28, 2024

is the hostname unique in your case? I mean, is one of those output different on the various nodes :

import platform
print(platform.node())

import socket
print(socket.gethostname())

from keops.

timlacroix commented on August 28, 2024

both are different on various nodes.
(However, I might want to vary the number of GPUs available at runtime on the same machine, for exemple while developing, I have two things running on 1 GPU, then at some point I want to try 1 thing on 2 GPUs ...)

I don't know if including the hostname is a good idea. This means using a separate cache folder per machine which the user can do if necessary by just using a random cache at runtime. In my case I would be happy to re-use the same cache for various nodes of the cluster.

from keops.

joanglaunes commented on August 28, 2024

Hello @timlacroix ,
In fact the technical problem for us is that detection of Gpus and their properties is currently done at compilation in the Cmake scripts that are launched after the Python code detected there is a need for compilation.
So as @bcharlier is suggesting the easiest solution for us is to include hostname and node (+ the content of CUDA_VISIBLE_DEVICES maybe) in the hash code, because this is easy to do with Python.
However ok, maybe including the Gpu properties in the hash code is not so difficult, I guess it can be done with GPUtils...

from keops.

Issue when discrepancy between available CUDA device at build time / runtime about keops HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent