The logic in LocalCUDACluster might be also encoded i

One approach to this would be to create a thin wrapper around <code class="notranslate

This would require and --env keyword to <code class="

Create dask-cuda-worker CLI utility about dask-cuda HOT 8 CLOSED

rapidsai commented on September 17, 2024

Create dask-cuda-worker CLI utility

from dask-cuda.

Comments (8)

mrocklin commented on September 17, 2024

To be more explicit, we would like to be able to have other deployment solutions have the same CUDA_VISIBLE_DEVICES tricks that are used within the LocalCUDACluster class within this repository. One simple-ish way to do this is to create optional hooks to replace the dask-worker command with a new command, which we will provide as dask-cuda-worker. This CLI utility would presumably start a few Nannys with different environment variables set (there is an env= keyword in Nanny), similar to how LocalCUDACluster does.

A concrete proposal for this on the dask-jobqueue side is referenced above in dask/dask-jobqueue#229 . But this approach could also be used in other tooling like dask-kubernetes, dask-ssh, and so on.

from dask-cuda.

mrocklin commented on September 17, 2024

One approach to this would be to create a thin wrapper around dask-worker that made some opinions about --nprocs and --env and then passed those opinions down to the main dask-worker command, maybe something like the following:

from distributed.cli.dask_worker import main as original_main

@click.argument(...)  # pass through everything
def main(*args, **kwargs):
    nprocs = figure_out_how_many_gpus_we_have()
    environment_variables = [cuda_visible_devices(i) for i in range(nprocs)]

    original_main(*args, nprocs=nprocs, env=environment_variables, **kwargs)

But I suspect that click has some special way to pass through keywords that isn't well represented here.

from dask-cuda.

mrocklin commented on September 17, 2024

This would require and --env keyword to dask-worker, I'll raise an issue there

from dask-cuda.

mrocklin commented on September 17, 2024

Alternatively we could screw Python and probably just write a small bash script to do this.

In hindsight this might be simpler if the --env trick is deemed a hack from the dask side.

from dask-cuda.

beberg commented on September 17, 2024

Proposing the following interaction between --nprocs and a new flag --gpus to dask-worker. No CUDA code is required in dask-worker, so no new worker program. No additional dependencies. Modifications to dask-ssh should be straightforward.

@click.option('--gpus', type=str, default=None,
              help="GPUs to map to worker processes. "
                   "If used with --nprocs then nprocs must evenly divide the "
                   "number of GPUs. Specified as an ordered list of ranges e.g. '0-7', or '0,2,6-7'")

Checks

--gpus parses as a comma separated list of integer ranges expanded into gpu-list
GPUs exists on the system as /dev/nvidia# for each GPU.
len(gpu-list) / nprocs is an integer.

Functionality

For each group of len(gpu-list)/nprocs a nanny+worker will be started with CUDA_VISIBLE_DEVICES set with the GPUs in that group.

from dask-cuda.

mrocklin commented on September 17, 2024

I'm not sure if this is what you're proposing, but we probably can't add this to the dask-worker command in the core project. The logic is pretty CUDA specific. My guess is that we'll have to find a way to wrap around in this repository rather than introduce this logic into the core project.

from dask-cuda.

mrocklin commented on September 17, 2024

To expand on my previous remark, I would have a few concerns:

The logic we're proposing is around modifying CUDA_VISIBLE_DEVICES, so it's very CUDA specific. If some other GPU manufacturer were to show up they could easily cry foul. If someone else were to ask for similar functionality that was tied to AMD hardware the community would also probably reject this
The logic we're proposing here is honestly likely to change as we learn more about how to use GPUs and Dask together. The proposal isn't stable enough to make its way into the user-level API of Dask itself. There might be other good ways to handle Dask and GPUs. We generally need more experimentation before we commit ourselves to a particular blessed approach.
It's not yet clear that Dask wants to think at all about GPUs. They're not special cased anywhere currently in the system. Baking in logic about GPUs specifically is a broader decision beyond this change. There are enough disparate special interests pushing on this project that we end up having to say "no" unfortunately often, even when it would make a particular group's situation easier. Scope creep gets pretty bad otherwise and maintenance becomes difficult.

from dask-cuda.

mrocklin commented on September 17, 2024

To get things moving, we might also consider just copy-pasting the entire dask-worker implementation, modifying the nprocs logic a bit, and calling it a day. I suspect that learning enough about click to properly reuse code might take a while (I certainly don't know how to do this at least).

Also cc @quasiben in case he has recommendations.

from dask-cuda.

Create dask-cuda-worker CLI utility about dask-cuda HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent