Giter Site home page Giter Site logo

Comments (8)

mrocklin avatar mrocklin commented on September 17, 2024

To be more explicit, we would like to be able to have other deployment solutions have the same CUDA_VISIBLE_DEVICES tricks that are used within the LocalCUDACluster class within this repository. One simple-ish way to do this is to create optional hooks to replace the dask-worker command with a new command, which we will provide as dask-cuda-worker. This CLI utility would presumably start a few Nannys with different environment variables set (there is an env= keyword in Nanny), similar to how LocalCUDACluster does.

A concrete proposal for this on the dask-jobqueue side is referenced above in dask/dask-jobqueue#229 . But this approach could also be used in other tooling like dask-kubernetes, dask-ssh, and so on.

from dask-cuda.

mrocklin avatar mrocklin commented on September 17, 2024

One approach to this would be to create a thin wrapper around dask-worker that made some opinions about --nprocs and --env and then passed those opinions down to the main dask-worker command, maybe something like the following:

from distributed.cli.dask_worker import main as original_main

@click.argument(...)  # pass through everything
def main(*args, **kwargs):
    nprocs = figure_out_how_many_gpus_we_have()
    environment_variables = [cuda_visible_devices(i) for i in range(nprocs)]

    original_main(*args, nprocs=nprocs, env=environment_variables, **kwargs)

But I suspect that click has some special way to pass through keywords that isn't well represented here.

from dask-cuda.

mrocklin avatar mrocklin commented on September 17, 2024

This would require and --env keyword to dask-worker, I'll raise an issue there

from dask-cuda.

mrocklin avatar mrocklin commented on September 17, 2024

Alternatively we could screw Python and probably just write a small bash script to do this.

In hindsight this might be simpler if the --env trick is deemed a hack from the dask side.

from dask-cuda.

beberg avatar beberg commented on September 17, 2024

Proposing the following interaction between --nprocs and a new flag --gpus to dask-worker. No CUDA code is required in dask-worker, so no new worker program. No additional dependencies. Modifications to dask-ssh should be straightforward.

@click.option('--gpus', type=str, default=None,
              help="GPUs to map to worker processes. "
                   "If used with --nprocs then nprocs must evenly divide the "
                   "number of GPUs. Specified as an ordered list of ranges e.g. '0-7', or '0,2,6-7'")

Checks

  • --gpus parses as a comma separated list of integer ranges expanded into gpu-list
  • GPUs exists on the system as /dev/nvidia# for each GPU.
  • len(gpu-list) / nprocs is an integer.

Functionality

For each group of len(gpu-list)/nprocs a nanny+worker will be started with CUDA_VISIBLE_DEVICES set with the GPUs in that group.

from dask-cuda.

mrocklin avatar mrocklin commented on September 17, 2024

I'm not sure if this is what you're proposing, but we probably can't add this to the dask-worker command in the core project. The logic is pretty CUDA specific. My guess is that we'll have to find a way to wrap around in this repository rather than introduce this logic into the core project.

from dask-cuda.

mrocklin avatar mrocklin commented on September 17, 2024

To expand on my previous remark, I would have a few concerns:

  1. The logic we're proposing is around modifying CUDA_VISIBLE_DEVICES, so it's very CUDA specific. If some other GPU manufacturer were to show up they could easily cry foul. If someone else were to ask for similar functionality that was tied to AMD hardware the community would also probably reject this
  2. The logic we're proposing here is honestly likely to change as we learn more about how to use GPUs and Dask together. The proposal isn't stable enough to make its way into the user-level API of Dask itself. There might be other good ways to handle Dask and GPUs. We generally need more experimentation before we commit ourselves to a particular blessed approach.
  3. It's not yet clear that Dask wants to think at all about GPUs. They're not special cased anywhere currently in the system. Baking in logic about GPUs specifically is a broader decision beyond this change. There are enough disparate special interests pushing on this project that we end up having to say "no" unfortunately often, even when it would make a particular group's situation easier. Scope creep gets pretty bad otherwise and maintenance becomes difficult.

from dask-cuda.

mrocklin avatar mrocklin commented on September 17, 2024

To get things moving, we might also consider just copy-pasting the entire dask-worker implementation, modifying the nprocs logic a bit, and calling it a day. I suspect that learning enough about click to properly reuse code might take a while (I certainly don't know how to do this at least).

Also cc @quasiben in case he has recommendations.

from dask-cuda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.