Comments (8)
I've taken a bit of a look at this, although haven't been able to get anything up and running yet. Just posting some progress here in case anyone else is looking at it too.
Databricks has this article on getting ray up and running which looks helpful and very similar to what getting dask running should look like. It suggests like you've put in the description using DB_IS_DRIVER
and DB_DRIVER_IP
.
By my reckoning, that should imply this simple init script would work:
set -e
# Install dask
pip install "dask[complete]"
# if no runtime version start dask worker
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
dask scheduler &
# otherwise, start start dask scheduler
else
sleep 40
dask worker tcp://$DB_DRIVER_IP:8786 &
fi
spoiler alert: it doesn't. Seems to set the scheduler fine, but not any workers, or at least, something different must be needed to communicate with workers than what I've put together.
This script registers the client ok, but the submitted job just hangs permanently on "pending" suggest no workings are registered:
from dask.distributed import Client
import os
client = Client(f'{os.environ["SPARK_LOCAL_IP"]}:8786')
def inc(x):
return x + 1
x = client.submit(inc, 10)
from dask-databricks.
Also @skirui-source shared this updated version of the script that uses nc
to poll for the scheduler instead of waiting for a sleep which is a little more robust.
#!/bin/bash
set -e
echo "DB_IS_DRIVER = $DB_IS_DRIVER"
echo "DB_DRIVER_IP = $DB_DRIVER_IP"
pip install --upgrade pip dask[complete]
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
echo "This node is the Dask scheduler."
dask scheduler &
else
echo "This node is a Dask worker."
echo "Connecting to Dask scheduler at $DB_DRIVER_IP:8786"
# Wait for the scheduler to start
while ! nc -z $DB_DRIVER_IP 8786; do
echo "Scheduler not available yet. Waiting..."
sleep 1
done
dask worker tcp://$DB_DRIVER_IP:8786 &
fi
from dask-databricks.
Thanks for that @benrutter. I just tried your example and it worked for me!
from dask-databricks.
Oh nice one! Did submitting a job to the client work and return a response?
I'm curious about why it didn't work on the cluster I tried, what version of databricks runtime / nodes set up did you have?
from dask-databricks.
Yeah you can see the result of 11
printed in my screenshot.
Here's the configuration I used.
I wonder if the sleep of 40
seconds is too short to work consistently?
from dask-databricks.
Not sure how I missed in the screenshot! Could be the 40 seconds, I'll see if I can get it to work by extending it.
from dask-databricks.
Yup, I think it must have either:
- The 40 seconds not being enough
- The driver node not having a wait time, so being callable ahead of workers
I don't really know enough about when databricks considers a cluster "ready", but given I was starting up a cluster to immediately run a notebook, its possible I just needed to wait an extra 40 seconds being scheduling the job.
Either way, tried again with a 100 second wait both before the worker nodes, and after the dask set up on the driver node, and all worked great.
Nice! What would be involved in setting up a runner to use this method?
from dask-databricks.
@benrutter awesome glad you had success!
Now that we've figured some things out and have a POC I'm not convinced this fits into the runner paradigm anymore. Given the use of init scripts it would probably be better for us to just provide a small utility to replace most of that script, and also provide some client-side tools for use in the notebook.
I've created another repo which will probably also get moved to dask-contrib
at some point. I'm going to transfer this issue over there and close it out. Please feel free to engage in the other discussions I'm starting there.
from dask-databricks.
Related Issues (17)
- High-level plan and scope HOT 1
- Access Dask dashboard through Databricks proxy
- Check process health HOT 1
- Add support for alternative worker commands and config options
- Add CI
- Add precommit and precommit CI
- Add auto VCS versioning
- Add docs HOT 1
- Publish to PyPI HOT 1
- Publish to conda-forge HOT 1
- Add DatabricksCluster class HOT 4
- example of read/write parquet using dbfs://
- Installing additional packages on the dask workers HOT 4
- Unable to use all nodes/threads setup for Databricks Compute HOT 2
- Dask Dashboard URL not working for Azure Databricks
- Dynamically alter behaviour when on single node cluster HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-databricks.