To debug crashes, it is often useful to start each of the processes (the plasma store,

I've been using tmux panes to debug. Wrote a <a href="https://gist.github.com/d

Updated instructions Start Redis <div class="sn

Updated instructions. Start Redis <div class="s

If you wish to start the Redis serves by hand instead of calling <code class="notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yeah, that would be great. I’ll keep playing with the to see what

Starting processes by hand.,about ray-project/ray

Comments (11)

pcmoritz commented on April 27, 2024 1

A cool variant of this I've just successfully used:

Edit services.py to output the command that was supposed to be run instead of actually running it, and put a

import IPython
IPython.embed()

after it. Then you can start Ray normally and when the interpreter hits the IPython.embed, start the program by hand (possibly in gdb). It works beautifully!

Here is the diff:

--- a/python/ray/local_scheduler/local_scheduler_services.py
+++ b/python/ray/local_scheduler/local_scheduler_services.py
@@ -117,6 +117,10 @@ def start_local_scheduler(plasma_store_name,
                                stdout=stdout_file, stderr=stderr_file)
         time.sleep(1.0)
     else:
-        pid = subprocess.Popen(command, stdout=stdout_file, stderr=stderr_file)
+        print(" ".join(command))
+        import IPython
+        IPython.embed()
+        pid = None
+        # pid = subprocess.Popen(command, stdout=stdout_file, stderr=stderr_file)
         time.sleep(0.1)
     return local_scheduler_name, pid
diff --git a/python/ray/services.py b/python/ray/services.py
index a37c16a..e6f3ede 100644
--- a/python/ray/services.py
+++ b/python/ray/services.py
@@ -555,7 +555,7 @@ def start_local_scheduler(redis_address,
         stderr_file=stderr_file,
         static_resource_list=[num_cpus, num_gpus],
         num_workers=num_workers)
-    if cleanup:
+    if cleanup and p:
         all_processes[PROCESS_TYPE_LOCAL_SCHEDULER].append(p)
     record_log_files_in_redis(redis_address, node_ip_address,
                               [stdout_file, stderr_file])

-- Philipp.

from ray.

danielsuo commented on April 27, 2024 1

I've been using tmux panes to debug. Wrote a script here that may be helpful to others. Originally, I had this in tmuxinator, but wanted to reduce the number of dependencies.

It's a little finicky since this doesn't really respect dependencies, but a little less painful than starting all processes by hand.

from ray.

robertnishihara commented on April 27, 2024

Closing for now.

from ray.

robertnishihara commented on April 27, 2024

Updated instructions

Start Redis

rm dump.rdb; ./src/common/thirdparty/redis/src/redis-server --loadmodule src/common/redis_module/ray_redis_module.so

Start the global scheduler (and pass in the Redis address)

./src/global_scheduler/build/global_scheduler -r 127.0.0.1:6379

Start the plasma store

src/plasma/build/plasma_store -s /tmp/s1 -m 1000000000

Start the plasma manager

src/plasma/build/plasma_manager -s /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -h 127.0.0.1 -p 23894

Start the local scheduler

src/photon/build/photon_scheduler -s /tmp/sched1 -p /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -a 127.0.0.1:23894 -h 127.0.0.1

Start a worker (or run this multiple times to start multiple workers).

python lib/python/ray/workers/default_worker.py --redis-address 127.0.0.1:6379 --object-store-name /tmp/s1 --object-store-manager-name /tmp/m1 --local-scheduler-name /tmp/sched1 --node-ip-address 127.0.0.1

Connect a driver (run this in a Python interpreter).

import ray

address_info = {
  "node_ip_address": "127.0.0.1",
  "redis_address": "127.0.0.1:6379",
  "store_socket_name": "/tmp/s1",
  "manager_socket_name": "/tmp/m1",
  "local_scheduler_socket_name": "/tmp/sched1"}

ray.connect(address_info, mode=ray.SCRIPT_MODE)

Some other useful things:

Start a redis client with redis-cli and run monitor to monitor all of the commands going through the redis server.

from ray.

robertnishihara commented on April 27, 2024

Updating this again, since the instructions have changed.

Start Redis

rm dump.rdb; ./python/core/src/common/thirdparty/redis/src/redis-server --loadmodule python/core/src/common/redis_module/libray_redis_module.so

Start the global scheduler (and pass in the Redis address)

./python/core/src/global_scheduler/global_scheduler -r 127.0.0.1:6379

Start the plasma store

./python/core/src/plasma/plasma_store -s /tmp/s1 -m 1000000000

Start the plasma manager

./python/core/src/plasma/plasma_manager -s /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -h 127.0.0.1 -p 23894

Start the local scheduler

./python/core/src/photon/photon_scheduler -s /tmp/sched1 -p /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -a 127.0.0.1:23894 -h 127.0.0.1

Start a worker (or run this multiple times to start multiple workers).

python python/ray/workers/default_worker.py --redis-address 127.0.0.1:6379 --object-store-name /tmp/s1 --object-store-manager-name /tmp/m1 --local-scheduler-name /tmp/sched1 --node-ip-address 127.0.0.1

Connect a driver (run this in a Python interpreter).

import ray

address_info = {
  "node_ip_address": "127.0.0.1",
  "redis_address": "127.0.0.1:6379",
  "store_socket_name": "/tmp/s1",
  "manager_socket_name": "/tmp/m1",
  "local_scheduler_socket_name": "/tmp/sched1"}

ray.connect(address_info, mode=ray.SCRIPT_MODE)

Some other useful things:

Start a redis client with redis-cli and run monitor to monitor all of the commands going through the redis server.

from ray.

robertnishihara commented on April 27, 2024

Updated instructions.

Start Redis

rm dump.rdb; ./python/ray/core/src/common/thirdparty/redis/src/redis-server \
    --loadmodule python/ray/core/src/common/redis_module/libray_redis_module.so

Start the global scheduler (and pass in the Redis address)

./python/ray/core/src/global_scheduler/global_scheduler \
    -r 127.0.0.1:6379 \
    -h 127.0.0.1

Start the plasma store

./python/ray/core/src/plasma/plasma_store \
    -s /tmp/s1 \
    -m 1000000000

Start the plasma manager

./python/ray/core/src/plasma/plasma_manager \
    -s /tmp/s1 \
    -m /tmp/m1 \
    -r 127.0.0.1:6379 \
    -h 127.0.0.1 \
    -p 23894

Start the local scheduler

# Without ability to start new workers.
./python/ray/core/src/local_scheduler/local_scheduler \
    -s /tmp/sched1 \
    -p /tmp/s1 \
    -m /tmp/m1 \
    -r 127.0.0.1:6379 \
    -a 127.0.0.1:23894 \
    -h 127.0.0.1

# With ability to start new workers.
./python/ray/core/src/local_scheduler/local_scheduler \
    -s /tmp/sched1 \
    -p /tmp/s1 \
    -m /tmp/m1 \
    -r 127.0.0.1:6379 \
    -a 127.0.0.1:23894 \
    -h 127.0.0.1 \
    -w "python python/ray/workers/default_worker.py --redis-address 127.0.0.1:6379 --object-store-name /tmp/s1 --object-store-manager-name /tmp/m1 --local-scheduler-name /tmp/sched1 --node-ip-address 127.0.0.1"

Start a worker (or run this multiple times to start multiple workers).

python python/ray/workers/default_worker.py \
    --redis-address 127.0.0.1:6379 \
    --object-store-name /tmp/s1 \
    --object-store-manager-name /tmp/m1 \
    --local-scheduler-name /tmp/sched1 \
    --node-ip-address 127.0.0.1

Connect a driver (run this in a Python interpreter).

import ray

address_info = {
  "node_ip_address": "127.0.0.1",
  "redis_address": "127.0.0.1:6379",
  "store_socket_name": "/tmp/s1",
  "manager_socket_name": "/tmp/m1",
  "local_scheduler_socket_name": "/tmp/sched1"}

ray.connect(address_info, mode=ray.SCRIPT_MODE)

Some other useful things:

Start a redis client with redis-cli and run monitor to monitor all of the commands going through the redis server.

from ray.

robertnishihara commented on April 27, 2024

More recent instructions.

Note: Throughout, you will need to replace <head-node-ip> with something more correct.

The processes on the head node can be started as follows.

Run the following in Python to start the Redis shards.

import ray

ray.services.start_redis("<head-node-ip>",
                         port=6379,
                         num_redis_shards=1,
                         redirect_output=False,
                         cleanup=False)

Start a global scheduler.

./python/ray/core/src/global_scheduler/global_scheduler \
    -r <head-node-ip>:6379 \
    -h <head-node-ip>

Start a plasma store

./python/ray/core/src/plasma/plasma_store \
    -s /tmp/s1 \
    -m 1000000000

Start a plasma manager

./python/ray/core/src/plasma/plasma_manager \
    -s /tmp/s1 \
    -m /tmp/m1 \
    -r <head-node-ip>:6379 \
    -h <head-node-ip> \
    -p 23894

Start the local scheduler

# Without ability to start new workers.
./python/ray/core/src/local_scheduler/local_scheduler \
    -s /tmp/sched1 \
    -p /tmp/s1 \
    -m /tmp/m1 \
    -r <head-node-ip>:6379 \
    -a <head-node-ip>:23894 \
    -h <head-node-ip> \
    -c 16,0

# With ability to start new workers.
./python/ray/core/src/local_scheduler/local_scheduler \
    -s /tmp/sched1 \
    -p /tmp/s1 \
    -m /tmp/m1 \
    -r <head-node-ip>:6379 \
    -a <head-node-ip>:23894 \
    -h <head-node-ip> \
    -c 16,0 \
    -w "python python/ray/workers/default_worker.py --redis-address <head-node-ip>:6379 --object-store-name /tmp/s1 --object-store-manager-name /tmp/m1 --local-scheduler-name /tmp/sched1 --node-ip-address <head-node-ip>"

Start a worker (or run this multiple times to start multiple workers).

 python python/ray/workers/default_worker.py \
     --redis-address <head-node-ip>:6379 \
     --object-store-name /tmp/s1 \
     --object-store-manager-name /tmp/m1 \
     --local-scheduler-name /tmp/sched1 \
     --node-ip-address <head-node-ip>

Connect a driver (run this in a Python interpreter).

import ray
ray.init(redis_address="<head-node-ip>:6379")

After that, it should be possible to start up Ray on other nodes as follows.*

ray start --redis-address=<head-node-ip>:6379

from ray.

robertnishihara commented on April 27, 2024

If you wish to start the Redis serves by hand instead of calling ray.services.start_redis as in the previous comment, you can do the following (this starts one primary Redis server and one other Redis shard). Note the value <head-node-ip> will need to be replaced.

./python/ray/core/src/common/thirdparty/redis/src/redis-server \
    --loglevel warning \
    --loadmodule ./python/ray/core/src/common/redis_module/libray_redis_module.so \
    --port 6379

./python/ray/core/src/common/thirdparty/redis/src/redis-server \
    --loglevel warning \
    --loadmodule ./python/ray/core/src/common/redis_module/libray_redis_module.so \
    --port 6380

./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 set NumRedisShards 1
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 rpush RedisShards <head-node-ip>:6380

./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 config set notify-keyspace-events Kl
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6380 config set notify-keyspace-events Kl

./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 config set protected-mode no
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6380 config set protected-mode no

./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 config set client-output-buffer-limit "normal 0 0 0 slave 268435456 67108864 60 pubsub 134217728 134217728 60"
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6380 config set client-output-buffer-limit "normal 0 0 0 slave 268435456 67108864 60 pubsub 134217728 134217728 60"

from ray.

robertnishihara commented on April 27, 2024

@danielsuo that looks like a step in the right direction! I think the most useful thing would be to be able to start any of the Ray processes within tmux (and also within gdb), e.g., integrating something like this within services.py.

from ray.

danielsuo commented on April 27, 2024

Yeah, that would be great. I’ll keep playing with the script to see what’s useful for me and report back. I hesitate to add into services.py since it might clutter and couple us more with python. Of course, I’m not really familiar with those development plans, so defer to your judgment!

…

On Fri, Aug 10, 2018 at 8:18 PM Robert Nishihara ***@***.***> wrote: @danielsuo <https://github.com/danielsuo> that looks like a step in the right direction! I think the most useful thing would be to be able to start any of the Ray processes within tmux (and also within gdb), e.g., integrating something like this within services.py. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#108 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACkV1q8yDqSR78MlFwRqUqlGKQ3KtyLpks5uPiLGgaJpZM4LIsAt> .

from ray.

javierabosch2 commented on April 27, 2024

A tutorial on this would be great!

from ray.

Starting processes by hand. about ray HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent