Giter Site home page Giter Site logo

Starting processes by hand. about ray HOT 11 CLOSED

ray-project avatar ray-project commented on April 27, 2024 1
Starting processes by hand.

from ray.

Comments (11)

pcmoritz avatar pcmoritz commented on April 27, 2024 1

A cool variant of this I've just successfully used:

Edit services.py to output the command that was supposed to be run instead of actually running it, and put a

import IPython
IPython.embed()

after it. Then you can start Ray normally and when the interpreter hits the IPython.embed, start the program by hand (possibly in gdb). It works beautifully!

Here is the diff:

--- a/python/ray/local_scheduler/local_scheduler_services.py
+++ b/python/ray/local_scheduler/local_scheduler_services.py
@@ -117,6 +117,10 @@ def start_local_scheduler(plasma_store_name,
                                stdout=stdout_file, stderr=stderr_file)
         time.sleep(1.0)
     else:
-        pid = subprocess.Popen(command, stdout=stdout_file, stderr=stderr_file)
+        print(" ".join(command))
+        import IPython
+        IPython.embed()
+        pid = None
+        # pid = subprocess.Popen(command, stdout=stdout_file, stderr=stderr_file)
         time.sleep(0.1)
     return local_scheduler_name, pid
diff --git a/python/ray/services.py b/python/ray/services.py
index a37c16a..e6f3ede 100644
--- a/python/ray/services.py
+++ b/python/ray/services.py
@@ -555,7 +555,7 @@ def start_local_scheduler(redis_address,
         stderr_file=stderr_file,
         static_resource_list=[num_cpus, num_gpus],
         num_workers=num_workers)
-    if cleanup:
+    if cleanup and p:
         all_processes[PROCESS_TYPE_LOCAL_SCHEDULER].append(p)
     record_log_files_in_redis(redis_address, node_ip_address,
                               [stdout_file, stderr_file])

-- Philipp.

from ray.

danielsuo avatar danielsuo commented on April 27, 2024 1

I've been using tmux panes to debug. Wrote a script here that may be helpful to others. Originally, I had this in tmuxinator, but wanted to reduce the number of dependencies.

It's a little finicky since this doesn't really respect dependencies, but a little less painful than starting all processes by hand.

from ray.

robertnishihara avatar robertnishihara commented on April 27, 2024

Closing for now.

from ray.

robertnishihara avatar robertnishihara commented on April 27, 2024

Updated instructions

Start Redis

rm dump.rdb; ./src/common/thirdparty/redis/src/redis-server --loadmodule src/common/redis_module/ray_redis_module.so

Start the global scheduler (and pass in the Redis address)

./src/global_scheduler/build/global_scheduler -r 127.0.0.1:6379

Start the plasma store

src/plasma/build/plasma_store -s /tmp/s1 -m 1000000000

Start the plasma manager

src/plasma/build/plasma_manager -s /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -h 127.0.0.1 -p 23894

Start the local scheduler

src/photon/build/photon_scheduler -s /tmp/sched1 -p /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -a 127.0.0.1:23894 -h 127.0.0.1

Start a worker (or run this multiple times to start multiple workers).

python lib/python/ray/workers/default_worker.py --redis-address 127.0.0.1:6379 --object-store-name /tmp/s1 --object-store-manager-name /tmp/m1 --local-scheduler-name /tmp/sched1 --node-ip-address 127.0.0.1

Connect a driver (run this in a Python interpreter).

import ray

address_info = {
  "node_ip_address": "127.0.0.1",
  "redis_address": "127.0.0.1:6379",
  "store_socket_name": "/tmp/s1",
  "manager_socket_name": "/tmp/m1",
  "local_scheduler_socket_name": "/tmp/sched1"}

ray.connect(address_info, mode=ray.SCRIPT_MODE)

Some other useful things:

  • Start a redis client with redis-cli and run monitor to monitor all of the commands going through the redis server.

from ray.

robertnishihara avatar robertnishihara commented on April 27, 2024

Updating this again, since the instructions have changed.

Start Redis

rm dump.rdb; ./python/core/src/common/thirdparty/redis/src/redis-server --loadmodule python/core/src/common/redis_module/libray_redis_module.so

Start the global scheduler (and pass in the Redis address)

./python/core/src/global_scheduler/global_scheduler -r 127.0.0.1:6379

Start the plasma store

./python/core/src/plasma/plasma_store -s /tmp/s1 -m 1000000000

Start the plasma manager

./python/core/src/plasma/plasma_manager -s /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -h 127.0.0.1 -p 23894

Start the local scheduler

./python/core/src/photon/photon_scheduler -s /tmp/sched1 -p /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -a 127.0.0.1:23894 -h 127.0.0.1

Start a worker (or run this multiple times to start multiple workers).

python python/ray/workers/default_worker.py --redis-address 127.0.0.1:6379 --object-store-name /tmp/s1 --object-store-manager-name /tmp/m1 --local-scheduler-name /tmp/sched1 --node-ip-address 127.0.0.1

Connect a driver (run this in a Python interpreter).

import ray

address_info = {
  "node_ip_address": "127.0.0.1",
  "redis_address": "127.0.0.1:6379",
  "store_socket_name": "/tmp/s1",
  "manager_socket_name": "/tmp/m1",
  "local_scheduler_socket_name": "/tmp/sched1"}

ray.connect(address_info, mode=ray.SCRIPT_MODE)

Some other useful things:

  • Start a redis client with redis-cli and run monitor to monitor all of the commands going through the redis server.

from ray.

robertnishihara avatar robertnishihara commented on April 27, 2024

Updated instructions.

Start Redis

rm dump.rdb; ./python/ray/core/src/common/thirdparty/redis/src/redis-server \
    --loadmodule python/ray/core/src/common/redis_module/libray_redis_module.so

Start the global scheduler (and pass in the Redis address)

./python/ray/core/src/global_scheduler/global_scheduler \
    -r 127.0.0.1:6379 \
    -h 127.0.0.1

Start the plasma store

./python/ray/core/src/plasma/plasma_store \
    -s /tmp/s1 \
    -m 1000000000

Start the plasma manager

./python/ray/core/src/plasma/plasma_manager \
    -s /tmp/s1 \
    -m /tmp/m1 \
    -r 127.0.0.1:6379 \
    -h 127.0.0.1 \
    -p 23894

Start the local scheduler

# Without ability to start new workers.
./python/ray/core/src/local_scheduler/local_scheduler \
    -s /tmp/sched1 \
    -p /tmp/s1 \
    -m /tmp/m1 \
    -r 127.0.0.1:6379 \
    -a 127.0.0.1:23894 \
    -h 127.0.0.1

# With ability to start new workers.
./python/ray/core/src/local_scheduler/local_scheduler \
    -s /tmp/sched1 \
    -p /tmp/s1 \
    -m /tmp/m1 \
    -r 127.0.0.1:6379 \
    -a 127.0.0.1:23894 \
    -h 127.0.0.1 \
    -w "python python/ray/workers/default_worker.py --redis-address 127.0.0.1:6379 --object-store-name /tmp/s1 --object-store-manager-name /tmp/m1 --local-scheduler-name /tmp/sched1 --node-ip-address 127.0.0.1"

Start a worker (or run this multiple times to start multiple workers).

python python/ray/workers/default_worker.py \
    --redis-address 127.0.0.1:6379 \
    --object-store-name /tmp/s1 \
    --object-store-manager-name /tmp/m1 \
    --local-scheduler-name /tmp/sched1 \
    --node-ip-address 127.0.0.1

Connect a driver (run this in a Python interpreter).

import ray

address_info = {
  "node_ip_address": "127.0.0.1",
  "redis_address": "127.0.0.1:6379",
  "store_socket_name": "/tmp/s1",
  "manager_socket_name": "/tmp/m1",
  "local_scheduler_socket_name": "/tmp/sched1"}

ray.connect(address_info, mode=ray.SCRIPT_MODE)

Some other useful things:

  • Start a redis client with redis-cli and run monitor to monitor all of the commands going through the redis server.

from ray.

robertnishihara avatar robertnishihara commented on April 27, 2024

More recent instructions.

Note: Throughout, you will need to replace <head-node-ip> with something more correct.

The processes on the head node can be started as follows.

  1. Run the following in Python to start the Redis shards.

    import ray
    
    ray.services.start_redis("<head-node-ip>",
                             port=6379,
                             num_redis_shards=1,
                             redirect_output=False,
                             cleanup=False)
  2. Start a global scheduler.

    ./python/ray/core/src/global_scheduler/global_scheduler \
        -r <head-node-ip>:6379 \
        -h <head-node-ip>
    
  3. Start a plasma store

    ./python/ray/core/src/plasma/plasma_store \
        -s /tmp/s1 \
        -m 1000000000
    
  4. Start a plasma manager

    ./python/ray/core/src/plasma/plasma_manager \
        -s /tmp/s1 \
        -m /tmp/m1 \
        -r <head-node-ip>:6379 \
        -h <head-node-ip> \
        -p 23894
    
  5. Start the local scheduler

    # Without ability to start new workers.
    ./python/ray/core/src/local_scheduler/local_scheduler \
        -s /tmp/sched1 \
        -p /tmp/s1 \
        -m /tmp/m1 \
        -r <head-node-ip>:6379 \
        -a <head-node-ip>:23894 \
        -h <head-node-ip> \
        -c 16,0
    
    # With ability to start new workers.
    ./python/ray/core/src/local_scheduler/local_scheduler \
        -s /tmp/sched1 \
        -p /tmp/s1 \
        -m /tmp/m1 \
        -r <head-node-ip>:6379 \
        -a <head-node-ip>:23894 \
        -h <head-node-ip> \
        -c 16,0 \
        -w "python python/ray/workers/default_worker.py --redis-address <head-node-ip>:6379 --object-store-name /tmp/s1 --object-store-manager-name /tmp/m1 --local-scheduler-name /tmp/sched1 --node-ip-address <head-node-ip>"
    
  6. Start a worker (or run this multiple times to start multiple workers).

     python python/ray/workers/default_worker.py \
         --redis-address <head-node-ip>:6379 \
         --object-store-name /tmp/s1 \
         --object-store-manager-name /tmp/m1 \
         --local-scheduler-name /tmp/sched1 \
         --node-ip-address <head-node-ip>
    
  7. Connect a driver (run this in a Python interpreter).

    import ray
    ray.init(redis_address="<head-node-ip>:6379")

After that, it should be possible to start up Ray on other nodes as follows.*

ray start --redis-address=<head-node-ip>:6379

from ray.

robertnishihara avatar robertnishihara commented on April 27, 2024

If you wish to start the Redis serves by hand instead of calling ray.services.start_redis as in the previous comment, you can do the following (this starts one primary Redis server and one other Redis shard). Note the value <head-node-ip> will need to be replaced.

./python/ray/core/src/common/thirdparty/redis/src/redis-server \
    --loglevel warning \
    --loadmodule ./python/ray/core/src/common/redis_module/libray_redis_module.so \
    --port 6379
./python/ray/core/src/common/thirdparty/redis/src/redis-server \
    --loglevel warning \
    --loadmodule ./python/ray/core/src/common/redis_module/libray_redis_module.so \
    --port 6380
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 set NumRedisShards 1
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 rpush RedisShards <head-node-ip>:6380

./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 config set notify-keyspace-events Kl
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6380 config set notify-keyspace-events Kl

./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 config set protected-mode no
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6380 config set protected-mode no

./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6379 config set client-output-buffer-limit "normal 0 0 0 slave 268435456 67108864 60 pubsub 134217728 134217728 60"
./python/ray/core/src/common/thirdparty/redis/src/redis-cli -p 6380 config set client-output-buffer-limit "normal 0 0 0 slave 268435456 67108864 60 pubsub 134217728 134217728 60"

from ray.

robertnishihara avatar robertnishihara commented on April 27, 2024

@danielsuo that looks like a step in the right direction! I think the most useful thing would be to be able to start any of the Ray processes within tmux (and also within gdb), e.g., integrating something like this within services.py.

from ray.

danielsuo avatar danielsuo commented on April 27, 2024

from ray.

javierabosch2 avatar javierabosch2 commented on April 27, 2024

A tutorial on this would be great!

from ray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.