Giter Site home page Giter Site logo

ray-project / ray Goto Github PK

View Code? Open in Web Editor NEW
30.6K 471.0 5.2K 382.38 MB

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Home Page: https://ray.io

License: Apache License 2.0

Shell 0.50% C++ 18.90% Python 73.45% C 0.10% HTML 0.01% CSS 0.01% Java 3.13% Dockerfile 0.09% JavaScript 0.01% TypeScript 1.70% Starlark 1.10% Cython 0.96% PowerShell 0.01% Jinja 0.03% Roff 0.01%
ray distributed parallel machine-learning reinforcement-learning deep-learning python rllib hyperparameter-search optimization

ray's Introduction

image

image

image

image

image

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI libraries for simplifying ML compute:

image

Learn more about Ray AI Libraries:

  • Data: Scalable Datasets for ML
  • Train: Distributed Training
  • Tune: Scalable Hyperparameter Tuning
  • RLlib: Scalable Reinforcement Learning
  • Serve: Scalable and Programmable Serving

Or more about Ray Core and its key abstractions:

  • Tasks: Stateless functions executed in the cluster.
  • Actors: Stateful worker processes created in the cluster.
  • Objects: Immutable values accessible across the cluster.

Monitor and debug Ray applications and clusters using the Ray dashboard.

Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations.

Install Ray with: pip install ray. For nightly wheels, see the Installation page.

Why Ray?

Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.

Ray is a unified way to scale Python and AI applications from a laptop to a cluster.

With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.

More Information

Older documents:

Getting Involved

Platform Purpose Estimated Response Time Support Level
Discourse Forum For discussions about development and questions about usage. < 1 day Community
GitHub Issues For reporting bugs and filing feature requests. < 2 days Ray OSS Team
Slack For collaborating with other Ray users. < 2 days Community
StackOverflow For asking questions about how to use Ray. 3-5 days Community
Meetup Group For learning about Ray projects and best practices. Monthly Ray DevRel
Twitter For staying up-to-date on new features. Daily Ray DevRel

ray's People

Contributors

amogkam avatar architkulkarni avatar arturniederfahrenhorst avatar bveeramani avatar can-anyscale avatar clarkzinzow avatar dmitrigekhtman avatar edoakes avatar ericl avatar fishbone avatar ijrsvt avatar jjyao avatar jovany-wang avatar justinvyu avatar krfricke avatar mehrdadn avatar pcmoritz avatar raulchen avatar richardliaw avatar rickyyx avatar rkooo567 avatar robertnishihara avatar scv119 avatar shrekris-anyscale avatar simon-mo avatar stephanie-wang avatar suquark avatar sven1977 avatar xwjiang2010 avatar yard1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ray's Issues

Crash in simple example.

The following currently crashes.

import ray

ray.init(start_ray_local=True, num_workers=1)

@ray.remote
def f(n):
  return n

f.remote(f.remote(3))

The last line crashes the plasma manager on my machine with the following error.

Out[4]: [FATAL] (plasma_manager.c:1590: errno: Operation now in progress) Check failure: plasma_receive_request(client_sock, &type, &req) >= 0 

0   plasma_manager                      0x000000010d3ec1ac process_message + 252
1   plasma_manager                      0x000000010d41be7b aeProcessEvents + 539
2   plasma_manager                      0x000000010d41c52e aeMain + 94
3   plasma_manager                      0x000000010d40ce35 event_loop_run + 21
4   plasma_manager                      0x000000010d3f6166 start_server + 1078
5   plasma_manager                      0x000000010d3f672a main + 1370
6   libdyld.dylib                       0x00007fffcfe5a255 start + 1

Segfault when calling ray.put on a large numpy array.

The following fails (on my laptop) with a bus error.

import ray
import numpy as np

ray.init()
x = np.zeros(10 ** 9)

ray.put(x)

This may be related to Arrow's lack of support for large arrays. However, even if we don't support it right away we need to throw a reasonable exception.

How to update version of Ray on a cluster?

Someone I chatted with wants to do the following: update an existing Ray cluster with a bunch of nodes to use a newer version of Ray. Right now if the cluster is large the best way to do it seems to be to create an AMI with the new version and restart all the instances. Is there a better way (one possibility: provide an update-ray.sh for pssh)?

Ray.wait fails with duplicate object IDs.

The following fails.

import ray
import time

ray.init()

@ray.remote
def f():
  time.sleep(1)

x = f.remote()
ray.wait([x, x], num_returns=2)

It fails with the message.

[FATAL] (plasma_manager.c:396: errno: Operation now in progress) Check failure: wait_req->object_requests[j].status == ((ObjectStatus_enum_t)3L) 

/home/ubuntu/ray/lib/python/plasma/plasma_manager(update_object_wait_requests+0x8bb)[0x40b526]
/home/ubuntu/ray/lib/python/plasma/plasma_manager(process_add_object_notification+0x138f)[0x416092]
/home/ubuntu/ray/lib/python/plasma/plasma_manager(process_object_notification+0xf5)[0x4161a6]
/home/ubuntu/ray/lib/python/plasma/plasma_manager(aeProcessEvents+0x263)[0x440aa3]
/home/ubuntu/ray/lib/python/plasma/plasma_manager(aeMain+0x48)[0x440c64]
/home/ubuntu/ray/lib/python/plasma/plasma_manager(event_loop_run+0x18)[0x431a05]
/home/ubuntu/ray/lib/python/plasma/plasma_manager(start_server+0x422)[0x416d32]
/home/ubuntu/ray/lib/python/plasma/plasma_manager(main+0x48f)[0x41721b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6ce6da8830]
/home/ubuntu/ray/lib/python/plasma/plasma_manager(_start+0x29)[0x4086f9]
[FATAL] (/home/ubuntu/ray/src/plasma/plasma_protocol.c:89: errno: None) Check failure: type == message_type 
type = 0, message_type = 23
/home/ubuntu/ray/lib/python/plasma/libplasma.so(plasma_receive+0xd7)[0x7fac9ec51bf9]
/home/ubuntu/ray/lib/python/plasma/libplasma.so(plasma_wait+0x546)[0x7fac9ec5fc4d]
/home/ubuntu/ray/lib/python/plasma/libplasma.so(PyPlasma_wait+0x30f)[0x7fac9ec4abdc]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyCFunction_Call+0xf9)[0x7facb530d5e9]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x8fb5)[0x7facb5394bd5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x7facb5394df5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x7facb5394df5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalCodeEx+0x48)[0x7facb5395cd8]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalCode+0x3b)[0x7facb5395d1b]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x137dfe)[0x7facb5388dfe]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyCFunction_Call+0xf9)[0x7facb530d5e9]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x8fb5)[0x7facb5394bd5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x7facb5394df5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x7facb5394df5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x7facb5394df5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x7facb5394df5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x7facb5394df5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x9546)[0x7facb5395166]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalCodeEx+0x48)[0x7facb5395cd8]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x9a661)[0x7facb52eb661]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyObject_Call+0x56)[0x7facb52b8236]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x6614)[0x7facb5392234]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x7facb5394df5]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x7facb5395b49]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalCodeEx+0x48)[0x7facb5395cd8]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalCode+0x3b)[0x7facb5395d1b]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyRun_FileExFlags+0x130)[0x7facb53bb020]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(PyRun_SimpleFileExFlags+0x173)[0x7facb53bc623]
/home/ubuntu/anaconda3/bin/../lib/libpython3.5m.so.1.0(Py_Main+0xca7)[0x7facb53d78c7]
/home/ubuntu/anaconda3/bin/python(main+0x15d)[0x400add]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7facb4373830]
/home/ubuntu/anaconda3/bin/python[0x4008b9]
[INFO] (photon_scheduler.c:283) Disconnecting client on fd 11
Aborted (core dumped)

TensorFlow example in documentation doesn't work.

Copying and pasting the example in the section Complete Example in the documentation https://github.com/ray-project/ray/blob/master/doc/using-ray-with-tensorflow.md fails with the error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-9c9d314791eb> in <module>()
     85   # Add up all the different weights. Each element of new_weights_list is a list
     86   # of weights, and we want to add up these lists component wise.
---> 87   weights = [sum(weight_tuple) / NUM_BATCHES for weight_tuple in zip(*new_weights_list)]
     88   # Print the current weights. They should converge to roughly to the values 0.1
     89   # and 0.3 used in generate_fake_x_y_data.

<ipython-input-1-9c9d314791eb> in <listcomp>(.0)
     85   # Add up all the different weights. Each element of new_weights_list is a list
     86   # of weights, and we want to add up these lists component wise.
---> 87   weights = [sum(weight_tuple) / NUM_BATCHES for weight_tuple in zip(*new_weights_list)]
     88   # Print the current weights. They should converge to roughly to the values 0.1
     89   # and 0.3 used in generate_fake_x_y_data.

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Also, this warning is printed a bunch.

WARNING:tensorflow:From <ipython-input-1-9c9d314791eb>:26 in net_vars_initializer.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.

Should we provide functionality for running a function on all workers (or all nodes)?

Should we expose the ability to run a given function on all workers? This can be useful for a bunch of reasons (for example, properly setting the Python path on each worker).

Should we also expose the ability to run a given function once on each node? This can be useful for setting things up (e.g., installing some stuff, downloading some files, copying application files).

Of course, we shouldn't add any functionality without a very good reason...

Calling ray.register_class after defining a remote function fails and (sometimes) gives no error message.

An example like the following sometimes works and sometimes fails (I haven't been able to get this particular example to fail, but a similar one did on a different machine).

import ray
ray.init()

class Foo(object):
  def __init__(self):
    pass

@ray.remote
def f():
  return Foo()

@ray.remote
def g(x):
  return 1

ray.register_class(Foo)

ray.get([f.remote() for _ in range(10)] + [g.remote(Foo()) for _ in range(10)])

If the worker tries to execute the function f or g before the register_class call has operated on the worker, then it will throw an exception.

Normally, it should fail with the following error message.

Remote function __main__.f failed with:

Traceback (most recent call last):
  File "/Users/rkn/Workspace/ray/python/ray/worker.py", line 1634, in process_task
    store_outputs_in_objstore(return_object_ids, outputs, worker)
  File "/Users/rkn/Workspace/ray/python/ray/worker.py", line 1973, in store_outputs_in_objstore
    worker.put_object(objectids[i], outputs[i])
  File "/Users/rkn/Workspace/ray/python/ray/worker.py", line 474, in put_object
    numbuf.store_list(objectid.id(), self.plasma_client.conn, [value])
  File "/Users/rkn/Workspace/ray/python/ray/serialization.py", line 97, in serialize
    raise Exception("Ray does not know how to serialize objects of type {}. To fix this, call 'ray.register_class' with this class.".format(type(obj)))
Exception: Ray does not know how to serialize objects of type <class '__main__.Foo'>. To fix this, call 'ray.register_class' with this class.

However, I have seen it fail with just None (that's what showed up in ray.error_info().

Inspecting the event_log a little more closely (for a different script that gave rise to this problem) gave the error

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python2.7/site-packages/ray-0.0.1-py2.7.egg/ray/worker.py", line 1542, in process_task
    arguments = get_arguments_for_execution(worker.functions[function_id.id()], args, worker)
  File "/home/ubuntu/.local/lib/python2.7/site-packages/ray-0.0.1-py2.7.egg/ray/worker.py", line 1852, in get_arguments_for_execution
    argument = worker.get_object([arg])[0]
  File "/home/ubuntu/.local/lib/python2.7/site-packages/ray-0.0.1-py2.7.egg/ray/worker.py", line 455, in get_object
    0)
  File "/home/ubuntu/.local/lib/python2.7/site-packages/ray-0.0.1-py2.7.egg/ray/serialization.py", line 127, in deserialize
    cls = whitelisted_classes[class_id]
KeyError: 'autocore.yaml_config.YamlConfig'

The failure in this case appears to be when an argument is deserialized. We should do several things.

  1. Make sure we propagate error messages from this case.
  2. Make sure we provide better utilities for inspecting tasks because there might be errors in the event log that are not propagated.
  3. Add tests for this problem.

Starting processes by hand.

To debug crashes, it is often useful to start each of the processes (the plasma store, the plasma manager, the local scheduler, the global scheduler, redis, etc) by hand. This way, some of the processes can be started in gdb, and logging messages from the different processes are easier to read.

Currently, this can be done as follows (run all of these commands in separate terminal windows).

  1. Start Redis
rm dump.rdb; ./src/common/thirdparty/redis/src/redis-server --loadmodule src/common/redis_module/ray_redis_module.so
  1. Start the global scheduler (and pass in the Redis address)
./src/global_scheduler/build/global_scheduler -r 127.0.0.1:6379
  1. Start the plasma store
src/plasma/build/plasma_store -s /tmp/s1 -m 1000000000
  1. Start the plasma manager
src/plasma/build/plasma_manager -s /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -h 127.0.0.1 -p 23894
  1. Start the local scheduler
src/photon/build/photon_scheduler -s /tmp/sched1 -p /tmp/s1 -m /tmp/m1 -r 127.0.0.1:6379 -a 127.0.0.1:23894 -h 127.0.0.1
  1. Modify start_ray_local in lib/python/ray/services.py to be something like this.
def start_ray_local(node_ip_address="127.0.0.1", num_workers=0, num_local_schedulers=1, worker_path=None):
  if worker_path is None:
    worker_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "workers/default_worker.py")
  redis_port = 6379
  redis_address = address(node_ip_address, redis_port)
  object_store_names = ["/tmp/s1"]
  object_store_manager_names = ["/tmp/m1"]
  local_scheduler_names = ["/tmp/sched1"]
  address_info = {"node_ip_address": node_ip_address,
                  "redis_port": redis_port,
                  "object_store_names": object_store_names,
                  "object_store_manager_names": object_store_manager_names,
                  "local_scheduler_names": local_scheduler_names}
  # Start the workers.
  for i in range(num_workers):
    start_worker(address_info["node_ip_address"],
                 address_info["object_store_names"][i % num_local_schedulers],
                 address_info["object_store_manager_names"][i % num_local_schedulers],
                 address_info["local_scheduler_names"][i % num_local_schedulers],
                 redis_port,
                 worker_path,
                 cleanup=True)
  return address_info
  1. Then start Python, and run
import ray
ray.init(start_ray_local=True, num_workers=2)

Then try running some commands and see which processes crash! Processes can be started in gdb as well (or lldb on Mac).

Error defining remote functions using TensorFlow in cluster setting.

On a 10-node cluster with Ubuntu, and

import tensorflow as tf
tf.__version__  # '0.12.1'

Without Ray, I can do

import tensorflow as tf
import cloudpickle

def f():
  x = tf.Variable(1)
  return 1

print(cloudpickle.dumps(f))

Then I can copy the output of cloudpickle.dumps(f) and load it on a different machine in the cluster.

import cloudpickle

s = # string output by cloudpickle.dumps(f)
f = cloudpickle.loads(s)
f() # 1

However, If I define the same thing as a remote function,

import ray

@ray.remote
def f():
  x = tf.Variable(1)
  return 1

I get the following error.

Traceback (most recent call last):
  File "/home/ubuntu/ray/python/ray/worker.py", line 1022, in fetch_and_register
    function = pickling.loads(serialized_function)
  File "/opt/conda/lib/python2.7/pickle.py", line 1388, in loads
    return Unpickler(file).load()
  File "/opt/conda/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/opt/conda/lib/python2.7/pickle.py", line 1139, in load_reduce
    value = func(*args)
  File "build/bdist.linux-x86_64/egg/cloudpickle/cloudpickle.py", line 717, in s
    __import__(name)
  File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/__init__.py",
    from tensorflow.python import *
  File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/__init
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/__init
    from tensorflow.python import pywrap_tensorflow
ImportError: cannot import name pywrap_tensorflow


Error importing tensorflow.  Unless you are using bazel,
you should not try to import tensorflow from its source directory;
please exit the tensorflow source tree, and relaunch your python interpreter
from there.

Segfault when running tests, local scheduler fails to give task to worker with strange file descriptor value.

I'm currently working on a branch #245, so I've probably introduced a bug somewhere, but the failure looks similar to failures I've seen in the tests before. However, now

python ../../test/stress_tests.py TaskTests.testGettingManyObjects

fails consistently on my branch (but not on the master).

The first thing to die is the local scheduler, with message

B?e?ƭ????ewi??z?u?55??A?YjݳT\?M?h?%9L[?sk??P1?ߺ?qV?n_??r????????????ƕ4??A?YjݳT\?M?h?%9L[[WARN] (/Users/rkn/Workspace/ray/src/photon/photon_scheduler.c:134) Failed to give task to worker on fd -1940912352. The client may have hung up.
[WARN] (/Users/rkn/Workspace/ray/src/photon/photon_scheduler.c:134) Failed to give task to worker on fd -1941952336. The client may have hung up.
B?e?ƭ????ewi??z?u?5?er??Sv
1???|?^???sk??P1?ߺ?qV?n_??r????????????ƕ?er??Sv
1???|?^??B?e?ƭ????ewi??z?u?5?k?q?a@|Uo  ?????9jF?sk??P1?ߺ?qV?n_??r????????????ƕ?k?q?a@|Uo	?????9jF[WARN] (/Users/rkn/Workspace/ray/src/photon/photon_scheduler.c:134) Failed to give task to worker on fd 79083733. The client may have hung up.
B?e?ƭ????ewi??z?u?5?`?[?σ??G*?Cn?	M?sk??P1?ߺ?qV?n_?	?r????????????ƕ?`?[?σ??G*?Cn?	Mphoton_scheduler(21871,0x7fffc08ce3c0) malloc: *** error for object 0x7fef8c6001c8: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6 (core dumped)

I've seen these strange file descriptor values before, which is why I bring this up.

Opening the core with lldb -c core.21871, and printing the backtrace gives

(lldb) bt
* thread #1: tid = 0x0000, 0x00007fffb7c0cdda libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x00007fffb7c0cdda libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fffb7cf8787 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fffb7b72420 libsystem_c.dylib`abort + 129
    frame #3: 0x00007fffb7c6cfb1 libsystem_malloc.dylib`szone_error + 626
    frame #4: 0x00007fffb7c61c9e libsystem_malloc.dylib`tiny_malloc_from_free_list + 1148
    frame #5: 0x00007fffb7c60522 libsystem_malloc.dylib`szone_malloc_should_clear + 400
    frame #6: 0x00007fffb7c63d57 libsystem_malloc.dylib`szone_realloc + 2858
    frame #7: 0x00007fffb7c631d5 libsystem_malloc.dylib`malloc_zone_realloc + 115
    frame #8: 0x00007fffb7c630c4 libsystem_malloc.dylib`realloc + 256
    frame #9: 0x0000000104b5df50 photon_scheduler`sdscatlen [inlined] sdsMakeRoomFor(s=<unavailable>, addlen=20) + 35 at sds.c:142 [opt]
    frame #10: 0x0000000104b5df2d photon_scheduler`sdscatlen(s=<unavailable>, t=<unavailable>, len=20) + 45 at sds.c:241 [opt]
    frame #11: 0x0000000104b5b9f7 photon_scheduler`redisvFormatCommand(target=<unavailable>, format=<unavailable>, ap=<unavailable>) + 871 at hiredis.c:264 [opt]
    frame #12: 0x0000000104b61941 photon_scheduler`redisAsyncCommand [inlined] redisvAsyncCommand(format=<unavailable>, ap=<unavailable>) + 8 at async.c:654 [opt]
    frame #13: 0x0000000104b61939 photon_scheduler`redisAsyncCommand(ac=0x00007fef8c600000, fn=(photon_scheduler`redis_task_table_add_task_callback at redis.c:768), privdata=0x000000000000018e, format=<unavailable>) + 153 at async.c:669 [opt]
    frame #14: 0x0000000104b339d7 photon_scheduler`redis_task_table_add_task(callback_data=0x00007fef8da012a0) + 535 at redis.c:792
    frame #15: 0x0000000104b36a11 photon_scheduler`init_table_callback(db_handle=0x00007fef8c4028b0, id=(id = "?xİ?`?fz?:???\x93???), label="task_table_add_task", data=0x00007fef8da011e0, retry=0x0000000104b6e948, done_callback=0x0000000000000000, retry_callback=(photon_scheduler`redis_task_table_add_task at redis.c:783), user_context=0x0000000000000000) + 1201 at table.c:52
    frame #16: 0x0000000104b3923e photon_scheduler`task_table_add_task(db_handle=0x00007fef8c4028b0, task=0x00007fef8da011e0, retry=0x0000000104b6e948, done_callback=0x0000000000000000, user_context=0x0000000000000000) + 174 at task_table.c:20
    frame #17: 0x0000000104b233fe photon_scheduler`give_task_to_global_scheduler(state=0x00007fef8c402810, algorithm_state=0x00007fef8c500390, spec=0x00007fef8d900000) + 654 at photon_algorithm.c:419
    frame #18: 0x0000000104b23499 photon_scheduler`handle_task_submitted(state=0x00007fef8c402810, algorithm_state=0x00007fef8c500390, spec=0x00007fef8d900000) + 121 at photon_algorithm.c:438
    frame #19: 0x0000000104b1dd70 photon_scheduler`process_message(loop=0x00007fef8c4027b0, client_sock=21, context=0x00007fef8c5010a0, events=1) + 288 at photon_scheduler.c:263
    frame #20: 0x0000000104b3a4bb photon_scheduler`aeProcessEvents(eventLoop=0x00007fef8c4027b0, flags=3) + 539 at ae.c:412
    frame #21: 0x0000000104b3ab6e photon_scheduler`aeMain(eventLoop=0x00007fef8c4027b0) + 94 at ae.c:455
    frame #22: 0x0000000104b26d15 photon_scheduler`event_loop_run(loop=0x00007fef8c4027b0) + 21 at event_loop.c:56
    frame #23: 0x0000000104b1eb10 photon_scheduler`start_server(node_ip_address="127.0.0.1", socket_name="/tmp/sched1", redis_addr="127.0.0.1", redis_port=6379, plasma_store_socket_name="/tmp/s1", plasma_manager_socket_name="/tmp/m1", plasma_manager_address="127.0.0.1:23894", global_scheduler_exists=true, start_worker_command=0x0000000000000000) + 608 at photon_scheduler.c:429
    frame #24: 0x0000000104b1f3f2 photon_scheduler`main(argc=13, argv=0x00007fff5b0e39a0) + 2226 at photon_scheduler.c:520
    frame #25: 0x00007fffb7ade255 libdyld.dylib`start + 1

So the error is happening somewhere in hiredis.

I don't fully understand the problem here, and I'm unsure whether it was introduced in #245 or exposed by it.

Recursive reconstruction for dependencies on `ray.put`.

Similar to the deadlock described in #231, we can get a deadlock when reconstructing an object that was created by ray.put.

# Define a task with a single dependency, which returns its one argument.
@ray.remote
def single_dependency(arg):
  return arg

@ray.remote
def root_task(size):
  # Do the initial put.
  array = np.zeros(size)
  arg = ray.put(array)

  # Launch num_objects instances of the remote task, each dependent on the
  # one before it.
  args = []
  for i in range(num_objects):
    arg = single_dependency.remote(arg)
    args.append(arg)

  # Get each value to force each task to finish. After some number of gets,
  # old values should be evicted.
  for i in range(num_objects):
    value = ray.get(args[i])
  # Get each value again to force reconstruction.
  for i in range(num_objects):
    value = ray.get(args[i])

ray.get(root_task.remote(size))

We hang when trying to reconstruct because each time we lose the initial argument and try to reconstruct it, we will submit another instance of root_task, and eventually we will run out of workers. If we reconstruct naively, this becomes a workload with infinite recursion depth, which means that this problem cannot be solved by starting workers alone.

A possible solution is to spill over certain values (e.g., the initial ray.put) to disk.

Remote functions and actors currently cannot close over actor definitions.

The following currently fails.

import ray

ray.init()

@ray.actor
class Actor(object):
  pass

@ray.remote
def f():
  Actor()

Similarly, this example fails.

import ray

ray.init()

@ray.actor
class Actor1(object):
  pass

@ray.actor
class Actor2(object):
  def __init__(self):
    Actor1()

a = Actor2()

These examples fail with the following error.

PicklingError: Could not pickle object as excessively deep recursion required.

This can presumably be fixed in the same way that we handle remote functions closing over remote functions.

Deadlock can be caused by remote functions calling other remote functions and getting the results.

The following causes the system to hang.

import ray
ray.init(num_workers=1)

@ray.remote
def f():
  return

@ray.remote
def g():
  ray.get(f.remote())

ray.get(g.remote())

Since there is only one worker, the worker will start executing g. Then when g calls f, f can't be scheduled anywhere because there is only one worker and it is being used to execute g. But that worker won't finish executing g until f has finished executing. Hence deadlock.

There are two natural solutions (that I can think of at the moment).

  1. Detect the situation (probably in the local scheduler) and start more workers so that f can be scheduled.
  2. Make remote functions re-entrant in the sense that when g calls ray.get, it can get a new task from the local scheduler, execute it, and then resume executing g.

Warning message when running tests in Python 3.

When running the tests (e.g., python test/runtest.py) using Python 3, the following warning message appears.

/home/ubuntu/anaconda3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py:141: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/ubuntu/anaconda3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py' mode='r' encoding='utf-8'>
  _find_module(mod_name)

A minimal example to reproduce the warning is the following.

import unittest
import ray

class Test(unittest.TestCase):
  def test(self):
    ray.init(start_ray_local=True)

if __name__ == "__main__":
  unittest.main()

Running this in Python 3 produces the following output.

[INFO] (plasma_store.c:647) Allowing the Plasma store to use up to 825.24GB of memory.
/home/ubuntu/anaconda3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py:141: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/ubuntu/anaconda3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py' mode='r' encoding='utf-8'>
  _find_module(mod_name)
/home/ubuntu/anaconda3/lib/python3.5/site-packages/cloudpickle/cloudpickle.py:141: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/ubuntu/ray/lib/python/ray/serialization.py' mode='r' encoding='utf-8'>
  _find_module(mod_name)
.
----------------------------------------------------------------------
Ran 1 test in 0.850s

OK
[INFO] (plasma_manager.c:1362) Disconnecting client on fd 12
[INFO] (plasma_manager.c:1362) Disconnecting client on fd 11
Successfully shut down Ray.

Note that running

import ray
ray.init(start_ray_local=True)

does not produce the warning message.

ray.put starts to lag after pushing too many weights

Original Issue:
In implementing RL algorithm A3C, workers will need to ship gradients back to the driver while the driver needs to ship updated policies back to the worker. After multiple iterations, the object store is overfilled and the eviction policy seems to slow down the entire algorithm.

The RL policy is represented by a convolutional neural network, with matrices of size:
[(4, 4, 16, 32), (256, 3), (1,), (8, 8, 4, 16), (3872, 256), (256, 1), (256,), (32,), (3,), (16,)]
Equivalently in bytes: [32912, 3184, 100, 16528, 3965040, 1136, 1120, 224, 108, 160]

Code to reproduce:

import numpy as np
import ray
import time

shapes = [(4, 4, 16, 32), (256, 3), (1,), (8, 8, 4, 16), (3872, 256), (256, 1), (256,), (32,), (3,), (16,)]
pseudo_weights = [np.random.rand(*s) for s in shapes]

ray.init(num_workers=1)

@ray.remote
def try_this(param):
	return param

log = []
remaining = []
test_weights = pseudo_weights
for i in range(500):
	start = time.time()
	param = ray.put(test_weights)
	log.append(time.time() - start)
	remaining.append(try_this.remote(param))
	ready, remaining = ray.wait(remaining)

	test_weights = ray.get(ready)[0]
	if i % 50 == 0:
		print log[-10:]

Log visualized here:
tmp
begins to lag significantly after a few hundred iterations.

Bug when multiple wait requests involve the same object ID.

The following hangs (one of the calls to ray.wait inside of g hangs).

import ray
import time

ray.init(num_workers=3)

@ray.remote
def g(l):
  ray.wait([l[0]])

@ray.remote
def f(x):
  time.sleep(x)

x = f.remote(5)

ray.get([g.remote([x]), g.remote([x])])

The issue is that in plasma_manager.c, when we process a new object ID and call update_object_wait_requests, we iterate over a list (the utarray object_wait_reqs->wait_requests) of wait requests involving that object ID. For some of the wait requests, we return to the client by calling return_from_wait, which removes an element from object_wait_reqs->wait_requests. Since we delete an element from the array while iterating over it, we end up not iterating over the full list and not returning for all of the wait requests.

SYN cookies cause errors

I am running an EC2 cluster using 18 nodes of type c4.8xlarge in the us-west-2c region.

After installing Ray and starting the cluster I find that a simple test script that launches 1000 tasks hangs. I see this error on the head node:

[FATAL] (/home/ubuntu/ray/src/photon/photon_client.c:53: errno: Resource temporarily unavailable) Check failure: type == EXECUTE_TASK

/home/ubuntu/.local/lib/python2.7/site-packages/ray-0.0.1-py2.7.egg/core/src/photon/libphoton.so(photon_get_task+0xdd)[0x7fabb09e50db]
/home/ubuntu/.local/lib/python2.7/site-packages/ray-0.0.1-py2.7.egg/core/src/photon/libphoton.so(+0x488c)[0x7fabb09e188c]
/opt/conda/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x82dc)[0x7fabbc156f3c]
/opt/conda/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7fabbc1581ce]
/opt/conda/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8596)[0x7fabbc1571f6]
/opt/conda/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7fabbc1581ce]
/opt/conda/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7fabbc1582e2]
/opt/conda/bin/../lib/libpython2.7.so.1.0(PyRun_FileExFlags+0xb0)[0x7fabbc178960]
/opt/conda/bin/../lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xef)[0x7fabbc178b3f]
/opt/conda/bin/../lib/libpython2.7.so.1.0(Py_Main+0xca4)[0x7fabbc18e484]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fabbb384830]
python[0x400649]

Additionally, I see the following in /var/log/syslog:

Feb  9 05:27:29 ip-172-31-3-24 kernel: [11293.538325] TCP: request_sock_TCP: Possible SYN flooding on port 63148. Dropping request.  Check SNMP counters.

After some experimentation I found that turning off SYN cookies on the head node fixes the problem:

sudo sysctl -w net.ipv4.tcp_syncookies=0
sudo sysctl -p

We need to make sure that users do not encounter this issue. Perhaps some investigation will show how Ray can work with SYN cookies enabled. Alternatively we can update installation instructions to disable this kernel feature.

Test Script

import ray

ray.init(redis_address="172.31.3.24:6379")

@ray.remote
def f():
    return ray.services.get_node_ip_address()

queries = [f.remote() for _ in range(1000)]
(ready, _) = ray.wait(queries, timeout=3000, num_returns=len(queries))
nodes = set(ray.get(ready))
print len(nodes), nodes

Calling ray.put and other methods before ray.init gives a bad error message.

Doing

import ray

ray.put(1)

Fails with

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-261129042380> in <module>()
----> 1 ray.put(1)

/Users/rkn/Workspace/ray/lib/python/ray/worker.py in put(value, worker)
   1274     The object ID assigned to this value.
   1275   """
-> 1276   with log_span("ray:put", worker=worker):
   1277     check_main_thread()
   1278     check_connected(worker)

/Users/rkn/Workspace/ray/lib/python/ray/worker.py in __enter__(self)
   1179         contents=self.contents,
   1180         kind=LOG_SPAN_START,
-> 1181         worker=self.worker)
   1182 
   1183   def __exit__(self, type, value, tb):

/Users/rkn/Workspace/ray/lib/python/ray/worker.py in log(event_type, kind, contents, worker)
   1220   # Make sure all of the keys and values in the dictionary are strings.
   1221   contents = {str(k): str(v) for k, v in contents.items()}
-> 1222   worker.events.append((time.time(), event_type, kind, contents))
   1223 
   1224 def flush_log(worker=global_worker):

AttributeError: 'Worker' object has no attribute 'events'

This is simple to fix. We just need to move the check_connected call before the logging call in all the worker methods.

Expose get timeout to python

The timeout of get should be exposed to python so we can get control back to the interpreter if the object takes too long to compute or fetch.

Trouble installing in virtual env.

Someone attempting to install Ray with python setup.py install (within a virtualenv on Mac OS X) got the error message below.

Note:

  1. I tried it myself in a virtualenv, and that worked fine on my machine, so the virtualenv may not be the problem.
  2. As a workaround, doing ./build.sh worked fine, and then doing python setup.py develop worked.
TEST FAILED: /Users/user/.local/lib/python3.5/site-packages/ does NOT support .pth files
error: bad install directory or PYTHONPATH

You are attempting to install a package to a directory that is not
on PYTHONPATH and which Python does not read ".pth" files from.  The
installation directory you specified (via --install-dir, --prefix, or

the distutils default setting) was:

    /Users/user/.local/lib/python3.5/site-packages/

and your PYTHONPATH environment variable currently contains:

    '/usr/local/opt/opencv3/lib/python3.5/site-packages/'

Here are some of your options for correcting the problem:

* You can choose a different installation directory, i.e., one that is
  on PYTHONPATH or supports .pth files

* You can add the installation directory to the PYTHONPATH environment
  variable.  (It must then also be on PYTHONPATH whenever you run
  Python and want to use the package(s) you are installing.)

* You can set up the installation directory to support ".pth" files by
  using one of the approaches described here:

  https://setuptools.readthedocs.io/en/latest/easy_install.html#custom-installation-locations

Error message in TensorFlow tests.

The following message is printed at the end of the tensorflow tests. It also comes up in other places (as in the LBFGS example). In the past, I have seen this come up when using tensorflow in situations where two Python modules mutually import each other.

Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x111829358>>
Traceback (most recent call last):
  File "/Users/rkn/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 581, in __del__
UnboundLocalError: local variable 'status' referenced before assignment
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x111911048>>
Traceback (most recent call last):
  File "/Users/rkn/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 581, in __del__
UnboundLocalError: local variable 'status' referenced before assignment
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x111867358>>
Traceback (most recent call last):
  File "/Users/rkn/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 581, in __del__
UnboundLocalError: local variable 'status' referenced before assignment
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x111852278>>
Traceback (most recent call last):
  File "/Users/rkn/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 581, in __del__
UnboundLocalError: local variable 'status' referenced before assignment
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x111881c50>>
Traceback (most recent call last):
  File "/Users/rkn/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 581, in __del__
UnboundLocalError: local variable 'status' referenced before assignment
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x111956e48>>
Traceback (most recent call last):
  File "/Users/rkn/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 581, in __del__
UnboundLocalError: local variable 'status' referenced before assignment

Failure of an actor constructor does not prevent subsequent methods from working.

If an actor constructor throws an exception, subsequent methods can still be called without problems. This deviates from standard Python behavior. For example, the following example runs to completion with no exceptions (an exception will be printed in the background for the __init__ task that failed, but it won't kill the script).

import ray

ray.init()

@ray.actor
class Foo(object):
  def __init__(self):
    raise Exception("The constructor failed!")
  def get_val(self):
    return 1

f = Foo()
ray.get(f.get_val())

Serialization of numpy.int64 is not handled properly.

This bug can be reproduced by doing the following.

import ray
import numpy as np

ray.init(start_ray_local=True, num_workers=1)

@ray.remote
def f(x):
  print([x])
  return x

ray.get(f.remote(np.int64(3)))  # This prints ['\x03\x00\x00\x00\x00\x00\x00\x00'] and returns '\x03\x00\x00\x00\x00\x00\x00\x00'.

Note that the following code works.

ray.get(ray.put(np.int64(3)))  # prints 3

The same problem occurs with other numpy data types.

Plasma store dying if worker dies in the middle of a get.

The following will kill the plasma store.

import ray
ray.init(num_workers=1)

@ray.remote
def f():
  print("This will be printed")
  ray.worker.global_worker.plasma_client.get(20 * b"a")
  print("This will not be printed")
  return None

x = f.remote()

Then kill the worker (e.g., do ps aux | grep default_worker.py to find the worker process ID and then do kill followed by the process ID).

# Creating and sealing the object will cause the plasma store to try to send a message to the worker that has died. This will cause the store to die.
ray.worker.global_worker.plasma_client.create(20 * b"a", 100)
ray.worker.global_worker.plasma_client.seal(20 * b"a")

This should take down the plasma store, which you can check by doing ps aux | grep plasma_store. Also, calling ray.put(1) should fail.

The issue is that when the driver creates and seals the object, the plasma store will try to send a message to the dead worker and will die.

Get rid of copying in build.sh.

Right now build.sh copies a number of Python files. For example, it copies src/plasma/lib/python/plasma.py to lib/python/plasma/lib/python/plasma.py.

There are a couple of reasons for this.

  1. We want Plasma to be standalone. That is, if you were to remove everything other than src/plasma (and src/common), then that should be enough to use Plasma.
  2. When we run setup.py from lib/python, we need to find plasma.py. This is difficult unless plasma.py is somewhere within lib/python.

This causes some issues. For example, sometimes you modify src/plasma/lib/python/setup.py and think you're running the code with your changes, but you are in fact running the other copy. Another issue is that you sometimes modify the copied file and not the original file during development.

There are a handful of alternatives.

  1. Use links instead of copying.
  2. Do the copying in setup.py and also remove the copied files in setup.py.
  3. Move setup.py to ray/ instead of ray/lib/python.

Plasma manager blocks on an object transfer when the object is evicted.

Plasma managers request object transfers from each other. These requests get queued at each Plasma manager, which sends the object data to the receiver in order. It's possible for the requested object to get evicted while the request is waiting in the queue. In this case, the sending Plasma manager will block (possibly forever) trying to get the object.

This can be seen in the multinode reconstruction tests: logs.

We should fix this by having the Plasma manager skip requests for which the object is no longer available locally. The receiving Plasma manager should be notified that the transfer is canceled, if some of the object data had already been sent.

Installation fails after installing PyTorch dependencies.

Prior to all of this, I installed Anaconda 3.5 and did

sudo apt-get update
sudo apt-get install -y cmake build-essential autoconf curl libtool libboost-all-dev unzip

Following instructions from https://github.com/pytorch/pytorch#installation on Linux, I did

export CMAKE_PREFIX_PATH=[anaconda root directory]
conda install numpy mkl setuptools cmake gcc cffi
conda install -c soumith magma-cuda75

Then cloning Ray and doing python setup.py install from ray/python failed with

ubuntu@ip-172-31-34-198:~/ray/python$ python setup.py install
running install
~/ray/src/common/thirdparty ~/ray/python
~/ray/python
+ set -e
+++ dirname /home/ubuntu/ray/src/numbuf/thirdparty/download_thirdparty.sh
++ cd /home/ubuntu/ray/src/numbuf/thirdparty
++ pwd
+ TP_DIR=/home/ubuntu/ray/src/numbuf/thirdparty
+ '[' '!' -d /home/ubuntu/ray/src/numbuf/thirdparty/arrow ']'
+ cd /home/ubuntu/ray/src/numbuf/thirdparty/arrow
+ git checkout c88bd70c13cf16c07b840623cb466aa98d535be0
HEAD is now at c88bd70... Build arrow_io and arrow_ipc as static libraries.
+ set -e
+++ dirname /home/ubuntu/ray/src/numbuf/thirdparty/build_thirdparty.sh
++ cd /home/ubuntu/ray/src/numbuf/thirdparty
++ pwd
+ TP_DIR=/home/ubuntu/ray/src/numbuf/thirdparty
+ PREFIX=/home/ubuntu/ray/src/numbuf/thirdparty/installed
++ uname
+ unamestr=Linux
+ [[ Linux == \L\i\n\u\x ]]
++ nproc
+ PARALLEL=64
+ echo 'building arrow'
building arrow
+ cd /home/ubuntu/ray/src/numbuf/thirdparty/arrow/cpp
+ mkdir -p /home/ubuntu/ray/src/numbuf/thirdparty/arrow/cpp/build
+ cd /home/ubuntu/ray/src/numbuf/thirdparty/arrow/cpp/build
+ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS=-g -DCMAKE_CXX_FLAGS=-g -DARROW_BUILD_TESTS=OFF ..
clang-tidy not found
clang-format not found
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
-- Build Type: RELEASE
INFO Reading specs from /home/ubuntu/anaconda3/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.8.5/specs
COLLECT_GCC=/home/ubuntu/anaconda3/bin/c++
COLLECT_LTO_WRAPPER=/home/ubuntu/anaconda3/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.8.5/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --prefix=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold --with-gxx-include-dir=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/gcc/include/c++ --bindir=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/bin --datarootdir=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/share --libdir=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib --with-gmp=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold --with-mpfr=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold --with-mpc=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold --with-isl=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold --with-cloog=/home/ray/mc-x64-2.7/conda-bld/gcc_1479223211463/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold --enable-checking=release --with-tune=generic --disable-multilib
Thread model: posix
gcc version 4.8.5 (GCC) 

Selected compiler gcc 4.8.5
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:946 ] _boost_TEST_VERSIONS = 1.61.0;1.61;1.60.0;1.60;1.59.0;1.59;1.58.0;1.58;1.57.0;1.57;1.56.0;1.56;1.55.0;1.55;1.54.0;1.54;1.53.0;1.53;1.52.0;1.52;1.51.0;1.51;1.50.0;1.50;1.49.0;1.49;1.48.0;1.48;1.47.0;1.47;1.46.1;1.46.0;1.46;1.45.0;1.45;1.44.0;1.44;1.43.0;1.43;1.42.0;1.42;1.41.0;1.41;1.40.0;1.40;1.39.0;1.39;1.38.0;1.38;1.37.0;1.37;1.36.1;1.36.0;1.36;1.35.1;1.35.0;1.35;1.34.1;1.34.0;1.34;1.33.1;1.33.0;1.33
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:948 ] Boost_USE_MULTITHREADED = ON
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:950 ] Boost_USE_STATIC_LIBS = ON
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:952 ] Boost_USE_STATIC_RUNTIME = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:954 ] Boost_ADDITIONAL_VERSIONS = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:956 ] Boost_NO_SYSTEM_PATHS = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1024 ] Declared as CMake or Environmental Variables:
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1026 ]   BOOST_ROOT = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1028 ]   BOOST_INCLUDEDIR = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1030 ]   BOOST_LIBRARYDIR = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1032 ] _boost_TEST_VERSIONS = 1.61.0;1.61;1.60.0;1.60;1.59.0;1.59;1.58.0;1.58;1.57.0;1.57;1.56.0;1.56;1.55.0;1.55;1.54.0;1.54;1.53.0;1.53;1.52.0;1.52;1.51.0;1.51;1.50.0;1.50;1.49.0;1.49;1.48.0;1.48;1.47.0;1.47;1.46.1;1.46.0;1.46;1.45.0;1.45;1.44.0;1.44;1.43.0;1.43;1.42.0;1.42;1.41.0;1.41;1.40.0;1.40;1.39.0;1.39;1.38.0;1.38;1.37.0;1.37;1.36.1;1.36.0;1.36;1.35.1;1.35.0;1.35;1.34.1;1.34.0;1.34;1.33.1;1.33.0;1.33
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1125 ] location of version.hpp: /usr/include/boost/version.hpp
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1149 ] version.hpp reveals boost 1.58.0
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1235 ] guessed _boost_COMPILER = -gcc48
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1245 ] _boost_MULTITHREADED = -mt
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1288 ] _boost_RELEASE_ABI_TAG = -
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1290 ] _boost_DEBUG_ABI_TAG = -d
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1344 ] _boost_LIBRARY_SEARCH_DIRS_RELEASE = /usr/include/lib;/usr/include/../lib;/usr/include/stage/lib;PATHS;C:/boost/lib;C:/boost;/sw/local/lib_boost_LIBRARY_SEARCH_DIRS_DEBUG   = /usr/include/lib;/usr/include/../lib;/usr/include/stage/lib;PATHS;C:/boost/lib;C:/boost;/sw/local/lib
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1483 ] Searching for SYSTEM_LIBRARY_RELEASE: boost_system-gcc48-mt-1_58;boost_system-gcc48-mt;boost_system-mt-1_58;boost_system-mt;boost_system
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1525 ] Searching for SYSTEM_LIBRARY_DEBUG: boost_system-gcc48-mt-d-1_58;boost_system-gcc48-mt-d;boost_system-mt-d-1_58;boost_system-mt-d;boost_system-mt;boost_system
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1483 ] Searching for FILESYSTEM_LIBRARY_RELEASE: boost_filesystem-gcc48-mt-1_58;boost_filesystem-gcc48-mt;boost_filesystem-mt-1_58;boost_filesystem-mt;boost_filesystem
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1525 ] Searching for FILESYSTEM_LIBRARY_DEBUG: boost_filesystem-gcc48-mt-d-1_58;boost_filesystem-gcc48-mt-d;boost_filesystem-mt-d-1_58;boost_filesystem-mt-d;boost_filesystem-mt;boost_filesystem
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1595 ] Boost_FOUND = 1
CMake Error at /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1753 (message):
  Unable to find the requested Boost libraries.

  Boost version: 1.58.0

  Boost include path: /usr/include

  Could not find the following static Boost libraries:

          boost_system
          boost_filesystem

  No Boost libraries were found.  You may need to set BOOST_LIBRARYDIR to the
  directory containing Boost libraries or BOOST_ROOT to the location of
  Boost.
Call Stack (most recent call first):
  CMakeLists.txt:437 (find_package)


-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:946 ] _boost_TEST_VERSIONS = 1.61.0;1.61;1.60.0;1.60;1.59.0;1.59;1.58.0;1.58;1.57.0;1.57;1.56.0;1.56;1.55.0;1.55;1.54.0;1.54;1.53.0;1.53;1.52.0;1.52;1.51.0;1.51;1.50.0;1.50;1.49.0;1.49;1.48.0;1.48;1.47.0;1.47;1.46.1;1.46.0;1.46;1.45.0;1.45;1.44.0;1.44;1.43.0;1.43;1.42.0;1.42;1.41.0;1.41;1.40.0;1.40;1.39.0;1.39;1.38.0;1.38;1.37.0;1.37;1.36.1;1.36.0;1.36;1.35.1;1.35.0;1.35;1.34.1;1.34.0;1.34;1.33.1;1.33.0;1.33
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:948 ] Boost_USE_MULTITHREADED = ON
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:950 ] Boost_USE_STATIC_LIBS = OFF
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:952 ] Boost_USE_STATIC_RUNTIME = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:954 ] Boost_ADDITIONAL_VERSIONS = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:956 ] Boost_NO_SYSTEM_PATHS = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1024 ] Declared as CMake or Environmental Variables:
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1026 ]   BOOST_ROOT = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1028 ]   BOOST_INCLUDEDIR = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1030 ]   BOOST_LIBRARYDIR = 
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1032 ] _boost_TEST_VERSIONS = 1.61.0;1.61;1.60.0;1.60;1.59.0;1.59;1.58.0;1.58;1.57.0;1.57;1.56.0;1.56;1.55.0;1.55;1.54.0;1.54;1.53.0;1.53;1.52.0;1.52;1.51.0;1.51;1.50.0;1.50;1.49.0;1.49;1.48.0;1.48;1.47.0;1.47;1.46.1;1.46.0;1.46;1.45.0;1.45;1.44.0;1.44;1.43.0;1.43;1.42.0;1.42;1.41.0;1.41;1.40.0;1.40;1.39.0;1.39;1.38.0;1.38;1.37.0;1.37;1.36.1;1.36.0;1.36;1.35.1;1.35.0;1.35;1.34.1;1.34.0;1.34;1.33.1;1.33.0;1.33
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1125 ] location of version.hpp: /usr/include/boost/version.hpp
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1149 ] version.hpp reveals boost 1.58.0
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1235 ] guessed _boost_COMPILER = -gcc48
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1245 ] _boost_MULTITHREADED = -mt
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1288 ] _boost_RELEASE_ABI_TAG = -
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1290 ] _boost_DEBUG_ABI_TAG = -d
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1344 ] _boost_LIBRARY_SEARCH_DIRS_RELEASE = /usr/include/lib;/usr/include/../lib;/usr/include/stage/lib;PATHS;C:/boost/lib;C:/boost;/sw/local/lib_boost_LIBRARY_SEARCH_DIRS_DEBUG   = /usr/include/lib;/usr/include/../lib;/usr/include/stage/lib;PATHS;C:/boost/lib;C:/boost;/sw/local/lib
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1483 ] Searching for SYSTEM_LIBRARY_RELEASE: boost_system-gcc48-mt-1_58;boost_system-gcc48-mt;boost_system-mt-1_58;boost_system-mt;boost_system
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1525 ] Searching for SYSTEM_LIBRARY_DEBUG: boost_system-gcc48-mt-d-1_58;boost_system-gcc48-mt-d;boost_system-mt-d-1_58;boost_system-mt-d;boost_system-mt;boost_system
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1483 ] Searching for FILESYSTEM_LIBRARY_RELEASE: boost_filesystem-gcc48-mt-1_58;boost_filesystem-gcc48-mt;boost_filesystem-mt-1_58;boost_filesystem-mt;boost_filesystem
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1525 ] Searching for FILESYSTEM_LIBRARY_DEBUG: boost_filesystem-gcc48-mt-d-1_58;boost_filesystem-gcc48-mt-d;boost_filesystem-mt-d-1_58;boost_filesystem-mt-d;boost_filesystem-mt;boost_filesystem
-- [ /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1595 ] Boost_FOUND = 1
CMake Error at /home/ubuntu/anaconda3/share/cmake-3.6/Modules/FindBoost.cmake:1753 (message):
  Unable to find the requested Boost libraries.

  Boost version: 1.58.0

  Boost include path: /usr/include

  Could not find the following Boost libraries:

          boost_system
          boost_filesystem

  No Boost libraries were found.  You may need to set BOOST_LIBRARYDIR to the
  directory containing Boost libraries or BOOST_ROOT to the location of
  Boost.
Call Stack (most recent call first):
  CMakeLists.txt:444 (find_package)


CMake Error at CMakeLists.txt:447 (list):
  list sub-command SORT requires list to be present.


-- Boost include dir: /usr/include
-- Boost libraries: 
CMake Error at CMakeLists.txt:421 (message):
  No static or shared library provided for NOTFOUND
Call Stack (most recent call first):
  CMakeLists.txt:462 (ADD_THIRDPARTY_LIB)


-- Configuring incomplete, errors occurred!
See also "/home/ubuntu/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeOutput.log".
See also "/home/ubuntu/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeError.log".
Traceback (most recent call last):
  File "setup.py", line 41, in <module>
    license="Apache 2.0")
  File "/home/ubuntu/anaconda3/lib/python3.5/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/ubuntu/anaconda3/lib/python3.5/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/home/ubuntu/anaconda3/lib/python3.5/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "setup.py", line 13, in run
    subprocess.check_call(["../build.sh"])
  File "/home/ubuntu/anaconda3/lib/python3.5/subprocess.py", line 581, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../build.sh']' returned non-zero exit status 1

Errors arising from lack of isolation between drivers on the same cluster.

Some test failures, for example https://travis-ci.org/ray-project/ray/jobs/195442413 are caused by this. In this test, two drivers both define a remote function f. When the second driver calls its copy of f, the task gets assigned to a worker and the worker executes the first driver's copy of f. There are a couple of problems here.

  1. When two drivers both define a remote function with the same name, one of them will override the other. This happens because remote functions are currently identified by their module name plus function name. This could be fixed by incorporating the driver ID in the remote function ID (e.g., hashing them together).

  2. Drivers currently have an "import counter" which counts how many things the driver has imported (e.g., remote functions, environment variables, etc). Drivers also keep track of how many things they have exported (e.g., how many remote functions they have defined). When a remote function is defined on the driver, it makes a note of how many exports were exported before it, and it will not run on a worker until that worker has imported at least that many things. However, this counter is currently shared between drivers and so exports from both drivers are incrementing a counter on the workers. This is the problem that we're running into in the link above. We need a separate counter for each driver (or we need to rethink how we're doing exports).

This is somewhat driver-centric (it won't probably handle situations where remote functions are defined on workers). Should we support that? We probably do need to support the creation of actors from tasks run on workers, so we may need to handle that a little differently.

Valgrind for Redis.

Now that we are writing our own Redis modules, we need to run Redis in valgrind.

Failures when stress testing "wait".

Currently it is possible to overload the plasma manager by calling wait too much. For example, calling wait 1000 times on 1000 object IDs can trigger

Assuming we start

rm dump.rdb; src/common/thirdparty/redis/src/redis-server 
src/plasma/build/plasma_store -s /tmp/s1 -m 1000000
src/plasma/build/plasma_manager -s /tmp/s1 -m /tmp/m1 -h 127.0.0.1 -r 127.0.01:6379 -p 29834

Then in python do the following.

import plasma
import numpy as np

c1 = plasma.PlasmaClient("/tmp/s1", "/tmp/m1")

object_ids = [np.random.bytes(20) for _ in range(100000)]
for i in range(1, 1000):
  print(i)
  c1.wait(object_ids, timeout=10, num_returns=i)

This triggers (on the branch for #113) a timeout in the object_table_subscribe command in the wait implementation in the plasma manager. The timeout triggers a fatal callback.

To trigger that particular failure, you may need to decrease the timeout for object_table_subscribe command (otherwise, the manager will instead die via a SIGPIPE later on).

This could be addressed by not allowing timeouts or retrying an arbitrary number of times, but the real problem is the load on the manager (and the number of timers that are added to the manager event loop). Hiredis doesn't seem to handle large numbers of timers very efficiently.

A better solution would be to use Redis modules to write a custom Redis call to subscribe to a large number of object ID updates instead of doing a seperate Redis call for each object ID in the wait call.

Import error GLIBCXX_3.4.21 not found when importing numbuf.

I followed the installation instructions on a fresh Ubuntu VM using Anaconda with Python 3 (though I am pretty sure I have seen this error with Python 2 as well).

Ray builds and installs fine. However, if I try to run runtest.py (or simply import numbuf), I get the following error.

Traceback (most recent call last):
  File "test/runtest.py", line 6, in <module>
    import ray
  File "/home/ubuntu/ray/lib/python/ray/__init__.py", line 16, in <module>
    import ray.serialization
  File "/home/ubuntu/ray/lib/python/ray/serialization.py", line 6, in <module>
    import numbuf
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/numbuf-0.0.1-py3.5.egg/numbuf/__init__.py", line 5, in <module>
    from numbuf.libnumbuf import *
ImportError: /home/ubuntu/anaconda3/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/ubuntu/anaconda3/lib/python3.5/site-packages/numbuf-0.0.1-py3.5.egg/numbuf/libnumbuf.so)

I followed one of the solutions here http://askubuntu.com/questions/575505/glibcxx-3-4-20-not-found-how-to-fix-this-error, which suggested running the following.

conda install libgcc

That fixed the problem.

Should we specifically look for this error and print something more helpful? Or should we change the installation instructions to mention this potential problem?

Actor methods should do some checking when they are invoked instead of when they are executed.

Currently, if we do

import ray

ray.init()

@ray.actor
class Foo(object):
  def __init__(self):
    pass
  def get_val(self, x):
    return x

f = Foo()

And then we call f.get_val(), the method will be executed as a task on the actor and will throw an exception. However, we could have thrown an exception on the caller side as we do with remote functions. For example,

import ray

ray.init()

@ray.remote
def f(x):
  return 1

f.remote()

Fails with

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-4b889f5b30a2> in <module>()
----> 1 f.remote()

/Users/rkn/Workspace/ray/python/ray/worker.py in func_call(*args, **kwargs)
   1796         args.extend([kwargs[keyword] if keyword in kwargs else default for keyword, default in keyword_defaults[len(args):]]) # fill in the remaining arguments
   1797         if any([arg is funcsigs._empty for arg in args]):
-> 1798           raise Exception("Not enough arguments were provided to {}.".format(func_name))
   1799         if _mode() == PYTHON_MODE:
   1800           # In PYTHON_MODE, remote calls simply execute the function. We copy the

Exception: Not enough arguments were provided to __main__.f.

Object hash mismatch for Python dictionaries

Python dictionaries (even when the insertion order was the same) do not have a deterministic key ordering. This means that we can get an object hash mismatch for dictionaries, even though the key-value pairs are the same.

Better code example to come, but this can be seen on this branch. Run the following command on python (v3.5):
python test/stress_tests.py ReconstructionTests.testPut

Error in Ray + TensorFlow documentation.

Running the example code under Complete Example on this page https://github.com/ray-project/ray/blob/master/doc/using-ray-with-tensorflow.md works for me.

However, if I run the first box of code beforehand (pasted below)

import tensorflow as tf
import numpy as np

x_data = tf.placeholder(tf.float32, shape=[100])
y_data = tf.placeholder(tf.float32, shape=[100])

w = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = w * x_data + b

loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

init = tf.initialize_all_variables()
sess = tf.Session()

then the example fails with errors like

Remote function __main__.step failed with:

Traceback (most recent call last):
  File "<ipython-input-2-bbe208aec82f>", line 45, in step
  File "/Users/rkn/Workspace/ray/lib/python/ray/experimental/tfutils.py", line 78, in set_weights
    self.sess.run(self.assignment_nodes, feed_dict={self.assignment_placeholders[name]: value for (name, value) in new_weights.items()})
  File "/Users/rkn/Workspace/ray/lib/python/ray/experimental/tfutils.py", line 78, in <dictcomp>
    self.sess.run(self.assignment_nodes, feed_dict={self.assignment_placeholders[name]: value for (name, value) in new_weights.items()})
KeyError: 'Variable_2'


You can inspect errors by running

    ray.error_info()

If this driver is hanging, start a new one with

    ray.init(redis_address="127.0.0.1:39461")

The problem seems to be that we are using the default TensorFlow variable names, which are out of sync on the driver and the workers because we have run some additional TensorFlow code on the driver.

Cannot Import Ray in Python code

I was able to download Ray without issue, but when I tried to run the test in the provided in the test directory or create my own, I tripped an exception upon running "import ray" shown below:

Traceback (most recent call last):
  File "runtest.py", line 7, in <module>
    import ray
  File "/Users/edwardgao/Library/Python/3.5/lib/python/site-packages/ray-0.0.1-py3.5.egg/ray/__init__.py", line 17, in <module>
    import ray.serialization
  File "/Users/edwardgao/Library/Python/3.5/lib/python/site-packages/ray-0.0.1-py3.5.egg/ray/serialization.py", line 6, in <module>
    import numbuf
  File "/Users/edwardgao/Library/Python/3.5/lib/python/site-packages/ray-0.0.1-py3.5.egg/numbuf/__init__.py", line 14, in <module>
    from core.src.numbuf.libnumbuf import *
ImportError: dynamic module does not define module export function (PyInit_libnumbuf)

To diagnose I ran build.sh, but I don't know how to interpret what I got. Output is pasted below:

~/projects/ray/src/common/thirdparty ~/projects/ray
~/projects/ray
+ set -e
+++ dirname /Users/edwardgao/projects/ray/src/numbuf/thirdparty/download_thirdparty.sh
++ cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty
++ pwd
+ TP_DIR=/Users/edwardgao/projects/ray/src/numbuf/thirdparty
+ '[' '!' -d /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow ']'
+ cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow
+ git checkout c88bd70c13cf16c07b840623cb466aa98d535be0
HEAD is now at c88bd70... Build arrow_io and arrow_ipc as static libraries.
+ set -e
+++ dirname /Users/edwardgao/projects/ray/src/numbuf/thirdparty/build_thirdparty.sh
++ cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty
++ pwd
+ TP_DIR=/Users/edwardgao/projects/ray/src/numbuf/thirdparty
+ PREFIX=/Users/edwardgao/projects/ray/src/numbuf/thirdparty/installed
++ uname
+ unamestr=Darwin
+ [[ Darwin == \L\i\n\u\x ]]
+ [[ Darwin == \D\a\r\w\i\n ]]
++ sysctl -n hw.ncpu
+ PARALLEL=4
+ echo 'Platform is macosx.'
Platform is macosx.
+ echo 'building arrow'
building arrow
+ cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp
+ mkdir -p /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build
+ cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build
+ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_FLAGS=-g -DCMAKE_CXX_FLAGS=-g -DARROW_BUILD_TESTS=OFF ..
clang-tidy not found
clang-format not found
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
-- Build Type: RELEASE
INFO Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Selected compiler clang 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1002 ] _boost_TEST_VERSIONS = 1.63.0;1.63;1.62.0;1.62;1.61.0;1.61;1.60.0;1.60;1.59.0;1.59;1.58.0;1.58;1.57.0;1.57;1.56.0;1.56;1.55.0;1.55;1.54.0;1.54;1.53.0;1.53;1.52.0;1.52;1.51.0;1.51;1.50.0;1.50;1.49.0;1.49;1.48.0;1.48;1.47.0;1.47;1.46.1;1.46.0;1.46;1.45.0;1.45;1.44.0;1.44;1.43.0;1.43;1.42.0;1.42;1.41.0;1.41;1.40.0;1.40;1.39.0;1.39;1.38.0;1.38;1.37.0;1.37;1.36.1;1.36.0;1.36;1.35.1;1.35.0;1.35;1.34.1;1.34.0;1.34;1.33.1;1.33.0;1.33
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1004 ] Boost_USE_MULTITHREADED = ON
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1006 ] Boost_USE_STATIC_LIBS = ON
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1008 ] Boost_USE_STATIC_RUNTIME = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1010 ] Boost_ADDITIONAL_VERSIONS = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1012 ] Boost_NO_SYSTEM_PATHS = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1080 ] Declared as CMake or Environmental Variables:
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1082 ]   BOOST_ROOT = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1084 ]   BOOST_INCLUDEDIR = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1086 ]   BOOST_LIBRARYDIR = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1088 ] _boost_TEST_VERSIONS = 1.63.0;1.63;1.62.0;1.62;1.61.0;1.61;1.60.0;1.60;1.59.0;1.59;1.58.0;1.58;1.57.0;1.57;1.56.0;1.56;1.55.0;1.55;1.54.0;1.54;1.53.0;1.53;1.52.0;1.52;1.51.0;1.51;1.50.0;1.50;1.49.0;1.49;1.48.0;1.48;1.47.0;1.47;1.46.1;1.46.0;1.46;1.45.0;1.45;1.44.0;1.44;1.43.0;1.43;1.42.0;1.42;1.41.0;1.41;1.40.0;1.40;1.39.0;1.39;1.38.0;1.38;1.37.0;1.37;1.36.1;1.36.0;1.36;1.35.1;1.35.0;1.35;1.34.1;1.34.0;1.34;1.33.1;1.33.0;1.33
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1181 ] location of version.hpp: /usr/local/include/boost/version.hpp
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1205 ] version.hpp reveals boost 1.63.0
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1291 ] guessed _boost_COMPILER = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1301 ] _boost_MULTITHREADED = -mt
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1345 ] _boost_RELEASE_ABI_TAG = -
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1347 ] _boost_DEBUG_ABI_TAG = -d
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1403 ] _boost_LIBRARY_SEARCH_DIRS_RELEASE = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH_boost_LIBRARY_SEARCH_DIRS_DEBUG   = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1542 ] Searching for SYSTEM_LIBRARY_RELEASE: boost_system-mt-1_63;boost_system-mt;boost_system-mt-1_63;boost_system-mt;boost_system
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:362 ]  Boost_LIBRARY_DIR_RELEASE = /usr/local/lib _boost_LIBRARY_SEARCH_DIRS_RELEASE = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1584 ] Searching for SYSTEM_LIBRARY_DEBUG: boost_system-mt-d-1_63;boost_system-mt-d;boost_system-mt-d-1_63;boost_system-mt-d;boost_system-mt;boost_system
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:362 ]  Boost_LIBRARY_DIR_DEBUG = /usr/local/lib _boost_LIBRARY_SEARCH_DIRS_DEBUG = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1542 ] Searching for FILESYSTEM_LIBRARY_RELEASE: boost_filesystem-mt-1_63;boost_filesystem-mt;boost_filesystem-mt-1_63;boost_filesystem-mt;boost_filesystem
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:362 ]  Boost_LIBRARY_DIR_RELEASE = /usr/local/lib _boost_LIBRARY_SEARCH_DIRS_RELEASE = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1584 ] Searching for FILESYSTEM_LIBRARY_DEBUG: boost_filesystem-mt-d-1_63;boost_filesystem-mt-d;boost_filesystem-mt-d-1_63;boost_filesystem-mt-d;boost_filesystem-mt;boost_filesystem
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:362 ]  Boost_LIBRARY_DIR_DEBUG = /usr/local/lib _boost_LIBRARY_SEARCH_DIRS_DEBUG = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1654 ] Boost_FOUND = 1
-- Boost version: 1.63.0
-- Found the following Boost libraries:
--   system
--   filesystem
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1002 ] _boost_TEST_VERSIONS = 1.63.0;1.63;1.62.0;1.62;1.61.0;1.61;1.60.0;1.60;1.59.0;1.59;1.58.0;1.58;1.57.0;1.57;1.56.0;1.56;1.55.0;1.55;1.54.0;1.54;1.53.0;1.53;1.52.0;1.52;1.51.0;1.51;1.50.0;1.50;1.49.0;1.49;1.48.0;1.48;1.47.0;1.47;1.46.1;1.46.0;1.46;1.45.0;1.45;1.44.0;1.44;1.43.0;1.43;1.42.0;1.42;1.41.0;1.41;1.40.0;1.40;1.39.0;1.39;1.38.0;1.38;1.37.0;1.37;1.36.1;1.36.0;1.36;1.35.1;1.35.0;1.35;1.34.1;1.34.0;1.34;1.33.1;1.33.0;1.33
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1004 ] Boost_USE_MULTITHREADED = ON
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1006 ] Boost_USE_STATIC_LIBS = OFF
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1008 ] Boost_USE_STATIC_RUNTIME = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1010 ] Boost_ADDITIONAL_VERSIONS = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1012 ] Boost_NO_SYSTEM_PATHS = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1080 ] Declared as CMake or Environmental Variables:
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1082 ]   BOOST_ROOT = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1084 ]   BOOST_INCLUDEDIR = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1086 ]   BOOST_LIBRARYDIR = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1088 ] _boost_TEST_VERSIONS = 1.63.0;1.63;1.62.0;1.62;1.61.0;1.61;1.60.0;1.60;1.59.0;1.59;1.58.0;1.58;1.57.0;1.57;1.56.0;1.56;1.55.0;1.55;1.54.0;1.54;1.53.0;1.53;1.52.0;1.52;1.51.0;1.51;1.50.0;1.50;1.49.0;1.49;1.48.0;1.48;1.47.0;1.47;1.46.1;1.46.0;1.46;1.45.0;1.45;1.44.0;1.44;1.43.0;1.43;1.42.0;1.42;1.41.0;1.41;1.40.0;1.40;1.39.0;1.39;1.38.0;1.38;1.37.0;1.37;1.36.1;1.36.0;1.36;1.35.1;1.35.0;1.35;1.34.1;1.34.0;1.34;1.33.1;1.33.0;1.33
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1181 ] location of version.hpp: /usr/local/include/boost/version.hpp
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1205 ] version.hpp reveals boost 1.63.0
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1291 ] guessed _boost_COMPILER = 
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1301 ] _boost_MULTITHREADED = -mt
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1345 ] _boost_RELEASE_ABI_TAG = -
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1347 ] _boost_DEBUG_ABI_TAG = -d
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1403 ] _boost_LIBRARY_SEARCH_DIRS_RELEASE = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH_boost_LIBRARY_SEARCH_DIRS_DEBUG   = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1542 ] Searching for SYSTEM_LIBRARY_RELEASE: boost_system-mt-1_63;boost_system-mt;boost_system-mt-1_63;boost_system-mt;boost_system
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:362 ]  Boost_LIBRARY_DIR_RELEASE = /usr/local/lib _boost_LIBRARY_SEARCH_DIRS_RELEASE = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1584 ] Searching for SYSTEM_LIBRARY_DEBUG: boost_system-mt-d-1_63;boost_system-mt-d;boost_system-mt-d-1_63;boost_system-mt-d;boost_system-mt;boost_system
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:362 ]  Boost_LIBRARY_DIR_DEBUG = /usr/local/lib _boost_LIBRARY_SEARCH_DIRS_DEBUG = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1542 ] Searching for FILESYSTEM_LIBRARY_RELEASE: boost_filesystem-mt-1_63;boost_filesystem-mt;boost_filesystem-mt-1_63;boost_filesystem-mt;boost_filesystem
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:362 ]  Boost_LIBRARY_DIR_RELEASE = /usr/local/lib _boost_LIBRARY_SEARCH_DIRS_RELEASE = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1584 ] Searching for FILESYSTEM_LIBRARY_DEBUG: boost_filesystem-mt-d-1_63;boost_filesystem-mt-d;boost_filesystem-mt-d-1_63;boost_filesystem-mt-d;boost_filesystem-mt;boost_filesystem
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:362 ]  Boost_LIBRARY_DIR_DEBUG = /usr/local/lib _boost_LIBRARY_SEARCH_DIRS_DEBUG = /usr/local/lib;NO_DEFAULT_PATH;NO_CMAKE_FIND_ROOT_PATH
-- [ /usr/local/Cellar/cmake/3.7.2/share/cmake/Modules/FindBoost.cmake:1654 ] Boost_FOUND = 1
-- Boost version: 1.63.0
-- Found the following Boost libraries:
--   system
--   filesystem
-- Boost include dir: /usr/local/include
-- Boost libraries: /usr/local/lib/libboost_system-mt.dylib/usr/local/lib/libboost_filesystem-mt.dylib
Added static library dependency boost_system: /usr/local/lib/libboost_system-mt.a
Added shared library dependency boost_system: /usr/local/lib/libboost_filesystem-mt.dylib
Added static library dependency boost_filesystem: /usr/local/lib/libboost_filesystem-mt.a
Added shared library dependency boost_filesystem: /usr/local/lib/libboost_system-mt.dylib
-- RapidJSON include dir: /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/src/rapidjson_ep/include
-- Flatbuffers include dir: /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/flatbuffers_ep-prefix/src/flatbuffers_ep-install/include
-- Flatbuffers static library: /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/flatbuffers_ep-prefix/src/flatbuffers_ep-install/libflatbuffers.a
-- Flatbuffers compiler: /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/flatbuffers_ep-prefix/src/flatbuffers_ep-install/bin/flatc
Added static library dependency flatbuffers: /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/flatbuffers_ep-prefix/src/flatbuffers_ep-install/libflatbuffers.a
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build
+ make VERBOSE=1 -j4
/usr/local/Cellar/cmake/3.7.2/bin/cmake -H/Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp -B/Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_progress_start /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/progress.marks
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/Makefile2 all
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/flatbuffers_ep.dir/build.make CMakeFiles/flatbuffers_ep.dir/depend
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/rapidjson_ep.dir/build.make CMakeFiles/rapidjson_ep.dir/depend
cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build && /usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/rapidjson_ep.dir/DependInfo.cmake --color=
cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build && /usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/flatbuffers_ep.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/rapidjson_ep.dir/build.make CMakeFiles/rapidjson_ep.dir/build
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/flatbuffers_ep.dir/build.make CMakeFiles/flatbuffers_ep.dir/build
make[2]: Nothing to be done for `CMakeFiles/flatbuffers_ep.dir/build'.
make[2]: Nothing to be done for `CMakeFiles/rapidjson_ep.dir/build'.
[ 34%] Built target flatbuffers_ep
[ 34%] Built target rapidjson_ep
/Library/Developer/CommandLineTools/usr/bin/make -f src/arrow/ipc/CMakeFiles/metadata_fbs.dir/build.make src/arrow/ipc/CMakeFiles/metadata_fbs.dir/depend
cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build && /usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/src/arrow/ipc /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/src/arrow/ipc /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/src/arrow/ipc/CMakeFiles/metadata_fbs.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make -f src/arrow/ipc/CMakeFiles/metadata_fbs.dir/build.make src/arrow/ipc/CMakeFiles/metadata_fbs.dir/build
make[2]: Nothing to be done for `src/arrow/ipc/CMakeFiles/metadata_fbs.dir/build'.
[ 36%] Built target metadata_fbs
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/arrow_objlib.dir/build.make CMakeFiles/arrow_objlib.dir/depend
cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build && /usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/arrow_objlib.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/arrow_objlib.dir/build.make CMakeFiles/arrow_objlib.dir/build
make[2]: Nothing to be done for `CMakeFiles/arrow_objlib.dir/build'.
[ 72%] Built target arrow_objlib
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/arrow_static.dir/build.make CMakeFiles/arrow_static.dir/depend
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/arrow_shared.dir/build.make CMakeFiles/arrow_shared.dir/depend
cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build && /usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/arrow_static.dir/DependInfo.cmake --color=
cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build && /usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/arrow_shared.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/arrow_static.dir/build.make CMakeFiles/arrow_static.dir/build
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/arrow_shared.dir/build.make CMakeFiles/arrow_shared.dir/build
make[2]: Nothing to be done for `CMakeFiles/arrow_shared.dir/build'.
make[2]: Nothing to be done for `CMakeFiles/arrow_static.dir/build'.
[ 74%] Built target arrow_static
[ 76%] Built target arrow_shared
/Library/Developer/CommandLineTools/usr/bin/make -f src/arrow/io/CMakeFiles/arrow_io.dir/build.make src/arrow/io/CMakeFiles/arrow_io.dir/depend
cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build && /usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/src/arrow/io /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/src/arrow/io /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/src/arrow/io/CMakeFiles/arrow_io.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make -f src/arrow/io/CMakeFiles/arrow_io.dir/build.make src/arrow/io/CMakeFiles/arrow_io.dir/build
make[2]: Nothing to be done for `src/arrow/io/CMakeFiles/arrow_io.dir/build'.
[ 85%] Built target arrow_io
/Library/Developer/CommandLineTools/usr/bin/make -f src/arrow/ipc/CMakeFiles/arrow_ipc.dir/build.make src/arrow/ipc/CMakeFiles/arrow_ipc.dir/depend
cd /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build && /usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/src/arrow/ipc /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/src/arrow/ipc /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/src/arrow/ipc/CMakeFiles/arrow_ipc.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make -f src/arrow/ipc/CMakeFiles/arrow_ipc.dir/build.make src/arrow/ipc/CMakeFiles/arrow_ipc.dir/build
make[2]: Nothing to be done for `src/arrow/ipc/CMakeFiles/arrow_ipc.dir/build'.
[100%] Built target arrow_ipc
/usr/local/Cellar/cmake/3.7.2/bin/cmake -E cmake_progress_start /Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles 0
~/projects/ray/python/core ~/projects/ray
-- Trying custom approach for finding Python.
-- Found Python program: /usr/bin/python
-- PYTHON_LIBRARY_NAME: python2.7
-- PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- PYTHON_PREFIX: /System/Library/Frameworks/Python.framework/Versions/2.7
-- PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- The custom approach for finding Python succeeded.
-- Using CUSTOM_PYTHON_EXECUTABLE: /usr/bin/python
-- Using PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- Using PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- Trying custom approach for finding Python.
-- Found Python program: /usr/bin/python
-- PYTHON_LIBRARY_NAME: python2.7
-- PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- PYTHON_PREFIX: /System/Library/Frameworks/Python.framework/Versions/2.7
-- PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- The custom approach for finding Python succeeded.
-- Using CUSTOM_PYTHON_EXECUTABLE: /usr/bin/python
-- Using PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- Using PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- Trying custom approach for finding Python.
-- Found Python program: /usr/bin/python
-- PYTHON_LIBRARY_NAME: python2.7
-- PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- PYTHON_PREFIX: /System/Library/Frameworks/Python.framework/Versions/2.7
-- PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- The custom approach for finding Python succeeded.
-- Using CUSTOM_PYTHON_EXECUTABLE: /usr/bin/python
-- Using PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- Using PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- Trying custom approach for finding Python.
-- Found Python program: /usr/bin/python
-- PYTHON_LIBRARY_NAME: python2.7
-- PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- PYTHON_PREFIX: /System/Library/Frameworks/Python.framework/Versions/2.7
-- PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- The custom approach for finding Python succeeded.
-- Using CUSTOM_PYTHON_EXECUTABLE: /usr/bin/python
-- Using PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- Using PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- Trying custom approach for finding Python.
-- Found Python program: /usr/bin/python
-- PYTHON_LIBRARY_NAME: python2.7
-- PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- PYTHON_PREFIX: /System/Library/Frameworks/Python.framework/Versions/2.7
-- PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- The custom approach for finding Python succeeded.
-- Using CUSTOM_PYTHON_EXECUTABLE: /usr/bin/python
-- Using PYTHON_LIBRARIES: /System/Library/Frameworks/Python.framework/Versions/2.7/lib/libpython2.7.dylib
-- Using PYTHON_INCLUDE_DIRS: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7
-- NumPy ver. 1.8.0rc1 found (include: /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/include)
-- Configuring done
CMake Warning (dev):
  Policy CMP0042 is not set: MACOSX_RPATH is enabled by default.  Run "cmake
  --help-policy CMP0042" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  MACOSX_RPATH is not specified for the following targets:

   numbuf
   photon
   plasma
   plasma_client
   ray_redis_module

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /Users/edwardgao/projects/ray/python/core
[  1%] Building C object src/common/CMakeFiles/common.dir/event_loop.c.o
[  2%] Building C object src/common/CMakeFiles/common.dir/common.c.o
[  3%] Building C object src/common/CMakeFiles/common.dir/task.c.o
[  4%] Building C object src/common/CMakeFiles/common.dir/io.c.o
/Users/edwardgao/projects/ray/src/common/io.c:34:14: warning: incompatible pointer types
      initializing 'int *const' with an expression of type 'const char *'
      [-Wincompatible-pointer-types]
  int *const pon = (char const *) &on;
             ^     ~~~~~~~~~~~~~~~~~~
1 warning generated.
[  5%] Building C object src/common/CMakeFiles/common.dir/net.c.o
[  6%] Building C object src/common/CMakeFiles/common.dir/logging.c.o
[  7%] Building C object src/common/CMakeFiles/common.dir/state/redis.c.o
[  8%] Building C object src/common/CMakeFiles/common.dir/state/table.c.o
[  9%] Building C object src/common/CMakeFiles/common.dir/state/object_table.c.o
[ 10%] Building C object src/common/CMakeFiles/common.dir/state/task_table.c.o
[ 11%] Building C object src/common/CMakeFiles/common.dir/state/db_client_table.c.o
[ 12%] Building C object src/common/CMakeFiles/common.dir/state/local_scheduler_table.c.o
[ 13%] Building C object src/common/CMakeFiles/common.dir/thirdparty/ae/ae.c.o
[ 14%] Building C object src/common/CMakeFiles/common.dir/thirdparty/sha256.c.o
[ 15%] Linking C static library libcommon.a
[ 15%] Built target common
[ 15%] Built target hiredis
[ 17%] Creating directories for 'flatcc'
[ 18%] Performing download step (download, verify and extract) for 'flatcc'
-- File already exists but no hash specified (use URL_HASH):
  file='/Users/edwardgao/projects/ray/python/core/src/common/flatcc-prefix/src/v0.4.0.tar.gz'
Old file will be removed and new file downloaded from URL.
-- Downloading...
   dst='/Users/edwardgao/projects/ray/python/core/src/common/flatcc-prefix/src/v0.4.0.tar.gz'
   timeout='none'
-- Using src='https://github.com/dvidelabs/flatcc/archive/v0.4.0.tar.gz'
-- Downloading... done
-- extracting...
     src='/Users/edwardgao/projects/ray/python/core/src/common/flatcc-prefix/src/v0.4.0.tar.gz'
     dst='/Users/edwardgao/projects/ray/python/core/src/common/flatcc-prefix/src/flatcc'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 19%] No patch step for 'flatcc'
[ 20%] No update step for 'flatcc'
[ 21%] Performing configure step for 'flatcc'
-- Setting Clang compiler options
-- Configured C_FLAGS: -fPIC -DFLATCC_REFLECTION=1 -std=c11 -pedantic -Wall -Wextra -Werror
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/edwardgao/projects/ray/python/core/src/common/flatcc-prefix/src/flatcc-build
[ 22%] Performing build step for 'flatcc'
[  1%] Linking C static library /Users/edwardgao/projects/ray/python/core/src/common/flatcc-prefix/src/flatcc/lib/libflatccrt.a
[  9%] Built target flatccrt
[ 11%] Linking C static library /Users/edwardgao/projects/ray/python/core/src/common/flatcc-prefix/src/flatcc/lib/libflatcc.a
[ 48%] Built target flatcc
[ 50%] Linking C executable /Users/edwardgao/projects/ray/python/core/src/common/flatcc-prefix/src/flatcc/bin/flatcc
[ 51%] Built target flatcc_cli
[ 53%] Linking C executable cgen_test
[ 54%] Built target cgen_test
[ 54%] Built target gen_monster_test
Scanning dependencies of target monster_test
[ 56%] Building C object test/monster_test/CMakeFiles/monster_test.dir/monster_test.c.o
[ 58%] Linking C executable monster_test
[ 58%] Built target monster_test
[ 58%] Built target gen_monster_test_solo
Scanning dependencies of target monster_test_solo
[ 59%] Building C object test/monster_test_solo/CMakeFiles/monster_test_solo.dir/monster_test_solo.c.o
[ 61%] Linking C executable monster_test_solo
[ 61%] Built target monster_test_solo
[ 61%] Built target gen_monster_test_concat
Scanning dependencies of target monster_test_concat
[ 62%] Building C object test/monster_test_concat/CMakeFiles/monster_test_concat.dir/monster_test_concat.c.o
[ 64%] Linking C executable monster_test_concat
[ 64%] Built target monster_test_concat
[ 64%] Built target gen_monster_test_prefix
Scanning dependencies of target monster_test_prefix
[ 66%] Building C object test/monster_test_prefix/CMakeFiles/monster_test_prefix.dir/monster_test_prefix.c.o
[ 67%] Linking C executable monster_test_prefix
[ 67%] Built target monster_test_prefix
[ 67%] Built target gen_flatc_compat
Scanning dependencies of target flatc_compat
[ 69%] Building C object test/flatc_compat/CMakeFiles/flatc_compat.dir/flatc_compat.c.o
[ 70%] Linking C executable flatc_compat
[ 70%] Built target flatc_compat
[ 70%] Built target gen_monster_test_json
Scanning dependencies of target test_json_printer
[ 72%] Building C object test/json_test/CMakeFiles/test_json_printer.dir/test_json_printer.c.o
[ 74%] Linking C executable test_json_printer
[ 74%] Built target test_json_printer
Scanning dependencies of target test_json_parser
[ 75%] Building C object test/json_test/CMakeFiles/test_json_parser.dir/test_json_parser.c.o
[ 77%] Linking C executable test_json_parser
[ 77%] Built target test_json_parser
[ 79%] Linking C executable test_basic_parse
[ 80%] Built target test_basic_parse
Scanning dependencies of target test_json
[ 82%] Building C object test/json_test/CMakeFiles/test_json.dir/test_json.c.o
[ 83%] Linking C executable test_json
[ 83%] Built target test_json
[ 83%] Built target gen_emit_test
Scanning dependencies of target emit_test
[ 85%] Building C object test/emit_test/CMakeFiles/emit_test.dir/emit_test.c.o
[ 87%] Linking C executable emit_test
[ 87%] Built target emit_test
[ 87%] Built target gen_load_test
Scanning dependencies of target load_test
[ 88%] Building C object test/load_test/CMakeFiles/load_test.dir/load_test.c.o
[ 90%] Linking C executable load_test
[ 90%] Built target load_test
[ 90%] Built target gen_reflection_test
[ 91%] Linking C executable reflection_test
[ 93%] Built target reflection_test
[ 93%] Built target gen_monster_fbs
Scanning dependencies of target monster
[ 95%] Building C object samples/monster/CMakeFiles/monster.dir/monster.c.o
[ 96%] Linking C executable monster
[ 96%] Built target monster
[ 96%] Built target gen_monster_bfbs
[ 98%] Linking C executable bfbs2json
[100%] Built target bfbs2json
[ 23%] No install step for 'flatcc'
[ 24%] Completed 'flatcc'
[ 24%] Built target flatcc
[ 25%] Building C object src/common/CMakeFiles/io_tests.dir/test/io_tests.c.o
[ 26%] Linking C executable io_tests
[ 26%] Built target io_tests
[ 27%] Building C object src/common/CMakeFiles/common_tests.dir/test/common_tests.c.o
[ 28%] Linking C executable common_tests
[ 28%] Built target common_tests
[ 29%] Building C object src/common/CMakeFiles/redis_tests.dir/test/redis_tests.c.o
[ 30%] Linking C executable redis_tests
[ 30%] Built target redis_tests
[ 31%] Building C object src/common/CMakeFiles/task_tests.dir/test/task_tests.c.o
[ 32%] Linking C executable task_tests
[ 32%] Built target task_tests
[ 34%] Building C object src/common/CMakeFiles/task_table_tests.dir/test/task_table_tests.c.o
[ 35%] Linking C executable task_table_tests
[ 35%] Built target task_table_tests

Hint: It's a good idea to run 'make test' ;)

[ 35%] Built target redis
[ 35%] Built target copy_redis
[ 36%] Building C object src/common/CMakeFiles/db_tests.dir/test/db_tests.c.o
[ 37%] Linking C executable db_tests
[ 37%] Built target db_tests
[ 38%] Building C object src/common/CMakeFiles/object_table_tests.dir/test/object_table_tests.c.o
[ 39%] Linking C executable object_table_tests
[ 39%] Built target object_table_tests
[ 40%] Building C object src/common/redis_module/CMakeFiles/ray_redis_module.dir/ray_redis_module.c.o
[ 41%] Linking C shared library libray_redis_module.so
[ 41%] Built target ray_redis_module
Running flatc compiler on /Users/edwardgao/projects/ray/src/plasma/format/plasma.fbs
[ 41%] Built target gen_plasma_fbs
Scanning dependencies of target plasma_store
[ 42%] Building C object src/plasma/CMakeFiles/plasma_store.dir/plasma_store.c.o
[ 43%] Building C object src/plasma/CMakeFiles/plasma_store.dir/plasma.c.o
[ 44%] Building C object src/plasma/CMakeFiles/plasma_store.dir/plasma_protocol.c.o
[ 45%] Building C object src/plasma/CMakeFiles/plasma_store.dir/eviction_policy.c.o
[ 46%] Building C object src/plasma/CMakeFiles/plasma_store.dir/fling.c.o
[ 47%] Building C object src/plasma/CMakeFiles/plasma_store.dir/malloc.c.o
[ 48%] Linking C executable plasma_store
[ 48%] Built target plasma_store
Scanning dependencies of target plasma
[ 50%] Building C object src/plasma/CMakeFiles/plasma.dir/plasma.c.o
[ 51%] Building C object src/plasma/CMakeFiles/plasma.dir/plasma_extension.c.o
[ 52%] Building C object src/plasma/CMakeFiles/plasma.dir/__/common/lib/python/common_extension.c.o
[ 53%] Building C object src/plasma/CMakeFiles/plasma.dir/plasma_protocol.c.o
[ 54%] Building C object src/plasma/CMakeFiles/plasma.dir/plasma_client.c.o
[ 55%] Building C object src/plasma/CMakeFiles/plasma.dir/thirdparty/xxhash.c.o
[ 56%] Building C object src/plasma/CMakeFiles/plasma.dir/fling.c.o
[ 57%] Linking C shared library libplasma.so
[ 57%] Built target plasma
Scanning dependencies of target plasma_lib
[ 58%] Building C object src/plasma/CMakeFiles/plasma_lib.dir/plasma_client.c.o
[ 59%] Building C object src/plasma/CMakeFiles/plasma_lib.dir/plasma.c.o
[ 60%] Building C object src/plasma/CMakeFiles/plasma_lib.dir/plasma_protocol.c.o
[ 61%] Building C object src/plasma/CMakeFiles/plasma_lib.dir/fling.c.o
[ 62%] Building C object src/plasma/CMakeFiles/plasma_lib.dir/thirdparty/xxhash.c.o
[ 63%] Linking C static library libplasma_lib.a
[ 63%] Built target plasma_lib
Scanning dependencies of target plasma_manager
[ 64%] Building C object src/plasma/CMakeFiles/plasma_manager.dir/plasma_manager.c.o
[ 65%] Linking C executable plasma_manager
[ 65%] Built target plasma_manager
Scanning dependencies of target serialization_tests
[ 67%] Building C object src/plasma/CMakeFiles/serialization_tests.dir/test/serialization_tests.c.o
[ 68%] Linking C executable serialization_tests
[ 68%] Built target serialization_tests
Scanning dependencies of target manager_tests
[ 69%] Building C object src/plasma/CMakeFiles/manager_tests.dir/test/manager_tests.c.o
[ 70%] Building C object src/plasma/CMakeFiles/manager_tests.dir/plasma_manager.c.o
[ 71%] Linking C executable manager_tests
[ 71%] Built target manager_tests
Scanning dependencies of target client_tests
[ 72%] Building C object src/plasma/CMakeFiles/client_tests.dir/test/client_tests.c.o
[ 73%] Linking C executable client_tests
[ 73%] Built target client_tests
Scanning dependencies of target plasma_client
[ 74%] Building C object src/plasma/CMakeFiles/plasma_client.dir/plasma_client.c.o
[ 75%] Linking C shared library libplasma_client.so
[ 75%] Built target plasma_client
[ 76%] Building C object src/photon/CMakeFiles/photon_client.dir/photon_client.c.o
[ 77%] Linking C static library libphoton_client.a
[ 77%] Built target photon_client
[ 78%] Building C object src/photon/CMakeFiles/photon.dir/photon_extension.c.o
[ 79%] Building C object src/photon/CMakeFiles/photon.dir/__/common/lib/python/common_extension.c.o
[ 80%] Linking C shared library libphoton.so
[ 80%] Built target photon
[ 81%] Building C object src/photon/CMakeFiles/photon_scheduler.dir/photon_scheduler.c.o
[ 82%] Building C object src/photon/CMakeFiles/photon_scheduler.dir/photon_algorithm.c.o
[ 84%] Linking C executable photon_scheduler
[ 84%] Built target photon_scheduler
[ 85%] Building C object src/photon/CMakeFiles/photon_tests.dir/test/photon_tests.c.o
[ 86%] Building C object src/photon/CMakeFiles/photon_tests.dir/photon_scheduler.c.o
[ 87%] Building C object src/photon/CMakeFiles/photon_tests.dir/photon_algorithm.c.o
[ 88%] Linking C executable photon_tests
[ 88%] Built target photon_tests
[ 89%] Building C object src/global_scheduler/CMakeFiles/global_scheduler.dir/global_scheduler.c.o
[ 90%] Building C object src/global_scheduler/CMakeFiles/global_scheduler.dir/global_scheduler_algorithm.c.o
[ 91%] Linking C executable global_scheduler
[ 91%] Built target global_scheduler
Scanning dependencies of target numbuf
[ 92%] Building CXX object src/numbuf/CMakeFiles/numbuf.dir/cpp/src/numbuf/tensor.cc.o
[ 93%] Building CXX object src/numbuf/CMakeFiles/numbuf.dir/cpp/src/numbuf/dict.cc.o
[ 94%] Building CXX object src/numbuf/CMakeFiles/numbuf.dir/cpp/src/numbuf/sequence.cc.o
[ 95%] Building CXX object src/numbuf/CMakeFiles/numbuf.dir/python/src/pynumbuf/numbuf.cc.o
In file included from /Users/edwardgao/projects/ray/src/numbuf/python/src/pynumbuf/numbuf.cc:15:
/Users/edwardgao/projects/ray/src/numbuf/python/src/pynumbuf/memory.h:25:17: warning: 
      'Read' overrides a member function but is not marked 'override'
      [-Winconsistent-missing-override]
  arrow::Status Read(int64_t nbytes, int64_t* bytes_read, uint8_t* out) {
                ^
/Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/src/arrow/io/interfaces.h:77:18: note: 
      overridden virtual function is here
  virtual Status Read(int64_t nbytes, int64_t* bytes_read, uint8_t* out) = 0;
                 ^
In file included from /Users/edwardgao/projects/ray/src/numbuf/python/src/pynumbuf/numbuf.cc:20:
In file included from /Users/edwardgao/projects/ray/src/numbuf/../plasma/plasma_client.h:7:
In file included from /Users/edwardgao/projects/ray/src/numbuf/../plasma/plasma.h:13:
/Users/edwardgao/projects/ray/src/common/cmake/../common.h:100:9: warning: 'DCHECK' macro
      redefined [-Wmacro-redefined]
#define DCHECK(COND) CHECK(COND)
        ^
/Users/edwardgao/projects/ray/src/numbuf/thirdparty/arrow/cpp/src/arrow/util/logging.h:73:9: note: 
      previous definition is here
#define DCHECK(condition) ARROW_CHECK(condition)
        ^
2 warnings generated.
[ 96%] Building CXX object src/numbuf/CMakeFiles/numbuf.dir/python/src/pynumbuf/adapters/numpy.cc.o
[ 97%] Building CXX object src/numbuf/CMakeFiles/numbuf.dir/python/src/pynumbuf/adapters/python.cc.o
[ 98%] Building C object src/numbuf/CMakeFiles/numbuf.dir/__/common/lib/python/common_extension.c.o
[100%] Linking CXX shared library libnumbuf.so
[100%] Built target numbuf
~/projects/ray

ray builds need to clean up /tmp after building

on the new/staging jenkins worker, the ray builds are leaving over 1k files in /tmp:

sknapp@amp-jenkins-staging-worker-02:/tmp$ ls -l scheduler* plasma_* | wc -l
1302

please ensure that this stuff gets cleaned up once builds are done.

thanks!

Warning messages when redis-server is started on Linux.

On Linux, after compiling redis and starting a redis-server (e.g., with src/common/thirdparty/redis/src/redis-server, a large number of warning messages appear.

These messages appear when the user starts Ray with ray.init.

ubuntu@ip-172-31-37-7:~/ray$ src/common/thirdparty/redis/src/redis-server
25487:C 22 Dec 23:43:21.136 # Warning: no config file specified, using the default config. In order to specify a config file use src/common/thirdparty/redis/src/redis-server /path/to/redis.conf
25487:M 22 Dec 23:43:21.137 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.9.102 (46a88703/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 25487
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

25487:M 22 Dec 23:43:21.138 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
25487:M 22 Dec 23:43:21.138 # Server started, Redis version 3.9.102
25487:M 22 Dec 23:43:21.138 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
25487:M 22 Dec 23:43:21.138 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
25487:M 22 Dec 23:43:21.138 * The server is now ready to accept connections on port 6379
  • The first one can be addressed by passing in a config file.
  • I'm not sure about the next three, and I don't have a sense of how serious they are (though the redis server does output some very nice suggestions for what to do).

Eventually, I'd like the Redis stdout/stderr (and the stdout/stderr for other processes as well) to be viewable through a web UI but not logged to the terminal.

In the meantime, should we suppress these messages by just redirecting the Redis stdout/stderr to /dev/null? Should we leave them as they are? Or should we address them..

Actors cannot be created before ray.init is called.

It's important to be able to define remote functions and things like that before a call to ray.init (for example, these may be defined in a Python module which are imported at the top of the file before you call ray.init).

The same is probably true for actors. We should enable actors to be defined before a call to ray.init by caching the created actors and exporting them when ray.init is eventually called.

Fix ray.wait for timeouts > 2**30

Right now, we limit the maximum timeout for ray.wait to be 2**30 us. This is ~1000s, which is too short. We should track down the problem in here:

#42

And make the timeout a 64-bit number.

Ray Wait is waiting for all tasks to complete

RL Experiments

I'm doing RL sampling right now, and I'm timing the tasks at a finer grain. I'm also using a ray version off commit 6cd02d7

    while num_samples < max_samples:
        for i in range(num_workers - len(remaining)): 
            remaining.append(ray_rollout.remote(param_id, max_path_length))
        done, remaining = ray.wait(remaining)
        result, timestamps = ray.get(done[0])
        num_samples += len(result['rewards'])

Here is the rollout code:

@ray.remote
def ray_rollout(policy_params, max_path_length):
    """returns rollout dictionary, id, (start, end)"""
    start_time = str(datetime.now())
    ...
    return rollout(policy_params), (start_time, str(datetime.now()))

However, instead of getting fine-grain tasks, the time plots look like this:

screenshot 2016-12-23 14 20 12

Retry socket connections

A common pattern for a Ray process is to connect to some socket(s). In these cases, we often try once and exit violently if the initial connection is unsuccessful. However, these failures to connect may be transient (for instance, if the listen buffer on the server-side is full during photon_connect). In these cases, we should retry the connection, with some backoff.

Problem running Ray with Anaconda 3.6.

Following the installation instructions using Anaconda 3.6 on Ubuntu and running python test/runtest.py fails with the following error.

Traceback (most recent call last):
  File "../test/runtest.py", line 7, in <module>
    import ray
  File "/home/ubuntu/.local/lib/python3.6/site-packages/ray-0.0.1-py3.6.egg/ray/__init__.py", line 17, in <module>
    import ray.serialization
  File "/home/ubuntu/.local/lib/python3.6/site-packages/ray-0.0.1-py3.6.egg/ray/serialization.py", line 6, in <module>
    import numbuf
  File "/home/ubuntu/.local/lib/python3.6/site-packages/ray-0.0.1-py3.6.egg/numbuf/__init__.py", line 14, in <module>
    from core.src.numbuf.libnumbuf import *
ImportError: /home/ubuntu/anaconda3/lib/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/ubuntu/.local/lib/python3.6/site-packages/ray-0.0.1-py3.6.egg/core/src/numbuf/libnumbuf.so)

This seems related to #131 (which occurred with Python 3.5), but the error message has changed for Python 3.6.

Running conda install libgcc fixed the problem for me. Interestingly, after subsequently doing conda uninstall libgcc, the problem was still fixed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.