Comments (4)
I just tried this again, and it still fails. On the driver with
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[FATAL] (/Users/rkn/Workspace/ray/src/plasma/plasma_client.c:747: errno: Broken pipe) Check failure: plasma_send_WaitRequest(conn->manager_conn, conn->builder, object_requests, num_object_requests, num_ready_objects, timeout_ms) >= 0
0 libplasma.so 0x00000001069de4e1 plasma_wait + 1537
1 libplasma.so 0x00000001069c2abb PyPlasma_wait + 779
2 libpython3.5m.dylib 0x000000010004ff38 PyCFunction_Call + 280
3 libpython3.5m.dylib 0x00000001000bd2df PyEval_EvalFrameEx + 22431
4 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
5 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
6 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
7 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
8 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
9 libpython3.5m.dylib 0x00000001000b52ab builtin_exec + 555
10 libpython3.5m.dylib 0x000000010004ff38 PyCFunction_Call + 280
11 libpython3.5m.dylib 0x00000001000bd2df PyEval_EvalFrameEx + 22431
12 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
13 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
14 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
15 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
16 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
17 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
18 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
19 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
20 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
21 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
22 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
23 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
24 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
25 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
26 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
27 libpython3.5m.dylib 0x00000001000c192f fast_function + 207
28 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
29 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
30 libpython3.5m.dylib 0x00000001000b7b1e PyEval_EvalCodeEx + 78
31 libpython3.5m.dylib 0x000000010003430f function_call + 351
32 libpython3.5m.dylib 0x000000010000fd73 PyObject_Call + 99
33 libpython3.5m.dylib 0x00000001000bdec8 PyEval_EvalFrameEx + 25480
34 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
35 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
36 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
37 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
38 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
39 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
40 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
41 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
42 python 0x0000000100000e92 main + 418
43 python 0x0000000100000cc4 start + 52
/Users/rkn/anaconda/bin/python.app: line 3: 63731 Abort trap: 6 /Users/rkn/anaconda/python.app/Contents/MacOS/python "$@"
and on the plasma_manager with
[WARN] (/Users/rkn/Workspace/ray/src/common/state/table.c:95) Table command object_table_request_notifications with timer ID 20 failed
[FATAL] (/Users/rkn/Workspace/ray/src/plasma/plasma_manager.c:976: errno: Invalid argument) Check failure: 0
0 plasma_manager 0x00000001031496b1 fatal_table_callback + 161
1 plasma_manager 0x000000010315f04a table_timeout_handler + 810
2 plasma_manager 0x00000001031625db processTimeEvents + 443
3 plasma_manager 0x00000001031621a5 aeProcessEvents + 677
4 plasma_manager 0x00000001031627ce aeMain + 94
5 plasma_manager 0x000000010314ed05 event_loop_run + 21
6 plasma_manager 0x000000010314e0db start_server + 1211
7 plasma_manager 0x000000010314e69a main + 1370
8 libdyld.dylib 0x00007fffb7ade255 start + 1
Abort trap: 6
from ray.
Recreating this example through the Ray API.
import ray
import time
ray.init()
@ray.remote
def f():
time.sleep(1000)
l = [f.remote() for _ in range(100000)]
for i in range(1000):
print(i)
ray.wait(l, timeout=0)
Fails with
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[WARN] (/Users/rkn/Workspace/ray/src/common/state/table.c:95) Table command object_table_request_notifications with timer ID 26 failed
[FATAL] (/Users/rkn/Workspace/ray/src/plasma/plasma_manager.c:979: errno: Invalid argument) Check failure: 0
0 plasma_manager 0x0000000105efc261 fatal_table_callback + 161
1 plasma_manager 0x0000000105f123fa table_timeout_handler + 810
2 plasma_manager 0x0000000105f1598b processTimeEvents + 443
3 plasma_manager 0x0000000105f15555 aeProcessEvents + 677
4 plasma_manager 0x0000000105f15b7e aeMain + 94
5 plasma_manager 0x0000000105f018b5 event_loop_run + 21
6 plasma_manager 0x0000000105f00c8b start_server + 1211
7 plasma_manager 0x0000000105f0124a main + 1370
8 libdyld.dylib 0x00007fffa33fc255 start + 1
[INFO] (/Users/rkn/Workspace/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 5
[FATAL] (/Users/rkn/Workspace/ray/src/plasma/plasma_client.c:636: errno: Broken pipe) Check failure: plasma_send_FetchRequest(conn->manager_conn, conn->builder, object_ids, num_object_ids) >= 0
0 photon_scheduler 0x000000010820e8fa plasma_fetch + 634
1 photon_scheduler 0x00000001081e545f fetch_object_timeout_handler + 431
2 photon_scheduler 0x00000001082033cb processTimeEvents + 443
3 photon_scheduler 0x0000000108202f95 aeProcessEvents + 677
4 photon_scheduler 0x00000001082035be aeMain + 94
5 photon_scheduler 0x00000001081ee2e5 event_loop_run + 21
6 photon_scheduler 0x00000001081db4e0 start_server + 720
7 photon_scheduler 0x00000001081dc041 main + 2865
8 libdyld.dylib 0x00007fffa33fc255 start + 1
[FATAL] (/Users/rkn/Workspace/ray/src/photon/photon_client.c:61: errno: None) Check failure: type == EXECUTE_TASK
0 libphoton.so 0x00000001061b38b7 photon_get_task + 263
1 libphoton.so 0x00000001061b0372 PyPhotonClient_get_task + 34
2 libpython3.5m.dylib 0x00000001000bd34e PyEval_EvalFrameEx + 22542
3 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
4 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
5 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
6 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
7 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
8 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
9 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
10 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
11 python 0x0000000100000dc7 main + 215
12 python 0x0000000100000ce4 start + 52
13 ??? 0x0000000000000009 0x0 + 9
[INFO] (/Users/rkn/Workspace/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 20
[FATAL] (/Users/rkn/Workspace/ray/src/photon/photon_client.c:61: errno: None) Check failure: type == EXECUTE_TASK
0 libphoton.so 0x00000001061ac8b7 photon_get_task + 263
1 libphoton.so 0x00000001061a9372 PyPhotonClient_get_task + 34
[FATAL] (/Users/rkn/Workspace/ray/src/photon/photon_client.c:61: errno: None) Check failure: type == EXECUTE_TASK
0 libphoton.so 0x00000001061ac8b7 photon_get_task + 263
1 libphoton.so 0x00000001061a9372 PyPhotonClient_get_task + 34
2 libpython3.5m.dylib 0x00000001000bd34e PyEval_EvalFrameEx + 22542
3 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
2 libpython3.5m.dylib 0x00000001000bd34e PyEval_EvalFrameEx + 22542
4 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
5 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
3 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
6 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
4 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
7 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
5 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
6 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
8 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
9 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
7 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
10 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
11 python 0x0000000100000dc7 main + 215
12 python 0x0000000100000ce4 start + 52
8 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
13 ??? 0x0000000000000009 0x0 + 9
9 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
10 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
11 python 0x0000000100000dc7 main + 215
12 python 0x0000000100000ce4 start + 52
13 ??? 0x0000000000000009 0x0 + 9
[INFO] (/Users/rkn/Workspace/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 17
[INFO] (/Users/rkn/Workspace/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 19
18
[FATAL] (/Users/rkn/Workspace/ray/src/plasma/plasma_client.c:674: errno: Broken pipe) Check failure: plasma_send_WaitRequest(conn->manager_conn, conn->builder, object_requests, num_object_requests, num_ready_objects, timeout_ms) >= 0
0 libplasma.so 0x00000001077ecc01 plasma_wait + 1537
1 libplasma.so 0x00000001077d163b PyPlasma_wait + 779
2 libpython3.5m.dylib 0x000000010004ff38 PyCFunction_Call + 280
3 libpython3.5m.dylib 0x00000001000bd2df PyEval_EvalFrameEx + 22431
4 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
5 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
6 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
7 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
8 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
9 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
10 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
11 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
12 libpython3.5m.dylib 0x00000001000b52ab builtin_exec + 555
13 libpython3.5m.dylib 0x000000010004ff38 PyCFunction_Call + 280
14 libpython3.5m.dylib 0x00000001000bd2df PyEval_EvalFrameEx + 22431
15 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
16 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
17 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
18 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
19 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
20 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
21 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
22 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
23 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
24 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
25 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
26 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
27 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
28 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
29 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
30 libpython3.5m.dylib 0x00000001000c192f fast_function + 207
31 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
32 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
33 libpython3.5m.dylib 0x00000001000b7b1e PyEval_EvalCodeEx + 78
34 libpython3.5m.dylib 0x000000010003430f function_call + 351
35 libpython3.5m.dylib 0x000000010000fd73 PyObject_Call + 99
36 libpython3.5m.dylib 0x00000001000bdec8 PyEval_EvalFrameEx + 25480
37 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
38 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
39 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
40 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
41 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
42 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
43 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
44 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
45 python 0x0000000100000e92 main + 418
46 python 0x0000000100000cc4 start + 52
[INFO] (/Users/rkn/Workspace/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 10
/Users/rkn/anaconda/bin/python.app: line 3: 88168 Abort trap: 6 /Users/rkn/anaconda/python.app/Contents/MacOS/python "$@"
from ray.
Trying the above example out on the branch for #312, this gets to i=5000
(it's still running), but slows down substantially as time goes on.
from ray.
This should be addressed by #312.
from ray.
Related Issues (20)
- AttributeError: 'NCCLCommunicator' object has no attribute 'comm' HOT 1
- [Data] Does map_batches support runtime_env settings? HOT 1
- [Workflows] Can workflow support batch processing capabilities?
- [core] ray.shutdown() hangs in joining the logger thread
- algo.train() unclear number of training steps
- Expired TLS certificates for docs.ray.io
- [Serve] Remove unnecessary checks in `serve.get_app_handle`/`ServeControllerClient.get_handle()`
- [Dashboard] reset tracked tasks to 0 from client after maximum hi
- [Data] `ArrowInvalid: offset overflow` when calling `Dataset.map_groups()`
- CI test linux://rllib:examples/connectors/prev_actions_prev_rewards_multi_agent_ppo is flaky HOT 2
- Release test chaos_torch_batch_inference_16_gpu_300gb_raw.aws failed HOT 1
- [serve] `InMemoryMetricsStore` leaks memory with handle-side autoscaling metrics enabled
- Phasing out Ray Docker images for Intel-based Mac OSX
- [Serve][High] How to properly configure the load on replicas?
- There are some deprecated items that need to be removed
- CI test linux://rllib:examples/evaluation/evaluation_parallel_to_training_multi_agent_duration_auto_torch_envrunner is flaky HOT 4
- Ray v2.11.0 missing windows distribution HOT 3
- [Ray Tune/ Train] Auth with aws_web_identity_token or use the provided file system provider in runtime config HOT 3
- [<Ray component: Core>] ray raise error on nvidia cuda machine for amdgpu missing
- [Tune] Trials on pre-started game instances
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.