Comments (1)
This is probably the cause of the error in testGettingManyObjects
in stress_tests.py
in this log https://travis-ci.org/ray-project/ray/jobs/200210188.
testGettingManyObjects (__main__.TaskTests) ... Waiting for redis server at 127.0.0.1:38081 to respond...
10098:M 10 Feb 07:14:23.632 # Server started, Redis version 3.9.102
Failed to connect to the redis server, retrying.
Waiting for redis server at 127.0.0.1:38081 to respond...
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:870) Allowing the Plasma store to use up to 3.44GB of memory.
[INFO] (/Users/travis/build/ray-project/ray/src/photon/photon_scheduler.c:799) Start worker command is python /Users/travis/.local/lib/python3.5/site-packages/ray-0.0.1-py3.5.egg/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --object-store-name=/tmp/plasma_store79902205 --object-store-manager-name=/tmp/plasma_manager74482700 --local-scheduler-name=/tmp/scheduler28728453 --redis-address=127.0.0.1:38081
[ERROR] (/Users/travis/build/ray-project/ray/src/common/io.c:115: errno: Connection refused) Connection to socket failed for pathname /tmp/scheduler28728453.
[FATAL] (/Users/travis/build/ray-project/ray/src/photon/photon_client.c:14: errno: Bad file descriptor) Check failure: success == 0
Unable to register worker with local scheduler
0 libphoton.so 0x0000000102e0f1ff photon_connect + 287
1 libphoton.so 0x0000000102e0c24e PyPhotonClient_init + 78
[ERROR] (/Users/travis/build/ray-project/ray/src/common/io.c:115: errno: Connection refused) Connection to socket failed for pathname /tmp/scheduler28728453.
[FATAL] (/Users/travis/build/ray-project/ray/src/photon/photon_client.c:14: errno: Bad file descriptor) Check failure: success == 0
Unable to register worker with local scheduler
0 libphoton.so 0x0000000102e101ff photon_connect + 287
1 libphoton.so 0x0000000102e0d24e PyPhotonClient_init + 78
2 libpython3.5m.dylib 0x0000000100062329 type_call + 281
3 libpython3.5m.dylib 0x000000010000fd73 PyObject_Call + 99
4 libpython3.5m.dylib 0x00000001000bd766 PyEval_EvalFrameEx + 23590
5 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
6 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
7 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
8 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
9 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
10 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
11 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
12 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
13 python 0x0000000100000dc7 main + 215
14 python 0x0000000100000ce4 start + 52
15 ??? 0x0000000000000007 0x0 + 7
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 19
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 16
2 libpython3.5m.dylib 0x0000000100062329 type_call + 281
3 libpython3.5m.dylib 0x000000010000fd73 PyObject_Call + 99
4 libpython3.5m.dylib 0x00000001000bd766 PyEval_EvalFrameEx + 23590
5 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
6 libpython3.5m.dylib 0x00000001000c19ae fast_function + 334
7 libpython3.5m.dylib 0x00000001000bd434 PyEval_EvalFrameEx + 22772
8 libpython3.5m.dylib 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
9 libpython3.5m.dylib 0x00000001000b7ac1 PyEval_EvalCode + 81
10 libpython3.5m.dylib 0x00000001000e6937 PyRun_FileExFlags + 215
11 libpython3.5m.dylib 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
12 libpython3.5m.dylib 0x00000001000fcc5b Py_Main + 3355
13 python 0x0000000100000dc7 main + 215
14 python 0x0000000100000ce4 start + 52
15 ??? 0x0000000000000007 0x0 + 7
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 18
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 15
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 12
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 9
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 13
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 10
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 14
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 11
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 15
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 12
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 16
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 13
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 17
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 14
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 19
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 17
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 20
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 21
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 11
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 18
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 19
10098:signal-handler (1486710883) Received SIGTERM scheduling shutdown...
10098:M 10 Feb 07:14:43.662 # User requested shutdown...
10098:M 10 Feb 07:14:43.662 # Redis is now ready to exit, bye bye...
ok
The test still passes even though one of the workers failed to connect. Also, when we run self.assertTrue(ray.services.all_processes_alive())
, it doesn't catch the dead worker because workers are started from the local scheduler now (and not from services.py
).
from ray.
Related Issues (20)
- CI test linux://python/ray/serve/tests:test_handle_api_with_queue_len_cache_disabled is flaky HOT 1
- CI test linux://python/ray/serve/tests:test_handle_streaming_with_queue_len_cache_disabled is flaky HOT 1
- CI test linux://python/ray/serve/tests:test_handle_with_queue_len_cache_disabled is flaky HOT 1
- CI test linux://python/ray/serve/tests:test_max_queued_requests is flaky HOT 1
- CI test linux://python/ray/serve/tests:test_max_replicas_per_node_with_stop_fully_then_start_behavior is flaky HOT 1
- CI test linux://python/ray/serve/tests:test_multiplex_with_queue_len_cache_disabled is flaky HOT 1
- CI test linux://python/ray/serve/tests:test_request_timeout_with_queue_len_cache_disabled is flaky HOT 1
- CI test linux://python/ray/serve/tests/unit:test_router is flaky HOT 1
- [ RayCluster | Client server ] Ability to disable client authentication
- CI test darwin://:gcs_client_reconnection_test is flaky HOT 1
- CI test linux://rllib:examples/action_masking_tf2 is flaky
- CI test darwin://python/ray/tests:test_actor_retry is consistently_failing HOT 3
- CI test darwin://python/ray/tests:test_plasma_unlimited is flaky HOT 2
- [<Core>] Wrong calculation of the lineage object size leads to OOM.
- [SERVE] Ray crashes on small load in fastapi HOT 1
- [<Doc>] Missing steps in the profiling doc
- CI test linux://rllib:learning_tests_pendulum_cql is flaky HOT 1
- [Data] [Dashboard] Information for Datasets from previous jobs incorrectly show up in newly created jobs' page
- CI test linux://rllib:examples/connectors/connector_v2_prev_actions_prev_rewards_multi_agent_ppo is consistently_failing HOT 1
- CI test linux://rllib:tests/test_io is flaky HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.