Giter Site home page Giter Site logo

Retry socket connections about ray HOT 1 CLOSED

ray-project avatar ray-project commented on April 27, 2024
Retry socket connections

from ray.

Comments (1)

robertnishihara avatar robertnishihara commented on April 27, 2024

This is probably the cause of the error in testGettingManyObjects in stress_tests.py in this log https://travis-ci.org/ray-project/ray/jobs/200210188.

testGettingManyObjects (__main__.TaskTests) ... Waiting for redis server at 127.0.0.1:38081 to respond...
10098:M 10 Feb 07:14:23.632 # Server started, Redis version 3.9.102
Failed to connect to the redis server, retrying.
Waiting for redis server at 127.0.0.1:38081 to respond...
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:870) Allowing the Plasma store to use up to 3.44GB of memory.
[INFO] (/Users/travis/build/ray-project/ray/src/photon/photon_scheduler.c:799) Start worker command is python /Users/travis/.local/lib/python3.5/site-packages/ray-0.0.1-py3.5.egg/ray/workers/default_worker.py --node-ip-address=127.0.0.1 --object-store-name=/tmp/plasma_store79902205 --object-store-manager-name=/tmp/plasma_manager74482700 --local-scheduler-name=/tmp/scheduler28728453 --redis-address=127.0.0.1:38081
[ERROR] (/Users/travis/build/ray-project/ray/src/common/io.c:115: errno: Connection refused) Connection to socket failed for pathname /tmp/scheduler28728453.
[FATAL] (/Users/travis/build/ray-project/ray/src/photon/photon_client.c:14: errno: Bad file descriptor) Check failure: success == 0 
Unable to register worker with local scheduler
0   libphoton.so                        0x0000000102e0f1ff photon_connect + 287
1   libphoton.so                        0x0000000102e0c24e PyPhotonClient_init + 78
[ERROR] (/Users/travis/build/ray-project/ray/src/common/io.c:115: errno: Connection refused) Connection to socket failed for pathname /tmp/scheduler28728453.
[FATAL] (/Users/travis/build/ray-project/ray/src/photon/photon_client.c:14: errno: Bad file descriptor) Check failure: success == 0 
Unable to register worker with local scheduler
0   libphoton.so                        0x0000000102e101ff photon_connect + 287
1   libphoton.so                        0x0000000102e0d24e PyPhotonClient_init + 78
2   libpython3.5m.dylib                 0x0000000100062329 type_call + 281
3   libpython3.5m.dylib                 0x000000010000fd73 PyObject_Call + 99
4   libpython3.5m.dylib                 0x00000001000bd766 PyEval_EvalFrameEx + 23590
5   libpython3.5m.dylib                 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
6   libpython3.5m.dylib                 0x00000001000c19ae fast_function + 334
7   libpython3.5m.dylib                 0x00000001000bd434 PyEval_EvalFrameEx + 22772
8   libpython3.5m.dylib                 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
9   libpython3.5m.dylib                 0x00000001000b7ac1 PyEval_EvalCode + 81
10  libpython3.5m.dylib                 0x00000001000e6937 PyRun_FileExFlags + 215
11  libpython3.5m.dylib                 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
12  libpython3.5m.dylib                 0x00000001000fcc5b Py_Main + 3355
13  python                              0x0000000100000dc7 main + 215
14  python                              0x0000000100000ce4 start + 52
15  ???                                 0x0000000000000007 0x0 + 7
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 19
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 16
2   libpython3.5m.dylib                 0x0000000100062329 type_call + 281
3   libpython3.5m.dylib                 0x000000010000fd73 PyObject_Call + 99
4   libpython3.5m.dylib                 0x00000001000bd766 PyEval_EvalFrameEx + 23590
5   libpython3.5m.dylib                 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
6   libpython3.5m.dylib                 0x00000001000c19ae fast_function + 334
7   libpython3.5m.dylib                 0x00000001000bd434 PyEval_EvalFrameEx + 22772
8   libpython3.5m.dylib                 0x00000001000c10c3 _PyEval_EvalCodeWithName + 1779
9   libpython3.5m.dylib                 0x00000001000b7ac1 PyEval_EvalCode + 81
10  libpython3.5m.dylib                 0x00000001000e6937 PyRun_FileExFlags + 215
11  libpython3.5m.dylib                 0x00000001000e60ea PyRun_SimpleFileExFlags + 842
12  libpython3.5m.dylib                 0x00000001000fcc5b Py_Main + 3355
13  python                              0x0000000100000dc7 main + 215
14  python                              0x0000000100000ce4 start + 52
15  ???                                 0x0000000000000007 0x0 + 7
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 18
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 15
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 12
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 9
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 13
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 10
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 14
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 11
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 15
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 12
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 16
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 13
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 17
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 14
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 19
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 17
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 20
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 21
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_manager.c:1396) Disconnecting client on fd 11
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 18
[INFO] (/Users/travis/build/ray-project/ray/src/plasma/plasma_store.c:796) Disconnecting client on fd 19
10098:signal-handler (1486710883) Received SIGTERM scheduling shutdown...
10098:M 10 Feb 07:14:43.662 # User requested shutdown...
10098:M 10 Feb 07:14:43.662 # Redis is now ready to exit, bye bye...
ok

The test still passes even though one of the workers failed to connect. Also, when we run self.assertTrue(ray.services.all_processes_alive()), it doesn't catch the dead worker because workers are started from the local scheduler now (and not from services.py).

from ray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.