Comments (4)
Reproduction script
git clone https://github.com/modin-project/modin
cd modin
mamba env create -f environment-dev.yml
conda activate modin
python -m pytest -n 2 modin/test/test_partition_api.py
running pip install .
from the cloned modin directory fixes the problem
from ray.
It would be great to know the reason why it worked before and what exactly has been changed in 2.10.0.
from ray.
I am running into similar issues with ray 2.10.0
I have upgraded from 2.9.3 and am now getting the following error (which only happens on the worker.
Exception raised in creation task: The actor died because of an error raised in its creation task, �[36mray::SERVE_REPLICA::llm_app#xformers-hf-internal-testing-tiny-random-gpt2#dmsm0oos:ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2.__init__()�[39m (pid=457, ip=192.128.15.91, actor_id=7e50a57d8b03c22157ca5ff901000000, repr=<ray.serve._private.replica.ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2 object at 0x7f3d93b532b0>)
File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 258, in __init__
deployment_def = cloudpickle.loads(serialized_deployment_def)
ModuleNotFoundError: No module named 'kaiko.llm_serve'
[2024-03-26 15:43:01,478 E 457 457] logging.cc:104: Stack trace:
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xfe543a) [0x7f3ee7f0043a] ray::operator<<()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xfe7b78) [0x7f3ee7f02b78] ray::TerminateHandler()
/home/ray/anaconda3/bin/../lib/libstdc++.so.6(+0xb135a) [0x7f3ee6da835a] __cxxabiv1::__terminate()
/home/ray/anaconda3/bin/../lib/libstdc++.so.6(+0xb13c5) [0x7f3ee6da83c5]
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x7c9670) [0x7f3ee76e4670] std::thread::_State_impl<>::~_State_impl()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x6285ba) [0x7f3ee75435ba] std::_Sp_counted_base<>::_M_release()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x7b2772) [0x7f3ee76cd772] std::_Sp_counted_ptr_inplace<>::_M_dispose()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x6285ba) [0x7f3ee75435ba] std::_Sp_counted_base<>::_M_release()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x6e6d92) [0x7f3ee7601d92] std::default_delete<>::operator()()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorkerD1Ev+0xf7) [0x7f3ee76734e7] ray::core::CoreWorker::~CoreWorker()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x6285ba) [0x7f3ee75435ba] std::_Sp_counted_base<>::_M_release()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x134) [0x7f3ee76b2484] ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core17CoreWorkerProcess20RunTaskExecutionLoopEv+0x1d) [0x7f3ee76b258d] ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x5a38e7) [0x7f3ee74be8e7] __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop()
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x4ff2f4] method_vectorcall_NOARGS
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyEval_EvalFrameDefault+0x731) [0x4ed6d1] _PyEval_EvalFrameDefault
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyFunction_Vectorcall+0x6f) [0x4fcadf] _PyFunction_Vectorcall
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyEval_EvalFrameDefault+0x731) [0x4ed6d1] _PyEval_EvalFrameDefault
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x591d92] _PyEval_Vector
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(PyEval_EvalCode+0x87) [0x591cd7] PyEval_EvalCode
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x5c2967] run_eval_code_obj
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x5bdad0] run_mod
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x45956b] pyrun_file.cold
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyRun_SimpleFileObject+0x19f) [0x5b805f] _PyRun_SimpleFileObject
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyRun_AnyFileObject+0x43) [0x5b7dc3] _PyRun_AnyFileObject
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(Py_RunMain+0x38d) [0x5b4b7d] Py_RunMain
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(Py_BytesMain+0x39) [0x584e49] Py_BytesMain
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3ee8bb5083] __libc_start_main
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x584cfe]
*** SIGABRT received at time=1711464181 on cpu 3 ***
PC: @ 0x7f3ee8bd400b (unknown) raise
@ 0x7f3ee8ef1420 (unknown) (unknown)
@ 0x7f3ee6da835a 80 __cxxabiv1::__terminate()
@ 0x7f3ee75435ba 32 std::_Sp_counted_base<>::_M_release()
@ 0x7f3ee76cd772 96 std::_Sp_counted_ptr_inplace<>::_M_dispose()
@ 0x7f3ee75435ba 32 std::_Sp_counted_base<>::_M_release()
@ 0x7f3ee7601d92 144 std::default_delete<>::operator()()
@ 0x7f3ee76734e7 128 ray::core::CoreWorker::~CoreWorker()
@ 0x7f3ee75435ba 32 std::_Sp_counted_base<>::_M_release()
@ 0x7f3ee76b2484 112 ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
@ 0x7f3ee76b258d 32 ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
@ 0x7f3ee74be8e7 32 __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop()
@ 0x4ff2f4 (unknown) method_vectorcall_NOARGS
@ ... and at least 1 more frames
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: *** SIGABRT received at time=1711464181 on cpu 3 ***
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: PC: @ 0x7f3ee8bd400b (unknown) raise
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee8ef1420 (unknown) (unknown)
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee6da835a 80 __cxxabiv1::__terminate()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee75435ba 32 std::_Sp_counted_base<>::_M_release()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee76cd772 96 std::_Sp_counted_ptr_inplace<>::_M_dispose()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee75435ba 32 std::_Sp_counted_base<>::_M_release()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee7601d92 144 std::default_delete<>::operator()()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee76734e7 128 ray::core::CoreWorker::~CoreWorker()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee75435ba 32 std::_Sp_counted_base<>::_M_release()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee76b2484 112 ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee76b258d 32 ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x7f3ee74be8e7 32 __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ 0x4ff2f4 (unknown) method_vectorcall_NOARGS
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: @ ... and at least 1 more frames
Fatal Python error: Aborted
Stack (most recent call first):
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 879 in main_loop
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/workers/default_worker.py", line 282 in <module>
Extension modules: msgpack._cmsgpack, google.protobuf.pyext._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, _brotli, charset_normalizer.md, uvloop.loop, ray._raylet, pvectorc, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, grpc._cython.cygrpc, pyarrow._json (total: 93)
This library (kaiko.llm_serve
) is also used in the head node, and they share the same custom image.
from ray.
@YarShev @hahamark1 Can you provide a simpler reproduction script using only Ray?
from ray.
Related Issues (20)
- [Serve] Deployment fails to start with `ModuleNotFoundError` in Ray 3.10 HOT 3
- CI test darwin://python/ray/tests:test_concurrency_group is flaky HOT 1
- CI test darwin://:gcs_client_reconnection_test is flaky HOT 7
- any_cast unavailable in Mac CI HOT 5
- Feature Request - Automatic Cleanup of Finished Jobs in Ray's JobSubmissionClient
- [Core] Support Mojo
- Docs page malformed on Chrome on iPhone HOT 2
- [Core] [runtime env] Stream output for pip install HOT 1
- Ray Serve: Custom Docker image not working HOT 9
- Release test dataset_shuffle_push_based_sort_1tb.aws failed HOT 1
- [Serve] chaining DeploymentHandle calls not work as expected HOT 12
- [core] If streaming generator callee lost 1 rpc, the caller hangs.
- [Data] [Dashboard] `_StatsActor` appears on dashboard actors page when Ray Data is not in use
- [RayServe] Autoscaling Issue with Neuron Devices (Inf2), RayServe, and Karpenter on EKS HOT 4
- [RLlib] Rock Paper Scissors Example Error
- [docs infra] `Edit on GitHub` button on library examples.html pages don't work
- specifying conda runtime_env using fullpath no longer works HOT 1
- [Data] Ray Data metadata can be corrupted after a sort call
- CI test darwin://python/ray/tests:test_client_library_integration is consistently_failing HOT 2
- CI test darwin://python/ray/tests:test_tensorflow is consistently_failing HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.