Giter Site home page Giter Site logo

Comments (4)

dchigarev avatar dchigarev commented on May 8, 2024

Reproduction script
git clone https://github.com/modin-project/modin
cd modin
mamba env create -f environment-dev.yml
conda activate modin
python -m pytest -n 2 modin/test/test_partition_api.py

running pip install . from the cloned modin directory fixes the problem

from ray.

YarShev avatar YarShev commented on May 8, 2024

It would be great to know the reason why it worked before and what exactly has been changed in 2.10.0.

from ray.

hahamark1 avatar hahamark1 commented on May 8, 2024

I am running into similar issues with ray 2.10.0

I have upgraded from 2.9.3 and am now getting the following error (which only happens on the worker.

Exception raised in creation task: The actor died because of an error raised in its creation task, �[36mray::SERVE_REPLICA::llm_app#xformers-hf-internal-testing-tiny-random-gpt2#dmsm0oos:ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2.__init__()�[39m (pid=457, ip=192.128.15.91, actor_id=7e50a57d8b03c22157ca5ff901000000, repr=<ray.serve._private.replica.ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2 object at 0x7f3d93b532b0>)
  File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 258, in __init__
    deployment_def = cloudpickle.loads(serialized_deployment_def)
ModuleNotFoundError: No module named 'kaiko.llm_serve'
[2024-03-26 15:43:01,478 E 457 457] logging.cc:104: Stack trace: 
 /home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xfe543a) [0x7f3ee7f0043a] ray::operator<<()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0xfe7b78) [0x7f3ee7f02b78] ray::TerminateHandler()
/home/ray/anaconda3/bin/../lib/libstdc++.so.6(+0xb135a) [0x7f3ee6da835a] __cxxabiv1::__terminate()
/home/ray/anaconda3/bin/../lib/libstdc++.so.6(+0xb13c5) [0x7f3ee6da83c5]
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x7c9670) [0x7f3ee76e4670] std::thread::_State_impl<>::~_State_impl()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x6285ba) [0x7f3ee75435ba] std::_Sp_counted_base<>::_M_release()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x7b2772) [0x7f3ee76cd772] std::_Sp_counted_ptr_inplace<>::_M_dispose()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x6285ba) [0x7f3ee75435ba] std::_Sp_counted_base<>::_M_release()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x6e6d92) [0x7f3ee7601d92] std::default_delete<>::operator()()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorkerD1Ev+0xf7) [0x7f3ee76734e7] ray::core::CoreWorker::~CoreWorker()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x6285ba) [0x7f3ee75435ba] std::_Sp_counted_base<>::_M_release()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x134) [0x7f3ee76b2484] ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core17CoreWorkerProcess20RunTaskExecutionLoopEv+0x1d) [0x7f3ee76b258d] ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
/home/ray/anaconda3/lib/python3.10/site-packages/ray/_raylet.so(+0x5a38e7) [0x7f3ee74be8e7] __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop()
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x4ff2f4] method_vectorcall_NOARGS
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyEval_EvalFrameDefault+0x731) [0x4ed6d1] _PyEval_EvalFrameDefault
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyFunction_Vectorcall+0x6f) [0x4fcadf] _PyFunction_Vectorcall
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyEval_EvalFrameDefault+0x731) [0x4ed6d1] _PyEval_EvalFrameDefault
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x591d92] _PyEval_Vector
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(PyEval_EvalCode+0x87) [0x591cd7] PyEval_EvalCode
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x5c2967] run_eval_code_obj
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x5bdad0] run_mod
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x45956b] pyrun_file.cold
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyRun_SimpleFileObject+0x19f) [0x5b805f] _PyRun_SimpleFileObject
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(_PyRun_AnyFileObject+0x43) [0x5b7dc3] _PyRun_AnyFileObject
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(Py_RunMain+0x38d) [0x5b4b7d] Py_RunMain
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2(Py_BytesMain+0x39) [0x584e49] Py_BytesMain
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3ee8bb5083] __libc_start_main
ray::ServeReplica:llm_app:xformers-hf-internal-testing-tiny-random-gpt2() [0x584cfe]

*** SIGABRT received at time=1711464181 on cpu 3 ***
PC: @     0x7f3ee8bd400b  (unknown)  raise
    @     0x7f3ee8ef1420  (unknown)  (unknown)
    @     0x7f3ee6da835a         80  __cxxabiv1::__terminate()
    @     0x7f3ee75435ba         32  std::_Sp_counted_base<>::_M_release()
    @     0x7f3ee76cd772         96  std::_Sp_counted_ptr_inplace<>::_M_dispose()
    @     0x7f3ee75435ba         32  std::_Sp_counted_base<>::_M_release()
    @     0x7f3ee7601d92        144  std::default_delete<>::operator()()
    @     0x7f3ee76734e7        128  ray::core::CoreWorker::~CoreWorker()
    @     0x7f3ee75435ba         32  std::_Sp_counted_base<>::_M_release()
    @     0x7f3ee76b2484        112  ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
    @     0x7f3ee76b258d         32  ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
    @     0x7f3ee74be8e7         32  __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop()
    @           0x4ff2f4  (unknown)  method_vectorcall_NOARGS
    @ ... and at least 1 more frames
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: *** SIGABRT received at time=1711464181 on cpu 3 ***
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361: PC: @     0x7f3ee8bd400b  (unknown)  raise
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee8ef1420  (unknown)  (unknown)
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee6da835a         80  __cxxabiv1::__terminate()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee75435ba         32  std::_Sp_counted_base<>::_M_release()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee76cd772         96  std::_Sp_counted_ptr_inplace<>::_M_dispose()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee75435ba         32  std::_Sp_counted_base<>::_M_release()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee7601d92        144  std::default_delete<>::operator()()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee76734e7        128  ray::core::CoreWorker::~CoreWorker()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee75435ba         32  std::_Sp_counted_base<>::_M_release()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee76b2484        112  ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee76b258d         32  ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @     0x7f3ee74be8e7         32  __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop()
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @           0x4ff2f4  (unknown)  method_vectorcall_NOARGS
[2024-03-26 15:43:01,555 E 457 457] logging.cc:361:     @ ... and at least 1 more frames
Fatal Python error: Aborted

Stack (most recent call first):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 879 in main_loop
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/workers/default_worker.py", line 282 in <module>

Extension modules: msgpack._cmsgpack, google.protobuf.pyext._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, _brotli, charset_normalizer.md, uvloop.loop, ray._raylet, pvectorc, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.tslib, pandas._libs.lib, pandas._libs.hashing, pyarrow.lib, pyarrow._hdfsio, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.index, pandas._libs.join, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.internals, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.tslibs.strptime, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, grpc._cython.cygrpc, pyarrow._json (total: 93)

This library (kaiko.llm_serve) is also used in the head node, and they share the same custom image.

from ray.

hongchaodeng avatar hongchaodeng commented on May 8, 2024

@YarShev @hahamark1 Can you provide a simpler reproduction script using only Ray?

from ray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.