Giter Site home page Giter Site logo

Comments (9)

codingl2k1 avatar codingl2k1 commented on June 23, 2024

看上去是连了0.0.0.0,你可以用具体的机器ip试试,例如:xinference-worker -e http://10.100.108.220:9997 --log-level=debug

from inference.

Double-bear avatar Double-bear commented on June 23, 2024

xinference-worker -e http://10.100.108.220:9997 --log-level=debug

worker的那个ip我是通过传参数传进去的,后面我在sh里写死了,运行以后还是报同样的错误:
worker.sh

xinference-worker -e http://10.100.108.220:9997/ --log-level=debug

from inference.

codingl2k1 avatar codingl2k1 commented on June 23, 2024

10.100.108.220

确定ip没错吧?

from inference.

Double-bear avatar Double-bear commented on June 23, 2024

10.100.108.220

确定ip没错吧?

这个是server端当前的日志,应该没错吧

2024-02-05 11:02:50,956 xinference.core.supervisor 269 INFO     Xinference supervisor 10.100.108.220:25589 started

2024-02-05 11:02:55,956 xinference.core.supervisor 269 DEBUG    Enter get_status, args: (<xinference.core.supervisor.SupervisorActor object at 0x7fc16030c400>,), kwargs: {}

2024-02-05 11:02:55,956 xinference.core.supervisor 269 DEBUG    Leave get_status, elapsed time: 0 s

2024-02-05 11:02:57,448 xinference.api.restful_api 140 INFO     Starting Xinference at endpoint: http://10.100.108.220:9997

/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py:476: UserWarning: 

            Xinference ui is not built at expected directory: /usr/local/lib/python3.10/dist-packages/xinference/web/ui/build/

            To resolve this warning, navigate to /usr/local/lib/python3.10/dist-packages/xinference/web/ui/

            And build the Xinference ui by running "npm run build"

            

  warnings.warn(

2024-02-05 11:08:37,100 xinference.core.supervisor 269 DEBUG    Enter add_worker, args: (<xinference.core.supervisor.SupervisorActor object at 0x7fc16030c400>, '0.0.0.0:49736'), kwargs: {}

2024-02-05 11:40:37,037 xinference.core.supervisor 269 DEBUG    Enter add_worker, args: (<xinference.core.supervisor.SupervisorActor object at 0x7fc16030c400>, '0.0.0.0:50276'), kwargs: {}

2024-02-05 11:49:15,908 xinference.core.supervisor 269 DEBUG    Enter add_worker, args: (<xinference.core.supervisor.SupervisorActor object at 0x7fc16030c400>, '0.0.0.0:48767'), kwargs: {}

2024-02-05 13:54:49,201 xinference.core.supervisor 269 DEBUG    Enter add_worker, args: (<xinference.core.supervisor.SupervisorActor object at 0x7fc16030c400>, '0.0.0.0:30828'), kwargs: {}

from inference.

codingl2k1 avatar codingl2k1 commented on June 23, 2024

看着supervisor日志是有Enter add_worker的,worker的报错还是跟最开始一样吗?

from inference.

Double-bear avatar Double-bear commented on June 23, 2024

看着supervisor日志是有Enter add_worker的,worker的报错还是跟最开始一样吗?

是的,还是一样

2024-02-05 13:54:47,482 xinference.core.worker 121 INFO     Starting metrics export server at 0.0.0.0:None

2024-02-05 13:54:47,483 xinference.core.worker 121 INFO     Checking metrics export server...

2024-02-05 13:54:49,170 xinference.core.worker 121 INFO     Metrics server is started at: http://0.0.0.0:41831

Traceback (most recent call last):

  File "/usr/local/bin/xinference-worker", line 8, in <module>

    sys.exit(worker())

  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__

    return self.main(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main

    rv = self.invoke(ctx)

  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke

    return ctx.invoke(self.callback, **ctx.params)

  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke

    return __callback(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/cmdline.py", line 349, in worker

    main(

  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 94, in main

    loop.run_until_complete(task)

  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete

    return future.result()

  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 65, in _start_worker

    await start_worker_components(

  File "/usr/local/lib/python3.10/dist-packages/xinference/deploy/worker.py", line 43, in start_worker_components

    await xo.create_actor(

  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 78, in create_actor

    return await ctx.create_actor(actor_cls, *args, uid=uid, address=address, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 143, in create_actor

    return self._process_result_message(result)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message

    raise message.as_instanceof_cause()

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 596, in create_actor

    await self._run_coro(message.message_id, actor.__post_create__())

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 368, in _run_coro

    return await coro

  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 179, in __post_create__

    await self._supervisor_ref.add_worker(self.address)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 227, in send

    return self._process_result_message(result)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message

    raise message.as_instanceof_cause()

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 657, in send

    result = await self._run_coro(message.message_id, coro)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 368, in _run_coro

    return await coro

  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__

    return await super().__on_receive__(message)  # type: ignore

  File "xoscar/core.pyx", line 558, in __on_receive__

    raise ex

  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__

    async with self._lock:

  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__

    with debug_async_timeout('actor_lock_timeout',

  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__

    result = await result

  File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 44, in wrapped

    ret = await func(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 917, in add_worker

    worker_ref = await xo.actor_ref(address=worker_address, uid=WorkerActor.uid())

  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 125, in actor_ref

    return await ctx.actor_ref(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 196, in actor_ref

    future = await self._call(actor_ref.address, message, wait=False)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 77, in _call

    return await self._caller.call(

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/core.py", line 180, in call

    client = await self.get_client(router, dest_address)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/core.py", line 68, in get_client

    client = await router.get_client(dest_address, from_who=self)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/router.py", line 143, in get_client

    client = await self._create_client(client_type, address, **kw)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/router.py", line 157, in _create_client

    return await client_type.connect(address, local_address=local_address, **kw)

  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/communication/socket.py", line 255, in connect

    (reader, writer) = await asyncio.open_connection(host=host, port=port, **kwargs)

  File "/usr/lib/python3.10/asyncio/streams.py", line 48, in open_connection

    transport, _ = await loop.create_connection(

  File "/usr/lib/python3.10/asyncio/base_events.py", line 1076, in create_connection

    raise exceptions[0]

  File "/usr/lib/python3.10/asyncio/base_events.py", line 1060, in create_connection

    sock = await self._connect_sock(

  File "/usr/lib/python3.10/asyncio/base_events.py", line 969, in _connect_sock

    await self.sock_connect(sock, address)

  File "/usr/lib/python3.10/asyncio/selector_events.py", line 501, in sock_connect

    return await fut

  File "/usr/lib/python3.10/asyncio/selector_events.py", line 541, in _sock_connect_cb

    raise OSError(err, f'Connect call failed {address}')

ConnectionRefusedError: [address=10.100.108.220:25589, pid=269] [Errno 111] Connect call failed ('0.0.0.0', 30828)

from inference.

aresnow1 avatar aresnow1 commented on June 23, 2024

分布式下,worker -H 指定当前 worker 的 ip

from inference.

Double-bear avatar Double-bear commented on June 23, 2024

分布式下,worker -H 指定当前 worker 的 ip

成功了,谢谢!

from inference.

Double-bear avatar Double-bear commented on June 23, 2024

分布式下,worker -H 指定当前 worker 的 ip

还想请教一个问题,我一台机子有八张卡,我用四张卡启了一个qwen 72b的模型,但是在launch的时候oom了,我单卡的显存是80G,肯定是够的,请问在启动的时候还需要设置什么吗?下面是我sh的命令:

pip install xinference
MASTER_IP=$(ifconfig | grep -o 'inet [0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+' | grep -v '127.0.0.1' | head -n 1 | awk '{print $2}')
xinference-local -H "$MASTER_IP" --port 9997 --log-level=debug

from inference.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.