WARNING 03-15 14:52:00 scheduler.py:195] Input prompt (4108 tokens) is too long and ex

我现在是0.9.0 参数传递好像是0.9.1加的 <a class="is

Same here, 在web中启动时加个参数可以解决 <a target="_blank" rel="noopener noreferrer" href="htt

我现在是0.9.0 参数传递好像是0.9.1加的 <a

qwen1.5 chat 14b 上下文没有 32k，差不多超过 8k 就不输出了是什么情况? about inference HOT 11 CLOSED

LucisBaoshg commented on September 24, 2024

qwen1.5 chat 14b 上下文没有 32k，差不多超过 8k 就不输出了是什么情况?

from inference.

Comments (11)

YYLCyylc commented on September 24, 2024 1

默认的max_model_len是4096，启动的时候加个参数--max_model_len 32000就行了

from inference.

YYLCyylc commented on September 24, 2024 1

我现在是0.9.0

参数传递好像是0.9.1加的
#1048

from inference.

Valdanitooooo commented on September 24, 2024 1

Same here, 在web中启动时加个参数可以解决

from inference.

ybsbbw commented on September 24, 2024

默认的max_model_len是4096，启动的时候加个参数--max_model_len 32000就行了
我shell下输入命令xinference launch -u Qwen1.5-72B-chat -n Qwen1.5-72B-chat -s 72 -f pytorch --max_model_len 32000，
结果反馈没有--max_model_len这个option

from inference.

ybsbbw commented on September 24, 2024

另外按照官网的python启动方式启动的时候，输入命令：
from xinference.client import RESTfulClient
client = RESTfulClient("http://127.0.0.1:9997")
model_uid = client.launch_model(
model_uid="my-llama-2",
model_name="llama-2-chat",
model_format="pytorch",
size_in_billions=13
)
print('Model uid: ' + model_uid)
提示没有size_in_billions这个参数

from inference.

YYLCyylc commented on September 24, 2024

默认的max_model_len是4096，启动的时候加个参数--max_model_len 32000就行了
我shell下输入命令xinference launch -u Qwen1.5-72B-chat -n Qwen1.5-72B-chat -s 72 -f pytorch --max_model_len 32000，
结果反馈没有--max_model_len这个option

更新下xinference的版本试试？

现在应该是model_size_in_billions这个参数