Comments (11)
默认的max_model_len是4096,启动的时候加个参数--max_model_len 32000就行了
from inference.
我现在是0.9.0
参数传递好像是0.9.1加的
#1048
from inference.
from inference.
默认的max_model_len是4096,启动的时候加个参数--max_model_len 32000就行了
我shell下输入命令xinference launch -u Qwen1.5-72B-chat -n Qwen1.5-72B-chat -s 72 -f pytorch --max_model_len 32000,
结果反馈没有--max_model_len这个option
from inference.
另外按照官网的python启动方式启动的时候,输入命令:
from xinference.client import RESTfulClient
client = RESTfulClient("http://127.0.0.1:9997")
model_uid = client.launch_model(
model_uid="my-llama-2",
model_name="llama-2-chat",
model_format="pytorch",
size_in_billions=13
)
print('Model uid: ' + model_uid)
提示没有size_in_billions这个参数
from inference.
默认的max_model_len是4096,启动的时候加个参数--max_model_len 32000就行了
我shell下输入命令xinference launch -u Qwen1.5-72B-chat -n Qwen1.5-72B-chat -s 72 -f pytorch --max_model_len 32000,
结果反馈没有--max_model_len这个option
更新下xinference的版本试试?
现在应该是model_size_in_billions这个参数
from inference.
我现在是0.9.0
from inference.
我现在是0.9.0
参数传递好像是0.9.1加的 #1048
简直了
from inference.
docker 启动的怎么传递这个参数呢?
from inference.
This issue is stale because it has been open for 7 days with no activity.
from inference.
This issue was closed because it has been inactive for 5 days since being marked as stale.
from inference.
Related Issues (20)
- 无法使用vllm HOT 3
- xinference0.15.0 不能自动引入`flash_attn` HOT 3
- 在使用from langchain_openai import OpenAI访问模型时报错:openai.InternalServerError: Internal Server Error: 1 validation error for CreateCompletionRequest prompt str type expected (type=type_error.str)
- glm4-chat工具调用无法正确回答 HOT 1
- 一张大显存的显卡(一个slot)可以运行多个语言模型 HOT 1
- Launch bce-embedding-base_v1 model failed:Failed to launch model, detail: [address=0.0.0.0:44301, pid=12678] __init__() got an unexpected keyword argument 'model_kwargs' HOT 4
- 0.15.0版本xinf启动本地模型报错Model not found HOT 1
- File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 402, in verify_with_parallel_config if total_num_attention_heads % tensor_parallel_size != 0: ZeroDivisionError: [address=0.0.0.0:39009, pid=1112] integer division or modulo by zero HOT 1
- Qwen2-VL如何启用Flash-attention2? HOT 2
- xinference==0.15.0版本的transformers推理引擎速度太慢了 HOT 1
- gradio request document window creates a Curl request to make it easier
- xinference==0.15.0版本的vLLM引擎报错 HOT 5
- Question Classifcation function in workflow maybe abnormal HOT 1
- Using LoRA adapters by OpenAI API
- 所有模型都无法下载 HOT 1
- 双卡推理报错模型张量不在同一个设备上,inference单卡推理没问题,微调时双卡也没问题
- 运行FishSpeech-1.4模型报错Error locating target 'fish_speech.models.vqgan.modules.fsq.DownsampleFiniteScalarQuantize', set env var HYDRA_FULL_ERROR=1 to see chained exception. full_key: quantizer
- 0.15.1运行minicpm3-4b 报错
- 增加MiniCPM3-4B的Function call功能
- 用vllm引擎加载Qwen2-7b,参数max_model_len值默认或设置在4000到5000之间会导致系统关机
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from inference.