Comments (2)
昨天拉取了Docker的V0.12.3版本,可以正常启动不需要在安装llama-cpp-python==0.2.77
但测试rerank模型问题依旧,每一次搜索都会增加大概5G显存占用,很快就占满且不会释放,必须要手动关闭模型才行
from inference.
解决了吗,我拉起rerank model也会拉升gpu显存占用
from inference.
Related Issues (20)
- 'system_message' is undefined HOT 5
- 升级到0.13.1后局域网访问时要求用户登录 HOT 6
- 调用rerank接口,返回的结果中,results.document 字段返回值都是空 HOT 1
- chatglm3 failed to load model:gguf_init_from_file: invalid magic characters 'V'
- support MInference in feature HOT 1
- 建议能支持一张卡加载多个模型 HOT 1
- Upgrade vllm and sglang to new version and support gemma model correctly HOT 2
- 注册自定义模型后,测试页面不可用 HOT 1
- docker启动时报错,详情见具体异常 HOT 2
- Both `max_new_tokens` (=512) and `max_length`(=518) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. HOT 2
- [ maybe a bug ] Occasional exceptions occurred when reasoning with the mlx model yi-1.5-9b-chat HOT 1
- Failed start when base image from pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel to vllm/vllm-openai:latest HOT 3
- 支持IPU加速吗? HOT 2
- xinference微调模型的支持
- Qwen1.5-14b-chat-gptq-int4 推理速度
- Failed to do inference with latest GLM-4 chat 9b model HOT 2
- v1/completions接口无法使用,返回空字符串 HOT 1
- 显示启动模型失败,load失败 HOT 1
- Failed to register model, Invalid model URI D:/Pretrainedmodels3/ZhipuAI/chat4/glm-4-9b-chat.
- 建议新增对图embedding模型的
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from inference.