Comments (2)
日志完整的能贴一下吗
from inference.
我复现一下我的操作过程,方便复现。
version: '3.8'
services:
xinference-local:
image: xprobe/xinference:v0.9.0
container_name: xinference-local
ports:
- 9998:9997
environment:
- XINFERENCE_MODEL_SRC=modelscope
- XINFERENCE_HOME=/root/MODEL_PATH
volumes:
- ${MODEL_PATH}:/root/MODEL_PATH
restart: always
shm_size: '128g'
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
command: xinference-local -H 0.0.0.0 --log-level debug
networks:
- xinference-local
networks:
xinference-local:
driver: bridge
ipam:
driver: default
config:
- subnet: "172.30.2.0/24"
${MODEL_PATH}是一个文件夹,里面的子文件夹是模型文件夹。
我的显卡是4张英伟达的A100,然后在部署页面选择部署了72B int4量化的Qwen(这个是使用Xinference平台下载的模型)。同样方法测试过Baichuan2-7b-chat模型,都是独占4张卡,都是选择pytorch的模型。
下面是测试大模型使用的代码
from ragas import evaluate
from langchain.chat_models import ChatOpenAI
from ragas.llms.base import LangchainLLMWrapper
inference_server_url = ""
# create vLLM Langchain instance
chat = ChatOpenAI(
model="Baichuan2-13B-Chat",
openai_api_key="no-key",
openai_api_base=inference_server_url,
max_tokens=1024,
temperature=0.5,
)
# use the Ragas LangchainLLM wrapper to create a RagasLLM instance
vllm = LangchainLLMWrapper(chat)
from ragas.metrics import (
context_precision,
answer_relevancy,
faithfulness,
context_recall,
context_relevancy,
answer_correctness,
answer_similarity
)
from ragas.metrics.critique import harmfulness
# change the LLM
faithfulness.llm = vllm
context_relevancy.llm = vllm
context_recall.llm = vllm
context_precision.llm = vllm
answer_similarity.llm = vllm
answer_relevancy.llm = vllm # Invalid key: 0 is out of bounds for size 0
answer_correctness.llm = vllm
harmfulness.llm = vllm
from langchain.embeddings import HuggingFaceEmbeddings
modelPath = "bge-large-zh"
# Create a dictionary with model configuration options, specifying to use the CPU for computations
model_kwargs = {'device':'cpu'}
# Create a dictionary with encoding options, specifically setting 'normalize_embeddings' to False
encode_kwargs = {'normalize_embeddings': True}
# Initialize an instance of HuggingFaceEmbeddings with the specified parameters
embeddings = HuggingFaceEmbeddings(
model_name=modelPath, # Provide the pre-trained model's path
model_kwargs=model_kwargs, # Pass the model configuration options
encode_kwargs=encode_kwargs # Pass the encoding options
)
answer_relevancy.embeddings = embeddings
answer_correctness.embeddings = embeddings
answer_similarity.embeddings = embeddings
# evaluate
from ragas import evaluate
result = evaluate(
dataset,
metrics=[context_precision], # 1
)
print(result)
测试代码大概是这个意思,删删改改多次,可能有bug。核心调用大模型的其实就是最后一句result = evaluate(
dataset,
metrics=[context_precision], # 1
)
我测试的时候dataset大概只有100条数据,后台观察nvidia-smi,会发现显存占用率一直攀升,直到占满显存,显存占用归0.
后台日志就是普通的Out of memory.(现在卡还在其他用途,之后有空闲了再贴)
然后Web前端表现为Hang住,无法查看正在运行的模型,也无法重新部署模型。
from inference.
Related Issues (20)
- qwen2.0-7b-instruct cannot use file as request body.(不能上传文件)
- BUG: quantization for glm-4v
- BUG super-worker模式有健壮性问题。 HOT 1
- glm4-chat的tool call 在dify下,无法使用 HOT 8
- BUG: docker image of 0.12.1 launch failed HOT 6
- FP8量化支持
- Image upgrade to 0.12.1, running Qwen1.5-14B-Chat-GPTQ-Int4 is much slower compared to 0.11.0
- BUG fix security vulnerability HOT 1
- Feat: Support download model only. HOT 1
- FEAT: support embedding model Alibaba-NLP/gte-Qwen2-7B-instruct
- Please support the configuration of chattts in the “Register Model”
- worker,ValueError: [address=0.0.0.0:45416, pid=47] Model not found,
- [QUESTION] i pulled qwen2 awq 7b 4bit quantized model but it is giving me gibberish text which has no meaning
- Support Optional Configurations for Embedding models HOT 3
- BUG 20G以上模型导致机器重启 HOT 3
- BUG When I reasoning the model Qwen-VL-Chat-Int4 and Yi-VL-6B, the Model Engine cannot be found HOT 1
- Suggestions about transcription enhancement HOT 1
- 多卡运行模型启动报错,单卡运行正常
- 自定义模型qwen2-0.5在dify中默认为React,无法更改
- 请问跑minicpm-llama3-v-2_5(int4)支持并发调用接口么?2个及以上并发调用就报错了
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from inference.