Comments (7)
Ah, that clears it up. I am using a range of quantisations for benchmarking purposes. I can offer to do a PR for documenting this better, if you'd like. It would be nice to have this information in the docs, and maybe even programmatically. I have not been involved with this for long, but I intend to invest more time now, and could give feedback on usability on Apple Silicon (I have the biggest M3 machine).
from inference.
If you launch a model with gguf format, it will use Metal automatically without specify n_gpu
.
from inference.
Hi @aresnow1, thanks for the quick reply! It does use Metal, but does that automatically mean using the GPUs? I see heavy CPU usage when I do inference, and not a lot of GPU. The running Xinference process also responds with this when I try to set n_gpu
higher than 0:
The parameter `n_gpu` must be greater than 0 and not greater than the number of GPUs: 0 on the machine.
So it seems there are no GPUs registered. Happy to help troubleshooting or file a PR if it is within my power.
from inference.
The n_gpu
parameter is provided for NVIDIA graphics card users and does require some additional explanation. Regarding GPU utilization, what observations have you made when using llamacpp directly? In this code snippet at line 99 of the file here, we have set n_gpu_layers to 1 for Apple users.
from inference.
@aresnow1 thanks for the explanation, that makes sense! Still wondering about the monitored activity, but I will do some A/B testing and get back to you.
It is not super well-documented in cpp-llama, the version mentioned in the docs I quote above is quite old and back then, it was only implemented for 4-bit quantised models. It is hard to find out what has happened in the meantime in terms of model support. If you have a pointer, that would be great as well.
from inference.
@slobentanzer Oh, that reminds me, only 4-bit quantization can be accelerated with Metal, like Q4_K_M. What kind of quantization are you using?
from inference.
Any PR or feedback is welcome!
from inference.
Related Issues (20)
- BUG 当我使用yi-6b-vl时,报错,报错如下 HOT 5
- 我使用lora微调了qwen-vl,部署时选择了qwen-vl-chat,但是页面还是不能上传图片的模式 HOT 1
- Fail to install qwen-chat-14B HOT 2
- Failed to generate chat completion HOT 1
- BUG: Model not found in the model list, uid: {model_uid} HOT 3
- BUG: Log in with authentication enabled not working HOT 6
- BUG: healthcheck failed on Windows and errored with "Cluster is not available after multiple attempts" HOT 7
- FEAT: support CogVLM HOT 4
- QUESTION: how to change run models use gpu? HOT 6
- BUG probability tensor contains either `inf`, `nan` or element < 0
- 请问能否将模型的提示词和输入输出在--log-level=debug模式下打印出来方便模型微调 HOT 2
- Worker无法连接上master HOT 9
- xinference ui 增加音频模型 HOT 3
- BUG: qwen-chat output unexpected tokens for ggufv2 format
- Some LLaMA2 Chat GGUF models have wrong URLs? HOT 1
- FEAT: Request for MiniCPM-V model support HOT 3
- BUG: Fail to run codellama-70b-instruct HOT 1
- BUG:设置/v1/chat/completions接口参数stream=True时,接口返回“Internal Server Error” HOT 7
- BUG 通过api调用,一句话完了好像没有返回[DONE] HOT 1
- n_gpu_layers参数希望可以开放 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from inference.