Bug tracking for the LM Studio desktop application
lmstudio-ai / lmstudio-bug-tracker Goto Github PK
View Code? Open in Web Editor NEWBug tracking for the LM Studio desktop application
Bug tracking for the LM Studio desktop application
{
"cause": "(Exit code: 0). Some model operation failed. Try a different model and/or config.",
"suggestion": "",
"data": {
"memory": {
"ram_capacity": "63.71 GB",
"ram_unused": "54.15 GB"
},
"gpu": {
"gpu_names": [
"NVIDIA GeForce RTX 4060 Laptop GPU"
],
"vram_recommended_capacity": "8.00 GB",
"vram_unused": "6.93 GB"
},
"os": {
"platform": "win32",
"version": "10.0.22631"
},
"app": {
"version": "0.2.26",
"downloadsDir": "C:\\Users\\genco\\.cache\\lm-studio\\models"
},
"model": {}
},
"title": "Error loading model."
}```
Looks like you pushed an update today! Is there a way to turn off autoupdates?
[2024-05-07 10:32:53.913] [INFO] [LM STUDIO SERVER] Stopping server..
[2024-05-07 10:32:53.914] [INFO] [LM STUDIO SERVER] Server stopped
[2024-05-07 10:33:03.858] [INFO] [LM STUDIO SERVER] Verbose server logs are ENABLED
[2024-05-07 10:33:03.858] [INFO] [LM STUDIO SERVER] Heads up: you've enabled CORS. Make sure you understand the implications
[2024-05-07 10:33:03.913] [INFO] [LM STUDIO SERVER] Stopping server..
The error throws the warning message, and then turns off. the warning is being thrown as an error when it shouldn't be.
npx lmstudio install-cli
I fixed this by installing the cli tool which I assume overwrote whatever is broken by the update script.
When trying to utilize the full context size for this model https://huggingface.co/vsevolodl/Llama-3-70B-Instruct-Gradient-1048k-GGUF i get an out of RAM(?) error like this:
{
"title": "Failed to load model",
"cause": "",
"errorData": {
"n_ctx": 1048576,
"n_batch": 512,
"n_gpu_layers": 81
},
"data": {
"memory": {
"ram_capacity": "314.65 GB",
"ram_unused": "316.65 KB"
},
"gpu": {
"type": "NvidiaCuda",
"vram_recommended_capacity": "141.90 GB",
"vram_unused": "130.46 GB"
},
"os": {
"platform": "linux",
"version": "5.15.0-106-generic",
"supports_avx2": true
},
"app": {
"version": "0.2.22",
"downloadsDir": "/home/loading/.cache/lm-studio/models"
},
"model": {}
}
}
so, it claims that the ram is kinda used but when in fact htop only reports a 10gb RAM usage and LM Studio itself (at the top right) reports 48GB of RAM being used (although i believe, this might include the VRAM being used).
i try to fully offload to GPU.
i also noticed a bit of a slow down during the loading process. so it loads slower and slower until the above error pops up, but i dont know if this is as its supposed to be. maybe its just faking the progress bar, a little bit, and towards the end it realizes that there is still ways to go to load the rest of the model.
The model works with context sizes of up to 56k, everything larger ends with the above error.
i can use larger models than this with no issues (although they only have 8k context size). right now i tested https://huggingface.co/lmstudio-community/Meta-Llama-3-120B-Instruct-GGUF/ fully offloaded and it works like a charme (kinda. could run faster but its doing ok).
When I click on Start Server. It does nothing. It will not start and the logs are blank
Think I found where LM Studio is failing to render a response in some front ends.
HTTP 1.1 mandates a connection header, and that header can be either "keep-alive" or "close"
Case in point, when using sillytavern with LMStudio on the back end, sometimes the connection will appear to hang because the connection isn't being closed on the server side after sending the text completion, when SillyTavern had specified "Connection: Close" as the header.
This might be the cause behind a number of other issues on this board too.
You can try this for yourself with using https://github.com/oobabooga/text-generation-webui.git to compare it to and using wireshark to capture the traffic. Set both to use the same API port, same settings, and then try both one after the other from silly tavern to see when ST hangs.
It'll be because LM Studio isn't respecting the connection header. This is a problem with how LM Studio handles HTTP traffic at the protocol level concerning when to terminate the TCP connection.
Hello, I' using LM Studio for my AI experiments. Wanted to report that v0.2.25 has bug where you cannot load old chat sessions.
Thank you!
Hi,
I'm trying to use LM Studio 0.2.23 on RHEL8 (usually has the same requirements as Ubuntu 20.04).
Thank you for agreeing to downgrade your build chains to 20.04., btw
The AppImage now starts but there is still a GLIBC errror and a popup comes up:
$ ./LM_Studio-0.2.23.AppImage
15:49:17.173 › App starting...
(node:2575489) UnhandledPromiseRejectionWarning: ReferenceError: Cannot access 'q' before initialization
at /tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/index.js:11:38584
(Use `lm-studio --trace-warnings ...` to show where the warning was created)
(node:2575489) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
15:49:17.617 › Downloads folder from settings.json: /export/home/raistlin/.cache/lm-studio/models
15:49:17.621 › Extensions backends directory already exists at /export/home/raistlin/.cache/lm-studio/extensions/backends
15:49:17.624 › Available backend descriptors:
{
"extension" : [],
"bundle" : [
{
"path": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/CUDA",
"manifest": {
"target_libraries": [
{
"name": "llm_engine_cuda.node",
"type": "llm_engine",
"version": "0.1.0"
},
{
"name": "liblmstudio_bindings_cuda.node",
"type": "liblmstudio",
"version": "0.2.23"
}
],
"type": "llama_cuda",
"platform": "linux",
"supported_model_formats": [
"gguf"
]
}
},
{
"path": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/NoGPU",
"manifest": {
"target_libraries": [
{
"name": "llm_engine.node",
"type": "llm_engine",
"version": "0.1.0"
},
{
"name": "liblmstudio_bindings.node",
"type": "liblmstudio",
"version": "0.2.23"
}
],
"type": "llama_cpu",
"platform": "linux",
"supported_model_formats": [
"gguf"
]
}
},
{
"path": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/OpenCL",
"manifest": {
"target_libraries": [
{
"name": "llm_engine_clblast.node",
"type": "llm_engine",
"version": "0.1.0"
},
{
"name": "liblmstudio_bindings_clblast.node",
"type": "liblmstudio",
"version": "0.2.23"
}
],
"type": "llama_opencl",
"platform": "linux",
"supported_model_formats": [
"gguf"
]
}
}
]
}
15:49:17.624 › Backend keys and libpaths for use:
{
"llama_cuda" : {
"libLmStudioPath": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/CUDA/liblmstudio_bindings_cuda.node",
"llmEngineLibPath": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/CUDA/llm_engine_cuda.node"
},
"llama_cpu" : {
"libLmStudioPath": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/NoGPU/liblmstudio_bindings.node",
"llmEngineLibPath": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/NoGPU/llm_engine.node"
},
"llama_opencl" : {
"libLmStudioPath": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/OpenCL/liblmstudio_bindings_clblast.node",
"llmEngineLibPath": "/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/OpenCL/llm_engine_clblast.node"
}
}
15:49:17.625 › Surveying backend-hardware compatibility...
15:49:17.625 › Loading LM Studio core from: '/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/CUDA/liblmstudio_bindings_cuda.node'
15:49:17.834 › Error message recieved from LMSCore process: Failed to load libLmStudio: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/CUDA/liblmstudio_bindings_cuda.node)
15:49:17.836 › Error while surveying hardware with backend 'llama_cuda': LMSCore load lib failed - child process with PID 2575757 exited with code 1
1th kill failed
15:49:17.838 › Loading LM Studio core from: '/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/NoGPU/liblmstudio_bindings.node'
15:49:17.989 › Error message recieved from LMSCore process: Failed to load libLmStudio: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/NoGPU/liblmstudio_bindings.node)
15:49:17.991 › Error while surveying hardware with backend 'llama_cpu': LMSCore load lib failed - child process with PID 2575921 exited with code 1
1th kill failed
15:49:17.991 › Loading LM Studio core from: '/tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/OpenCL/liblmstudio_bindings_clblast.node'
2th kill failed
2th kill failed
15:49:18.230 › Error message recieved from LMSCore process: Failed to load libLmStudio: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /tmp/.mount_LM_StuqJLxgm/resources/app/.webpack/main/build/Release/OpenCL/liblmstudio_bindings_clblast.node)
15:49:18.231 › Error while surveying hardware with backend 'llama_opencl': LMSCore load lib failed - child process with PID 2576036 exited with code 1
15:49:18.231 › Backend-hardware compatibility survey complete:
}
D [fallbackBackendPref] Initializing FileData
3th kill failed
3th kill failed
4th kill failed
4th kill failed
5th kill failed
5th kill failed
6th kill failed
6th kill failed
7th kill failed
7th kill failed
8th kill failed
8th kill failed
9th kill failed
9th kill failed
10th kill failed
10th kill failed
11th kill failed
Too many fails, giving up.
11th kill failed
Too many fails, giving up.
I'm having an issue on Ubuntu 24.04 that I cannot start the app image. The following issue is thrown:
./LM_Studio-0.2.24.AppImage ─╯
[7882:0526/125358.338741:FATAL:setuid_sandbox_host.cc(158)] The SUID sandbox helper binary was found, but is not configured correctly. Rather than run without sandboxing I'm aborting now. You need to make sure that /tmp/.mount_LM_StuRBujGc/chrome-sandbox is owned by root and has mode 4755.
[1] 7882 trace trap (core dumped) ./LM_Studio-0.2.24.AppImage
Hello LMStudio-Team,
much appreciate your efforts, building a great tool for working with LLMs locally. I'm a fan and love to use it almost every productive day.
I'm writing you as a maintainer of the lmstudio package in nixpkgs, one of the largest and most up to date repositories. We recently had an issue, because you have changed the binary available under the download link (https://releases.lmstudio.ai/mac/arm64/0.2.22/b/latest/LM-Studio-0.2.22-arm64.dmg at the time), which results in breaking the lmstudio package for Nix(OS).
A stable download link is favourable, because it gives us the security, that the package we download is exactly the package, we expect it to be. This does help avoiding supply chain attacks, as well as just gives general reliability, as it does keep every operation reproducible.
If there was a critical (especially security) issue, I think, everyone can understand quite well, that you would want to stop distribution altogether ASAP, however that's not the case, as far as I'm informed.
I fear, with breaking existing binary download links, we might lose acceptance for the package in Nixpkgs altogether.
I would be glad to keep the package available to NixOS users using the official Nixpkgs repository.
Is it possible to establish some pattern of stable download links, serving the same binary for as long of a period as possible?
Thanks a lot for your time.
Best Regards
Dean
LM Studio doesn't support latex render in markdown, while text-gen-webui has this feature. Will LM support Latex render in the future?
Hello,
Firstly, I'd like to express my gratitude for the exceptional LLM platform, LM Studio. It has significantly enriched my experience with AI-generated content.
I kindly request two enhancements to further optimize the user interface:
Additionally, I propose incorporating distinct text colors for User Input and AI-generated Text as a visual code, enhancing readability and clarity within the interface.
Thank you for considering these suggestions. Your continuous improvement efforts make LM Studio an increasingly valuable tool in my workflow.
Best regards,
Hello,
Is it possible to implement LM Studio support for this translation model from Google?
Google already made GGUFs (q4 and q6)?
google/madlad400-10b-mt
https://huggingface.co/google/madlad400-10b-mt/tree/main
Very cool application. It would be cool if there was a version for intel Mac. I know llama.cpp has problems with Metal on intel Mac. But what about CPU or Vulkan version?
Description:
The current context menu in LMstudio only has one option, "Copy Entire Message", which is limiting and counterintuitive. A context menu with only one option is essentially a longer way to implement the same behavior as the terminal's Copy-on-select or Copy-on-highlight feature.
To make matters worse, if you want to copy a specific portion of the text, and you forgetfully use right-click to try and do so, you have to:
Click off the message to deselect the text
Reselect the desired text
Remember to use the keyboard shortcuts (e.g. Ctrl+C) to copy the selection, since the context menu only offers "Copy Entire Message"
This behavior is inconsistent with most applications, which typically offer a "Copy" or "Copy Selection" option in the context menu. This inconsistency causes wasted time and frustration.
Severity: Minor (but affects usability)
Reproducibility: Always (when right-clicking on selected text)
Environment: LMstudio GUI application (version [insert version number])
Expected Behavior:
A context menu with multiple options, including "Copy Selection" or "Copy"
Ability to copy selected text using the context menu or keyboard shortcuts
Actual Behavior:
A context menu with only one option, "Copy Entire Message"
No ability to copy selected text using the context menu
Request:
Please update the context menu to include a "Copy Selection" or "Copy" option, allowing users to easily copy selected text without having to use keyboard shortcuts or work around the current limitation. This will improve the user experience and make the application more intuitive to use.
After using the Local Server for multiple requests without resenting it or creating a new one, all the LLM used starts to generate text in a fixed way. Essentially, they stop generating a response and start generate a fixed number endlessly. Here an example of output:
This process will never stop. even after 10 min. The LLM was loaded 15 layers on GPU and all the remaining on the CPU, using langchain for interacting with the Local Server.
These are the LLM used for the Local Server:
For the scenario, I had some documents that I wanted to summarize in a path on my laptop and, for each one of them I asked for a concise summary to the Local Server through LangChain. After the 14* or 15* request, the LLMs used started to print something like the example above.
This will occurs independently of the LLM used.
windows 2022 Datacenter
64gig ram
16 core
version 0.2.23
It ran successfully for me on three machines but this machine was not happy after upgrade and llama3 download.
the modal just does not seem to respond. I did extract a log file attached. Not sure how to clean this up to working accept to
reload windows which is obviously undesireable.
main.log
I just upgraded LM Studio to 0.2.27 and the latest Gemma 2 quants and it's complete trash now. For example, I used to get a good answer to this test question: "how much time in minutes is needed to heat a room 3 x 5 x 7 m from 0 C to 30 C with 2kw heater?". Not anymore. Right now it even refuses to answer - it says "I can't give you the answer to that question, here's why:"
What happened? How is it improved when it's objectively way worse, and I didn't touch any inference parameters?
Plans for function calling support?
Right now multi-model sessions are limited to only support full GPU offload. Any plans for supporting CPU offload so that we are able to run multiple models that uses VRAM across CPU and GPU?
GPU Offload is inactive in version 0.2.25, but I can select it in 0.2.24
vicuna-13b-v1.5.Q5_K_M
AMD RX 6600 XT
Given how many useful settings the app has, it boggles the mind there is no option to change at least the font size. Having the app open on a big monitor, it is barely readable from a healthy distance. I really don't feel like sitting glued to the screen to be able to read code blocks.
As the poor grandma demonstrates below, the goal is to use app comfortably without having to consult your nearest optician after a few days of use. Now, be a good lad and save the grandma from turning into Ray Charles.
Hi , are you planning on adding exllamav2 in the near future?
I am encountering an issue where lmstudio fails to run via X11 forwarding from a Linux server to a Windows client. The application works correctly when both the server and the client are Linux machines, but it does not work with a Windows client using Xming and PuTTY for X11 forwarding. Simpler applications like gedit and nautilus run without issues.
Steps to Reproduce:
Setup:
Linux Server: Ubuntu 20.04
Windows Client: Windows 11
Xming: Version 6.9.0.31
PuTTY: Version 0.76
Xming Configuration:
Started Xming with the -ac option to disable access control.
PuTTY Configuration:
Enabled X11 forwarding with X display location set to localhost:0.
Linux Server Configuration:
Ensured the following lines are present and uncommented in /etc/ssh/sshd_config:
X11Forwarding yes
X11DisplayOffset 10
X11UseLocalhost yes
Restarted the SSH service.
Running lmstudio:
Connected to the Linux server via PuTTY.
Exported the DISPLAY variable:
export DISPLAY=localhost:10.0
Attempted to run lmstudio with indirect rendering:
LIBGL_ALWAYS_INDIRECT=1 ./lmstudio
Observed Behavior:
lmstudio fails to start, with errors indicating issues related to OpenGL and EGL initialization.
Relevant error messages include:
./lmstudio
14:55:02.357 › GPU info: '04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)'
14:55:02.377 › Got GPU Type: unknown
14:55:02.378 › LM Studio: gpu type = Unknown
14:55:02.401 › App starting...
[525067:0607/145507.633420:ERROR:(-1)] Check failed: false.
[525067:0607/145507.633459:ERROR:(-1)] Check failed: false.
14:55:07.760 › Downloads folder from settings.json: /home/jugs/.cache/lm-studio/models
[525124:0607/145520.105959:ERROR:angle_platform_impl.cc(43)] Display.cpp:1019 (initialize): ANGLE Display::initialize error 12289: Unsupported GLX version (requires at least 1.3).
ERR: Display.cpp:1019 (initialize): ANGLE Display::initialize error 12289: Unsupported GLX version (requires at least 1.3).
[525124:0607/145520.110539:ERROR:gl_display.cc(504)] EGL Driver message (Critical) eglInitialize: Unsupported GLX version (requires at least 1.3).
[525124:0607/145520.110744:ERROR:gl_display.cc(793)] eglInitialize OpenGL failed with error EGL_NOT_INITIALIZED, trying next display type
[525124:0607/145520.125369:ERROR:angle_platform_impl.cc(43)] Display.cpp:1019 (initialize): ANGLE Display::initialize error 12289: Unsupported GLX version (requires at least 1.3).
ERR: Display.cpp:1019 (initialize): ANGLE Display::initialize error 12289: Unsupported GLX version (requires at least 1.3).
[525124:0607/145520.125660:ERROR:gl_display.cc(504)] EGL Driver message (Critical) eglInitialize: Unsupported GLX version (requires at least 1.3).
[525124:0607/145520.125755:ERROR:gl_display.cc(793)] eglInitialize OpenGLES failed with error EGL_NOT_INITIALIZED
[525124:0607/145520.125843:ERROR:gl_display.cc(819)] Initialization of all EGL display types failed.
[525124:0607/145520.125948:ERROR:gl_ozone_egl.cc(26)] GLDisplayEGL::Initialize failed.
[525124:0607/145522.677446:ERROR:angle_platform_impl.cc(43)] Display.cpp:1019 (initialize): ANGLE Display::initialize error 12289: Unsupported GLX version (requires at least 1.3).
ERR: Display.cpp:1019 (initialize): ANGLE Display::initialize error 12289: Unsupported GLX version (requires at least 1.3).
[525124:0607/145522.677758:ERROR:gl_display.cc(504)] EGL Driver message (Critical) eglInitialize: Unsupported GLX version (requires at least 1.3).
[525124:0607/145522.677821:ERROR:gl_display.cc(793)] eglInitialize OpenGL failed with error EGL_NOT_INITIALIZED, trying next display type
[525124:0607/145522.692794:ERROR:angle_platform_impl.cc(43)] Display.cpp:1019 (initialize): ANGLE Display::initialize error 12289: Unsupported GLX version (requires at least 1.3).
ERR: Display.cpp:1019 (initialize): ANGLE Display::initialize error 12289: Unsupported GLX version (requires at least 1.3).
[525124:0607/145522.693003:ERROR:gl_display.cc(504)] EGL Driver message (Critical) eglInitialize: Unsupported GLX version (requires at least 1.3).
[525124:0607/145522.693077:ERROR:gl_display.cc(793)] eglInitialize OpenGLES failed with error EGL_NOT_INITIALIZED
[525124:0607/145522.693144:ERROR:gl_display.cc(819)] Initialization of all EGL display types failed.
[525124:0607/145522.693210:ERROR:gl_ozone_egl.cc(26)] GLDisplayEGL::Initialize failed.
[525124:0607/145522.709838:ERROR:(-1)] Check failed: false.
[525124:0607/145522.710101:ERROR:(-1)] Check failed: false.
[525124:0607/145522.710440:ERROR:viz_main_impl.cc(186)] Exiting GPU process due to errors during initialization
Expected Behavior:
lmstudio should start and function correctly via X11 forwarding on a Windows client, similar to its behavior on a Linux client.
Additional Information:
The application works perfectly when the client is a Linux machine.
Using gedit and other simpler applications via X11 forwarding on the Windows client works without any issues.
How do I set up a proxy server to download models?
On a good day, the built-in updater takes about 20 minutes to download the updated LM-Studio. Today, I quit after an hour and a half and downloaded it with a browser from lmstudio.ai in 45 seconds.
When I started the update, I was also downloading a large file and that was consuming most of my bandwidth. I am guessing that the updater is trying to be smart and capping itself at whatever bandwidth it calculates it has at the time you press the "download & install" button instead of just getting the file and letting the network do what it does best.
Before giving up, I took a look in Task Manager and the 5 LM-Studio threads were consuming 0% network at a time when it should be consuming ~100% since everything else I was doing had completed.
When I opened the app for the first time, there was a conversation, and it said to increase the font size, 'Under the View' menu.
I tried to ask where the View menu is, but I can't get a model installed because I can't read your screen at all. And you have blue text button on a black background, which should be light colored text if you care about accessibility.
I can't install a model if I can't read the screen. It looks like this app was written by someone in their 20's.
I'm seeing a weird behavior with vision models.
I am using the Default LM Studio Windows config, which is the only one I have been able to get vision models to work with.
I have tried 2 different models: xtuner's llava llama 3 f16 and jartine's llava v 1.5
Both models and both in the chat interface and the local API deployment (using the vision example), when I ask for an image description I get a perfect description on the first request, and then a random response after that (usually mentioning collages). I'm not sure what's causing this, but its fairly consistent.
Might be related to t his issue: lmstudio-ai/.github#26
I am successfully running a ROCm stable diffusion setup using PyTorch-ROCm on a 6900 XT. I have the following ROCm and HIP libraries installed with no system error messages:
However, LM Studio will only report that I have an OpenCL GPU installed. It recognizes the 16GB of VRAM, but still throws an error when I try to load a model. I can run phi3 easily and with speed on just the CPU. But I'd like to run a larger model on the GPU.
I'm working on a project , and I need to track the seed used in each generation so that I can reproduce the output when needed using the same config ( and same seed ). However, I find that it's not always the case.
I tried using the seed 42
, and it gave me the exact same result each time with the same config.
When I tried a larger number 1715852364
( which I usually get from the epoch time ) I found out that it gives different results.
Here is the code I used to produce this bug (which is part of my code project):
lm_studio.py
:
import requests
try :
from base import BaseModel
except :
from models.ai.base import BaseModel
from typing import Any, Dict
from openai import OpenAI
import json
class LMStudioModel(BaseModel):
def __init__(self, api_url: str, headers: Dict[str, str], config: Dict[str, Any]) -> None:
super().__init__(api_url, headers, config)
self.client = OpenAI(base_url=api_url, api_key="not-needed")
def __str__(self) -> str:
return "LMStudioModel"
def __repr__(self) -> str:
return f"{self.__class__.__name__}(api_url={self.api_url}, headers={self.headers}, config={self.config})"
@staticmethod
def load_config(config_path: str) -> Dict[str, Any]:
return super(LMStudioModel, LMStudioModel).load_config(config_path)
def generate_text(self, prompt: str, parameters: Dict[str, Any]) -> Any:
# Adjust parameters based on the method signature and expected parameters
data = {
"messages": [
{"role": "system", "content": parameters.get("instructions", "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful.")},
{"role": "user", "content": prompt}
],
"temperature": parameters.get("temperature", 0.7),
"max_tokens": parameters.get("max_tokens", -1),
"stream": parameters.get("stream", False)
}
response = requests.post(self.api_url + "/chat/completions", headers=self.headers, json=data)
try:
response.raise_for_status()
return response.json()
except Exception as e:
return {"error": str(e)}
def predict(self, prompt: str, params: Dict[str, Any] = None) -> Any:
if params is None:
params = self.config.get('default_parameters', {})
response = self.generate_text(prompt, params)
return response
def inference(self, prompt, seed=None) -> str:
chat_completion = self.client.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt.strip(),
}
],
model="not-needed", # unused
seed=seed
)
return chat_completion.choices[0].message.content
def sys_inference(self, sys_prompt: str, usr_prompt: str, seed=None) -> str:
print("Using seed %s with type %s" % (seed, type(seed)))
chat_completion = self.client.chat.completions.create(
messages=[
{"role": "system", "content": sys_prompt},
{
"role": "user",
"content": usr_prompt.strip(),
}
],
model="not-needed", # unused
temperature=0.7,
seed=seed
)
return chat_completion.choices[0].message.content
def interactive_prompt(self):
print("You are now chatting with the intelligent assistant. Type something to start the conversation.")
history = [
{"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
{"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]
while True:
messages = history[-2:] # Consider only the last system message and the last user message for brevity
completion = self.client.chat.completions.create(
model="local-model", # this field is currently unused
messages=messages,
temperature=0.7,
max_tokens=150,
stream=True
)
new_message = {"role": "assistant", "content": ""}
for chunk in completion:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
new_message["content"] += chunk.choices[0].delta.content
history.append(new_message)
print()
# Capture user input
user_input = input("> ")
if user_input.lower() == 'quit':
print("Exiting interactive prompt...")
break
history.append({"role": "user", "content": user_input})
def update_token(self, new_token: str) -> None:
self.headers['Authorization'] = f"Bearer {new_token}"
def calc_tokens(self, prompt: str) -> int:
# Simplified token calculation; you might want to adjust this according to your actual tokenization logic
return len(prompt.split())
@classmethod
def setup_from_config(cls, config_path: str):
config = cls.load_config(config_path)
api_url = config.get("api_url", "http://localhost:1234/v1") # Default to example URL
headers = {"Content-Type": "application/json"} # Default header for JSON content
headers.update(config.get("headers", {})) # Update with any additional headers from config
return cls(api_url=api_url, headers=headers, config=config)
@classmethod
def setup_from_dict(cls, config_json: Dict[str, Any] | str ):
if isinstance(config_json, dict):
api_url = config_json.get("api_url", "http://localhost:1234/v1") # Default to example URL
headers = {"Content-Type": "application/json"} # Default header for JSON content
headers.update(config_json.get("headers", {})) # Update with any additional headers from config
return cls(api_url=api_url, headers=headers, config=config_json)
elif isinstance(config_json, str): # if it's a string, convert it to a dict
config : dict = json.loads(config_json)
return cls.setup_from_dict(config)
# Example usage
if __name__ == '__main__':
config_path = "configs/lm_studio.config.json"
lm_studio = LMStudioModel.setup_from_config(config_path)
# print(lm_studio.sys_inference(sys_prompt="You are a helpful assistant", usr_prompt="Hello there", seed=42))
print(lm_studio.sys_inference(sys_prompt="You are a helpful assistant", usr_prompt="Hello there", seed=1715852364))
# lm_studio.interactive_prompt()
base.py
:
from abc import ABC, abstractmethod
from typing import Any, Dict
import json
import os
import sys
class BaseModel(ABC):
"""
Abstract base class for models to interact with APIs and perform data processing.
"""
def __init__(self, api_url: str, headers: Dict[str, str], config: Dict[str, Any]) -> None:
self.api_url = api_url
self.headers = headers
self.config = config
@staticmethod
@abstractmethod
def load_config(config_path: str) -> Dict[str, Any]:
"""
Loads configuration from a specified path.
"""
with open(config_path, 'r') as file:
return json.load(file)
@abstractmethod
def generate_text(self, prompt: str, parameters: Dict[str, Any]) -> Any:
"""
Generates text based on a prompt and parameters.
This method needs to be implemented by the subclass.
"""
pass
@abstractmethod
def predict(self, prompt: str, params: Dict[str, Any]) -> Any:
"""
Processes a prompt and returns a prediction.
This method needs to be implemented by the subclass.
"""
pass
@abstractmethod
def inference(self) -> str:
"""
Performs inference using the model.
This method needs to be implemented by the subclass.
"""
pass
@abstractmethod
def sys_inference(self, sys_prompt, user_prompt, seed:int | None =None) -> str:
"""
Performs inference using the model with system prompt .
This method needs to be implemented by the subclass.
"""
pass
@abstractmethod
def update_token(self, new_token: str) -> None:
"""
Updates the API token used for authentication.
This method needs to be implemented by the subclass.
"""
pass
@abstractmethod
def calc_tokens(self, prompt: str) -> int:
"""
Calculates the number of tokens in a prompt.
This method needs to be implemented by the subclass.
"""
pass
def interactive_prompt(self) -> None:
"""
Optional: Implement an interactive prompt for testing purposes.
This method can be overridden by subclasses for specific interactive functionality.
"""
print("This method can be overridden by subclasses.")
@classmethod
def setup_from_config(cls, config_path: str):
"""
Sets up the model based on the specified configuration.
This method must be implemented by subclasses.
"""
pass
def setup_from_dict(cls, config_json: Dict[str, Any] | str ):
"""
Sets up the model based on the specified configuration.
This method must be implemented by subclasses.
"""
pass
configs/lm_studio.config.json
:
{
"api_url": "http://localhost:1234/v1",
"instructions": "You are a helpful AI Assistant.",
"default_parameters":{
"temperature": 0.7,
"max_tokens": -1,
"stream": false
}
}
I only know that LM Studio uses llama.cpp, but not sure if it has to do with the size of the seed, if so what's the maximum integer where the same seed will always give the same results ?
this model has a vision adapter: mmproj-model-f16.gguf
i never used any vision model in lmstudio, so I don´t know if that is a bug or related to this particular model.
because this model has strong OCR capabilities, I wanted to test it, but lmstudio is unable to load any of the GGUF versions
while the "mmproj-model-f16.gguf" exists. if I delete that file, the model is loading.
again: don´t know if that is lmstudio bug or not, but it would be nice to test this model ;)
Windows, 0.2.26
I accidentally discovered this bug, it was actually a big headache, because I could not understand why suddenly LMStudio closed and why I could not open it again - window would flicker and vanish.
I host all AI models on a separate partition and while LMStudio was downloading new model it filled the partition and that caused those symptoms.
I think there should be some kind of error handling procedure so that program itself would still work.
Thank you!
Local Interface Server works well in example
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
"messages": [
{ "role": "system", "content": "Always answer in rhymes." },
{ "role": "user", "content": "Introduce yourself." }
],
"temperature": 0.7,
"max_tokens": -1,
"stream": true
}'
However, when the content is empty, it can not work.
I just set content to empty, it prints error.
[2024-05-04 01:10:59.564] [INFO] [LM STUDIO SERVER] Processing queued request...
[2024-05-04 01:10:59.564] [INFO] Received POST request to /v1/chat/completions with body: {
"model": "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
"messages": [
{
"role": "system",
"content": ""
},
{
"role": "user",
"content": "Introduce yourself."
}
],
"temperature": 0.7,
"max_tokens": -1,
"stream": true
}
[2024-05-04 01:10:59.565] [ERROR] [Server Error] {"title":"'messages' array must only contain objects with a 'content' field that is not empty"}
Just hope when role of system's content is empty, it still can work.
I encountered an issue where the chat window became unresponsive after pasting a very large JSON file (approximately 60,000 tokens or more). Initially, the application was functioning correctly. However, after introducing the large JSON data, the chat window froze and became unusable.
I was able to restore functionality by manually clearing all files in the .cache
folder of the application.
Attempts to Resolve:
None of the above methods cleared the cached data or resolved the issue until I manually deleted the cache files.
Environment:
Can't open the chat page. The error says Cannot read properties of undefined (reading 'messages')
Support HIP_UMA was added in llama.cpp PR:
ggerganov/llama.cpp#7414
Support ROCm can be implemented by example:
https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU
This all can speedup working with llm on ROCm and AMD 780M APU GPU
0.2.22
Unexpected RAM consumption when/after model load when using full offload to GPU
I just upgraded the lm-studio to 0.2.22 and I got it running with a Tesla M40, it’s got 12GB of VRAM and the previous downloaded LLAMA3-8B-Q4_K_M(~4GB) should be entirely load on VRAM without any issue, but it seems like even though I set the GPU offload to max, and the VRAM usage is normal(~5GB), and the speed of token splitting is much faster which means it should be running properly on the gpu, I’m curious about why it’s still consuming over HUGE AMOUNT of ram (over 4GB physical RAM & over 10-15GB swap/virtual ram) and it GOT WORSE/CRASHED when enabling Flash Attention and it just ran out of ram(8GB Physical & 17GB Swap) when there’s still plenty of VRAM on the gpu side when switching flash attention on
Logs & Screenshots Attached
Normal RAM & VRAM Usage, (RAM: 2-3GB->Same When IDLE VRAM: Depends On Model Size)
Log-1 Diagnostic Info
{
"cause": "(Exit code: 0). Some model operation failed. Try a different model and/or config.",
"suggestion": "",
"data": {
"memory": {
"ram_capacity": "7.88 GB",
"ram_unused": "832.32 MB"
},
"gpu": {
"type": "NvidiaCuda",
"vram_recommended_capacity": "12.00 GB",
"vram_unused": "11.10 GB"
},
"os": {
"platform": "win32",
"version": "10.0.22631",
"supports_avx2": true
},
"app": {
"version": "0.2.22",
"downloadsDir": "E:\\LM_STUDIO_M_ARC"
},
"model": {}
},
"title": "Error loading model."
}
Screenshot Related INFO/ChatLog
-Well as we can see I attached 4 screenshots, two of them were before loading the model, vice versa
-*after offloading the entire model to gpu
-P.S. FYI the preset was the same as the built in llama3 preset except I switched the cpu threads value from 4 to 128 cuz I found it faster sometime back in few weeks ago when I had to run it on cpu only, and the result would be the same when switching to default value/preset if u would have question about that : )
And this is what happens when reloading the model after enabling flash attention
May LM-Studio continue to Flourish and Prosper.
Best Regards
PD-Kerman
Copypasted from Failed to load model
error modal:
{
"title": "Failed to load model",
"cause": "llama.cpp error: 'check_tensor_dims: tensor 'output.weight' not found'",
"errorData": {
"n_ctx": 4096,
"n_batch": 512,
"n_gpu_layers": 89
},
"data": {
"memory": {
"ram_capacity": "62.71 GB",
"ram_unused": "78.71 KB"
},
"gpu": {
"type": "Nvidia CUDA",
"vram_recommended_capacity": "23.68 GB",
"vram_unused": "21.81 GB"
},
"os": {
"platform": "linux",
"version": "6.5.0-35-generic",
"supports_avx2": true
},
"app": {
"version": "0.2.23",
"downloadsDir": "path/to/cache/lmstudio_cache"
},
"model": {}
}
}
When using app, at start it worked but then starting like this bug. Tried clear cache, remove, uninstall, re-install. Same bug. I can't download any model
Originally posted at lmstudio-ai/model-catalog#83
Also posted at lmstudio-ai/.github#49
Hi,
I'm experiencing issues with LM Studio on Ubuntu 22.04.
When I try to download a model listed on the landing screen, I get the error: "Download failed: unexpected status code 429".
Additionally, when I try to search for models, I receive the error: "Error searching for models: HTTP error! Status 429".
Thanks!
LM Studio 0.2.27
GPU acceleration: On, with CUDA.
From: bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF
To: qwp4w3hyb/DeepSeek-Coder-V2-Lite-Instruct-iMat-GGUF.gguf
Example output.
38"4EC$!31=.4<H0+':#"4::H/2H$
If you then switch GPU Acceleration off, it works fine. Switching GPU Accel back on is then fine too.
I suppose the thing that made Gemma 2 faster on GPU in 0.2.27, is still switched on in the GPU kernel after switching away from Gemma 2.
I'm not able to use google new model "google/codegemma-7b-it-GGUF"
It seems that the reason is I have to accept terms and conditions on hugging face.
Is there any way to login to hugging face in lm studio or use hugging face token?
0.2.22
CPU Usage Display provided suspiciously low usage as shown in the user interface when output generating
CPU is almost IDLE-ing when the model is (pre)generating outputs without any offload on GPU and its also being confirmed in the detail section in task manager
Full CPU Usage
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.