Comments (9)
Super nice to hear that you are using the repo. Seems like you are working on some cool project, Please do share more about it if and when possible!
from fastllama.
Hi,
You can try increasing the "n_ctx" param in
fastLlama.Model(
id="ALPACA-LORA-30B",
path=str(MODEL_PATH.resolve()), #path to model
num_threads=16, #number of threads to use
n_ctx=512, #context size of model
last_n_size=64, #size of last n tokens (used for repetition penalty) (Optional)
seed=0 #seed for random number generator (Optional)
)
Although I suspect that the quality might be a bit poor for long context lengths, than the latest lama.cpp repo.
I am in-between updating the ggml library!
from fastllama.
I tried larger context size upto 2048. Still the same problem. Even when 10 GB of Rams and Swap memory was available. problem was persisting.
However, loading the model for each iteration in the loop solves the problem. But it's super-slow as you might expect.
I believe problem like these is solved or will likely be solved in original llama.cpp
repo thanks to huge contributers there.
Do you plan to keep on updating this repo alongside the original llama.cpp
library ? I want to help anyway I can as well. Although, regretfully, C/C++ is not something I can help :(. only python for the time being
from fastllama.
Yes we will continue to update if it makes sense. However we want to make sure we don't just copy files from one repo to another. Else we will just be playing catchup with them. Which isn't that fun!
Wrt to your issue, I will have a look into it and fix it soon!
from fastllama.
Appreciate this ! I find this a really notorious problem. Because there's not much difference between giving instructions interactively VS providing instructions through a for loop.
from fastllama.
Interesting, @amitsingh19975 has fixed this in the feature/refactor branch. We are going to merge it to main soon!
Closing this for now. Please feel free to reopen if needed.
from fastllama.
No. it's not solved. I tried with that branch. Basically, loading model once and trying to do inference in a loop causes segmentation fault still now.
from fastllama.
Could you share the model parameters and which model you are using? The main branch uses mmap to reduce the memory consumption that we will introduce in the future as low memory mode with a few changes, which I hope fixes your problem if it's related to the memory.
from fastllama.
@robin-coac Is this still the case?
from fastllama.
Related Issues (20)
- Cmake Error HOT 1
- Cannot build this HOT 5
- Pip support testing HOT 21
- from build.fastllama import Model, ModelKind ModuleNotFoundError: No module named 'build.fastllama' HOT 8
- convert-pth-to-ggml.py expects 2 parts for ALPACA-LORA-13B, but it has only one HOT 5
- Bad Magic error HOT 6
- When stop words are reached, they get ingested, but are not forwarded to streaming_fn. HOT 4
- Enabling custom logger makes it crash at ingestion. HOT 1
- TypeError: Model.generate() got an unexpected keyword argument 'stop_word' HOT 2
- Pip uninstall not removing the package HOT 2
- Designing the UI HOT 1
- Deciding the Schema for the protocol between webUI and webSocket Server HOT 2
- "No module named 'fastllama.api' " after pip installation HOT 10
- Implement the WebSocket Server
- Integrating + Testing webUI and WebSocket Server
- README.md is outdated in sections #running-llama and #running-alpaca-lora HOT 1
- how to load model in webui ? HOT 3
- Port llama.cpp openCL support to fastllama?
- Webui UX issue on mobile
- GGUF and/or LLama-3 support?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastllama.