Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, You can try increasing the "n_ctx" param in <div class="snippet-clipboard-

Interesting, <a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Problems while Trying to Run code programatically about fastllama HOT 9 CLOSED

robin-coac commented on May 25, 2024

Problems while Trying to Run code programatically

from fastllama.

Comments (9)

PotatoSpudowski commented on May 25, 2024 1

Super nice to hear that you are using the repo. Seems like you are working on some cool project, Please do share more about it if and when possible!

from fastllama.

PotatoSpudowski commented on May 25, 2024

Hi,
You can try increasing the "n_ctx" param in

fastLlama.Model(
        id="ALPACA-LORA-30B",
        path=str(MODEL_PATH.resolve()), #path to model
        num_threads=16, #number of threads to use
        n_ctx=512, #context size of model
        last_n_size=64, #size of last n tokens (used for repetition penalty) (Optional)
        seed=0 #seed for random number generator (Optional)
    )

Although I suspect that the quality might be a bit poor for long context lengths, than the latest lama.cpp repo.

I am in-between updating the ggml library!

from fastllama.

robin-coac commented on May 25, 2024

I tried larger context size upto 2048. Still the same problem. Even when 10 GB of Rams and Swap memory was available. problem was persisting.
However, loading the model for each iteration in the loop solves the problem. But it's super-slow as you might expect.

I believe problem like these is solved or will likely be solved in original llama.cpp repo thanks to huge contributers there.

Do you plan to keep on updating this repo alongside the original llama.cpp library ? I want to help anyway I can as well. Although, regretfully, C/C++ is not something I can help :(. only python for the time being

from fastllama.

PotatoSpudowski commented on May 25, 2024

Yes we will continue to update if it makes sense. However we want to make sure we don't just copy files from one repo to another. Else we will just be playing catchup with them. Which isn't that fun!

Wrt to your issue, I will have a look into it and fix it soon!

from fastllama.

robin-coac commented on May 25, 2024

Appreciate this ! I find this a really notorious problem. Because there's not much difference between giving instructions interactively VS providing instructions through a for loop.

from fastllama.

PotatoSpudowski commented on May 25, 2024

Interesting, @amitsingh19975 has fixed this in the feature/refactor branch. We are going to merge it to main soon!

Closing this for now. Please feel free to reopen if needed.

from fastllama.

robin-coac commented on May 25, 2024

No. it's not solved. I tried with that branch. Basically, loading model once and trying to do inference in a loop causes segmentation fault still now.

from fastllama.

amitsingh19975 commented on May 25, 2024

Could you share the model parameters and which model you are using? The main branch uses mmap to reduce the memory consumption that we will introduce in the future as low memory mode with a few changes, which I hope fixes your problem if it's related to the memory.

from fastllama.

PotatoSpudowski commented on May 25, 2024

@robin-coac Is this still the case?

from fastllama.

Problems while Trying to Run code programatically about fastllama HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent