Giter Site home page Giter Site logo

atome-fe / llama-node Goto Github PK

View Code? Open in Web Editor NEW
851.0 15.0 62.0 31.11 MB

Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.

Home Page: https://llama-node.vercel.app/

License: Apache License 2.0

JavaScript 20.66% Rust 45.49% TypeScript 30.97% Python 1.43% C 0.01% CSS 0.61% Makefile 0.12% HTML 0.26% MDX 0.45%
ai gpt large-language-models llama llama-rs llm napi napi-rs nodejs llama-node embeddings llamacpp langchain rwkv

llama-node's Introduction

LLaMA Node

llama-node: Node.js Library for Large Language Model

NPM License npm npm Discord twitter

LLaMA generated by Stable diffusion

Picture generated by stable diffusion.



Introduction

This project is in an early stage and is not production ready, we do not follow the semantic versioning. The API for nodejs may change in the future, use it with caution.

This is a nodejs library for inferencing llama, rwkv or llama derived models. It was built on top of llm (originally llama-rs), llama.cpp and rwkv.cpp. It uses napi-rs for channel messages between node.js and llama thread.

Supported models

llama.cpp backend supported models (in GGML format):

llm(llama-rs) backend supported models (in GGML format):

  • GPT-2
  • GPT-J
  • LLaMA: LLaMA, Alpaca, Vicuna, Koala, GPT4All v1, GPT4-X, Wizard
  • GPT-NeoX: GPT-NeoX, StableLM, RedPajama, Dolly v2
  • BLOOM: BLOOMZ

rwkv.cpp backend supported models (in GGML format):

Supported platforms

  • darwin-x64
  • darwin-arm64
  • linux-x64-gnu (glibc >= 2.31)
  • linux-x64-musl
  • win32-x64-msvc

Node.js version: >= 16


Installation

  • Install llama-node npm package
npm install llama-node
  • Install anyone of the inference backends (at least one)

    • llama.cpp
    npm install @llama-node/llama-cpp
    • or llm
    npm install @llama-node/core
    • or rwkv.cpp
    npm install @llama-node/rwkv-cpp

Manual compilation

Please see how to start with manual compilation on our contribution guide


CUDA support

Please read the document on our site to get started with manual compilation related to CUDA support


Acknowledgments

This library was published under MIT/Apache-2.0 license. However, we strongly recommend you to cite our work/our dependencies work if you wish to reuse the code from this library.

Models/Inferencing tools dependencies

Some source code comes from


Community

Join our Discord community now! Click to join llama-node Discord

llama-node's People

Contributors

dinex-dev avatar fardjad avatar hlhr202 avatar tommoffat avatar triestpa avatar yorkzero831 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llama-node's Issues

Segmentation fault local cuda build

I have successfully build llama-cpp.linux-x64-gnu.node with cuda enabled.

When I try to load there is a Segmentation fault

I have installed segfault-handler and I get the following log

PID 130046 received SIGSEGV for address: 0x1
.../tmp/test-llama-server/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3236)[0x7f82483dc236]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7f8248083140]
/lib/x86_64-linux-gnu/libc.so.6(+0x15d319)[0x7f8247ff9319]
<home>/.llama-node/libllama.so(llama_init_from_file+0x522)[0x7f82442c2e72]
.../tmp/test-llama-server/llama-node/packages/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node(+0x45a0a)[0x7f8244402a0a]
.../tmp/test-llama-server/llama-node/packages/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node(+0x35cbb)[0x7f82443f2cbb]
.../tmp/test-llama-server/llama-node/packages/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node(+0x9d550)[0x7f824445a550]
.../tmp/test-llama-server/llama-node/packages/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node(+0x97b01)[0x7f8244454b01]
.../tmp/test-llama-server/llama-node/packages/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node(+0x963c6)[0x7f82444533c6]
.../tmp/test-llama-server/llama-node/packages/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node(+0xa33a5)[0x7f82444603a5]
.../tmp/test-llama-server/llama-node/packages/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node(+0x8ea85)[0x7f824444ba85]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f8248077ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f8247f97a2f]

My index.js is

import { LLM } from "./llama-node/dist/index.js";
import { LLamaCpp } from "./llama-node/dist/llm/llama-cpp.js";

const llama = new LLM(LLamaCpp);

await llama.load({
    path: `./models/wizard-mega-13B.ggml.q5_0.bin`,
    enableLogging: true,
    nCtx: 1024,
    nParts: -1,
    seed: 0,
    f16Kv: false,
    logitsAll: false,
    vocabOnly: false,
    useMlock: false,
    embedding: false,
    useMmap: true,
    nGpuLayers: 40,
});

Any idea of what to do to find more details?

use with Next.js (webpack fails to load binary file)

I'm trying to use it in Next.js API action, but getting error:

- error ./node_modules/@llama-node/llama-cpp/@llama-node/llama-cpp.darwin-arm64.node
Module parse failed: Unexpected character '๏ฟฝ' (1:0)
You may need an appropriate loader to handle this file type, currently no loaders are configured to process this file. See https://webpack.js.org/concepts#loaders
(Source code omitted for this binary file)

Import trace for requested module:
./node_modules/@llama-node/llama-cpp/@llama-node/llama-cpp.darwin-arm64.node
./node_modules/@llama-node/llama-cpp/index.js
./node_modules/llama-node/dist/llm/llama-cpp.js

Do I need to add some binary loader rule to webpack config, or somehow skip its loading?

[Error: Could not load model] { code: 'GenericFailure' }

Getting this error [Error: Could not load model] { code: 'GenericFailure' } when trying to load a model:

$ node ./bin/llm/llm.js --model ~/models/gpt4-alpaca-lora-30B.ggml.q5_1.bin
[Error: Could not load model] { code: 'GenericFailure' }

I've modified the example a bit to take an argument as --model

import minimist from 'minimist';
import { LLM } from "llama-node";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import path from "path";
const args = minimist(process.argv.slice(2));
const modelPath = args.model;
const model = path.resolve(modelPath);
const llama = new LLM(LLamaRS);
const template = `how are you`;
const prompt = `Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

${template}

### Response:`;

const params = {
    prompt,
    numPredict: 128,
    temp: 0.2,
    topP: 1,
    topK: 40,
    repeatPenalty: 1,
    repeatLastN: 64,
    seed: 0,
    feedPrompt: true,
};
const run = async () => {
    try {
        await llama.load({ path: model });
        await llama.createCompletion(params, (response) => {
            process.stdout.write(response.token);
        });
    } catch (err) {
        console.error(err);
    }
};

run();

Issue to build GPU version

I am trying to build a GPU version of llama-node/lama-cpp but running into the issue with napi package. The package does not get installed automatically when I try to install it I get an error that supported platform for napi is Node 0.4. Am I missing something? Has anyone build this successfully for the Nvidia GPU?

[llama-cpp]$ pnpm build:cuda

Checking environment...

Checking rustc...โœ…
Checking cargo...โœ…
Checking cmake...โœ…
Checking llvm...โœ…
Checking clang.../bin/sh: clang: command not found
Checking gcc...โœ…
Checking nvcc...โœ…

Compiling...

/bin/sh: napi: command not found
[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "127".] {
code: 'ERR_UNHANDLED_REJECTION'
}

Napi Error:

npm WARN EBADENGINE Unsupported engine {
npm WARN EBADENGINE package: '[email protected]',
npm WARN EBADENGINE required: { node: '~0.4' },
npm WARN EBADENGINE current: { node: 'v16.14.0', npm: '8.3.1' }
npm WARN EBADENGINE }

foreign exception error

llama.cpp: loading model from /home/ubuntu/models/ggml-gpt4all-j-v1.3-groovy.bin
fatal runtime error: Rust cannot catch foreign exceptions
Aborted2023-05-18T11:49:39.270396Z  INFO surrealdb::net: SIGTERM received. Start graceful shutdown...    
2023-05-18T11:49:39.270450Z  INFO surrealdb::net: Shutdown complete. Bye!    

==> /var/log/descriptive-web.log <==

llama-node/llama-cpp uses more memory than standalone llama.cpp with the same parameters

I'm trying to process a large text file. For the sake of reproducibility, let's use this. The following code:

Expand to see the code
import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "node:path";
import fs from "node:fs";

const model = path.resolve(
    process.cwd(),
    "/path/to/model.bin"
);
const llama = new LLM(LLamaCpp);
const prompt = fs.readFileSync("./path/to/file.txt", "utf-8");

await llama.load({
    enableLogging: true,
    modelPath: model,

    nCtx: 4096,
    nParts: -1,
    seed: 0,
    f16Kv: false,
    logitsAll: false,
    vocabOnly: false,
    useMlock: false,
    embedding: false,
    useMmap: false,
    nGpuLayers: 0,
});

await llama.createCompletion(
    {
        nThreads: 8,
        nTokPredict: 256,
        topK: 40,
        prompt,
    },
    (response) => {
        process.stdout.write(response.token);
    }
);

Crashes the process with a segfault error:

ggml_new_tensor_impl: not enough space in the scratch memory
segmentation fault  node index.mjs

When I compile the exact same version of llama.cpp and run it with the following args:

./main -m /path/to/ggml-vic7b-q5_1.bin -t 8 -c 4096 -n 256 -f ./big-input.txt

It runs perfectly fine (of course with a warning that the context is larger than what the model supports but it doesn't crash with a segfault).

Comparing the logs:

llama-node Logs
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4936280.75 KB
llama_model_load_internal: mem required  = 6612.59 MB (+ 2052.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  = 4096.00 MB
[Sun, 28 May 2023 14:35:50 +0000 - INFO - llama_node_cpp::context] - AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
[Sun, 28 May 2023 14:35:50 +0000 - INFO - llama_node_cpp::llama] - tokenized_stop_prompt: None
ggml_new_tensor_impl: not enough space in the scratch memory
llama.cpp Logs
main: warning: model does not support context sizes greater than 2048 tokens (4096 specified);expect poor results
main: build = 561 (5ea4339)
main: seed  = 1685284790
llama.cpp: loading model from ../my-llmatic/models/ggml-vic7b-q5_1.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  72.75 KB
llama_model_load_internal: mem required  = 6612.59 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  = 2048.00 MB

system_info: n_threads = 8 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = 256, n_keep = 0

Looks like the context size in llama-node is set to 4GBs and the kv self size is twice as large as what llama.cpp used.

I'm not sure if I'm missing something in my Load/Invocation config or if that's an issue in llama-node. Can you please have a look?

Bumping from 0.1.5 to 0.1.6 resulting with `Error: invariant broken`

Got an LLM running with GPT4All models (tried with ggml-gpt4all-j-v1.3-groovy.bin and ggml-gpt4all-l13b-snoozy.bin).

Version 0.1.5: - Works
Version 0.1.6 - Results with with Error: invariant broken: 999255479 <= 2 in Some("{PATH_TO}/ggml-gpt4all-j-v1.3-groovy.bin")

Package versions:

"@llama-node/core": "0.1.6",
"@llama-node/llama-cpp": "0.1.6",
"llama-node": "0.1.6",
/* eslint-disable @typescript-eslint/no-unused-vars */
/* eslint-disable @typescript-eslint/no-var-requires */
import { ModelType } from '@llama-node/core';
import { LLM } from 'llama-node';
// @ts-expect-error
import { LLMRS } from 'llama-node/dist/llm/llm-rs.cjs';
import path from 'path';

const modelPath = path.join(
  __dirname,
  '..',
  'models',
  'ggml-gpt4all-j-v1.3-groovy.bin',
);
const llama = new LLM(LLMRS);

const toChatTemplate = (prompt: string) => `### Instruction:
${prompt}

### Response:`;

export const createCompletion = async (
  prompt: string,
  onData: (data: string) => void,
  onDone: () => void,
) => {
  const params = {
    prompt: toChatTemplate(prompt),
    numPredict: 128,
    temperature: 0.8,
    topP: 1,
    topK: 40,
    repeatPenalty: 1,
    repeatLastN: 64,
    seed: 0,
    feedPrompt: true,
  };
  await llama.load({ modelPath, modelType: ModelType.GptJ });
  await llama.createCompletion(params, (response) => {
    if (response.completed) {
      return onDone();
    } else {
      onData(response.token);
    }
  });
};

llama-node does not display prompt in Deno environment

I am currently trying to use the [email protected] library with Deno. However, when I execute my program, I am not getting the expected prompt result. Instead, the Rust backend is returning Ok(()). I am suspecting that the issue might be related to a compatibility problem between [email protected] and the Deno runtime environment or am I missing something?

Any help or guidance on how to resolve this issue would be greatly appreciated. Thank you in advance for your assistance.

  • Lllama-node: [email protected]
  • Runtime: deno 1.32.5
  • Model: ggml-model-q4_0.bin / 7B alpaca 4bit quantized (Ran flawless with llama-rs itself)

main.ts code:

import { resolve } from "https://deno.land/[email protected]/path/mod.ts";
import { LLama } from "npm:[email protected]";
import { LLamaRS } from "npm:[email protected]/dist/llm/llama-rs.js";

const llama = new LLama(LLamaRS);

llama.load({
  path: resolve(Deno.cwd(), "./models/7B/ggml-model-q4_0.bin"),
});

const template = `How are you?`;

const prompt = `### Human:

${template}

### Assistant:`;

llama.createCompletion(
  {
      prompt,
      numPredict: 128,
      temp: 0.2,
      topP: 1,
      topK: 40,
      repeatPenalty: 1,
      repeatLastN: 64,
      seed: 0,
      feedPrompt: true,
  },
  (response) => {
      console.log(response.token)
  }
);

console.log output:

DEBUG RS - deno_runtime::worker:418 - received module evaluate Ok(
    Ok(
        (),
    ),
)

getting error after second run

thread 'tokio-runtime-worker' panicked at 'Builder::init should not be called after logger initialized: SetLoggerError(())', /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/env_logger-0.10.0/src/lib.rs:816:14stack backtrace:   0: rust_begin_unwind
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
   1: core::panicking::panic_fmt
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
   2: core::result::unwrap_failed
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/result.rs:1750:5
   3: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
   4: tokio::runtime::task::raw::poll
   5: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   6: tokio::runtime::task::raw::poll
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

dyld[50013]: missing symbol called


import { LLama } from "llama-node";
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";


const model = path.resolve(process.cwd(), "./src/ggml-vicuna-13b-1.1-q4_0.bin");


const llama = new LLama(LLamaCpp);

const config: LoadConfig = {
    path: model,
    enableLogging: true,
    nCtx: 1024,
    nParts: -1,
    seed: 0,
    f16Kv: false,
    logitsAll: false,
    vocabOnly: false,
    useMlock: false,
    embedding: false,
};

llama.load(config);

const template = `How are you`;

const prompt = `### Human:

${template}

### Assistant:`;

llama.createCompletion(
    {
        nThreads: 4,
        nTokPredict: 2048,
        topK: 40,
        topP: 0.1,
        temp: 0.2,
        repeatPenalty: 1,
        stopSequence: "### Human",
        prompt,
    },
    (response) => {
        process.stdout.write(response.token);
    }
);

My code was written

dyld[50013]: missing symbol called
sh: line 1: 50013 Abort trap: 6 node dist/app.js

I am getting an error, what is the reason and what is the solution?

[ERROR] cublas `TypeError: this.instance.inference is not a function`

since I compiled for using cuda core
first I had to add nGpuLayers (seem logic as it's an option available in llama.cpp)

then I obtain this error:

TypeError: this.instance.inference is not a function                                                    โ”‚
    at file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-cpp.js:54:23               โ”‚
    at new Promise (<anonymous>)                                                                        โ”‚
    at LLamaCpp.<anonymous> (file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-cpp.jโ”‚
s:53:14)                                                                                                โ”‚
    at Generator.next (<anonymous>)                                                                     โ”‚
    at file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-cpp.js:33:61               โ”‚
    at new Promise (<anonymous>)                                                                        โ”‚
    at __async (file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-cpp.js:17:10)     โ”‚
    at LLamaCpp.createCompletion (file:///root/git/llama-selfbot/node_modules/llama-node/dist/llm/llama-โ”‚
cpp.js:50:12)                                                                                           โ”‚
    at LLM.<anonymous> (/root/git/llama-selfbot/node_modules/llama-node/dist/index.cjs:56:23)           โ”‚
    at Generator.next (<anonymous>)

did I missed something or did something wrong ?

Get started guideline for contributors

This issue just note down the getting start guidelines for contributors, will close after we move these to documentations.

for contributors to start nodejs environment

  1. prepare nodejs >= 16
  2. prepare pnpm npm install -g pnpm
  3. install without scripts (to ignore preinstall binary build) pnpm install --ignore-scripts
  4. prepare github cli https://cli.github.com/
  5. download binary artifacts pnpm run artifacts

for native addon development, install

  1. rust
  2. cmake
  3. clang/gcc

Failed to load modal

llama.cpp: loading model from C:/Users/bazha/work/ai/models/llama/ggml-vic13b-uncensored-q8_0.bin
error loading model: unrecognized tensor type 8

llama_init_from_file: failed to load model
[2023-04-28T12:32:46Z INFO  llama_node_cpp::context] AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
[2023-04-28T12:32:46Z INFO  llama_node_cpp::llama] tokenized_stop_prompt: None

here the code, i just copied from the website:

import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";

const model = "C:/Users/bazha/work/ai/models/llama/ggml-vic13b-uncensored-q8_0.bin"
const llama = new LLM(LLamaCpp);
const config = {
    path: model,
    enableLogging: true,
    nCtx: 1024,
    nParts: -1,
    seed: 0,
    f16Kv: false,
    logitsAll: false,
    vocabOnly: false,
    useMlock: false,
    embedding: false,
    useMmap: true,
};
llama.load(config);

const template = `How are you?`;
const prompt = `A chat between a user and an assistant. USER: ${template}; ASSISTANT:`;

llama.createCompletion({
    nThreads: 4,
    nTokPredict: 2048,
    topK: 40,
    topP: 0.1,
    temp: 0.2,
    repeatPenalty: 1,
    prompt,
}, (response) => {
    process.stdout.write(response.token);
});

node: v18.12.1
@llama-node/llama-cpp: ^0.0.31
llama-node: ^0.0.31

model i downloaded

Version 0.0.34 gives drastic different ouput when paired with vicuna.

The same code when user with 0.0.34 versus 0.0.26 gives very weird output effects when paired with the vicuna 7 B model

0.0.26 works perfectly, so happy to stay on it, but whatever 0.0.34 shipped has broken something.

See example screens bellow.

Prompt: "Count from 1 to 10"

0.0.26:

Screenshot 2023-05-04 at 22 22 39

0.0.34:

Screenshot 2023-05-04 at 22 21 58

As you can see the model goes haywire, and this is the same across many different prompts.

Code only using 4 CPU, when I have 16 CPU

This is the code that I am using

import {RetrievalQAChain} from 'langchain/chains';
import {HNSWLib} from "langchain/vectorstores";
import {RecursiveCharacterTextSplitter} from 'langchain/text_splitter';
import {LLamaEmbeddings} from "llama-node/dist/extensions/langchain.js";
import {LLM} from "llama-node";
import {LLamaCpp} from "llama-node/dist/llm/llama-cpp.js";
import * as fs from 'fs';
import * as path from 'path';

const txtFilename = "TrainData";
const txtPath = ./${txtFilename}.txt;
const VECTOR_STORE_PATH = ${txtFilename}.index;
const model = path.resolve(process.cwd(), './h2ogptq-oasst1-512-30B.ggml.q5_1.bin');
const llama = new LLM(LLamaCpp);
const config = {
path: model,
enableLogging: true,
nCtx: 1024,
nParts: -1,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: true,
useMmap: true,
};
var vectorStore;
const run = async () => {
await llama.load(config);
if (fs.existsSync(VECTOR_STORE_PATH)) {
console.log('Vector Exists..');
vectorStore = await HNSWLib.fromExistingIndex(VECTOR_STORE_PATH, new LLamaEmbeddings({maxConcurrency: 1}, llama));
} else {
console.log('Creating Documents');
const text = fs.readFileSync(txtPath, 'utf8');
const textSplitter = new RecursiveCharacterTextSplitter({chunkSize: 1000});
const docs = await textSplitter.createDocuments([text]);
console.log('Creating Vector');
vectorStore = await HNSWLib.fromDocuments(docs, new LLamaEmbeddings({maxConcurrency: 1}, llama));
await vectorStore.save(VECTOR_STORE_PATH);
}
console.log('Testing Vector via Similarity Search');
const resultOne = await vectorStore.similaritySearch("what is a template", 1);
console.log(resultOne);
console.log('Testing Vector via RetrievalQAChain');
const chain = RetrievalQAChain.fromLLM(llama, vectorStore.asRetriever());
const res = await chain.call({
query: "what is a template",
});
console.log({res});
};
run();

It is only using 4 CPU at the time of "vectorStore = await HNSWLib.fromDocuments(docs, new LLamaEmbeddings({maxConcurrency: 1}, llama));"

Can we change anything for it to use more than 4 CPU?

tsx: not found while running npm install

Hi there! Nice package.

I'm trying to install this in a brand project, but I'm facing this issue

npm ERR! code 127
npm ERR! path /MYPATHHERE/node_modules/@llama-node/core
npm ERR! command failed
npm ERR! command sh -c -- tsx scripts/build.ts
npm ERR! sh: 1: tsx: not found

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/jonit/.npm/_logs/2023-04-07T01_54_20_183Z-debug-0.log

Any toughts?

Is it a typo?

Shouldn't it be tsc scripts/build.ts?

Error: Missing field `nGpuLayers`

Hello guys, i try to run mpt-7b model , and i am getting this code, i appreciate any help, here is the detail

Node.js v19.5.0

node_modules\llama-node\dist\llm\llama-cpp.cjs:82
this.instance = yield import_llama_cpp.LLama.load(path, rest, enableLogging);
^

Error: Missing field `nGpuLayers` at LLamaCpp.<anonymous> (<path>\node_modules\llama-node\dist\llm\llama-cpp.cjs:82:52) at Generator.next (<anonymous>) at <path>\node_modules\llama-node\dist\llm\llama-cpp.cjs:50:61 at new Promise (<anonymous>) at __async (<path>\node_modules\llama-node\dist\llm\llama-cpp.cjs:34:10) at LLamaCpp.load (<path>\node_modules\llama-node\dist\llm\llama-cpp.cjs:80:12) at LLM.load (<path>\node_modules\llama-node\dist\index.cjs:52:21) at run (file:///<path>/index.mjs:27:17) at file:///<path>/index.mjs:42:1 at ModuleJob.run (node:internal/modules/esm/module_job:193:25) { code: 'InvalidArg' }

folder structure
image

index.mjs

import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.cjs"
import path from "path"

const model = path.resolve(process.cwd(), "./model/ggml-mpt-7b-base.bin");
const llama = new LLM(LLamaCpp);
const config = {
    path: model,
    enableLogging: true,
    nCtx: 1024,
    nParts: -1,
    seed: 0,
    f16Kv: false,
    logitsAll: false,
    vocabOnly: false,
    useMlock: false,
    embedding: false,
    useMmap: true,
};

const template = `How are you?`;
const prompt = `A chat between a user and an assistant.
USER: ${template}
ASSISTANT:`;

const run = async () => {
    await llama.load(config);

    await llama.createCompletion({
        nThreads: 4,
        nTokPredict: 2048,
        topK: 40,
        topP: 0.1,
        temp: 0.2,
        repeatPenalty: 1,
        prompt,
    }, (response) => {
        process.stdout.write(response.token);
    });
}

run();

thank you for your time

TypeError: llm._modelType is not a function

This is the code I am using

import {RetrievalQAChain} from 'langchain/chains';
import {HNSWLib} from "langchain/vectorstores";
import {RecursiveCharacterTextSplitter} from 'langchain/text_splitter';
import {LLamaEmbeddings} from "llama-node/dist/extensions/langchain.js";
import {LLM} from "llama-node";
import {LLamaCpp} from "llama-node/dist/llm/llama-cpp.js";
import * as fs from 'fs';
import * as path from 'path';

const txtFilename = "TrainData";
const txtPath = ./${txtFilename}.txt;
const VECTOR_STORE_PATH = ${txtFilename}.index;
const model = path.resolve(process.cwd(), './h2ogptq-oasst1-512-30B.ggml.q5_1.bin');
const llama = new LLM(LLamaCpp);
const config = {
path: model,
enableLogging: true,
nCtx: 1024,
nParts: -1,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: true,
useMmap: true,
};
var vectorStore;
const run = async () => {
await llama.load(config);
if (fs.existsSync(VECTOR_STORE_PATH)) {
console.log('Vector Exists..');
vectorStore = await HNSWLib.fromExistingIndex(VECTOR_STORE_PATH, new LLamaEmbeddings({maxConcurrency: 1}, llama));
} else {
console.log('Creating Documents');
const text = fs.readFileSync(txtPath, 'utf8');
const textSplitter = new RecursiveCharacterTextSplitter({chunkSize: 1000});
const docs = await textSplitter.createDocuments([text]);
console.log('Creating Vector');
vectorStore = await HNSWLib.fromDocuments(docs, new LLamaEmbeddings({maxConcurrency: 1}, llama));
await vectorStore.save(VECTOR_STORE_PATH);
}
console.log('Testing Vector via RetrievalQAChain');
const chain = RetrievalQAChain.fromLLM(llama, vectorStore.asRetriever());
const res = await chain.call({
query: "what is a template",
});
console.log({res});
};
run();

At this line "const chain = RetrievalQAChain.fromLLM(llama, vectorStore.asRetriever());" It is throwing this error

file:///root/project/node_modules/langchain/dist/chains/prompt_selector.js:34
return llm._modelType() === "base_chat_model";
^

TypeError: llm._modelType is not a function
at isChatModel (file:///root/project/node_modules/langchain/dist/chains/prompt_selector.js:34:16)
at ConditionalPromptSelector.getPrompt (file:///root/project/node_modules/langchain/dist/chains/prompt_selector.js:23:17)
at loadQAStuffChain (file:///root/project/node_modules/langchain/dist/chains/question_answering/load.js:20:41)
at RetrievalQAChain.fromLLM (file:///root/project/node_modules/langchain/dist/chains/retrieval_qa.js:69:25)
at run (file:///root/project/index.js:47:36)

How can we fix this issue?

"libstdc++.so.6: version `GLIBCXX_3.4.29' not found" error with docker

I've added llama-node to my project and it's working locally on my windows machine, but when I try to run it in my docker container I get this error:

Error: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /app/node_modules/@llama-node/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node)
    at Object.Module._extensions..node (node:internal/modules/cjs/loader:1361:18)
    at Module.load (node:internal/modules/cjs/loader:1133:32)
    at Function.Module._load (node:internal/modules/cjs/loader:972:12)
    at Module.require (node:internal/modules/cjs/loader:1157:19)
    at require (node:internal/modules/helpers:119:18)
    at Object.<anonymous> (/app/node_modules/@llama-node/llama-cpp/index.js:188:31)
    at Module._compile (node:internal/modules/cjs/loader:1275:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1329:10)
    at Module.load (node:internal/modules/cjs/loader:1133:32)
    at Function.Module._load (node:internal/modules/cjs/loader:972:12)

I've tried a ton of things: different base images, updating libstdc++6, etc

How to use the GPU?

Hi, I am looking for a way for this library to trigger the inference generation using the GPU.
Is this supported?

no kernel image is available for execution on the device

~/llama-node/packages/llama-cpp$ node example/mycode.ts
llama.cpp: loading model from /llama-node/packages/llama-cpp/ggml-vic7b-uncensored-q5_1.bin
llama_model_load_internal: format = ggjt v2 (latest)
llama_model_load_internal: n_vocab = 32001
llama_model_load_internal: n_ctx = 1024
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 9 (mostly Q5_1)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 72.75 KB
llama_model_load_internal: mem required = 6612.59 MB (+ 2052.00 MB per state)
llama_model_load_internal: [cublas] offloading 8 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 1158 MB
llama_init_from_file: kv self size = 1024.00 MB
[Fri, 26 May 2023 09:45:06 +0000 - INFO - llama_node_cpp::context] - AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
[Fri, 26 May 2023 09:45:06 +0000 - INFO - llama_node_cpp::llama] - tokenized_stop_prompt: None
CUDA error 209 at /llama-node/packages/llama-cpp/llama-sys/llama.cpp/ggml-cuda.cu:693: no kernel image is available for execution on the device

got TESLA K80 card running on ubuntu. any advice what to do and where to look would be appreciated

Llama.cpp Typescript: Cannot find name 'LoadModel'

There is an error when compiling Typescript code using Llama.cpp:

$ tsc -p .
node_modules/@llama-node/llama-cpp/index.d.ts:137:31 - error TS2304: Cannot find name 'LoadModel'.

137   static load(params: Partial<LoadModel>, enableLogger: boolean): Promise<LLama>

I found that changing in llama-cpp/index.d.ts line 137 static load(params: Partial<LoadModel> by static load(params: Partial<ModelLoad> works. It is probably a typo.
I found another reference to LoadModel in llama-cpp/src/lib.rs, so unfortunately as I don't know Rust I can not suggest a valid fix in a PR.

Error: Failed to convert napi value Function into rust type `f64`

Getting this error: Error: Failed to convert napi value Function into rust type f64``

import { LLM } from "llama-node";
//import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";
import os from 'os';

export default class AI {
    constructor(chat, msg) {
        this.chat = chat;
        this.msg = msg;
        this.model = path.resolve(os.homedir(), 'models', path.basename(this.chat.model));
        this.llama = new LLM(LLamaCpp);
        this.template = this.msg.body;
        this.prompt = `Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

${this.template}

### Response:`;


        this.cppConfig = {
            enableLogging: true,
            nCtx: 1024,
            nParts: -1,
            seed: 0,
            f16Kv: false,
            logitsAll: false,
            vocabOnly: false,
            useMlock: false,
            embedding: false,
            useMmap: true,
        };

        this.cppParams = {
            prompt: this.prompt,
            nThreads: 4,
            nTokPredict: 2048,
            topK: 40,
            topP: 0.1,
            temp: this.model.temp || 0.2,
            repeatPenalty: this.model.repeat || 1,
        };

        this.getAIResponse = this.getAIResponse.bind(this);
    }


    async getAIResponse() {
        console.log('calling ai: ', this.chat);
        try {
            await this.llama.load({ path: this.model, ...this.cppConfig });
            await this.llama.createCompletion(this.cppParams, (response) => {
                process.stdout.write(JSON.stringify({ prompt, response: response.token }));
                process.stdout.write(response.token);

                return {
                    prompt: this.prompt,
                    response: response.token
                };
            });
        } catch (err) {
            console.error(err);
        }
    }
}

Model is: WizardLM-7B-uncensored.ggml.q5_1.bin

How do I force the end of text generation?

I am writing a telegram bot that works as a chat bot based on llama. When a user writes something the bot replies to the user and generates text in real-time. But llama crashes very often and starts communicating with itself for example
"""
ASSISTANT: Hello! How can I assist you today?
!!!Starts communicating with herself!!!
HUMAN: I don't know what to do.
ASSIST: Oh, that's okay! We can start with something simple. Can you tell me about your day so far?
HUMAN: I've been trying to write some code, but I keep getting errors.
"""
So I decided to catch the word HUMAN: and if such a word is detected, to forcibly stop generating text. But here is the problem, I don't understand how to do this with this library. I'm not very experienced so don't scold too much :)
Here is a piece of code responsible for answering the user

bot.on("text", ctx => {
    const input_text = ctx.message.text;

    var conv = plus_res("HUMAN", input_text)
    var res = get_res_ai(conv)
    plus_res("ASSISTANT", res)
    ctx.reply("Starting generate")

    
     function get_res_ai(prompt){
        var textAll = ""
        const newMessageId = ctx.message.message_id + 1
        llama.createCompletion({
            nThreads: 8,
            nTokPredict: 2048,
            topK: 40,
            topP: 0.95,
            temp: 0.8,
            repeatPenalty: 1,
            prompt,
        }, (response) => {
            textAll += response.token
            if (textAll.includes("HUMAN:")) {
                textAll = textAll.replace(/HUMAN:/g, '')
                ctx.telegram.editMessageText(ctx.chat.id, newMessageId, undefined, textAll).catch(error => console.error(error));
                //This is where the generation must stop
            }

            ctx.telegram.editMessageText(ctx.chat.id, newMessageId, undefined, textAll).catch(error => console.error(error));
        });
        return textAll
    }
    
});```

Option to get value as buffer - Support emoji

When you get tokens as text, some information is lost, for example, emojis that are built with more than one byte.
The token sends half an emoji, and the utf8 encodes that to an unknown character.

I thought about 2 possible solutions:

  1. to send as a buffer to ensure no information is lost, and let the client handle it
  2. if you see the last character is encoded to unknown save the byte for the next token.

I implement the second solution for something similar, maybe it could help
massage-builder

zsh: illegal hardware instruction

I get zsh: illegal hardware instruction when executing with v14.17.3 a inference script like

import { LLama } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";

const model = path.resolve(process.cwd(), "../gpt4all/gpt4all-converted.bin");

const llama = new LLama(LLamaCpp);

const config = {
    path: model,
    enableLogging: true,
    nCtx: 1024,
    nParts: -1,
    seed: 0,
    f16Kv: false,
    logitsAll: false,
    vocabOnly: false,
    useMlock: false,
    embedding: false,
};

llama.load(config);

const template = `How are you`;

const prompt = `### Human:

${template}

### Assistant:`;

llama.createCompletion(
    {
        nThreads: 4,
        nTokPredict: 2048,
        topK: 40,
        topP: 0.1,
        temp: 0.2,
        repeatPenalty: 1,
        stopSequence: "### Human",
        prompt,
    },
    (response) => {
        process.stdout.write(response.token);
    }
);

I'm using

% tsc --version                        
Version 5.0.4
% npm --version
9.6.4
% node --version
v14.17.3

How can I install Types?

I got a lot of error messages like this one:

Cannot find module 'llama-node/dist/llm/llama-rs' or its corresponding type declarations.

How can I install the types?

app crashes when input is too long

thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Input too long', packages/llama-cpp/src/llama.rs:99:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
   1: core::panicking::panic_fmt
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
   2: core::result::unwrap_failed
             at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/result.rs:1750:5
   3: tokio::runtime::task::raw::poll

the software has no reaction with no errors

`import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";
import fs from 'fs';

process.on('unhandledRejection', error => {
console.error('Unhandled promise rejection:', error);
});
const model = path.resolve(process.cwd(), "../llama.cpp/models/13B/ggml-model-q4_0.bin");

if (!fs.existsSync(model)) {
console.error("Model file does not exist: ", model);
}
const llama = new LLM(LLamaCpp);
//console.log("model:", model)
const config = {
modelPath: model,
enableLogging: true,
nCtx: 1024,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: true,
useMmap: true,
nGpuLayers: 0
};
//console.log("config:", config)
const prompt = Who is the president of the United States?;
const params = {
nThreads: 4,
nTokPredict: 2048,
topK: 40,
topP: 0.1,
temp: 0.2,
repeatPenalty: 1.1,
prompt,
};
//console.log("params:", params)

try {
console.log("Loading model...");
await llama.load(config);
console.log("Model loaded");
} catch (error) {
console.error("Error loading model: ", error);
}

const response = await llama.createCompletion(params);
console.log(response)

const run = async () => {
try {
await llama.load(config);
console.log("load complete")
await llama.getEmbedding(params).then(console.log);
} catch (error) {
console.error("Error loading model or generating embeddings: ", error);
}
};
run();`

I added a lot thing to debug it and find that it ends in the lin 44: await llama.load(config); the sequence is just stopped there and the software terminated. no errors were caught.

Mac book pro with m1 max
mac os 13.4 (22F66)
node js v20.3.0

basic unified API for all backends

  • the syntax for different backends is similar, but many config properties (that do exactly the same thing) have different names, for example numPredict (llm-rs) vs n_tok_predict (llama.cpp)

  • A "universal api" would have the same property names for every backend, so that switching to a different one would be as simple as changing one parameter.

  • It would only support the basic features all backends have (prompt, number of tokens, stop sequence(?), temperature) (hence "basic" in the title)

  • Backend-specific features (save/loadSession from llm-rs, mmap from llama.cpp) would be unavailable, or throw an error for backends that haven't implemented them yet (bad because throwing errors for obscure reasons?)

  • Parameters that are optional for some backends but mandatory for others (like topP?) could be replaced with "sane" default values.

req: support async/await

Can we support async/await instead of callbacks?

const response = await llama.createCompletion(params);

Error: Too many tokens predicted

Why is this being thrown as an error?

return Err(napi::Error::from_reason("Too many tokens predicted"));

Forgive my ignorance if I'm missing something, but seems like this should just set response.completed = true;

Context for error:
prompt is a String, ex "Hello World, "
tokens is Integer, ex 128

this.model.createCompletion({
    nThreads: 4,
    nTokPredict: tokens,
    topK: 40,
    topP: 0.1,
    temp: 0.2,
    repeatPenalty: 1,
    prompt,
}, (response) => {
    completition.push(response.token)
    console.log(response.token)
    if(response.completed || count == tokens) {
        resolve(completition)
        return;
    }
}).catch((error) => {
    reject(new Error("Failed to generate completition: " + error));
});   

Again, if I'm not missing something it seems clunky to have to check it the error is actually just because we have hit the end token.

Ggml v3 support in Llama.cpp

Hi, thanks for this nice package. LLama.cpp made a new breaking change in their quantization methods recently (PR ref).

Would it be possible to get an update for the llama-node package to be able to use the ggml v3 models? Actually the new ggml models that come out are all using this format

Interactive

I'm new to llm and llama but learning fast, I've wrote a small piece of code to chat via cli, but it seems to not follow the context (ie work in interactive mode).

import { LLM } from "llama-node";
import readline from "readline";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
const saveSession = path.resolve(process.cwd(), "./tmp/session.bin");
const loadSession = path.resolve(process.cwd(), "./tmp/session.bin");

import path from "path";
const model = path.resolve(process.cwd(), "./ggml-vic7b-q4_1.bin"); 

const llama = new LLM(LLamaRS);
llama.load({ path: model });


var rl = readline.createInterface(process.stdin, process.stdout);
console.log("Chatbot started!");
rl.setPrompt("> ");
rl.prompt();
rl.on("line", async function (line) {
    const prompt = `A chat between a user and an assistant.
    USER: ${line}
    ASSISTANT:`;
    llama.createCompletion({
        prompt,
        numPredict: 128,
        temp: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
        feedPrompt: true,
        saveSession,
        loadSession,
    }, (response) => {
        if(response.completed) {
            process.stdout.write('\n'); 
            rl.prompt(); 
        } else {
            process.stdout.write(response.token);
        }  
    });
});

I'm missing something?

Expand on install/usage instructions

readme states, install:

npm install llama-node
and provides some usage examples, which don't actually work, it is typescript, so there are more requirements.

Please provide all prerequisite installation instructions that actually provides a working example.

It might all be second nature to experts, but for the less fortunate it is not as straight forward. (ie. don't redirect someone to some examples directory with a config file as that still requires prerequisite steps).

Illegal instruction (core dumped)

I am unable to use llama-node on a xeon E5-2690 v2, I get the error:

Illegal instruction (core dumped)

I am assuming it is because it doesn't support AVX2, as llama-node works on my i7 12700F. Is there any way to get it to work?

langchain integration

Hello, I'm trying to use the langchin integration but I cannot figure out how to use it, I'm following some examples in langchain:

import { LLM } from "llama-node";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import readline from "readline";
import fs from "fs";
import path from "path";
import { SerpAPI } from 'langchain/tools';
import {initializeAgentExecutorWithOptions} from 'langchain/agents';
import { Calculator } from 'langchain/tools/calculator';
import { LLamaEmbeddings } from "llama-node/dist/extensions/langchain.js";

const SERPAPI_KEY = '';

const model = path.resolve(process.cwd(), "./ggml-vic7b-q4_1.bin"); 
const llama = new LLM(LLamaRS);
llama.load({ path: model });

const tools =[
    new SerpAPI(SERPAPI_KEY,{
        hl:'en',
        gl:'us'
    }),
    new Calculator(),
]

const executor = await initializeAgentExecutorWithOptions(tools, llama, {
    agentType: 'chat-zero-shot-react-description'
});
console.log('initialized')
const ret = await executor.call({
    input: "Who is Olivia Wilde's boyfrient? What is his age raised to the 0.23 power?"
});

console.log('ret:', ret.output);

but I got:

TypeError: this.llm.generatePrompt is not a function
    at LLMChain._call (file:///Users/lvx/dalai/node_modules/langchain/dist/chains/llm_chain.js:80:48)
    at async LLMChain.call (file:///Users/lvx/dalai/node_modules/langchain/dist/chains/base.js:65:28)
    at async LLMChain.predict (file:///Users/lvx/dalai/node_modules/langchain/dist/chains/llm_chain.js:98:24)
    at async ChatAgent._plan (file:///Users/lvx/dalai/node_modules/langchain/dist/agents/agent.js:197:24)
    at async AgentExecutor._call (file:///Users/lvx/dalai/node_modules/langchain/dist/agents/executor.js:82:28)
    at async AgentExecutor.call (file:///Users/lvx/dalai/node_modules/langchain/dist/chains/base.js:65:28)
    at async file:///Users/lvx/dalai/agent.js:35:13

i understand that is because the LLM model does not have this function is there any method to call it or do I have to create a translation class?

Embeddings.js file does not work correctly

This is Code:

import { LLM } from "llama-node";
import { LLMRS } from "llama-node/dist/llm/llm-rs.js";
import path from "path";
import fs from "fs";
const model = path.resolve(process.cwd(), "../../models/ggml-old-vic13b-q4_2.bin");
const llama = new LLM(LLMRS);
const getWordEmbeddings = async (prompt, file) => {
    const data = await llama.getEmbedding({
        prompt,
        numPredict: 128,
        temperature: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
    });
    console.log(prompt, data);
    await fs.promises.writeFile(path.resolve(process.cwd(), file), JSON.stringify(data));
};
const run = async () => {
    await llama.load({ modelPath: model, modelType: "Llama" /* ModelType.Llama */ });
    const dog1 = `My favourite animal is the dog`;
    await getWordEmbeddings(dog1, "./example/semantic-compare/dog1.json");
    const dog2 = `I have just adopted a cute dog`;
    await getWordEmbeddings(dog2, "./example/semantic-compare/dog2.json");
    const cat1 = `My favourite animal is the cat`;
    await getWordEmbeddings(cat1, "./example/semantic-compare/cat1.json");
};
run();

Output:

thread 'tokio-runtime-worker' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `5120`'

Failed to convert napi value into rust type `bool`

inference.mjs

import { LLM } from "llama-node";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";

const model = path.resolve(process.cwd(), "../ggml-vic7b-uncensored-q5_1.bin");
const llama = new LLM(LLamaCpp);

const config = {
path: model,
//modelPath: model, // does not exist in llama-cpp.js instead path used
enableLogging: true,
nCtx: 1024,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true,
nGpuLayers: 0
};

params passed

{
nCtx: 1024,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true,
nGpuLayers: 0
}
llama-node/node_modules/llama-node/dist/llm/llama-cpp.js:65
this.instance = yield LLama.load(path, rest, enableLogging);
^

Error: Failed to convert napi value into rust type bool
at LLamaCpp. (file:///home/simon/llama-node/node_modules/llama-node/dist/llm/llama-cpp.js:65:35)
at Generator.next ()
at file:///home/simon/llama-node/node_modules/llama-node/dist/llm/llama-cpp.js:33:61
at new Promise ()
at __async (file:///home/simon/llama-node/node_modules/llama-node/dist/llm/llama-cpp.js:17:10)
at LLamaCpp.load (file:///home/simon/llama-node/node_modules/llama-node/dist/llm/llama-cpp.js:61:14)
at LLM.load (/home/simon/llama-node/node_modules/llama-node/dist/index.cjs:52:21)
at run (file:///home/simon/llama-node/packages/llama-cpp/example-js/inference.mjs:39:17)
at file:///home/simon/llama-node/packages/llama-cpp/example-js/inference.mjs:45:1
at ModuleJob.run (node:internal/modules/esm/module_job:194:25) {
code: 'BooleanExpected'
}

I guess something isn't correct with prams passed any chance to know what is the correct way to pass it?

used model below
ggml-vic7b-uncensored-q5_1.bin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.