xyzhang626 / embeddings.cpp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from skeskinen/bert.cpp

50.0 50.0 3.0 180 KB

ggml implementation of embedding models including SentenceTransformer and BGE

License: MIT License

Shell 0.78% C++ 66.94% Python 14.17% C 11.34% CMake 6.77%

embeddings.cpp's People

Contributors

Stargazers

Watchers

Forkers

snowyu iamlemec jaepil

embeddings.cpp's Issues

Illegal instruction (core dumped)

I'm following the examples from the readme.
When I run:

python download-repo.py BAAI/bge-base-en-v1.5 # or any other model
sh run_conversions.sh bge-base-en-v1.5

It fails at the quantization step:

[...]
Done. Output file: bge-base-en-v1.5/ggml-model-f16.bin

Illegal instruction (core dumped)
Illegal instruction (core dumped)

Is there something I can do to make it work?

Cuda?

Do you have any plans to support the other backends that LlamaCPP supports so that this can be accelerated?

tokenizer.json seems ignored in the model conversion

In convert-to-ggml.py line 34, tokenizer.json seems ignored when convert model to ggml

Tokenizer works inconsistently but better for bge zh series model

Summary

This repo's tokenizer works consistently with huggingface's tokenizer in the most cases and works inconsistently but possibly better bge zh series model.

Details

bge-small-zh-v1.5 tokenizer will be bad at 1) words with capital letter 2) accent letter. It can be caused by the normalization setting of it.

For example, in the case 大家好我是GPT, hf tokenizer (left column) can not recognize the upper GPT but tokenizer in this repo (right column) can do it.

It's similar for the accent case.

If you find any more differences between tokenizer in this repo with the huggingface one, please let me know I will try to fix it.

bert_encode() not thread-safe

In server.cpp, bert_encode seems not thread-safe, different invocation works in same memory buffer in bert_context

while(true) {
    std::string string_in = receive_string(new_socket);
    if (string_in.empty()) {
        break;
    }
    std::vector<float> embeddings = std::vector<float>(n_embd);
    bert_encode(bctx, params.n_threads, string_in.data(), embeddings.data());
    send_floats(new_socket, embeddings);
}

xyzhang626 / embeddings.cpp Goto Github PK

embeddings.cpp's People

Contributors

Stargazers

Watchers

Forkers

embeddings.cpp's Issues

Illegal instruction (core dumped)

Cuda?

tokenizer.json seems ignored in the model conversion

Tokenizer works inconsistently but better for bge zh series model

Summary

Details

bert_encode() not thread-safe

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent