dicklesworthstone / swiss_army_llama Goto Github PK

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

Python 98.01% Dockerfile 0.37% Shell 1.62%

embedding-similarity embedding-vectors embeddings llama2 llamacpp semantic-search

swiss_army_llama's People

Contributors

Stargazers

Watchers

Forkers

iongpt fgdumitru whoyoufun protagora j2deen elundaeva dmillner thearchiver anguyen777 mhassaninmsft goswamig sjehan k2m5t2 hbcbh1999 rastuszhang kyrolabs techthiyanes dinobarton ailabteam noelo joshjud sorokinvld prectechadmin hansweytjens bxxd rodrigomasiniai bet0x eyeimmanuel duytrieu1999 bhardwajrahul zeroxclem mbrukman takoko1118 natestraub 5l1v3r1 nextuserlm mivanovitch eptacek segmond tanvirhundredone karthikra prowlee arrietafernando mrdnash ltcevil gpu-net acumenix asifshaikat fastdaima nivir

swiss_army_llama's Issues

failed to load model

Hello @Dicklesworthstone, thank you for the incredible work, it's what I've been trying to do as well. I'm running it following the instructions, and I'm encountering an error that says it can't find the models, but they did get downloaded.

(env_llama) samuelrg@mlserverdsturing:~/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service$ python llama_2_embeddings_fastapi_server.py
/home/samuelrg/.conda/envs/env_llama/lib/python3.9/site-packages/pydantic/_internal/fields.py:127: UserWarning: Field "model_name" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
2023-08-31 03:48:09,907 - INFO - USE_RAMDISK is set to: False
INFO: Started server process [3494]
INFO: Waiting for application startup.
2023-08-31 03:48:09,966 - INFO - Initializing database, creating tables, and setting SQLite PRAGMAs...
2023-08-31 03:48:09,972 - INFO - Executed SQLite PRAGMA: PRAGMA journal_mode=WAL;
2023-08-31 03:48:09,972 - INFO - Justification: Set SQLite to use Write-Ahead Logging (WAL) mode (from default DELETE mode) so that reads and writes can occur simultaneously
2023-08-31 03:48:09,973 - INFO - Executed SQLite PRAGMA: PRAGMA synchronous = NORMAL;
2023-08-31 03:48:09,973 - INFO - Justification: Set synchronous mode to NORMAL (from FULL) so that writes are not blocked by reads
2023-08-31 03:48:09,974 - INFO - Executed SQLite PRAGMA: PRAGMA cache_size = -1048576;
2023-08-31 03:48:09,974 - INFO - Justification: Set cache size to 1GB (from default 2MB) so that more data can be cached in memory and not read from disk; to make this 256MB, set it to -262144 instead
2023-08-31 03:48:09,975 - INFO - Executed SQLite PRAGMA: PRAGMA busy_timeout = 2000;
2023-08-31 03:48:09,976 - INFO - Justification: Increase the busy timeout to 2 seconds so that the database waits
2023-08-31 03:48:09,977 - INFO - Executed SQLite PRAGMA: PRAGMA wal_autocheckpoint = 100;
2023-08-31 03:48:09,977 - INFO - Justification: Set the WAL autocheckpoint to 100 (from default 1000) so that the WAL file is checkpointed more frequently
2023-08-31 03:48:09,984 - INFO - Database initialization completed.
2023-08-31 03:48:09,984 - INFO - Initializing process of creating set of input hash/model_name combinations that are either currently being processed or have already been processed...
2023-08-31 03:48:10,025 - INFO - Checking models directory...
2023-08-31 03:48:10,025 - INFO - Models directory exists: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models
2023-08-31 03:48:10,025 - INFO - File already exists: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/llama2_7b_chat_uncensored.ggmlv3.q3_K_L.bin
2023-08-31 03:48:10,025 - INFO - File already exists: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_L.bin
2023-08-31 03:48:10,026 - INFO - File already exists: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/ggml-model-f32.bin
2023-08-31 03:48:10,026 - INFO - Model downloads completed.
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/llama2_7b_chat_uncensored.ggmlv3.q3_K_L.bin

llama_load_model_from_file: failed to load model
2023-08-31 03:48:10,027 - ERROR - Exception occurred while loading the model: 1 validation error for LlamaCppEmbeddings
root
Could not load Llama model from path: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/llama2_7b_chat_uncensored.ggmlv3.q3_K_L.bin. Received error (type=value_error)
2023-08-31 03:48:10,028 - ERROR - No model file found matching: llama2_7b_chat_uncensored.ggmlv3.q3_K_L.bin
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_L.bin

llama_load_model_from_file: failed to load model
2023-08-31 03:48:10,029 - ERROR - Exception occurred while loading the model: 1 validation error for LlamaCppEmbeddings
root
Could not load Llama model from path: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_L.bin. Received error (type=value_error)
2023-08-31 03:48:10,029 - ERROR - No model file found matching: wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_L.bin
gguf_init_from_file: invalid magic number 67676d6c
error loading model: llama_model_loader: failed to load model from /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/ggml-model-f32.bin

llama_load_model_from_file: failed to load model
2023-08-31 03:48:10,030 - ERROR - Exception occurred while loading the model: 1 validation error for LlamaCppEmbeddings
root
Could not load Llama model from path: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/ggml-model-f32.bin. Received error (type=value_error)
2023-08-31 03:48:10,030 - ERROR - No model file found matching: ggml-model-f32.bin
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8089/ (Press CTRL+C to quit)

disable cache

Hi,

I am trying to see if llama embedding is dates aware. The sberts are obviously not, however llama chat is able to derive absolute dates from relative+absolute dates. This gave me hope and I wanted to give llama embedding models a try.
From the look of things my question is cached and the return is not what I expected. May I ask if you have any insight on this?

Install Error in venv

I'm getting the following error when running pip install -r requirements.txt

Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [26 lines of output]
*** scikit-build-core 0.9.2 using CMake 3.29.2 (wheel)
*** Configuring CMake...
2024-04-23 17:40:22,008 - scikit_build_core - WARNING - libdir/ldlibrary: /usr/lib/x86_64-linux-gnu/libpython3.10.so is not a real file!
2024-04-23 17:40:22,008 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/usr/lib/x86_64-linux-gnu, ldlibrary=libpython3.10.so, multiarch=x86_64-linux-gnu, masd=x86_64-linux-gnu
loading initial cache file /tmp/tmpxydgod13/build/CMakeInit.txt
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:3 (project):
No CMAKE_C_COMPILER could be found.

    Tell CMake where to find the compiler by setting either the environment
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
    the compiler, or to the compiler name if it is in the PATH.
  
  
  CMake Error at CMakeLists.txt:3 (project):
    No CMAKE_CXX_COMPILER could be found.
  
    Tell CMake where to find the compiler by setting either the environment
    variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
    to the compiler, or to the compiler name if it is in the PATH.
  
  
  -- Configuring incomplete, errors occurred!
  
  *** CMake configuration failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

.lkb

Small question: average hidden states for sentence embedding or last token embedding?

Hi, thanks for the great work!
I checked the code but I don't find the description and code when extracting sentence embedding from llama(or llama2) model.
I'm curious about
Q1. how you extract the sentence embedding? and
Q2. you take average sentence embedding or last token embedding in the process.

For extract the sentence embedding, I'm using
output = model.output(**input, output_hidden_states=True) sentence_embedding = output .hidden_states[-1].mean, or output .hidden_states[-1][:,-1,:]
I don't know the difference of these two.
I'd appreciate if you can share some knowledge on it!

It requires 40G of RAM to work?

I'm on a macbook air m2 16GB.

I tried running it with python locally in this line from README:
To run it natively (not using Docker) in a Python venv, you can use these commands:

A few times it failed with missing packages like greenlet etc but the error now it gives is regarding the RAM allocation.

as visible, it says:
raise ValueError(f"Cannot allocate {RAMDISK_SIZE_IN_GB}G for RAM Disk. Total system RAM is {total_ram_gb:.2f}G.") ValueError: Cannot allocate 40G for RAM Disk. Total system RAM is 16.00G.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.