Giter Site home page Giter Site logo

dicklesworthstone / swiss_army_llama Goto Github PK

View Code? Open in Web Editor NEW
911.0 911.0 50.0 7.83 MB

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

Python 98.01% Dockerfile 0.37% Shell 1.62%
embedding-similarity embedding-vectors embeddings llama2 llamacpp semantic-search

swiss_army_llama's People

Contributors

bxxd avatar dicklesworthstone avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swiss_army_llama's Issues

failed to load model

Hello @Dicklesworthstone, thank you for the incredible work, it's what I've been trying to do as well. I'm running it following the instructions, and I'm encountering an error that says it can't find the models, but they did get downloaded.

(env_llama) samuelrg@mlserverdsturing:~/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service$ python llama_2_embeddings_fastapi_server.py
/home/samuelrg/.conda/envs/env_llama/lib/python3.9/site-packages/pydantic/_internal/fields.py:127: UserWarning: Field "model_name" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
2023-08-31 03:48:09,907 - INFO - USE_RAMDISK is set to: False
INFO: Started server process [3494]
INFO: Waiting for application startup.
2023-08-31 03:48:09,966 - INFO - Initializing database, creating tables, and setting SQLite PRAGMAs...
2023-08-31 03:48:09,972 - INFO - Executed SQLite PRAGMA: PRAGMA journal_mode=WAL;
2023-08-31 03:48:09,972 - INFO - Justification: Set SQLite to use Write-Ahead Logging (WAL) mode (from default DELETE mode) so that reads and writes can occur simultaneously
2023-08-31 03:48:09,973 - INFO - Executed SQLite PRAGMA: PRAGMA synchronous = NORMAL;
2023-08-31 03:48:09,973 - INFO - Justification: Set synchronous mode to NORMAL (from FULL) so that writes are not blocked by reads
2023-08-31 03:48:09,974 - INFO - Executed SQLite PRAGMA: PRAGMA cache_size = -1048576;
2023-08-31 03:48:09,974 - INFO - Justification: Set cache size to 1GB (from default 2MB) so that more data can be cached in memory and not read from disk; to make this 256MB, set it to -262144 instead
2023-08-31 03:48:09,975 - INFO - Executed SQLite PRAGMA: PRAGMA busy_timeout = 2000;
2023-08-31 03:48:09,976 - INFO - Justification: Increase the busy timeout to 2 seconds so that the database waits
2023-08-31 03:48:09,977 - INFO - Executed SQLite PRAGMA: PRAGMA wal_autocheckpoint = 100;
2023-08-31 03:48:09,977 - INFO - Justification: Set the WAL autocheckpoint to 100 (from default 1000) so that the WAL file is checkpointed more frequently
2023-08-31 03:48:09,984 - INFO - Database initialization completed.
2023-08-31 03:48:09,984 - INFO - Initializing process of creating set of input hash/model_name combinations that are either currently being processed or have already been processed...
2023-08-31 03:48:10,025 - INFO - Checking models directory...
2023-08-31 03:48:10,025 - INFO - Models directory exists: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models
2023-08-31 03:48:10,025 - INFO - File already exists: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/llama2_7b_chat_uncensored.ggmlv3.q3_K_L.bin
2023-08-31 03:48:10,025 - INFO - File already exists: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_L.bin
2023-08-31 03:48:10,026 - INFO - File already exists: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/ggml-model-f32.bin
2023-08-31 03:48:10,026 - INFO - Model downloads completed.
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/llama2_7b_chat_uncensored.ggmlv3.q3_K_L.bin

llama_load_model_from_file: failed to load model
2023-08-31 03:48:10,027 - ERROR - Exception occurred while loading the model: 1 validation error for LlamaCppEmbeddings
root
Could not load Llama model from path: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/llama2_7b_chat_uncensored.ggmlv3.q3_K_L.bin. Received error (type=value_error)
2023-08-31 03:48:10,028 - ERROR - No model file found matching: llama2_7b_chat_uncensored.ggmlv3.q3_K_L.bin
gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_L.bin

llama_load_model_from_file: failed to load model
2023-08-31 03:48:10,029 - ERROR - Exception occurred while loading the model: 1 validation error for LlamaCppEmbeddings
root
Could not load Llama model from path: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_L.bin. Received error (type=value_error)
2023-08-31 03:48:10,029 - ERROR - No model file found matching: wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_L.bin
gguf_init_from_file: invalid magic number 67676d6c
error loading model: llama_model_loader: failed to load model from /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/ggml-model-f32.bin

llama_load_model_from_file: failed to load model
2023-08-31 03:48:10,030 - ERROR - Exception occurred while loading the model: 1 validation error for LlamaCppEmbeddings
root
Could not load Llama model from path: /home/samuelrg/TEXT_ANALYSIS/API_LLAMA_EMBEDDING/llama_embeddings_fastapi_service/models/ggml-model-f32.bin. Received error (type=value_error)
2023-08-31 03:48:10,030 - ERROR - No model file found matching: ggml-model-f32.bin
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8089/ (Press CTRL+C to quit)

disable cache

Hi,
Screenshot from 2023-09-15 20-08-21
I am trying to see if llama embedding is dates aware. The sberts are obviously not, however llama chat is able to derive absolute dates from relative+absolute dates. This gave me hope and I wanted to give llama embedding models a try.
From the look of things my question is cached and the return is not what I expected. May I ask if you have any insight on this?

Install Error in venv

I'm getting the following error when running pip install -r requirements.txt

Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [26 lines of output]
*** scikit-build-core 0.9.2 using CMake 3.29.2 (wheel)
*** Configuring CMake...
2024-04-23 17:40:22,008 - scikit_build_core - WARNING - libdir/ldlibrary: /usr/lib/x86_64-linux-gnu/libpython3.10.so is not a real file!
2024-04-23 17:40:22,008 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/usr/lib/x86_64-linux-gnu, ldlibrary=libpython3.10.so, multiarch=x86_64-linux-gnu, masd=x86_64-linux-gnu
loading initial cache file /tmp/tmpxydgod13/build/CMakeInit.txt
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:3 (project):
No CMAKE_C_COMPILER could be found.

    Tell CMake where to find the compiler by setting either the environment
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
    the compiler, or to the compiler name if it is in the PATH.
  
  
  CMake Error at CMakeLists.txt:3 (project):
    No CMAKE_CXX_COMPILER could be found.
  
    Tell CMake where to find the compiler by setting either the environment
    variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
    to the compiler, or to the compiler name if it is in the PATH.
  
  
  -- Configuring incomplete, errors occurred!
  
  *** CMake configuration failed
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

Small question: average hidden states for sentence embedding or last token embedding?

Hi, thanks for the great work!
I checked the code but I don't find the description and code when extracting sentence embedding from llama(or llama2) model.
I'm curious about
Q1. how you extract the sentence embedding? and
Q2. you take average sentence embedding or last token embedding in the process.

For extract the sentence embedding, I'm using
output = model.output(**input, output_hidden_states=True) sentence_embedding = output .hidden_states[-1].mean, or output .hidden_states[-1][:,-1,:]
I don't know the difference of these two.
I'd appreciate if you can share some knowledge on it!

It requires 40G of RAM to work?

I'm on a macbook air m2 16GB.

I tried running it with python locally in this line from README:
To run it natively (not using Docker) in a Python venv, you can use these commands:

A few times it failed with missing packages like greenlet etc but the error now it gives is regarding the RAM allocation.

image

as visible, it says:
raise ValueError(f"Cannot allocate {RAMDISK_SIZE_IN_GB}G for RAM Disk. Total system RAM is {total_ram_gb:.2f}G.") ValueError: Cannot allocate 40G for RAM Disk. Total system RAM is 16.00G.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.