Giter Site home page Giter Site logo

kennethleungty / llama-2-open-source-llm-cpu-inference Goto Github PK

View Code? Open in Web Editor NEW
929.0 13.0 205.0 4.63 MB

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Home Page: https://towardsdatascience.com/running-llama-2-on-cpu-inference-for-document-q-a-3d636037a3d8

License: MIT License

Python 100.00%
cpu cpu-inference deep-learning faiss langchain large-language-models llm machine-learning natural-language-processing nlp

llama-2-open-source-llm-cpu-inference's Introduction

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Clearly explained guide for running quantized open-source LLM applications on CPUs using LLama 2, C Transformers, GGML, and LangChain

Step-by-step guide on TowardsDataScience: https://towardsdatascience.com/running-llama-2-on-cpu-inference-for-document-q-a-3d636037a3d8


Context

  • Third-party commercial large language model (LLM) providers like OpenAI's GPT4 have democratized LLM use via simple API calls.
  • However, there are instances where teams would require self-managed or private model deployment for reasons like data privacy and residency rules.
  • The proliferation of open-source LLMs has opened up a vast range of options for us, thus reducing our reliance on these third-party providers. 
  • When we host open-source LLMs locally on-premise or in the cloud, the dedicated compute capacity becomes a key issue. While GPU instances may seem the obvious choice, the costs can easily skyrocket beyond budget.
  • In this project, we will discover how to run quantized versions of open-source LLMs on local CPU inference for document question-and-answer (Q&A).

    Alt text

Quickstart

  • Ensure you have downloaded the GGML binary file from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML and placed it into the models/ folder
  • To start parsing user queries into the application, launch the terminal from the project directory and run the following command: poetry run python main.py "<user query>"
  • For example, poetry run python main.py "What is the minimum guarantee payable by Adidas?"
  • Note: Omit the prepended poetry run if you are NOT using Poetry

    Alt text

Tools

  • LangChain: Framework for developing applications powered by language models
  • C Transformers: Python bindings for the Transformer models implemented in C/C++ using GGML library
  • FAISS: Open-source library for efficient similarity search and clustering of dense vectors.
  • Sentence-Transformers (all-MiniLM-L6-v2): Open-source pre-trained transformer model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search.
  • Llama-2-7B-Chat: Open-source fine-tuned Llama 2 model designed for chat dialogue. Leverages publicly available instruction datasets and over 1 million human annotations.
  • Poetry: Tool for dependency management and Python packaging

Files and Content

  • /assets: Images relevant to the project
  • /config: Configuration files for LLM application
  • /data: Dataset used for this project (i.e., Manchester United FC 2022 Annual Report - 177-page PDF document)
  • /models: Binary file of GGML quantized LLM model (i.e., Llama-2-7B-Chat)
  • /src: Python codes of key components of LLM application, namely llm.py, utils.py, and prompts.py
  • /vectorstore: FAISS vector store for documents
  • db_build.py: Python script to ingest dataset and generate FAISS vector store
  • main.py: Main Python script to launch the application and to pass user query via command line
  • pyproject.toml: TOML file to specify which versions of the dependencies used (Poetry)
  • requirements.txt: List of Python dependencies (and version)

References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.