Giter Site home page Giter Site logo

nanika's Introduction

Get branch ollama-rag from Nanika repo

The branch focus the use of ollama and langchain to RAG from your documents. (see list of supported file extensions at the end)

cd $HOME
git clone -b ollama-rag --single-branch [email protected]:W-Wuxian/NANIKA.git

Install ollama

After a successful installation run:

ollama pull nomic-embed-text 
ollama pull phi3
ollama list

nomic-embed-text is mandatory but phi3 can be replaced with any model name at (ollama.com/library)[https://ollama.com/library]

Activate langchain dependencies:

After installing ollama materials you need to do the following:

conda env create -f langchain_rag_env.yml
conda activate langchain_rag_env
pip install "unstructured[all-docs]"
pip install chromadb langchain-text-splitters
conda install conda-forge::pytesseract
conda install conda-forge::tesseract

Alternative using Python-venv

python -m venv langchain_rag_venv
pip install --upgrade unstructured langchain "unstructured[all-docs]"
pip install --upgrade chromadb langchain-text-splitters
pip install --upgrade pytesseract
pip install --upgrade tesseract

Running the code:

Once ollama and langchain stuff are done (see previous sections) you can use RAG. Here is two python scripts nanika.py to do so.

Creating a database and QA loop

The nanika.py script is used to create a database from your documents as follow:

python nanika.py --help
options or long_options are:
-m or --model_name model name
-e or --embedding_name embedding name
-i or --inputdocs_path  path given between " " to folders or files to be used at RAG step
-v or --vdb_path vector data base path
-c or --collection_name collection name
-r or --reuse reuse previous vdb and collection
-d or --display-doc whether or not to display given documents

So for example using phi3 llm model, with nomic-embed-text as an embedding model to create a database from my documents at /path/to/my/folder/ one can use the following command:

python nanika.py -m phi3 -e nomic-embed-text -i "/path/to/my/folder1 /path/to/my/folder2 /path/to/my/file1"

In order to run several database we need to specify the database storing location via -v and the collection name via -c, as follow:

python nanika.py -m phi3 -e nomic-embed-text -i /path/to/my/folder1/ -v ./database1 -c collection1
python nanika.py -m phi3 -e nomic-embed-text -i /path/to/my/folder2/ -v ./database2 -c collection2

The nanika.py script will also ask you to enter questions (RAG), to end this phase enter q or quit.

Reusing a database and QA loop

To reuse a database you need the corresponding -v and -c and run the nanika.py script with -r True as follow:

python nanika.py -m phi3 -e nomic-embed-text -v ./database1 -c collection1 -r True
python nanika.py -m phi3 -e nomic-embed-text -v ./database2 -c collection2 -r True

File extension coverage

file extension Coverage
pdf ✔️
txt ✔️
py ✔️
png ✔️
jpg ✔️
xlsx ✔️
xls ✔️
odt ✔️
csv ✔️
pptx ✔️
md ✔️
org ✔️

nanika's People

Contributors

w-wuxian avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.