Giter Site home page Giter Site logo

pernillebrams / llm_mini_series_part_ii Goto Github PK

View Code? Open in Web Editor NEW

This project forked from positivethinkingcomp/llm_mini_series_part_ii

0.0 0.0 0.0 967 KB

This is a demo repository for parallel multi-index question answering using streamlit and llama index

License: Apache License 2.0

Python 99.19% CSS 0.81%

llm_mini_series_part_ii's Introduction

Introduction

This repo contains a demo streamlit aplication with which it is possible to perform parallel question answering across multiple pdf Documents. This functionality is enabled by Llama index and Large Language Models (LLMs). The code is tested with text-davinci-003 from OpenAI with the configuration in app_config.yaml. Every OpenAI model can be used in the configuration app_config.yaml.

General

This repo provides some example pdf Documents for indexing which have been generated via ChatGPT !!! Please note that it is not legally compliant to send personal identifiable information to LLM Apis like OpenAI Make sure to test the App only with fictional CV documents, anynomize the CV documents or execute queries against a locally deployed LLM model instead of using OpenAI. PTC takes no legal responsibility for what data you send to OpenAI via this application !!!

Getting Started

  1. Download the ESCO dataset version 1.1.0 Link to ESCO Download
  • Version: ESCO dataset - v1.1.0
  • Content: classification
  • Language: en
  • File type: csv

1.1. Unzip the .csv file you get send via Email and set the path as an environment variable

  1. Setup your environment variables e.g. in an .env file

OPENAI_API_KEY = "here comes your openai api key" (example)

ESCO_NER_SEARCHTERMS= "your_path_to_esco_searchterms_skill_ner_csv/ESCO dataset - v1.1.0 - classification - en - csv/searchterms_skill_ner.csv" (example)

  1. Install Poetry Link to Poetry CLI installation tutorial

  2. Create Poetry environment and install the package

Link to Poetry Environment Management

  • In your terminal confirm that poetry is available:

poetry --version

  • Start a poetry console:

poetry shell

  • Install the package and dependencies via the pyproject.toml:

poetry install

If you cannot use Poetry for your dependency management you can alternatively install the requirements via

pip install -r requirements.txt

  1. Launch the streamlit app via the poetry shell:
  • streamlit run <your_absolute_path_2_the_app>\multi_index_demo\app.py

Overview of the application architecture

rag_overview indexing_stage multi_index_queries

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.