Giter Site home page Giter Site logo

openai-app-poc's Introduction

openai-app-poc

python flask app that uses llama index and langchain to build a vector index and query an LLM (Azure OpenAI).

how-to

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt --upgrade

Might need to run Ctrl+Shift+P in VSCode, type Python: Create environment... and follow instructions if needed.

To run the application simply do flask --debug --app app run --port=5001

sending a query

Only the body.query parameter is mandatory, other fields are optionals and have reasonable defaults.

curl --location 'http://127.0.0.1:5000/query' \
--header 'Content-Type: application/json' \
--data '{
    "query": "What is the ITSM training mailbox email address? I want the email address with the ampersand in it.",
    "temp": 0.7,
    "k": 3
}'

build image and run it

docker build -t scdcciodtoopenaipoccontainerregistry.azurecr.io/app:3.0.2 .
docker push scdcciodtoopenaipoccontainerregistry.azurecr.io/app:3.0.2
# then you can run it via 
docker run -it --env-file .env scdcciodtoopenaipoccontainerregistry.azurecr.io/openai-app-poc:3.0.2

troubleshooting

If you ever get an error like this one while building/loading the vector index:

2023-04-18T13:22:20.948654585Z     type = docstore_dict[TYPE_KEY]
2023-04-18T13:22:20.948658985Z KeyError: '__type__'

It is mostlikely because the index wasn't build with the same package version it is being loaded. Simply rebuild the index with the most up to date version of the packages and re-load it.

documentation

docker for wsl

I use Windows 10 with WSL 2+. I installed docker following the instructions on their site.

You will need JIT admin rights and since the install will put the admin account inside docker-users you will have to add your own domain account to that group after that.

To do so open powershell (as admin):

whoami
net localgroup "docker-users" "<your username>" /add

You should see something like Command completed successfully then logout and then you can start Docker Desktop.

Using Azure Cognitive Search

https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/app.py

building a chatbot

https://gpt-index.readthedocs.io/en/latest/guides/tutorials/building_a_chatbot.html

Installing Python 3.11 on Ubuntu 22.04

https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11 python3.11-venv
python3 -V

documentation

openai-app-poc's People

Contributors

guillaumeturcotte avatar sanjeeveasparan avatar

Watchers

 avatar Kostas Georgiou avatar

openai-app-poc's Issues

add logging (to a file)

We need to start logging requests (at least for the /query endpoint for now).

Info we need to track and log to a file for each request:

  • UserID (ideally we need to get it from Bot Composer, need to do research for that one)
  • query/answer/sources
  • ip
  • etc.

Add metadata when indexing documents

We should re-index and add metadata in the documents via document.extra_info = <your dict>.

We can, when reading in documents for the index do something similar:

<loop>
document = Document(page.content)
document.extra_info = page.metadata
documents.append(document
...

Shoutout to the good folks on the llama_index discord for helping me out.

add a way to return context with filename/url in metadata

For the frontend we need a way to populate the supporting content panel.

Define a flag that can be used (like the pretty flag) that will add this format to metadata ..

{
    ...
    "metadata": [
        "https://plus.ssc-spc.gc.ca/en/page/step-1-hire-your-candidate: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce eget justo vitae lacus placerat sollicitudin. Sed ut nulla non urna hendrerit accumsan. Nunc nec mi magna. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Nulla fringilla venenatis nisi, eu auctor turpis euismod nec",
        "https://plus.ssc-spc.gc.ca/en/page/human-resources-and-workplace: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce eget justo vitae lacus placerat sollicitudin. Sed ut nulla non urna hendrerit accumsan. Nunc nec mi magna. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Nulla fringilla venenatis nisi, eu auctor turpis euismod nec"
    ],
    ...
}

So filename/url: supporting_text

history tweaks

add small tweaks and improvements to the history functionality (ie: allow it to live in context prompts but not embeddings)

Add feedback endpoint

Add a /feedback endpoint to the api and allow users to pass a comment and a conversation id or some reference to chatbot flow, id is not yet implemented, this might be something we have to come up with. This will be invoked directly by the Bot Composer project.

index metadata needs updating

The new indexes that will be pulled out via SSCPlus data fetch have a different set of metadata, updated to reflect SSC Plus data only for now.

acceptance criteria

  • update metadata returned in response
  • test new indexed data load properly and returns good results.

figure out issues with gunicorn/flask and python3.11

it seems there is an issue with Docker and the app service. Python3.11 seems to run just fine locally in docker or directly on OS but in the Azure within app it used to run just fine with the 0.5 cpu and 1gb ram settings now I need to crank it all the way to the max (2 cpu and 4gb of ram) else it wont even start...

Improve chatbot results

We have a pretty good base for the chatbot now is the time to start playing around with various parameters to improve the results that are given back to us by the chatbot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.