Giter Site home page Giter Site logo

jackley-dev / gpt_chat_pdf_gh Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 1.0 7 KB

利用chatgpt api和pinecone向量数据库,基于langchain和streamlit开发的本地知识库问答系统: 前端采用streamlit开发,支持本地部署; 支持在web端上传pdf文档; 支持对所上传的文档进行向量化,并存储到pinecone数据库; 支持基于数据库中的特定领域知识进行问答

Python 100.00%

gpt_chat_pdf_gh's Introduction

项目简介

利用chatgpt api和pinecone向量数据库,基于langchain和streamlit开发的本地知识库问答系统:

  • 前端采用streamlit开发,支持本地部署
  • 支持在web端上传pdf文档
  • 支持对所上传的文档进行向量化,并存储到pinecone数据库
  • 支持基于数据库中的特定领域知识进行问答

使用指南

  1. 需要在pinecone.io网站申请pinecone的试用版,获取pinecone api key及相关环境变量
  2. 更新.env中的如下参数配置,改成实际的key和环境变量
PINECONE_API_KEY='xx'
PINECONE_ENV='xx'
OPENAI_API_KEY='xx'
PINECONE_INDEX='xx'

总体思路

1. 从本地上传pdf,并进行读取和切分

  1. 使用PyPDF2库读取pdf文件
  2. 使用langchain将读取的文本切分成小段

2. 将信息向量化,并存入向量数据库

  1. 通过openai的embedding接口,将文档转化为向量
  2. 将转化后的向量存入Pinecone向量数据库

3. 在向量数据库中搜索与query相似的内容,合并投喂给gpt进行回答

  1. 利用similarity_search函数搜索与query相似的内容
  2. 利用langchain中的load_qa_chain函数,将query和查询到的相似内容作为参数传入,即可得到基于知识库的回答

演示实例:

  1. 演示地址:https://huggingface.co/spaces/jackley86/gpt_chat_pdf
  2. 由于可能有人上传大文档,演示站点的api token消耗过快,免费额度耗尽后无法继续使用向量化和问答功能,但仍可查看界面及功能设计;若要体验完整功能,建议使用个人key在本地部署

gpt_chat_pdf_gh's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

geekcheng

gpt_chat_pdf_gh's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.