Giter Site home page Giter Site logo

chatpdflike's Introduction

chatpdflike

一个基于大语言模型API实现端到端的文档问答项目

注:本项目并非chatpdf官方开源项目,仅是揣测和复现

虽然chatpdf.com的实现代码并不开源,但是基于作者的twitter回复还是大致理顺了功能原理,主要流程原理如下:

1 、文本切割

将文本切割成一小部分,调用 openai 的 embedding 接口,返回这段文本的 embedding 的向量数据。存储这些数据,并且保存好对应关系。

2 、用户提问

将用户提的问题,调用 openai 的 embedding 接口,返回问题的向量数据。

3 、搜索向量

计算相似度,用问题的向量,在之前切割的所有向量数据里,计算和问题向量相似度最高的几个文本(余弦定理)。

4 、调用 gpt-turbo

准备合适的 prompt ,里面带上切割的文本内容,加上问题的 prompt 。

基于以上的流程,只需要开发少量的适配代码,主要功能都是由openai的接口完成,

使用指南

  1. 配置系统环境变量OPENAI_API_KEY,Ps:密钥需要自己上openai官网申请
export OPENAI_API_KEY = "XXX"  
  1. 运行
python run.py 

安装项目环境依赖

pip install -r requirements.txt

效果演示

chatpdflike's People

Contributors

ulov888 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

chatpdflike's Issues

undefined

When i upload my pdf and ask a question the answer is always undefined.

[performance] chat的响应速度很慢而且内容不及ChatPDF

2303.10130v3.pdf
我在ChatPDF.com y以及本地运行ChatPDFLike,问同样的问题:“把会被AI淘汰的工作机会列出来”

以下是ChatPDF的回复

很抱歉,我无法提供您所需的信息。我们的研究并没有列出会被人工智能淘汰的具体工作机会清单。相反,我们的研究旨在评估人工智能技术对整个劳动力市场的潜在影响,并探讨如何应对这些变化。如果您有其他问题,我很乐意为您提供帮助。

以下是ChatPDFLike的回复

 根据研究发现,AI技术在任务层面的曝光程度往往会在职业内部分散,因此很难找到AI工具可以完成几乎所有工作的职业。研究发现,自动化和增强曝光往往是正相关的,因此可以列出可能被AI淘汰的工作机会,包括那些需要依赖常规任务和手动任务的职业。

Page 4: 'most relevantly to this paper, devising a new rubric to assess worker activities for their suitability for machine learning (Brynjolfsson et al., 2018...

Page 17: 'This paperaims tobuild ona numberof previous empiricalstudies examiningthe occupationalexposureto advances in AI and/or automation. Previous studies h...

[bug] 多轮会话导致tokens超载

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4658 tokens (3158 in your prompt; 1500 for the completion). Please reduce your prompt; or completion length.

上传入pdf 报错

127.0.0.1 - - [31/Mar/2023 11:57:23] "POST /process_pdf HTTP/1.1" 500 -
INFO:werkzeug:127.0.0.1 - - [31/Mar/2023 11:57:23] "POST /process_pdf HTTP/1.1" 500 -
Traceback (most recent call last):
File "D:\huice\chatpdflike\venv\lib\site-packages\pandas\core\indexes\base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'text'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\huice\chatpdflike\venv\lib\site-packages\flask\app.py", line 2551, in call
return self.wsgi_app(environ, start_response)
File "D:\huice\chatpdflike\venv\lib\site-packages\flask\app.py", line 2531, in wsgi_app
File "D:\huice\chatpdflike\venv\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "D:\huice\chatpdflike\venv\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\huice\chatpdflike\run.py", line 24, in process_pdf
df = chatbot.paper_df(paper_text)
File "D:\huice\chatpdflike\generate_embedding.py", line 75, in paper_df
df['length'] = df['text'].apply(lambda x: len(x))
File "D:\huice\chatpdflike\venv\lib\site-packages\pandas\core\frame.py", line 3807, in getitem
indexer = self.columns.get_loc(key)
File "D:\huice\chatpdflike\venv\lib\site-packages\pandas\core\indexes\base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'text'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.