ulov888 / chatpdflike Goto Github PK

View Code? Open in Web Editor NEW

209.0 3.0 28.0 1.51 MB

an approximate implementation similar to chatpdf

License: Apache License 2.0

Python 46.76% CSS 12.54% JavaScript 31.90% HTML 8.81%

chatgpt chatgpt-bot chatpdf openai paperreader

chatpdflike's Introduction

chatpdflike

一个基于大语言模型API实现端到端的文档问答项目

注：本项目并非chatpdf官方开源项目，仅是揣测和复现

虽然chatpdf.com的实现代码并不开源，但是基于作者的twitter回复还是大致理顺了功能原理，主要流程原理如下：

1 、文本切割

将文本切割成一小部分，调用 openai 的 embedding 接口，返回这段文本的 embedding 的向量数据。存储这些数据，并且保存好对应关系。

2 、用户提问

将用户提的问题，调用 openai 的 embedding 接口，返回问题的向量数据。

3 、搜索向量

计算相似度,用问题的向量，在之前切割的所有向量数据里，计算和问题向量相似度最高的几个文本(余弦定理)。

4 、调用 gpt-turbo

准备合适的 prompt ，里面带上切割的文本内容，加上问题的 prompt 。

基于以上的流程，只需要开发少量的适配代码，主要功能都是由openai的接口完成，

使用指南

配置系统环境变量OPENAI_API_KEY，Ps:密钥需要自己上openai官网申请

export OPENAI_API_KEY = "XXX"

运行

python run.py

安装项目环境依赖

pip install -r requirements.txt

效果演示

chatpdflike's People

Contributors

Stargazers

Watchers

chatpdflike's Issues

Error: Request to OpenAI failed. Please try again.

undefined

When i upload my pdf and ask a question the answer is always undefined.

[performance] chat的响应速度很慢而且内容不及ChatPDF

2303.10130v3.pdf
我在ChatPDF.com y以及本地运行ChatPDFLike，问同样的问题：“把会被AI淘汰的工作机会列出来”

以下是ChatPDF的回复

很抱歉，我无法提供您所需的信息。我们的研究并没有列出会被人工智能淘汰的具体工作机会清单。相反，我们的研究旨在评估人工智能技术对整个劳动力市场的潜在影响，并探讨如何应对这些变化。如果您有其他问题，我很乐意为您提供帮助。

以下是ChatPDFLike的回复

 根据研究发现，AI技术在任务层面的曝光程度往往会在职业内部分散，因此很难找到AI工具可以完成几乎所有工作的职业。研究发现，自动化和增强曝光往往是正相关的，因此可以列出可能被AI淘汰的工作机会，包括那些需要依赖常规任务和手动任务的职业。

Page 4: 'most relevantly to this paper, devising a new rubric to assess worker activities for their suitability for machine learning (Brynjolfsson et al., 2018...

Page 17: 'This paperaims tobuild ona numberof previous empiricalstudies examiningthe occupationalexposureto advances in AI and/or automation. Previous studies h...

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 4658 tokens (3158 in your prompt; 1500 for the completion). Please reduce your prompt; or completion length.

上传入pdf 报错

127.0.0.1 - - [31/Mar/2023 11:57:23] "POST /process_pdf HTTP/1.1" 500 -
INFO:werkzeug:127.0.0.1 - - [31/Mar/2023 11:57:23] "POST /process_pdf HTTP/1.1" 500 -
Traceback (most recent call last):
File "D:\huice\chatpdflike\venv\lib\site-packages\pandas\core\indexes\base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'text'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\huice\chatpdflike\venv\lib\site-packages\flask\app.py", line 2551, in call
return self.wsgi_app(environ, start_response)
File "D:\huice\chatpdflike\venv\lib\site-packages\flask\app.py", line 2531, in wsgi_app
File "D:\huice\chatpdflike\venv\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "D:\huice\chatpdflike\venv\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "D:\huice\chatpdflike\run.py", line 24, in process_pdf
df = chatbot.paper_df(paper_text)
File "D:\huice\chatpdflike\generate_embedding.py", line 75, in paper_df
df['length'] = df['text'].apply(lambda x: len(x))
File "D:\huice\chatpdflike\venv\lib\site-packages\pandas\core\frame.py", line 3807, in getitem
indexer = self.columns.get_loc(key)
File "D:\huice\chatpdflike\venv\lib\site-packages\pandas\core\indexes\base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'text'

Error: Processing the pdf failed due to excess load. Please try again later. Check the URL if there is https:// at the beginning. If not, add it.

我这里pdf 都报错，希望能帮忙看看
量化.pdf

ulov888 / chatpdflike Goto Github PK

chatpdflike's Introduction

chatpdflike

使用指南

安装项目环境依赖

效果演示

chatpdflike's People

Contributors

Stargazers

Watchers

Forkers

chatpdflike's Issues

Error: Request to OpenAI failed. Please try again.

undefined

[performance] chat的响应速度很慢而且内容不及ChatPDF

[bug] 多轮会话导致tokens超载

上传入pdf 报错

Error: Processing the pdf failed due to excess load. Please try again later. Check the URL if there is https:// at the beginning. If not, add it.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent