👋
language:java
path:*.properties
Converse with book - Built with GPT-3
License: MIT License
👋
language:java
path:*.properties
What happend: I installed in a new environment, set up the path like described. Except here I have a more recent version of image magic, so the IMCONV
looks like this:
%PROGRAMFILES%\ImageMagick-7.1.0-Q16-HDRI\
Windows finds magick as a command, but when I do
dr-doc-search --train -i "pdfs\my_pdf.pdf" --embedding huggingface
There is this error.
'C:\Program' is not recognized as an internal or external command,
operable program or batch file.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\user\code_folder\dr_doc_test\venv\Scripts\dr-doc-search.exe\__main__.py", line 7, in <
module>
File "C:\Users\user\code_folder\dr_doc_test\venv\Lib\site-packages\doc_search\app.py", line 68, in ma
in
run_workflow(context, training_workflow_steps())
File "C:\Users\user\code_folder\dr_doc_test\venv\Lib\site-packages\py_executable_checklist\workflow.p
y", line 36, in run_workflow
__run_step(step, context)
File "C:\Users\user\code_folder\dr_doc_test\venv\Lib\site-packages\py_executable_checklist\workflow.p
y", line 29, in __run_step
returned_context = step_instance.execute() or {}
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\code_folder\dr_doc_test\venv\Lib\site-packages\doc_search\workflow\__init__.py",
line 142, in execute
run_command(convert_command)
File "C:\Users\user\code_folder\dr_doc_test\venv\Lib\site-packages\py_executable_checklist\workflow.p
y", line 9, in run_command
return subprocess.check_output(command, shell=True).decode("utf-8") # nosemgrep
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 466, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'C:\Program Files\ImageMagick-7.1.0-Q16-HDRI\magick convert -density 150 -
trim -background white -alpha remove -quality 100 -sharpen 0x1.0 C:\Users\user\OutputDir\dr-doc-search\my_pdf\my_pdf.pdf[1] -quali
ty 100 C:\Users\user\OutputDir\dr-doc-search\my_pdf\images\outpu
t-1.png' returned non-zero exit status 1.
It seems to me, that this is the problem: 'C:\Program' is not recognized as an internal or external command, operable program or batch file.
In Windows the folder is called Program Files
and the space in the name is a reliable source of errors in my scripts. I don't really understand where it comes from in this case.
I get the error below when running dr-doc-search --train -i filename.pdf
on Windows 11.
I noticed that convert_command
is set to this:
convert_command = f"""convert -density 150 -trim -background white -alpha remove -quality 100 -sharpen 0x1.0 {input_file_page} -quality 100 {image_path}"""
I think convert might be an alias on Linux: https://linux.die.net/man/1/convert
but on Windows that alias is taken by an existing command: https://www.wikiwand.com/en/Convert_(command)
Is there a way to allow the use of magick convert
for Windows machines?
Invalid Parameter - 150
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Python311\Scripts\dr-doc-search.exe\__main__.py", line 7, in <module>
File "C:\Python311\Lib\site-packages\doc_search\app.py", line 48, in main
run_workflow(context, training_workflow_steps())
File "C:\Python311\Lib\site-packages\py_executable_checklist\workflow.py", line 36, in run_workflow
__run_step(step, context)
File "C:\Python311\Lib\site-packages\py_executable_checklist\workflow.py", line 29, in __run_step
returned_context = step_instance.execute() or {}
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\doc_search\workflow\__init__.py", line 102, in execute
run_command(convert_command)
File "C:\Python311\Lib\site-packages\py_executable_checklist\workflow.py", line 9, in run_command
return subprocess.check_output(command, shell=True).decode("utf-8") # nosemgrep
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\subprocess.py", line 465, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\subprocess.py", line 569, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'convert -density 150 -trim -background white -alpha remove -quality 100 -sharpen 0x1.0 C:\Users\iqaco\OutputDir\dr-doc-search\wind-up-bird-chronicle\wind-up-bird-chronicle.pdf[1] -quality 100 C:\Users\iqaco\OutputDir\dr-doc-search\wind-up-bird-chronicle\images\output-1.png' returned non-zero exit status 4.```
Hi, I want to suggest the feature, to use a local model from a models path.
Example
dr-doc-search --train -i my_pdf.pdf --path "my_models/ggml-model-14_0.bin"
It'd make the reuse of models easier and allow people with a restricted internet connection (company proxy in my case) to download models the way they can and use them later on.
Having that would be awesome!
I want to run the indexing on available text file instead of a pdf file. Basically bypass the Imagemagick + OCR workflow and load the text file directly.
Use VectorDBQAWithSourcesChain
and return sources along with the answer.
Check if we can get page number so that an book page image can be displayed along with the answer?
Better if we can highlight the interesting sentences
After installing your library and tried given example on my win10 machine I got following error
Invalid Parameter - 150 Traceback (most recent call last): File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Python310\Scripts\dr-doc-search.exe\__main__.py", line 7, in <module> File "C:\Python310\lib\site-packages\doc_search\app.py", line 68, in main run_workflow(context, training_workflow_steps()) File "C:\Python310\lib\site-packages\py_executable_checklist\workflow.py", line 36, in run_workflow __run_step(step, context) File "C:\Python310\lib\site-packages\py_executable_checklist\workflow.py", line 29, in __run_step returned_context = step_instance.execute() or {} File "C:\Python310\lib\site-packages\doc_search\workflow\__init__.py", line 120, in execute run_command(convert_command) File "C:\Python310\lib\site-packages\py_executable_checklist\workflow.py", line 9, in run_command return subprocess.check_output(command, shell=True).decode("utf-8") # nosemgrep File "C:\Python310\lib\subprocess.py", line 420, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "C:\Python310\lib\subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'convert -density 150 -trim -background white -alpha remove -quality 100 -sharpen 0x1.0 C:\Users\XXXX\OutputDir\dr-doc-search\parable_monetary_economy\parable_monetary_economy.pdf[1] -quality 100 C:\Users\XXXX\OutputDir\dr-doc-search\parable_monetary_economy\images\output-1.png' returned non-zero exit status 4.
Any Help to fix this issue?
Add an export button and generate markdown or pdf
It would be pleasant to be able to use different backends than OpenAI. Training a model locally, even.
openai.error.InvalidRequestError: [''] is not valid under any of the given schemas - 'input'
I received an error while trying to install this library on colab via pip install dr-doc-search
as ERROR: Could not find a version that satisfies the requirement dr-doc-search (from versions: none)
. Is there any problem with this method of installation?
dr-doc-search --web-app -i ~/Downloads/吴晓波:勇敢者的方法论.pdf --llm huggingface
error:
Traceback (most recent call last):
File "/Users/boyer/hsg/dr-doc-search/venv/bin/dr-doc-search", line 8, in <module>
sys.exit(main())
File "/Users/boyer/hsg/dr-doc-search/venv/lib/python3.10/site-packages/doc_search/app.py", line 66, in main
run_web(context)
File "/Users/boyer/hsg/dr-doc-search/venv/lib/python3.10/site-packages/doc_search/web.py", line 77, in run_web
run_workflow(global_context, inference_workflow_steps())
File "/Users/boyer/hsg/dr-doc-search/venv/lib/python3.10/site-packages/py_executable_checklist/workflow.py", line 36, in run_workflow
__run_step(step, context)
File "/Users/boyer/hsg/dr-doc-search/venv/lib/python3.10/site-packages/py_executable_checklist/workflow.py", line 29, in __run_step
returned_context = step_instance.execute() or {}
File "/Users/boyer/hsg/dr-doc-search/venv/lib/python3.10/site-packages/doc_search/workflow/__init__.py", line 254, in execute
raise FileNotFoundError(f"FAISS DB file not found: {self.faiss_db}")
FileNotFoundError: FAISS DB file not found: /Users/boyer/OutputDir/dr-doc-search/index/index.pkl
rename demo.pdf, 问答的内容和中文也不兼容吗?
hi, thanks for your work. I tried to test, but got this error:
pythondev1-ubuntu@pythondev1-ubuntu:~$ dr-doc-search --train -i ~/dr-doc-search/tests/data/kh.pdf
2023-02-07 20:57:32 - text_splitter.py:59 - Created a chunk of size 1339, which is longer than the specified 1000
"!pip install dr-doc-search" doesn’t work on Google Colab and even on my JuPyter Notebook.
This is what I get When I try the aforementioned :
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: Could not find a version that satisfies the requirement dr-doc-search (from versions: none)
ERROR: No matching distribution found for dr-doc-search
"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.