Giter Site home page Giter Site logo

pdftools's Issues

Update to latest PyPDF2

I've just noticed that pdftools exists and is used via StackOverflow :-)

You might want to update it to the latest version of PyPDF2 soon, as we are going to deprecate a lot of methods / classes. If you want, I can help you to move.

It might also be interesting to to move pdftools to the https://github.com/py-pdf organization. There might also be some overlap in project goals with pdfly, but pdftools is more mature.

PyPDF2 dependency error

pdftools currently errors for me when I did a fresh install with pipx and wanted to split some files:

$ pdftools split some_file.pdf -q 12 9
Traceback (most recent call last):
  File "~/.local/bin/pdftools", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/pdftools/_cli.py", line 286, in main
    pdf_split(ARGS.src, ARGS.output, ARGS.stepsize, ARGS.sequence)
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/pdftools/pdftools.py", line 157, in pdf_split
    reader = PdfFileReader(inputfile)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/PyPDF2/_reader.py", line 1974, in __init__
    deprecation_with_replacement("PdfFileReader", "PdfReader", "3.0.0")
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/PyPDF2/_utils.py", line 369, in deprecation_with_replacement
    deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/PyPDF2/_utils.py", line 351, in deprecation
    raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.

This seems related to #17, because while requirements.txt pins PyPDF2 to version 1.25.1, setup.py only lists install_requires=["PyPdf2"],

Are there any plans to update this package to the new usage?
Otherwise the simple fix would be to pin the version of PyPDF2 in setup.py,

Manually reinstalling in a fresh environment with the pinned PyPDF2 version still works:

$ pip install pdftools PyPDF2==1.25.1
Collecting pdftools
  Using cached pdftools-2.0.2-py3-none-any.whl
Collecting PyPDF2==1.25.1
  Downloading PyPDF2-1.25.1.tar.gz (194 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.2/194.2 kB 3.4 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Installing collected packages: PyPDF2, pdftools
  DEPRECATION: PyPDF2 is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  Running setup.py install for PyPDF2 ... done
Successfully installed PyPDF2-1.25.1 pdftools-2.0.2
$ pdftools split some_file.pdf -q 12 9 # Success

easy fixes for PyPDF2 3.0.0 support

On my system pdftools is installed at:

/usr/local/lib/python3.8/site-packages/pdftools/pdftools.py

If I edit this file as follows:

s/PdfFileReader/PdfReader/g
s/PdfFileWriter/PdfWriter/g
s/addPage/add_page/g

then the deprecation errors I get from PyPDF2 3.0.0 go away and things work as expected when merging PDFs (so far). There might be other updates required, of course, but hopefully find and replace is sufficient.

Note: there is an open pull request that also includes these fixes, as well as a move from PyPDF2 to pypdf. So the point of this issue is just to document a quick local workaround.

pdftools merge fails for some PDFs

When I try to merge a PDF of a Virgin Mobile phone bill, it crashes on Windows 7 / Cygwin.

$ pdftools merge -o test.pdf virgin.pdf
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 229, in new
return decimal.Decimal.new(cls, utils.str_(value), context)
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/pdftools", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/pdftools/_cli.py", line 274, in main
pdf_merge(ARGS.src, ARGS.output, ARGS.delete)
File "/usr/local/lib/python3.8/site-packages/pdftools/pdftools.py", line 42, in pdf_merge
writer.write(outputfile)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 586, in _sweepIndirectReferences
newobj = self._sweepIndirectReferences(externMap, newobj)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 579, in readFromStream
value = readObject(stream, pdf)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 92, in readObject
return NumberObject.readFromStream(stream)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 271, in readFromStream
return FloatObject(num)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 231, in new
return decimal.Decimal.new(cls, str(value))
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

$ pip list
Package Version


pdftools 2.0.2
pip 21.3.1
PyPDF2 1.26.0
setuptools 59.1.1

If I go into Adobe, optimize the PDF, and save to a new file, then there are no problems. Do you have any suggestions about how to handle this from the command line? I wish I had a PDF to send without tons of private information.

Reorder booklet

Hello,

I tried to use pdftools/pdfposter to split a scanned image of an A4 booklet. The booklet has been unstapled and scanned like double sided A3. I used pdfposter -p 2x1A4 on it. The document got perfectly split into individual pages. The last step needed is to reorder the booklet into the original page order (1,2,3...). It would be great if you could add this function to the package.

Regards

Jiri

Broken executables in a virtual environment?

I used pip to install pdftools in a python3 virtual environment, and the executables are broken. They all have a #! line that reads #!/usr/bin/python3, when it really ought to read (in my case) #!/home/grant/Envs/py36/bin/python, instead. Alternatively, #!/usr/bin/env python3 would work, too.

In the past I've used setuptools with a 'console_scripts' key in the 'entry_points' dictionary, and all of that has worked automatically for me, but you might have a better solution.

In any event, thanks for a nice tool! It's quite handy.

copy crashes with AttributeError

I have a fresh install of pdftools v 2.0.2 (installed via pipx in an isolated venv on python3.10). Trying to get started, I first found #11 also with this line:

pdftools copy --output start.pdf --pages "1-33" src.pdf

I got:

usage: pdftools copy [-h] [-o OUTPUT] [-p PAGES [PAGES ...]] [-y] src
pdftools copy: error: the following arguments are required: src

So I tried with the -y:

pdftools copy --output start.pdf --pages "1-33" -y src.pdf

and:

Traceback (most recent call last):
  File "/Users/mjf/.local/bin/pdftools", line 8, in <module>
    sys.exit(main())
  File "/Users/mjf/.local/pipx/venvs/pdftools/lib/python3.10/site-packages/pdftools/_cli.py", line 266, in main
    pdf_copy(ARGS.input, ARGS.output, ARGS.pages, ARGS.y)
AttributeError: 'Namespace' object has no attribute 'input'

Even if one removes all the "optional" command line args, this crashes:

pdftools copy src.pdf                Mon Jan 17 15:29:53 2022
Traceback (most recent call last):
  File "/Users/mjf/.local/bin/pdftools", line 8, in <module>
    sys.exit(main())
  File "/Users/mjf/.local/pipx/venvs/pdftools/lib/python3.10/site-packages/pdftools/_cli.py", line 266, in main
    pdf_copy(ARGS.input, ARGS.output, ARGS.pages, ARGS.y)
AttributeError: 'Namespace' object has no attribute 'input'

Consider use a universal endline: Unix

I installed from pip install pdftools and pdfmerge.py has ^M endlines (dos). Would be nice if this works without I need to fix that. Version used: 1.1.0.

ValueError: invalid literal for int() with base 10: b''

I got an error while mergng pdfs in Python 3.

File "C:\Python34\lib\site-packages\PyPDF2\utils.py", line 149, in getitem
len_self = len(self)
File "C:\Python34\lib\site-packages\PyPDF2\utils.py", line 140, in len
return self.lengthFunction()
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 983, in getNumPages
self._flatten()
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1281, in _flatten
pages = catalog["/Pages"].getObject()
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 501, in getitem
return dict.getitem(self, key).getObject()
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 177, in getObject
return self.pdf.getObject(self).getObject()
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1366, in getObject
retval = self._getObjectFromStream(indirectReference)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1320, in _getObjectFromStream
objnum = NumberObject.readFromStream(streamData)
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 264, in readFromStream
return NumberObject(num)
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 251, in new
return int.new(cls, value)
ValueError: invalid literal for int() with base 10: b''

Syntax error during installation process

Not sure if it affects anything

python setup.py install

root@dev-alex-02:/root/pdftools-1.0.8 # ./setup.py install
: No such file or directory

root@dev-alex-02:/root/pdftools-1.0.8 # python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to pdftools.egg-info/requires.txt
writing pdftools.egg-info/PKG-INFO
writing top-level names to pdftools.egg-info/top_level.txt
writing dependency_links to pdftools.egg-info/dependency_links.txt
reading manifest file 'pdftools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'pdftools.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/pdftools
copying pdftools/init.py -> build/lib/pdftools
copying pdftools/pdftools.py -> build/lib/pdftools
copying pdftools/parseutil.py -> build/lib/pdftools
copying pdftools/_version.py -> build/lib/pdftools
UPDATING build/lib/pdftools/_version.py
set build/lib/pdftools/_version.py to '1.0.8'
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/pdftools
copying build/lib/pdftools/init.py -> build/bdist.linux-x86_64/egg/pdftools
copying build/lib/pdftools/pdftools.py -> build/bdist.linux-x86_64/egg/pdftools
copying build/lib/pdftools/parseutil.py -> build/bdist.linux-x86_64/egg/pdftools
copying build/lib/pdftools/_version.py -> build/bdist.linux-x86_64/egg/pdftools
byte-compiling build/bdist.linux-x86_64/egg/pdftools/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/pdftools/pdftools.py to pdftools.pyc
File "build/bdist.linux-x86_64/egg/pdftools/pdftools.py", line 15
def pdf_merge(inputs: [str], output: str, delete: bool=False):
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/pdftools/parseutil.py to parseutil.pyc
File "build/bdist.linux-x86_64/egg/pdftools/parseutil.py", line 20
def parse_rangearg(args, max_: int, min_: int=1):
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/pdftools/_version.py to _version.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
installing scripts to build/bdist.linux-x86_64/egg/EGG-INFO/scripts

add won't parse arguments the right way

If I do something like this:

pdftools add -o output.pdf input.pdf

I will get the following error:

pdftools add: error: the following arguments are required: src

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.