Giter Site home page Giter Site logo

pdftools's Introduction

pdftools

  • Copyright (c) 2015 Stefan Lehmann
  • License: MIT
  • Description: Python-based command line tool for manipulating PDFs. It is based on the PyPdf2 package.

Build Status PyPI version Downloads Downloads

Features

  • add, insert, remove and rotate pages
  • split PDF files in multiple documents
  • copy specific pages in a new document
  • merge or zip PDF files into one document

Usage

pdftools adds some scripts to your existing Python installation that can be called via the command line. The description for each script is listed below.

pdftools

usage: pdftools [-h] [-V] <command> ...

Python-based command line tool for manipulating PDFs. It is based on the
PyPdf2 package.

optional arguments:
  -h, --help     show this help message and exit
  -V, --version  Print version number and exit (default: False)

Sub-commands:
  <command>
    add          Add pages from a source file to an output PDF file
    copy         Copy specific pages of a PDF file in a new file
    insert       Insert pages of one file into another
    merge        Merge the pages of multiple input files into one output file
    remove       Remove pages from a PDF file
    rotate       Rotate the pages of a PDF files by 90 degrees
    split        Split a PDF file into multiple documents
    zip          Python-like zipping (interleaving) the pages of two documents
                 in one output file

Add

usage: pdftools add [-h] [-p PAGES [PAGES ...]] [-o OUTPUT] dest src

Add pages from a source file to an output PDF file

positional arguments:
  dest                  Destination PDF file
  src                   PDF source file

optional arguments:
  -h, --help            show this help message and exit
  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]
                        list of pages to add to the output. Examples: 5; 1-9;
                        1-; -9 (default: None)
  -o OUTPUT, --output OUTPUT
                        Name of the output file. If None, the `dest` file will
                        be overwritten (default: None)

Copy

usage: pdftools copy [-h] [-o OUTPUT] [-p PAGES [PAGES ...]] [-y] src

Copy specific pages of a PDF file in a new file

positional arguments:
  src                   Source PDF containing pages to copy

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Name of the output file. If None, the `dest` file will
                        be overwritten (default: None)
  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]
                        list of pages to copy in the new file. Examples: "5 8
                        10": Pages 5, 8, 10; "1-9": Pages 1 to 9; "5-": Pages
                        from 5 to last page; "-9": Pages from beginning to 9
                        (default: 1)

Insert

usage: pdftools insert [-h] [-o OUTPUT] [-p PAGES [PAGES ...]] [-i INDEX]
                       dest src

Insert pages of one file into another

positional arguments:
  dest                  Destination PDF file
  src                   Source PDF file

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Name of the output file. If None, the `dest` file will
                        be overwritten (default: None)
  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]
                        List of page numbers (start with 1) which will be
                        inserted. If None, all pages will be inserted
                        (default). Examples: 5; 1-9; 1-; -9 (default: None)
  -i INDEX, --index INDEX
                        Page number (1-indexed) of destination file where the
                        pages will be inserted. If None they will be added at
                        the end of the file (default: None)

Remove

usage: pdftools remove [-h] [-o OUTPUT] src pages [pages ...]

Remove pages from a PDF file

positional arguments:
  src                   PDF source file
  pages                 List of pages to remove from file. Examples: 5; 1-9;
                        1-; -9

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Name of the output file. If None, the `src` file will
                        be overwritten (default: None)

Rotate

usage: pdftools rotate [-h] [-d {90,180,270}] [-c] [-p PAGES [PAGES ...]]
                       [-o OUTPUT]
                       src

Rotate the pages of a PDF file by a set number of degrees

positional arguments:
  src                   Source file

optional arguments:
  -h, --help            show this help message and exit
  -d {90,180,270}, --degrees {90,180,270}
                        Specify degrees value to rotate page(s) (default: 90)
  -c, --counter-clockwise
                        Rotate pages counter-clockwise instead of clockwise,
                        by default (default: False)
  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]
                        List of page numbers which will be rotated. If None,
                        all pages will be rotated. Examples: 5; 1-9; 1-; -9
                        (default: None)
  -o OUTPUT, --output OUTPUT
                        Output filename. If None, the source file will be
                        overwritten (default: None)

Split

usage: pdftools split [-h] [-o OUTPUT] [-s STEPSIZE]
                      [-q SEQUENCE [SEQUENCE ...]]
                      src

Split a PDF file into multiple documents

positional arguments:
  src                   Source file to be split

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output filenames. If None, will append page numbers to
                        the input file name. (default: None)
  -s STEPSIZE, --stepsize STEPSIZE
                        How many pages are packed in each output file
                        (default: 1)
  -q SEQUENCE [SEQUENCE ...], --sequence SEQUENCE [SEQUENCE ...]
                        Sequence of numbers describing how many pages to put
                        in each outputfile (default: None)

Merge

usage: pdftools merge [-h] [-o OUTPUT] [-d] src [src ...]

Merge the pages of multiple input files into one output file

positional arguments:
  src                   List of input source files

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output filename (default: merged.pdf)
  -d, --delete          Delete source files after merge (default: False)

Zip

usage: pdftools zip [-h] [-d] [-r] src1 src2 output

Python-like zipping (interleaving) the pages of two documents in one output
file

positional arguments:
  src1          First source file
  src2          Second source file
  output        Name of the output file

optional arguments:
  -h, --help    show this help message and exit
  -d, --delete  Delete source files after merge (default: False)
  -r, --revert  Revert the pages of second input file (default: False)

pdftools's People

Contributors

agricolab avatar christianrinn avatar jrhawley avatar stiftcast avatar stlehmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pdftools's Issues

pdftools merge fails for some PDFs

When I try to merge a PDF of a Virgin Mobile phone bill, it crashes on Windows 7 / Cygwin.

$ pdftools merge -o test.pdf virgin.pdf
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 229, in new
return decimal.Decimal.new(cls, utils.str_(value), context)
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/pdftools", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/pdftools/_cli.py", line 274, in main
pdf_merge(ARGS.src, ARGS.output, ARGS.delete)
File "/usr/local/lib/python3.8/site-packages/pdftools/pdftools.py", line 42, in pdf_merge
writer.write(outputfile)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 586, in _sweepIndirectReferences
newobj = self._sweepIndirectReferences(externMap, newobj)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 579, in readFromStream
value = readObject(stream, pdf)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 92, in readObject
return NumberObject.readFromStream(stream)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 271, in readFromStream
return FloatObject(num)
File "/usr/local/lib/python3.8/site-packages/PyPDF2/generic.py", line 231, in new
return decimal.Decimal.new(cls, str(value))
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

$ pip list
Package Version


pdftools 2.0.2
pip 21.3.1
PyPDF2 1.26.0
setuptools 59.1.1

If I go into Adobe, optimize the PDF, and save to a new file, then there are no problems. Do you have any suggestions about how to handle this from the command line? I wish I had a PDF to send without tons of private information.

ValueError: invalid literal for int() with base 10: b''

I got an error while mergng pdfs in Python 3.

File "C:\Python34\lib\site-packages\PyPDF2\utils.py", line 149, in getitem
len_self = len(self)
File "C:\Python34\lib\site-packages\PyPDF2\utils.py", line 140, in len
return self.lengthFunction()
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 983, in getNumPages
self._flatten()
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1281, in _flatten
pages = catalog["/Pages"].getObject()
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 501, in getitem
return dict.getitem(self, key).getObject()
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 177, in getObject
return self.pdf.getObject(self).getObject()
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1366, in getObject
retval = self._getObjectFromStream(indirectReference)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1320, in _getObjectFromStream
objnum = NumberObject.readFromStream(streamData)
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 264, in readFromStream
return NumberObject(num)
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 251, in new
return int.new(cls, value)
ValueError: invalid literal for int() with base 10: b''

Broken executables in a virtual environment?

I used pip to install pdftools in a python3 virtual environment, and the executables are broken. They all have a #! line that reads #!/usr/bin/python3, when it really ought to read (in my case) #!/home/grant/Envs/py36/bin/python, instead. Alternatively, #!/usr/bin/env python3 would work, too.

In the past I've used setuptools with a 'console_scripts' key in the 'entry_points' dictionary, and all of that has worked automatically for me, but you might have a better solution.

In any event, thanks for a nice tool! It's quite handy.

add won't parse arguments the right way

If I do something like this:

pdftools add -o output.pdf input.pdf

I will get the following error:

pdftools add: error: the following arguments are required: src

Syntax error during installation process

Not sure if it affects anything

python setup.py install

root@dev-alex-02:/root/pdftools-1.0.8 # ./setup.py install
: No such file or directory

root@dev-alex-02:/root/pdftools-1.0.8 # python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to pdftools.egg-info/requires.txt
writing pdftools.egg-info/PKG-INFO
writing top-level names to pdftools.egg-info/top_level.txt
writing dependency_links to pdftools.egg-info/dependency_links.txt
reading manifest file 'pdftools.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'pdftools.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/pdftools
copying pdftools/init.py -> build/lib/pdftools
copying pdftools/pdftools.py -> build/lib/pdftools
copying pdftools/parseutil.py -> build/lib/pdftools
copying pdftools/_version.py -> build/lib/pdftools
UPDATING build/lib/pdftools/_version.py
set build/lib/pdftools/_version.py to '1.0.8'
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/pdftools
copying build/lib/pdftools/init.py -> build/bdist.linux-x86_64/egg/pdftools
copying build/lib/pdftools/pdftools.py -> build/bdist.linux-x86_64/egg/pdftools
copying build/lib/pdftools/parseutil.py -> build/bdist.linux-x86_64/egg/pdftools
copying build/lib/pdftools/_version.py -> build/bdist.linux-x86_64/egg/pdftools
byte-compiling build/bdist.linux-x86_64/egg/pdftools/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/pdftools/pdftools.py to pdftools.pyc
File "build/bdist.linux-x86_64/egg/pdftools/pdftools.py", line 15
def pdf_merge(inputs: [str], output: str, delete: bool=False):
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/pdftools/parseutil.py to parseutil.pyc
File "build/bdist.linux-x86_64/egg/pdftools/parseutil.py", line 20
def parse_rangearg(args, max_: int, min_: int=1):
^
SyntaxError: invalid syntax

byte-compiling build/bdist.linux-x86_64/egg/pdftools/_version.py to _version.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
installing scripts to build/bdist.linux-x86_64/egg/EGG-INFO/scripts

Reorder booklet

Hello,

I tried to use pdftools/pdfposter to split a scanned image of an A4 booklet. The booklet has been unstapled and scanned like double sided A3. I used pdfposter -p 2x1A4 on it. The document got perfectly split into individual pages. The last step needed is to reorder the booklet into the original page order (1,2,3...). It would be great if you could add this function to the package.

Regards

Jiri

PyPDF2 dependency error

pdftools currently errors for me when I did a fresh install with pipx and wanted to split some files:

$ pdftools split some_file.pdf -q 12 9
Traceback (most recent call last):
  File "~/.local/bin/pdftools", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/pdftools/_cli.py", line 286, in main
    pdf_split(ARGS.src, ARGS.output, ARGS.stepsize, ARGS.sequence)
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/pdftools/pdftools.py", line 157, in pdf_split
    reader = PdfFileReader(inputfile)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/PyPDF2/_reader.py", line 1974, in __init__
    deprecation_with_replacement("PdfFileReader", "PdfReader", "3.0.0")
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/PyPDF2/_utils.py", line 369, in deprecation_with_replacement
    deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
  File "~/.local/pipx/venvs/pdftools/lib/python3.11/site-packages/PyPDF2/_utils.py", line 351, in deprecation
    raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.

This seems related to #17, because while requirements.txt pins PyPDF2 to version 1.25.1, setup.py only lists install_requires=["PyPdf2"],

Are there any plans to update this package to the new usage?
Otherwise the simple fix would be to pin the version of PyPDF2 in setup.py,

Manually reinstalling in a fresh environment with the pinned PyPDF2 version still works:

$ pip install pdftools PyPDF2==1.25.1
Collecting pdftools
  Using cached pdftools-2.0.2-py3-none-any.whl
Collecting PyPDF2==1.25.1
  Downloading PyPDF2-1.25.1.tar.gz (194 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.2/194.2 kB 3.4 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Installing collected packages: PyPDF2, pdftools
  DEPRECATION: PyPDF2 is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  Running setup.py install for PyPDF2 ... done
Successfully installed PyPDF2-1.25.1 pdftools-2.0.2
$ pdftools split some_file.pdf -q 12 9 # Success

copy crashes with AttributeError

I have a fresh install of pdftools v 2.0.2 (installed via pipx in an isolated venv on python3.10). Trying to get started, I first found #11 also with this line:

pdftools copy --output start.pdf --pages "1-33" src.pdf

I got:

usage: pdftools copy [-h] [-o OUTPUT] [-p PAGES [PAGES ...]] [-y] src
pdftools copy: error: the following arguments are required: src

So I tried with the -y:

pdftools copy --output start.pdf --pages "1-33" -y src.pdf

and:

Traceback (most recent call last):
  File "/Users/mjf/.local/bin/pdftools", line 8, in <module>
    sys.exit(main())
  File "/Users/mjf/.local/pipx/venvs/pdftools/lib/python3.10/site-packages/pdftools/_cli.py", line 266, in main
    pdf_copy(ARGS.input, ARGS.output, ARGS.pages, ARGS.y)
AttributeError: 'Namespace' object has no attribute 'input'

Even if one removes all the "optional" command line args, this crashes:

pdftools copy src.pdf                Mon Jan 17 15:29:53 2022
Traceback (most recent call last):
  File "/Users/mjf/.local/bin/pdftools", line 8, in <module>
    sys.exit(main())
  File "/Users/mjf/.local/pipx/venvs/pdftools/lib/python3.10/site-packages/pdftools/_cli.py", line 266, in main
    pdf_copy(ARGS.input, ARGS.output, ARGS.pages, ARGS.y)
AttributeError: 'Namespace' object has no attribute 'input'

Update to latest PyPDF2

I've just noticed that pdftools exists and is used via StackOverflow :-)

You might want to update it to the latest version of PyPDF2 soon, as we are going to deprecate a lot of methods / classes. If you want, I can help you to move.

It might also be interesting to to move pdftools to the https://github.com/py-pdf organization. There might also be some overlap in project goals with pdfly, but pdftools is more mature.

Consider use a universal endline: Unix

I installed from pip install pdftools and pdfmerge.py has ^M endlines (dos). Would be nice if this works without I need to fix that. Version used: 1.1.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.