Giter Site home page Giter Site logo

cut_pdf_before_string's Introduction

Shellcheck

PDF Cut Script

This is a bash script that uses pdfgrep and pdftk to cut pdfs before the first page, containing a given string.

This script takes the pdfs from a folder recursively and cuts them at the right page, saving the result in the same location at a subfolder: cut/{filename}{appendix}.pdf.

It works like this:

  1. Find the first page number that contains the $SEARCH_STRING. If the pdf is only one page long, no cut is needed. Elif no page contains the string, open the pdf and ask the user for the page number (can be deactivated).
  2. Cut the pdf before that page.
  3. Save the result in a subfolder.

Requirements

The script is written in bash. To identify the page number, we use the pdfgrep tool. To cut the pdf, we use pdftk. The pdfs need to include text, rasterized pdfs will not work.

Usage

The script takes one argument: the path to the folder containing the pdfs. It will recursively (three levels deep) search for pdfs and cut them.

./cut_contents.sh <path>

Further variables can be set in the script. Specifically, the string to search for can be changed, and the appendix sting can be changed. It is also possible to give an expected number of pages. If the cut is made at a different page, the script will print a warning. A default PDF viewer can be set, which will be used to open the pdfs. Lastly, parallel processing can be enabled, which can speed up the process.

Check Cut PDFs

To check the cut pdfs, the script check_cut_pdfs.sh can be used. Using evince, it will open the original pdfs at the last page.

./check_cut_pdfs.sh <path>

License

As pdfgrep and pdftk are licensed under the GPL, this script is licensed under the GPL as well. GPLv3 is used, see LICENSE for details.

cut_pdf_before_string's People

Contributors

cbueth avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.