Giter Site home page Giter Site logo

pdfcropper's Introduction

pngCrop auto-detects white margins of a scanned white-background
document and crops the four margins either according to the
specified value or using a simple detection scheme. It is published
under the 2-clause BSD licence.

To compile:
cc -lpng -O2 pngCrop.c -o pngCrop
To use:
./pngCrop /home/draco/img-0011.png

On success, the cropped file is named Oimg-0011.png (or Oxx.png
where xx.png is the original name) under the same directory.

Editing source file to suit your needs. To manually set the left /
right / upper / bottom margins in pixels to crop, uncomment line
73 and edit dims array (and comment out line 75).

The automation of margin detection (procPng()) is based on simple
thresholding of consecutive dark blotches for a row (column). The
brightness (or red component for color png) is below tol (defined
on line 51), then it is considered dark. If there are more than
"Th" consecutive dark blotches on a column (or "Tw", on a row, the
two variables), then the row (column) is
considered to have actual content and thus excluded from margin.
After cropping, the Lmargin, Rmargin, Umargin and Bmargin values on
line 50 are used to determine the remaining margins surrounding
cropped content. Play with these parameters if the output png image
is unsatisfactory.

The suffix() function generates the output file name. It is based
on *NIX file system and regards '/' symbol as directory. It
prepends character 'O' before the input png file name.

Requirement: libpng.

Automatic PDF cropping:

While there are availabe scripts to adjust margins (
http://pdfcrop.sourceforge.net/ and
http://www.ctan.org/pkg/pdfcrop), it is based on TeX formatting and
thus unable to adjust for pure image-based pdfs such as
scan-generated ones. pngCrop.c is the tiny core part of simple
image cropping. It is easy to write a shell script such as
"cropper". First thing it does is use available tool (I prefer
`mupdfextract` of mupdf package) to convert a scanned PDF file into
png images. Then use pngCrop to process the images, and use
`convert` of imagemagick package to piece processed png files into
PDF.

pdfcropper's People

Contributors

kaikaizi avatar

Stargazers

Johannes Baiter avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.