Giter Site home page Giter Site logo

pbet's Introduction

PBET

PBET - pdf bookmark edit tool.

A(nother) tool for edit bookmarks in PDF file.
But this one support OCR feature.๐Ÿ˜Ž
However, you must first install tesseract to use the OCR feature.๐Ÿ˜ญ

Screenshot

Usage

If you are using the Releases version, double-click "run.bat" first.

  1. Select the pdf file you want to edit.
  2. (optional) Load existing bookmarks.
  3. (optional) Use OCR to get book catalog.
  4. Edit bookmarks. (Format details see below.)
  5. (optional) Set page number offset.
  6. Make sure there is no risk!
    The name of the new file will be (your file name)-new.pdf .
    Note that if a file with the same name exists in the source file directory, The save button will overwrite it without any warning!
  7. Click 'Save'.

Bookmark Format

It's actually just CSV format, very simple.
Each column is separated by a CSV delimiter.(The default delimiter is '~'.)

For example, By default you will see something like the following:

1~I am title~1
2~I am subtitle~1
1~I am another title~2

The first column is the level.
The second column is the title.
The third column is the page number.

There is also an optional fourth column that records a vertical coordinates on the page.
But this coordinate is recorded in points, so it is difficult to add manually.

Note that when the title contains a delimiter, the program obviously cannot parse it correctly.
You need to choose to use another delimiter.

This tool uses pymupdf to manipulate pdf files, check out these links if you want more details.
get_toc & set_toc

OCR Feature

As mentioned before, you have to install tesseract to use the OCR feature.
Versions for win32 can be found here and here.

WARNING: Tesseract should be either installed in the directory which is suggested during the installation or in a new directory. The uninstaller removes the whole installation directory. If you installed Tesseract in an existing directory, that directory will be removed with all its subdirectories and files.

Once you install it, you need to click the OCR button to tell this tool where tesseract.exe is.
Then set the language you want to recognize and you're good to go. (Just click the OCR button again.)

If you missed downloading language packs during installation, you can find them at this link.
Download and put them in (tesseract installation directory)/tessdata.

If you want to recognize multiple languages, just enter [first language code]+[second language code]+... in the language input box.
Language codes can be found in this document.

Recognition by the OCR tool may not be successful every time and may require multiple attempts.

Other Things

  • If you need to edit more than 20 bookmarks, I recommend you use Excel to edit the CSV.
  • The new file may lose some metadata, Check out this document for more details.

pbet's People

Contributors

nivdc avatar

Stargazers

john-ly avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.