Giter Site home page Giter Site logo

nautilus-ocr's Introduction

nautilus-ocr

nautilus-ocr is a Nautilus script that scans pdfs and adds OCR information to them. This is useful if you scanned documents and want to make them searcheable or copy-paste content from them.

nautilus-ocr right-click

nautilus-ocr language dialog

It allows selecting the language in the PDF for a better recognition of the text.

Requirements

  • Zenity: UI dialogs
  • OCRMyPDF: OCR
  • Tesseract: OCR (low-level)

For Tesseract you need also the OSD data.

You can install all the dependencies in Fedora as follows:

$ dnf install tesseract tesseract-osd ocrmypdf zenity

In Ubuntu you would instead do:

$ sudo apt install tesseract-ocr tesseract-ocr-osd ocrmypdf zenity

Similar packages might exist for your distribution of choice.

You might also want to add training data for other languages. In Fedora you can install, for example, French language as follows:

$ dnf install tesseract-langpack-fra

In Ubuntu, it would be:

$ sudo apt install tesseract-ocr-fra

Installation

(Click on image to see screencast)

Installation

Download nautilus-ocr.sh and move it to the Nautilus Scripts folder. Remember to add the execution permission. If you downloaded the script to ~/Downloads folder:

$ chmod +x ~/Downloads/nautilus-ocr.sh
$ mv ~/Downloads/nautilus-ocr.sh "~/.local/share/nautilus/scripts/Create OCR'ed PDF"

Once the script is in place, you can righ-click on any file in Nautilus.

It will create a pdf file with the _ocr suffix. That file will contain OCR information. If you can select text in that file, it means the OCR worked.

Configuration

Language

By default a dialog will ask you which language to use for the OCR. If you want to use always the same language, edit nautilus-ocr.sh and set the language at the top of the file.

Close on finish

A dialog will be showed after OCR'ing the files, stating that the process has finished. If you want to close the dialog automatically, just uncomment the option --auto-close.

Contributors

  • @errotu: Ubuntu support

nautilus-ocr's People

Contributors

daniperez avatar errotu avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

errotu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.