Giter Site home page Giter Site logo

chicobico / filebasedminidms Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stweiss/filebasedminidms

0.0 2.0 0.0 17 KB

This php script sorts your documents (by using hardlinks) into subfolders based on the hashtags it finds in your documents filenames.

License: MIT License

PHP 100.00%

filebasedminidms's Introduction

FileBasedMiniDMS

FileBasedMiniDMS.php by Stefan Weiss (2017)
Version 0.11 08.06.2017
https://github.com/stweiss/FileBasedMiniDMS

CHANGELOG

Version 0.11 (08.06.2017)

  • New: automatic OCR and automatic rename

Version 0.02 (02.03.2016)

  • release of this file based document management system.
  • sorts files with hashtags into hashtag-folders.

INSTALL

  1. Place this file on your FileServer/NAS
  2. For OCR (Step 1): Install Docker and pull an ocrmypdf image, eg. docker pull jbarlow83/ocrmypdf
  3. For Automatic rename (Step 1.1): make sure that pdftotext is available.
  4. Adjust settings for this script in config.php to fit your needs
  5. Create a cronjob on your FileServer/NAS to execute this script regularly. (In DSM you can do this in Control Panel -> Task Scheduler) It might be required to assign root privilege.
    ex. php /volume1/home/stefan/Scans/FileBasedMiniDMS.php
    or redirect stdout to see PHP Warnings/Errors:
    php /volume1/home/stefan/Scans/FileBasedMiniDMS.php >> /volume1/home/stefan/Scans/my.log 2>&1

NOTES

This script works in three steps. Each step can be turned on/off in config.php:

Step 1: OCR

OCR pdf files in the $inboxfolder, whose filename matches $matchWithoutOCR

Step 1.1: Rename ocr'ed files based on keywords and date

The pdf is going to be renamed to following structure: "<date> <name> <tags>.pdf"

<date>: The script tries to find a date in the pdf. If none is found the current date is used.
<name>: You can define $renamerules. The first rule which matches the ocr'ed content of the first page is used. You can use the operators & (AND) and , (OR) and you can use the wildcard operators ? and *.
<tags>: In $tagrules you can specify your tags. All matching rules will add their tag to the filename. You can use the same operators here.

Step 2: Tagging

This script creates a subfolder for each hashtag it finds in your filenames and creates a hardlink in that folder. Documents are expected to be stored flat in one folder. Name-structure needs to be like "<any name> #hashtag1 #hashtag2.extension".

eg: "Documents/Scans/2015-12-25 Bill of Santa Clause #bills #2015.pdf" will be linked into:

  • "Documents/Scans/tags/2015/2015-12-25 Bill of Santa Clause #bills.pdf"
  • "Documents/Scans/tags/bills/2015-12-25 Bill of Santa Clause #2015.pdf"

FAQ

Q: How do I assign another tag to my file?
A: Simply rename the file in the $scanfolder and add the tag at the end (but before the extension).

Q: How can I fix a typo in a documents filename?
A: Simply rename the file in the $scanfolder. The tags are created from scratch at the next scheduled interval and the old links and tags are automatically getting removed.

Disclaimer

Make sure to have a backup before you start using this script. You use this software on your own risk.

filebasedminidms's People

Contributors

chicobico avatar stweiss avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.