Giter Site home page Giter Site logo

gusanmaz / artitle Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 11 KB

A Python CLI program for batch renaming academic article PDFs to their titles.

Home Page: https://pypi.org/project/artitle/

License: MIT License

Python 100.00%
academic-articles grobid renaming-files arvix pdf-generation pdf-rename

artitle's Introduction

Academic Article Renamer

Academic Article Renamer (artitle) is a command-line interface (CLI) tool that renames academic articles to their titles. It solves the problem of cryptic default file names that many academic article repositories provide. The CLI uses the Grobid server to extract the title from the PDF file and renames the file accordingly.

Installation

You can install artitle using pip:

pip install artitle

Usage

To use artitle, you must have Grobid server running on your computer. You can refer to the Grobid documentation for information on how to run the server.

According to our observations Grobid needs to eat up substantial amounts of memory to be able to perform it's job. We recommend to allocate at least 6GB memory for Grobid Server to avoid runtime failures. To run the container with 6GB memory you could type the following command:

docker run -m 6g -p 8070:8070 grobid/grobid:0.7.2

Once you have the Grobid server running, you can use the CLI to rename your article PDFs. To rename all the PDFs in a directory, run the following command:

artitle <path-to-pdf-files>

he argument should be the path to the directory containing the PDF files that you want to rename.

By default, the CLI uses underscores (_) to replace spaces in the file names. If you want to use a different character to replace spaces, you can specify it using the -s or --space-replace option:

artitle <path-to-pdf-files> -s "-"

This command uses hyphens (-) to replace spaces in the file names.

The program creates a new directory named pdfs_with_old_names inside the directory containing the PDF files. Before renaming any PDF files, the program copies the original PDF files into this directory with their original names, so that you can easily revert the changes if anything goes wrong.

After renaming the PDF files, the program creates a file named renaming.txt inside the directory containing the PDF files. This file contains information about the renaming of each PDF file, with one row for each file. The rows include the original file name, the new file name, and any characters that were replaced with hyphens (-) because they could cause problems in file names.

Grobid Server generates an XML file for each processed PDF file. These XML files are stored in xml directory inside the directory containing PDF files.

License

Academic Article Renamer is licensed under the MIT License. See the LICENSE file for more information.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.