Giter Site home page Giter Site logo

docx_text2link's Introduction

docx_text2link: Convert text in a DOCX document to hyperlinks

Convert specific text in a DOCX document to hyperlinks, without changing anything else.

M.H.V. Werts, May 2020, using code snippets by others cited below (many thanks!).

USE AT YOUR OWN RISK! This program has only been very partially tested.

Description

A typical use case for this Python script would be that you have an MS Word DOCX document with a formatted bibliography (e.g. using Zotero followed by unlinking the references) and you need to insert clickable hyperlinks to the corresponding papers on-line (e.g. using their Digital Object Identifiers DOI).

With Zotero it is not possible to insert such hyperlinks directly into the Word document (if you know how to do it, please tell me!). The present script provides a way to obtain hyperlinks from DOI identifiers inserted as plain text into the bibliography. Zotero can generate such plain text DOIs.

With some effort, this Python script can probably be adapted for other use cases where one would need to convert plain text in a document into hyperlinks.

Usage

python3 docx_text2link.py <name of input file> <name of output file>

Only one DOI per paragraph is processed. Each DOI needs to be in a separate paragraph. It may be necessary to fine-tune the script by editing it to suit your specific use case.

Example

One example is provided in the form of example_bibliography_input.docx which has an ("unlinked") Zotero-formatted bibliography. Using the present docx_text2link, the DOIs have been converted to hyperlinks. The result of running the script is in example_bibliography_output.docx.

The example input and output are also provided as PDF so that you can see the effect of the script directly by opening these documents in Github.

Installation

There is no specific installation procedure. The script is copied into the directory with the document to be processed, and then run by calling the python3 interpreter

The script, however, relies on python-docx (we used 0.8.10), which needs to be installed first.

python-docx is available on the conda-forge channel. Install with conda install python-docx . For those working with pip , there is pip3 install python-docx .

Acknowledgements

Writing of this script was possible thanks to the following information:

[1] python-openxml/python-docx#74

[2] https://stackoverflow.com/questions/40475757/how-to-extract-the-url-in-hyperlinks-from-a-docx-file-using-python

[3] python-openxml/python-docx#74 (comment)

The following very helpful code was used in the script:

[4] python-openxml/python-docx#519 (comment)

[5] python-openxml/python-docx#74 (comment)

Method [3] provides alternative way of inserting links. However, this generates some complications with formatting, and links can only appear at end of a paragraph.

docx_text2link's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.