Convert specific text in a DOCX document to hyperlinks, without changing anything else.
M.H.V. Werts, May 2020, using code snippets by others cited below (many thanks!).
USE AT YOUR OWN RISK! This program has only been very partially tested.
A typical use case for this Python script would be that you have an MS Word DOCX document with a formatted bibliography (e.g. using Zotero followed by unlinking the references) and you need to insert clickable hyperlinks to the corresponding papers on-line (e.g. using their Digital Object Identifiers DOI).
With Zotero it is not possible to insert such hyperlinks directly into the Word document (if you know how to do it, please tell me!). The present script provides a way to obtain hyperlinks from DOI identifiers inserted as plain text into the bibliography. Zotero can generate such plain text DOIs.
With some effort, this Python script can probably be adapted for other use cases where one would need to convert plain text in a document into hyperlinks.
python3 docx_text2link.py <name of input file> <name of output file>
Only one DOI per paragraph is processed. Each DOI needs to be in a separate paragraph. It may be necessary to fine-tune the script by editing it to suit your specific use case.
One example is provided in the form of example_bibliography_input.docx
which has an ("unlinked") Zotero-formatted bibliography. Using the present docx_text2link
, the DOIs have been converted to hyperlinks. The result of running the script is in example_bibliography_output.docx
.
The example input and output are also provided as PDF so that you can see the effect of the script directly by opening these documents in Github.
There is no specific installation procedure. The script is copied into the directory with the document to be processed, and then run by calling the python3
interpreter
The script, however, relies on python-docx (we used 0.8.10), which needs to be installed first.
python-docx is available on the conda-forge channel. Install with conda install python-docx
. For those working with pip
, there is pip3 install python-docx
.
Writing of this script was possible thanks to the following information:
[1] python-openxml/python-docx#74
[3] python-openxml/python-docx#74 (comment)
The following very helpful code was used in the script:
[4] python-openxml/python-docx#519 (comment)
[5] python-openxml/python-docx#74 (comment)
Method [3] provides alternative way of inserting links. However, this generates some complications with formatting, and links can only appear at end of a paragraph.