Giter Site home page Giter Site logo

jason-fox / fox.jason.passthrough.pandoc Goto Github PK

View Code? Open in Web Editor NEW
20.0 3.0 4.0 619 KB

Pandoc DITA-OT Plug-in for extending the available input formats for DITA-OT. Non DITA input sources can be pre-processed to create create valid DITA source.

Home Page: https://jason-fox.github.io/dita-ot-plugins/pandoc

License: Apache License 2.0

Lua 97.96% Java 2.04%
dita-ot-plugin pandoc dita-ot markdown html rst odt dita word lua-script

fox.jason.passthrough.pandoc's Introduction

Pandoc Plugin for DITA-OT

license DITA-OT 4.2 CI Quality Gate Status

This is a DITA-OT Plug-in to extend the available input formats for DITA-OT. Non DITA input sources can be pre-processed using Pandoc to create create valid DITA source. Files written in multiple input formats can be directly added to a *.ditamap and processed as if they had been written in DITA.

▶️ Video from DITA-OT Day 2019

Table of Contents

Background

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can convert from the following formats:

This plug-in contains a Lua template which extends the output formats supported by Pandoc to include DITA. The output consists of a single DITA topic for each input file added to the ditamap.

Unlike the standard Markdown Plug-in, this plug-in does not fail if the h1...h6 headers are incorrectly incremented. This is because the Lua template has been designed to calculate that headers are incrementing at most one level at a time - the downside of this is that the output maybe unexpected.

Note that because Pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

Install

The DITA-OT Pandoc Pass Through plug-in has been tested against DITA-OT 4.x. It is recommended that you upgrade to the latest version.

Installing DITA-OT

The DITA-OT Pandoc plug-in is a file reader for the DITA Open Toolkit.

  • Full installation instructions for downloading DITA-OT can be found here.

    1. Download the dita-ot-4.2.zip package from the project website at dita-ot.org/download
    2. Extract the contents of the package to the directory where you want to install DITA-OT.
    3. Optional: Add the absolute path for the bin directory to the PATH system variable.

    This defines the necessary environment variable to run the dita command from the command line.

curl -LO https://github.com/dita-ot/dita-ot/releases/download/4.2/dita-ot-4.2.zip
unzip -q dita-ot-4.2.zip
rm dita-ot-4.2.zip

Installing the Plug-in

  • Run the plug-in installation commands:
dita install https://github.com/doctales/org.doctales.xmltask/archive/master.zip
dita install https://github.com/jason-fox/fox.jason.passthrough/archive/master.zip
dita install https://github.com/jason-fox/fox.jason.passthrough.pandoc/archive/master.zip

The dita command line tool requires no additional configuration.


Installing Pandoc

To download a copy follow the instructions on the Install page

If running DITA-OT with the Oxygen editor on Mac OS, and if you start Oxygen from the Terminal using sh oxygen.sh in the Oxygen installation folder, when Oxygen runs the DITA OT, the build file manages to run the pandoc executable. Starting Oxygen by double clicking the shortcut in the Finder, does not work reliably, it works only if the path to the Pandoc executable /usr/local/bin/ is fully specified in the configuration file. fox.jason.passthrough.pandoc/cfg/configuration.properties

Usage

To mark a file to be passed through for Pandoc processing, label it with format="pandoc" within the *.ditamap as shown:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE bookmap PUBLIC "-//OASIS//DTD DITA BookMap//EN" "bookmap.dtd">
<bookmap>
    ...etc
    <chapter format="pandoc" href="sample.docx"/>
</bookmap>

The additional file will run against the Pandoc XXX-to-DITA lua filter to be converted to a *.dita file and will be added to the build job without further processing. The navtitle of the included topic will be the same as root name of the file. Any underscores in the filename will be replaced by spaces in title.

How to annotate Pandoc passthrough files

The examples below use Markdown as a passthrough format, other formats need to provide equivalent annotations to obtain full functionality. Where possible, annotation aligns with the Markdown DITA syntax reference based on CommonMark. The chapter title is taken from the first header found. Thereafter the document is processed as expected:

# Chapter title

The abstract (if any) goes here...

## Topic 1

Body of topic 1 goes here.

## Topic 2

Body of topic 2 goes here.

...etc

Ideally input files should only contain a single <h1> header

ID and outputclass

Pandoc header_attributes can be used to define id or outputclass attributes:

# Topic title {#carrot .juice}

Sections

The following class values in header_attributes have a special meaning on header levels.

  • section
  • example

They are used to generate <section> and <example> elements:

# Topic title

## Section title {.section}

## Example title {.example}

Note

The following class values in header_attributes has a special meaning on header levels.

  • note

They are used to generate <note> elements:

# Topic title

Contents of the topic go here ...

---

## Note|Warning|Tip|Important {.note}

Contents of the note

---

Contents of the topic continue here ...

The type of the note is defined by the title of the header. The <note> will continue until the next header element or horizontal rule ---, which ever comes sooner

Metadata

YAML metadata block as defined in Pandoc pandoc_metadata_block can be used to specify different metadata elements. The supported elements are:

  • author
  • source
  • publisher
  • permissions
  • audience
  • category
  • keyword
  • resourceid
  • shortdesc

Unrecognized keys are output using data element.

---
author:
    - Author One
    - Author Two
source: Source
publisher: Publisher
permissions: Permissions
audience: Audience
category: Category
keyword:
    - Keyword1
    - Keyword2
resourceid:
    - Resourceid1
    - Resourceid2
workflow: review
---
Sample output with YAML header
<title>Sample with YAML header</title>
<prolog>
  <author>Author One</author>
  <author>Author Two</author>
  <source>Source</source>
  <publisher>Publisher</publisher>
  <permissions view="Permissions"/>
  <metadata>
    <audience audience="Audience"/>
    <category>Category</category>
    <keywords>
      <keyword>Keyword1</keyword>
      <keyword>Keyword2</keyword>
    </keywords>
  </metadata>
  <resourceid appid="Resourceid1"/>
  <resourceid appid="Resourceid2"/>
  <data name="workflow" value="review"/>
</prolog>

Ditamap TopicMeta for Pandoc Files

Ditamap <topicmeta> processing is also supported.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE bookmap PUBLIC "-//OASIS//DTD DITA BookMap//EN" "bookmap.dtd">
<bookmap>
    <chapter format="pandoc" processing-role="normal" type="topic" href="markdown.md">
        <topicmeta>
            <shortdesc>This is where the shortdesc goes</shortdesc>
            <metadata>
                 <keywords>
                    <keyword>Keyword1</keyword>
                    <keyword>Keyword2</keyword>
                </keywords>
            </metadata>
        </topicmeta>
    </chapter>
</bookmap>

This allows for topic metadata to be added to files for formats other than Markdown.

License

Apache 2.0 © 2019 - 2024 Jason Fox

fox.jason.passthrough.pandoc's People

Contributors

actions-user avatar jason-fox avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fox.jason.passthrough.pandoc's Issues

Create pandoc transtype for XXX=>dita

Related: jason-fox/fox.jason.passthrough#1

  • Using the dita output format does not produce files with .dita or .xml extension. Instead it is the source files extension even though the contents of the converted file are in fact DITA. I'm forced to rename the extension.

This is a limitation of the dita plugin - the xslt there is converting the treated files with the same file names as the input. There is no extension point I could use, so I'd have to create a new transtype for pandoc-dita post-processing.

Problem generating with Oxygen 24 on Windows

We don't ever use the terminal to generate DITA docs - only Oxygen.

My Pandoc executable is located here on my machine: C:\Users\gscott\AppData\Local\Pandoc
I use a plugin directory outside of Oxygen. dita.dir=C:\DITA-OT

I uncommented and set configuration.properties: pandoc.dir=C:\Users\gscott\AppData\Local\

Am I entering the path incorrectly? Here's the log: oxygenlog.txt

Thanks for your plugins

Hey Jason, this is not an "issue" but a compliment. You've added several cool plugins (such as this pandoc one) to the registry and I wanted to say thanks a lot and keep up the good work!

Regards,
Mark Giffin
https://github.com/markgif

Cannot convert a Word document to DITA topic

I had success in converting Markdown to DITA using the plugin.
I'm attaching a Word Document (DOCX).

Word File with various structures.docx

If I refer to it from a DITA Map:

<topicref format="pandoc" href="Word%20File%20with%20various%20structures.docx" type="topic"/>

the publishing is not able to convert it to DITA:

pandoc.process:
   [pandoc] Processing D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/markdown.md
[file-rename] Moving 1 file to D:\projects\eXml\frameworks\dita\DITA-OT3.x\plugins\fox.jason.passthrough.pandoc-master\test\input-markdown\temp\html5\oxygen_dita_temp
    [pandoc] Processing D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/Word File with various structures.docx
   [pandoc] Result: 1
   [pandoc] pandoc: File: openBinaryFile: does not exist (No such file or directory)

Running pandoc from the command line on the same Word document seems to work for me:

pandoc "D:/projects/eXml/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/test/input-markdown/Word File with various structures.docx"

Plugin fails to find pandoc even though it is installed

I receive this error when I run DITA-OT build, even though I've confirmed pandoc is installed and can run from anywhere:
/Applications/Oxygen XML Editor/frameworks/dita/DITA-OT3.x/plugins/fox.jason.passthrough.pandoc-master/resource/antlib.xml:27: Execute failed: java.io.IOException: Cannot run program "pandoc" (in directory "/var/folders/hm/51jzs7b97tvctw489dj_9w180000gp/T"): error=2, No such file or directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.