jelovirt / com.elovirta.ooxml Goto Github PK
View Code? Open in Web Editor NEWDITA to Word plug-in
License: Apache License 2.0
DITA to Word plug-in
License: Apache License 2.0
Add support for DITA 1.3 markup and xml domains.
I'm attaching a sample project. The generated document.xml contains a "p" inside a "p".
Right now you have a dependency on ant-contrib 0.6.
I keep getting these warning in the ANT console:
"trying to override old definition of task for"
Maybe you should try adding a dependency to the newest ant-contrib (1.0b3 I think).
Ideally the dependency to the ant-contrib could be added by a plugin which would always be bundled with the DITA OT (like the base plugin).
I found possible cell span bug in table header.
This should be converted as follows:
I attached the sample DITA instance and sample result.
I'm attaching a DITA Map which refers to a concept file containing a "fig" with a "desc" inside it. Somehow the generated Word document is invalid. I tested also with the "develop" branch.
When I generate .docx, word/document.xml contains multiple w:pPr per w:p such like below:
<w:p>
<w:pPr>
<w:pStyle w:val="BodyText"/>
</w:pPr>
<w:pPr>
<w:pStyle w:val="BodyText"/>
</w:pPr>
</w:p>
<w:p>
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
</w:pPr>
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
<!--depth 1-->
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="101"/>
</w:numPr>
</w:pPr>
<w:bookmarkStart w:id="5" w:name="_Tocd18e26"/>
<w:bookmarkStart w:id="4" w:name="_Refd18e26"/>
<w:r>
<w:t>XSLT</w:t>
</w:r>
</w:p>
I generated this result via oXygen 19.0 bundled plug-in.
According to the ISO XML scheme (wml.xsd) w:pPr should be occurred once as the first child element in w:p.
Probably because of the DOCX template used, the DOCX output from any DITA Map does no longer contain the page number at the bottom of the page. Also the DITA Map title is not contained in the header of each page.
Support manual line breaks using br
processing instruction
<p>Foo bar<?br?>baz</p>
will result in
Foo bar
baz
Redundant one space code remains in final .docx at the top of paragraph or at the end of paragraph.
Please refer to the attached DITA instance and output result.
In XSL-FO to PDF output, they are all removed by Formatter. But in .docx output the plug-in stylesheet should remove them by himself.
Based on this report:
https://www.oxygenxml.com/forum/post40578.html#p40578
For example if at the end of my topic I have:
<related-links>
<link href="https://www.oxygenxml.com/forum/post40578.html#p40578" format="html"
scope="external"/>
</related-links>
the output Word document does not have a link to the website.
The build fails with Invalid dateTime value
.
DITA-OT: 2.5
com.elovirta.ooxml branch: master
docx.convert:
[xslt] Processing C:\Users\eike\DITA\web-client\temp\processing\BHB_UnikatGE_Webclient_MERGED.xml to C:\Users\eike\DITA\web-client\temp\processing\BHB_UnikatGE_Webclient_CLEANED.xml
[xslt] Loading stylesheet C:\Users\eike\.DITA\dita-ot\plugins\com.elovirta.ooxml\docx\word\document.flat.xsl
[xslt] Processing C:\Users\eike\DITA\web-client\temp\processing\BHB_UnikatGE_Webclient_CLEANED.xml to C:\Users\eike\DITA\web-client\temp\processing\docx\docProps\core.xml
[xslt] Loading stylesheet C:\Users\eike\.DITA\dita-ot\plugins\com.elovirta.ooxml\docx\docProps\core.xsl
[xslt] Processing C:\Users\eike\DITA\web-client\temp\processing\BHB_UnikatGE_Webclient_CLEANED.xml to C:\Users\eike\DITA\web-client\temp\processing\docx\docProps\custom.xml
[xslt] Loading stylesheet C:\Users\eike\.DITA\dita-ot\plugins\com.elovirta.ooxml\docx\docProps\custom.xsl
[xslt] Processing C:\Users\eike\DITA\web-client\temp\processing\BHB_UnikatGE_Webclient_CLEANED.xml to C:\Users\eike\DITA\web-client\temp\processing\docx\word\document.xml
[xslt] Loading stylesheet C:\Users\eike\.DITA\dita-ot\plugins\com.elovirta.ooxml\docx\word\document.xsl
[xslt] Processing C:\Users\eike\DITA\web-client\temp\processing\BHB_UnikatGE_Webclient_CLEANED.xml to C:\Users\eike\DITA\web-client\temp\processing\docx\word\comments.xml
[xslt] Loading stylesheet C:\Users\eike\.DITA\dita-ot\plugins\com.elovirta.ooxml\docx\word\comments.xsl
[xslt] C:\Users\eike\.DITA\dita-ot\plugins\com.elovirta.ooxml\docx\word\document.utils.xsl:42: Fatal Error! Invalid dateTime value "2016-11-11T12:13:52+0100" (Timezone hour must be two digits)
[xslt] Failed to process C:\Users\eike\DITA\web-client\temp\processing\BHB_UnikatGE_Webclient_CLEANED.xml
BUILD FAILED
C:\Users\eike\DITA\web-client\build.xml:71: The following error occurred while executing this line:
C:\Users\eike\.DITA\dita-ot\build.xml:45: The following error occurred while executing this line:
C:\Users\eike\.DITA\dita-ot\plugins\com.elovirta.ooxml\build.xml:112: Fatal error during transformation using C:\Users\eike\.DITA\dita-ot\plugins\com.elovirta.ooxml\docx\word\comments.xsl: Invalid dateTime value "2016-11-11T12:13:52+0100" (Timezone hour must be two digits); SystemID: file:/C:/Users/eike/.DITA/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.utils.xsl; Line#: 42; Column#: -1
The Inkscape based SVG to EMF conversion converts all SVG files in all subdirectories. It does not matter, whether the SVG file is referenced in a map or topic.
I downloaded the DITA 1.3 specs from SVN and published the DITA Map "dita-1.3-specification-learningTraining.ditamap" to OOXML.
The resulting ooxml file is invalid.
Reported error when it is opened in MS Office 2013:
Ambiguous cell mapping encountered. Possible missing paragraph element. <p> elements are required before every </tc>.
Location: Part: /word/document.xml, Line 78, Column 0
Hi,
The docx produced shows the message "Image UserGuide/MainProductInfo_files/image1.png missing" when the ditamap file referes to dita file in a folder one or more level up then ditamap file container. If you open the docx file with winzip, the image is present in media folder.
The image is present when ditamap is build in pdf format.
The file test.zip contains the plug-in used, document structure, build command line, temporary files and output files (docx and pdf format always correct), in two different folder:
I need to keep the case 2 ditamap/dita file structure to keep separate different documents and facilitate the topic reuse putting them in the commontopic folder.
Please, could you help fixing the issue above?
Best regards
Lello
Something like:
<transtype name="docx" desc="Word Docx"/>
Currently all sections are generated as Document Final Section Properties (§17.6.17) when all expect the last w:sectPr
should be Section Properties (§17.6.18). Move non-final w:sectPr
to the next w:p
element.
If I convert attached DITA instance to .docx, the result is fine.
However after selecting all of the document by Ctrl + A and press F9 to update the field result, Word reports that REF fields are not valid.
This is because the referencing bookmarks are not defined in the converted word/document.xml. The stylesheet should generate accurate field code with the field result, because field may be updated after conversion by user editing.
Add support for generating a cover page when the template contains a cover page section.
Add extension point to add XSLT parameters.
In this sample:
http://www.oxygenxml.com/forum/files/flowers.zip
the lilac.dita has an xref like this:
<xref keyref="flowers.genus" format="dita">genus</xref>
but the generated word document contains something like:
Lilac (Syringa) is a Error! Reference source not found.
This is a kind of a subset of #26 but maybe there should be some logic involved too?
Multilevel lists (bullets and/or numbered) come up a lot in DITA and technical writing generally.
Would it make sense to have some logic to map the depth of the physical <li>
tags in the DITA to the corresponding standard Word headings (Bullet list
/ Numbered list
)?
If the DITA Map references all images like:
<keydef keys="testK" href="test.jpg" format="jpg"/>
and the topics reference it like:
<image keyref="testK"/>
it seems that the target "docx.package.media" is skipped because it looks at an images list file in the temporary files folder and that list file is empty.
My word exports appear to be ignoring the colspec colwidth attributes. I've tried explicit and relative values, for example, colwidth = "2in" and colwidth = "2*", but neither appear to work. How can I specify column widths?
Thanks!
When a colleague of mine runs the OOXML transformation on her PC, she gets the following strange timezone error:
[xslt] D:$_checkout$_docs\oxygenDocs\trunk_oxygen_ditaOT2.x\plugins\com.elovirta.ooxml\docx\word\document.utils.xsl:42: Fatal Error! Invalid dateTime value "2017-08-31T16:37:26+0200" (Timezone hour must be two digits)
[xslt] Failed to process C:\Users\xxx.xxx\oxygenOut\temp\oxygen_dita_temp\mcad_installAdminGuide-en_CLEANED.xml
BUILD FAILED
D:$_checkout$_docs\oxygenDocs\trunk_oxygen_ditaOT2.x\build.xml:45: The following error occurred while executing this line:
D:$_checkout$_docs\oxygenDocs\trunk_oxygen_ditaOT2.x\plugins\com.elovirta.ooxml\build.xml:112: Fatal error during transformation using D:$_checkout$_docs\oxygenDocs\trunk_oxygen_ditaOT2.x\plugins\com.elovirta.ooxml\docx\word\comments.xsl: Invalid dateTime value "2017-08-31T16:37:26+0200" (Timezone hour must be two digits); SystemID: file:/D:/$_checkout/$_docs/oxygenDocs/trunk/_oxygen/_ditaOT2.x/plugins/com.elovirta.ooxml/docx/word/document.utils.xsl; Line#: 42; Column#: -1
It works on my machine though. As far as I can tell, she and I have the same time settings.
I'm attempting to use a custom docx file for the dotx.file
.
I'm not getting any build errors, but the resulting file is considered corrupt by Word 2013 (I can still open it in LibreOffice Writer, and the content is there).
I've attached the template used during transformation and the resulting document. If I switch the template back to the Normal.docx that comes with this plugin, everything works fine. I suspect there is some undocumented requirement for the template that mine isn't adhering to.
The ant-contrib library is outdated. The shipped version is 0.6 (from 2004-02-18). The current version 1.0b3 is a little bit newer (2006-11-02). This is a library that is usually shipped by many DITA-OT plugins. Therefore you might get into trouble when importing it multiple times in different versions.
But maybe ant-contrib is completely obsolete here, as stated in #40 (comment)
There are some "s " fragments scattered in the resultant document.xml.
Looks like they come from the beginning of line https://github.com/jelovirt/com.elovirta.ooxml/blob/master/docx/word/document.topic.xsl#L67
Processing fails with message Parameter id is not declared in the called template
.
The error points to the following template in
document.abbrev-d.xsl
<xsl:template match="*" mode="ditamsg:no-glossentry-for-abbreviated-form">
<xsl:param name="keys"/>
<xsl:call-template name="output-message">
<xsl:with-param name="id">DOTX060W</xsl:with-param>
<xsl:with-param name="msgparams">%1=<xsl:value-of select="$keys"/></xsl:with-param>
</xsl:call-template>
</xsl:template>
Stacktrace: stacktrace.txt
Environment
2.2.5
When <node>node text</node>
is translated from DITA, the Word doc contains NOTE:node text
(no space between colon and the note text). Despite line 609 of /docx/word/document.topic.xsl
:
<w:t> </w:t>
(I myself decided to use the <w:tab/>
instead, which is commented out in the line before)
I've got some matching issues.
DITA-OT: 2.5.4
Branch: master
OS: Windows 7
docx.convert:
[xslt] : Error! Ambiguous rule match for /dakosyBookmap/dakosyConcept[14]/dakosyConcept[5]/dakosyConcept[2]/dakosyTask[1]/taskbody[1]/steps[1]/step[9]/stepresult[1]/note[1]/ul[1]/li[2]/dl[1]
[xslt] Matches both "*[contains(@class, ' topic/li ')]/*" on line 647 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] and "*[contains(@class, ' topic/note ')]//*[contains(@class, ' topic/li ')]//*" on line 764 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] : Error! Ambiguous rule match for /dakosyBookmap/dakosyConcept[14]/dakosyConcept[5]/dakosyConcept[2]/dakosyTask[1]/taskbody[1]/steps[1]/step[9]/stepresult[1]/note[1]/ul[1]/li[2]/dl[1]
[xslt] Matches both "*[contains(@class, ' topic/li ')]/*" on line 647 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] and "*[contains(@class, ' topic/note ')]//*[contains(@class, ' topic/li ')]//*" on line 764 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] : Error! Ambiguous rule match for /dakosyBookmap/dakosyConcept[14]/dakosyConcept[7]/dakosyConcept[2]/dakosyConcept[2]/dakosyTask[1]/taskbody[1]/steps[1]/step[18]/stepresult[1]/note[1]/ul[1]/li[2]/p[2]
[xslt] Matches both "*[contains(@class, ' topic/li ')]/*" on line 647 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] and "*[contains(@class, ' topic/note ')]//*[contains(@class, ' topic/li ')]//*" on line 764 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] : Error! Ambiguous rule match for /dakosyBookmap/dakosyConcept[14]/dakosyConcept[7]/dakosyConcept[2]/dakosyConcept[2]/dakosyTask[1]/taskbody[1]/steps[1]/step[18]/stepresult[1]/note[1]/ul[1]/li[2]/p[2]
[xslt] Matches both "*[contains(@class, ' topic/li ')]/*" on line 647 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] and "*[contains(@class, ' topic/note ')]//*[contains(@class, ' topic/li ')]//*" on line 764 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] : Error! Ambiguous rule match for /dakosyBookmap/dakosyConcept[14]/dakosyConcept[7]/dakosyConcept[3]/dakosyTask[1]/taskbody[1]/steps[1]/step[16]/info[1]/note[1]/ul[1]/li[2]/p[2]
[xslt] Matches both "*[contains(@class, ' topic/li ')]/*" on line 647 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] and "*[contains(@class, ' topic/note ')]//*[contains(@class, ' topic/li ')]//*" on line 764 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] : Error! Ambiguous rule match for /dakosyBookmap/dakosyConcept[14]/dakosyConcept[7]/dakosyConcept[3]/dakosyTask[1]/taskbody[1]/steps[1]/step[16]/info[1]/note[1]/ul[1]/li[2]/p[2]
[xslt] Matches both "*[contains(@class, ' topic/li ')]/*" on line 647 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] and "*[contains(@class, ' topic/note ')]//*[contains(@class, ' topic/li ')]//*" on line 764 of file:/C:/Users/eike/.dita/dita-ot/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
One of our clients encountered a problem with opening a word document obtained from DITA topics which contain footnotes. I'm attaching an archive.
Archive.zip
Word tries to recover the document. On Mac, Word crashed completely.
I produced a Word document from:
http://www.oxygenxml.com/forum/files/flowers.zip
The produced DOCM cannot be opened in MS Office 2013, it says:
We are sorry. We cannot open flowers.docm because we found a problem with its contents.
Details:
Unspecified error
Location: Part: /word/document.xml Line 1, Column 0
Example:
<?oxy_comment_start author="xxx" timestamp="20170412T163115+0300" comment="xxx"
causes error message:
Invalid dateTime value "2017-11-14T11:33:33+0300" (Timezone hour must be two digits)
document.utils.xsl; Line#: 42
Convert SVG into EMF using Inkscape.
When the source image for <image>
is missing, a reference to a non-existing is generated in OOXML. The conversion should identify missing images and ignore the images.
If in the DITA content I have an image map:
<imagemap id="personal_xsd_Element_p_person">
<image href="img/personal_xsd_Element_p_person.jpeg"/>
<area>
<shape>rect</shape>
<coords>147,46,251,97</coords>
<xref href="personal_xsd_Element_p_person.dita#person_id"/>
</area>
</imagemap>
right now the Word output outputs all the text content from the area including the coordinates.
It could do one of two things:
Is there a way to set the page orientation for each topic?
Ideally, I want to use the outputclass attribute in each topicref, setting the value to "portrait" or "landscape" and have it export accordingly.
.docx/word/numbering.xml contains:
This sample is generated via oXygen 19.0 bundled plug-in.
I'm getting the numbering from my Word styles (in the normal.dotm template) plus the additional title numbering I assume from the DITA processing for a total of 2 numbers per top level heading in the Word doc output. Is there any way to stop adding heading numbers into the heading text when the Word style is a numbered style (I guess probably not)?
Add ISO date-time parsing to comment date to fail early if the date format is invalid.
I have a DITA topic looking like this:
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topic id="introduction">
<title>Introduction</title>
<body>
<p>This is it <xref href="#topic_tnd_j4h_kz" format="dita"/></p>
</body>
<topic id="topic_tnd_j4h_kz">
<title>Second topic</title>
</topic>
</topic>
When publishing to Word the xref is not properly resolved, you end up in the Word document with "Error! Reference source not found." errors.
XHTML and PDF output formats properly work.
Ideally the integrator.xml would have an import to the actual plugin build.xml, in this way the contribution to the main DITA OT build.xml would be just an import.
While transforming a document using the plugin, the build fails with the error shown in the attached log file.
ant.log.txt
The document does build with a PDF-based transform and although the message appears to come from the simpletable template, there is no simpletable in the source.
I've attached a zip file of the temp directory created by the transform.
temp.zip
During the transformation an ambiguous rule match is reported:
[xslt] : Error! Ambiguous rule match for /map/topic[4]/glossentry[1]/glossterm[1]
[xslt] Matches both "*[contains(@class, ' glossentry/glossterm ')]" on line 654 of file:/D:/projects/eXml/frameworks/dita/DITA-OT2.x/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
[xslt] and "*[contains(@class, ' topic/topic ')]/ *[contains(@class, ' topic/title ')]" on line 114 of file:/D:/projects/eXml/frameworks/dita/DITA-OT2.x/plugins/com.elovirta.ooxml/docx/word/document.topic.xsl
The default DOCX template declares the names of the HTML related styles like e.g. HTML Typewriter
in /word/styles.xml
(within the DOCX/ZIP). However the document.pr-d.xsl
maps e.g. DITA's codeph
element to the style HTMLTypewriter
(note the missing space).
As the default template only declares (but not defines) e.g. HTML Typewriter
, the output will look the same also with the aligned style name. But when that style is explicitely added to the default template or a custom template with the style is used, this would enable the intended (monospace) formatting.
Therefore it would be great if such style names in the document.pr-d.xsl
could be fixed/aligned to the respective style names of the default template's styles.xml
.
(possibly related to #26)
I encountered build error in docx.image-metadata when I installed this plug-in into DITA-OT 2.4.6.
preprocess:
docx.image-metadata:
BUILD FAILED
D:\DITA-OT\dita-ot-2.4.6\build.xml:45: The following error occurred while executing this line:
D:\DITA-OT\dita-ot-2.4.6\plugins\com.elovirta.ooxml\build.xml:36: module doesn't support the nested "ditaFileset" element.
Total time: 13 seconds
The process finished with exit code: 1
I attached sample data and log file. Could you suggest me how to solve this issue?
20170419-sample-en-for-word.zip
Regards,
Toshihiko Makita
Hi Jarno
I want modify the Normal.dotx that comes with the plugin. When I open and save it under a new name and use this new template in built.xml, the output doesn't look like as with the original Normal.dotx. For instance, all heading styles are gone and so are the table borders. The size of the new template decreases from 78 KB to 28 KB. So, something is stripped out here.
Which Word version and language is recommended to make template modifications? I use Word 2010 (German). I tried Libre Writer too, but then the result can not be openend by Word.
Windows 7 x64
DITA-OT 2.5.4
Thanks
Chris.
Is there a way to capture information from the ditamap topicmeta and place it in Word's controlled fields? E.g.:
Thanks for the great work so far! 🥇
I need to do one last tweak to the plugin before I can put in into production:
I have a Word template with highly customized styles.
These aren't always the styles used by the DITA-Word converter.
Which angle is best/simplest to start from in order to align them?
Edit the Word template to match whatever heading/numberedlist/bulletlist/emphasis etc. are already mapped in the plugin? Is there a full listing of these I can see?
Or edit some config file in the plugin (if so, which one please?) to match my existing Word setup (it is not too complex, it has numbered headings 1-9, bullet list, normal, I use the default Word HTML Code a lot... and that's almost everything)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.