This is a domain service to generate Word .docx
from an initial .docx
template and input data; in other words, a mail merge.
Several data types are supported:
- plain text
- rich text
- date
- bulleted list
- tables
The implementation uses docx4j and jdom2. Databinding to custom XML parts (the .docx
file format's in-built support) is not used because this feature did not support repeating datasets prior to 2013.
The main API is:
public void merge(
String html,
InputStream docxTemplate,
OutputStream docxTarget,
MatchingPolicy matchingPolicy)
throws LoadInputException,
LoadTemplateException,
MergeException
The MatchingPolicy
specifies whether unmatched input values or unmatched placeholders in the template are allowed or should be considered as a failure.
Overloaded versions of the merge(...)
method exist:
- the
html
may instead be provided as aorg.w3c.dom.Document
- the
docxTemplate
may instead be provided as a doc4jWordprocessingMLPackage
(an in-memory object structure that could be considered as analogous to an w3cDocument
, but representing a.docx
).
The WordprocessingMLPackage
can be obtained from a supplementary API method:
public WordprocessingMLPackage loadPackage(
InputStream docxTemplate)
throws LoadTemplateException
This exists because the parsing of the input stream into a WordprocessingMLPackage
is not particularly quick. Therefore clients may wish to cache this in-memory object structure. If calling the overloaded version of merge(...)
that accepts the WordprocessingMLPackage
then the service performs a defensive copy of the template.
The template .docx is marked up using smart tags, as specified on the DEVELOPER tab.
A sample .docx can be found here.
The input data is provided as an XHTML form, and the service merges using the @id
attribute of the XHTML against the tag of the smart tag field in the .docx.
To specify a plain field, use:
<p id="CustomerId" class="plain">12345</p>
To specify a date field, use:
<p id="RenewalDate" class="date">20-Jan-2013</p>
To specify a rich field, use:
<p id="PromoText" class="rich">
Roll up, roll up, step right this way!
</p>
To specify a list field, use:
<ul id="Albums">
<li>
<p>Please Please Me</p>
<p>1963</p>
</li>
<li>
<p>Help</p>
</li>
<li>
<p>Sgt Peppers Lonely Hearts Club Band</p>
<p>1965</p>
<p>Better than Revolver, or not?</p>
</li>
</ul>
To specify a table field, use:
<table id="BandMembers">
<tr>
<td>John Lennon</td>
<td>Rhythm guitar</td>
</tr>
<tr>
<td>Paul McCartney</td>
<td>Bass guitar</td>
</tr>
<tr>
<td>George Harrison</td>
<td>Lead guitar</td>
</tr>
<tr>
<td>Ringo Starr</td>
<td>Drums</td>
</tr>
</table>
A sample HTML can be found here
For simple data types such as plain text, rich text and date, the service simply substitutes the input data into the placeholder fields in the .docx
.
For lists, the service expects the contents of the placeholder to be a bulleted list, with an optional second paragraph of a different style. The service clones the paragraphs for each item in the input list. If the input specifies more than one paragraph in the list item, then the second paragraph from the template is used for those additional paragraphs.
For tables, the service expects the placeholder to be a table, with a header and either one or two body rows. The header is left untouched, the body rows are used as the template for the input data. Any surplus cells in the input data are ignored.
A sample generated document can be found here