stadelmanma / sablon Goto Github PK
View Code? Open in Web Editor NEWThis project forked from senny/sablon
Ruby Document Template Processor based on docx templates and Mail Merge fields.
License: MIT License
This project forked from senny/sablon
Ruby Document Template Processor based on docx templates and Mail Merge fields.
License: MIT License
This can be done simply by adding footnotes.xml and endnote.xml to the files sablon parses. The content inside each <w:footnote> or <w:endnote> tag is the same as the document.xml.
I also want to add footnotes programmically using HTML. I'll probably make a pseduo-element <footnote id="#">
and <footnoteref id="#">
assuming nokogiri doesn't blow up on a fake element. If that is the case then I'll use a <div>
with a class of "footnote"
This would work like the content module where document processors can be registered and given a pattern and precedence. Using my current template.rb setup it would look something like:
register_processor(%r{word/document.xml}, Processor::Document, 0)
register_processor(%r{word/(?:header|footer)\d*\.xml}, Processor::Document, 100)
register_processor(%r{word/footnotes\.xml}, Processor::Document, 200)
register_processor(%r{word/endnotes\.xml}, Processor::Document, 300)
register_processor(%r{word/numbering.xml}, Processor::Numbering, 400)
register_processor(/\[Content_Types\].xml/, Processor::ContentType, 500)
I don't know if this is the best approach for files like Numbering and ContentType because those might be better served in a more dynamic fashion. For example, numbering is only messed with when lists are added and never formally processed. The same logic could be applied to *.rels files and content types with some refactoring.
The main reasoning behind this is it will allow the end user to extend sablon with new processors much easier than they are able to now. Instead of monkey patching we can have a more formal API that while isn't super easy to access is still workable.
Add support for the <span>
tag as a method to format individual runs with inline style. Allow existing tags to support things specified by a style=""
attribute.
<w:r>
tags and it's content will go in a <w:t>
tag<w:jc>
element<w:b />
for bold, <w:i />
for italics<w:color w:val="FFF200"/>
is the proper tag<w:shd w:val="clear" w:color="auto" w:fill="FFFF00"/>
is the proper tag<w:u w:val="single"/>
tagNeed to add support for the <sup>
and <sub>
tags.
It would be great to add support for the <table>
tags
Paragraph styles go in <w:pPr>
elements, content alignment can only be applied at a paragraph level.
Run styles go in <w:rPr>
elements
Paragraph Information:
http://officeopenxml.com/WPparagraph.php
Text Information:
http://officeopenxml.com/WPtextFormatting.php
Table information:
http://officeopenxml.com/WPtable.php
Task List:
<span>
tagstyle=
attribute
<sup>
and <sub>
tags
<span>
tag<table>
tagsCurrently the environment instance serves as a poor man's document model by collecting various helper classes like numbering, footnotes and bookmarks. This system is inherently hard to scale. Some work on my live branch has laid a foundation for this such as storing all of the XML files in a template in memory instead of simple sequential processing.
The general implementation would be as followd:
lib/sablon/document_object_model
directoryadd_image
, add_relationship
, add_bookmark
, add_list_definition
, etc.
Examples of files that might be repurposed into "dom classes" are lib/sablon/numbering.rb and lib/sablon/relationships.rb
This could potentially be a bear to implement depending on how easily I can locate the image within word's directory structure. Checking out some of the sablon forks that implemented image substitution within a merge field would be a good place to start.
Requires #2
This is a checklist of issues that are encountered when importing content from a partial they will be dealt with in individual issues and PRs.
This is the most basic level of functionality but lays the foundation for higher level functionality.
I want partials to be named with a leading underscore to match rails ERB convention. They will be called in the document using the name but omitting the leading underscore in a special merge field. «partial:filename»
. Partials will be assumed local to the file they are within, however down the road I will add the ability to check a 'templates/shared' folder if the local lookup fails.
I still need to do the merge field substitution on this partial so it may be easier to recursively call the merge parser and then inject the final product into the document. In theory that would allow for nested partials.
In the current implementation of Sablon when using a WordML injection the entire paragraph is replaced by the content, this will also occur when using a partial.
Reference issue on original project: senny#40
Currently the way I handle footnotes is half baked. I allow them inside the regular HTML insertion content because I couldn't think of a better way at the time, I have now thought of a better way.
The new method will use a content wrapper Sablon::Content::Footnote
. Keys in the context hash will be of the form footnote:name
, or already wrapped in that content type. The value of the key can be of three types, any other ones will raise an error, example below.
# New footnote insertion format examples
context = {
# this first "plain text" format is inserted directly as the footnote text with no changes (i.e. String insertion).
'footnote:address' => '123 Example Dr. Orlando, FL',
#
# Insertion of HTML content (WordML would work exactly the same)
# If content type is missing the logic falls back to Sablon::Content::Content#wrap
'footnote:reference' => {
content_type: :html,
content: '<em>Title</em> - author name'
},
'footnote:reference2' => {
content: Sablon.content(:html, '<em>Title</em> - author name')
},
'footnote:reference3' => Sablon.content(:footnote, Sablon.content(:html, '<em>Title</em> - author name'))
}
As seen in the example above footnote content gets wrapped twice, first so Sablon knows what to do with it and secondly to define how the actual content going inside the w:footnote
tags is structured. The final result regardless of the starting point is a WordML insertion into the footnotes.xml
file. When starting with plain text the full XML structure is generated, subbing in the content. When starting from HTML or WordML the footnote reference run is added in automatically if it is missing. If needed a wrapping paragraph tag will be added and an attribute pStyle with a value of "FootnoteText" if pStyle is missing. This means if the user supplies a <p>
or <div>
tag when defining the footnote they will need to set the pStyle attribute appropriately. as it it only checked after being converted to WordML.
# sample XML for a footnote
<w:footnotes>
...
<w:footnote w:id="3">
<w:p>
<w:pPr>
<w:pStyle w:val="FootnoteText"/>
</w:pPr>
<w:r>
<w:rPr>
<w:rStyle w:val="FootnoteReference"/>
</w:rPr>
<w:footnoteRef/>
</w:r>
<w:r>
<w:t xml:space="preserve"> Footnote text content </w:t>
</w:r>
</w:p>
</w:footnote>
...
</w:footnotes>
Footnote insertion into a document will take two forms, first if the name is used directly in a merge field of the document itself a footnote reference tag will be inserted, this will be the "standard" way supported in the upstream package. In my fork I'll reimplement the tag. The main benefit of this route is that I'll have all of my footnotes defined prior to processing the document.xml file. This means I can use the context to resolve footnote ids at parse time instead of afterwards, allowing me to remove the @env.footnotes.update_refereces
call in converter.rb.
#Example of reference tag XML inserted into document.xml
<w:r>
<w:rPr>
<w:rStyle w:val="FootnoteReference"/>
</w:rPr>
<w:footnoteReference w:id="4"/>
</w:r>
I'm not sure how best to handle using the same footnote more than once. Currently I duplicate the footnote and allow the insertion. A better option might be to throw an error since it is technically disallowed by MS Word. Reuse of footnotes would require deliberate duplication in the context.
Closing note: This should also be applicable to endnotes with a few tweaks to element names, styles, etc. I also might be able to implement this as a "configuration only" option with the right changes to implement a basic DOM.
This would just be a nice feature not really required.
This will be highly dependent on how complex the math markup is. I'll probably create a pseduo tag <math>
or <equation>
to maintain semantic elements.
Using WordML markup is simple but generating and maintaining the WordML xml snippets will be a major hassle. Instead we should be able to pull content from an existing docx file to greatly simplify the maintenance burden.
Word documents are very complex and the entire document's xml code should not be injected into the document calling the partial. The desired content will be indicated by two special merge fields.
«beginPartial»
and «endPartial»
.
Only content in between those nodes will be kept, not including the nodes themselves. This should be essentially an equivalent XML stream to that handled by direct use of WordML. During initial implementation it may be beneficial to have docx and xml inject side-by-side to test for consistency.
Requires #1
Using live XML instead of strings when working with sablon is the first step towards a formal document object model and will allow a hige degree of flexibility because references to give portions of the document (i.e. bookmarks or footnote refs) can be maintained. This prevents the need for an "ast_to_docx" hook to do things like update my footnote references. This conversion will take some effort but I think it is worthwhile especially when we want to venture into other territories like adding media, docs partials, etc.
Basically instead of having the to_docx
method return a string it would return an XML node or NodeSet that would be injected into the document at the proper location.
Maybe done using bookmarks? Probably needs stuff in a rels file as well
Simple text injection into headers and footers would be very handy. Additional features such as setting up the page numbering and such would be an additional win but may be more complicated than it's worth.
This will take some experimentation because I am not sure exactly how content in headers and footers gets broken down.
Requires #2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.