kannan-ar / marigold.openxhtml Goto Github PK
View Code? Open in Web Editor NEWMariGold.OpenXHTML is a wrapper library for Open XML SDK to convert HTML documents into Open XML word documents.
License: MIT License
MariGold.OpenXHTML is a wrapper library for Open XML SDK to convert HTML documents into Open XML word documents.
License: MIT License
I use this code to convert the html into openXML:
var html is the table as string
var filename is the filename as string
WordDocument doc = new WordDocument(filename);
doc.Process(new HtmlParser(html));
doc.Save();
And following structure gives me problems in the convertion. The html is converted into a .docx file but when opening an "unknown error" appears. This makes it hard for me to debug.
I read the following table from file.
<table>
<tbody>
<tr>
<th colspan="1" rowspan="1" colwidth="260">asdasd</th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1"></th>
<th colspan="1" rowspan="1" colwidth="110"></th>
</tr>
<tr>
<td colspan="1" rowspan="1" colwidth="260"></td>
<td colspan="1" rowspan="1"> </td>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1" colwidth="110">{{Arbeitgeber.Hey}}</td>
</tr>
<tr>
<td colspan="1" rowspan="1" colwidth="260">asxdasd </td>
<td colspan="1" rowspan="1">asdasd</td>
<td colspan="1" rowspan="1">asd</td>
<td colspan="1" rowspan="1" colwidth="110">asd</td>
</tr>
</tbody>
</table>
I also tried removing the tag but it doesn't change anything.
@kannan-ar Could you point me to my error?
I'm outputting a word document and applying styles to the h1, h2, h3, p and table elements. I'm also trying to set the width using css and it doesnt seem to translate into the word document. I always end up with the image displayed full size. I basically want to constrain all images to a set width so that they flow inline with the paragraphs. Any help with overriding the image sizing would be really appreciated.
Hello.
I'm trying to validate the following HTML
<style type="text/css">
table.sample {
border:1px;
/*border-collapse:collapse;*/
}
table.sample th {
border-top:1px solid #000000;
border-left:1px solid #000000;
border-right:1px solid #000000;
border-bottom:1px solid #000000;
}
table.sample td.top {
border-top:3px double #000000;
border-left:1px solid #000000;
border-right:1px solid #000000;
border-bottom:1px solid #000000;
}
table.sample td {
border-top:1px solid #000000;
border-left:1px solid #000000;
border-right:1px solid #000000;
border-bottom:1px solid #000000;
}
</style>
<div>
<table class="sample">
<thead>
<tr>
<th rowspan="3">Header1</th>
<th colspan="4">Header2<br>two rows header</th>
</tr>
<tr>
<th rowspan="2">Header2-1</th>
<th rowspan="2">Header2-2</th>
<th colspan="2">Header2-3</th>
</tr>
<tr>
<th>Header2-3-1</th>
<th>Header2-3-2<br>two rows header</th>
</tr>
</thead>
<tbody>
<tr>
<td class="top">val1</td>
<td class="top">val2-1</td>
<td class="top">val2-2</td>
<td class="top">val2-3-1</td>
<td class="top">val2-3-2</td>
</tr>
</tbody>
</table>
</div>
If I convert this HTML to docx, we get the following result.
When "rowspan" is specified in HTML "table", the marged cell left and right border is null.
What I expect is that the merged cell will also have a border, as shown below.
Is there a way to get the expected results?
Can I convert html document to .docx file in 'Track Changes' mode. That is, when I hover over the inserted/deleted text in the document, it should show which user inserted/deleted the text, at what datetime alongwith actual text?
And the actual inserted/deleted text would be identified with the help of custom tags in the html file.
Please confirm if this can be done using the current code.
Regards,
Hello,
when using a blank in the file path it doesn't work. After removing blanks everything runs smooth.
Example:
<img type="image/png" src="Feature Realisation - template - ReqIF V_000968c1/x04000000011CA34C.png" />
Thanks,
Helmut
hello
how I can setting the LeftMargin and RightMargin?
Steps to reproduce:
Would it be possible to license this project (and MariGold.HtmlParser) with an open source license? I would like to use this library for a server side application and wanted to make sure that I was legally in the clear.
Code hosted on Github without a license excludes both commercial and personal use for anyone except the author.
Alternatively, could I obtain explicit permission to use this library in my project?
Thanks
Hello,
great converter!
Is it possible to also support the '<object' tag like:
Background: I want to convert ReqIF XHTML tags, and they often use <object tags. They usually do it for images but also for other kinds of binary objects.
Thanks a lot,
Helmut
HTML to word conversion
page-break-before: always;
but it is not reflecting in output word document. and there is not much documentation available for this kind of issue.How page breaks works? Is there need any way other than CSS rules in html to breaks the page in out put word file.
Thank you for your works and efforts to make available this tool public.
When convert from HTML to DOCX, if there is any
Hello,
I want to include an OLE object (excel, pdf, etc.) for which I have an OLE_Compound_File (*.ole). The OLE cmpound object I want to open from the resulting *.doc document. This OLE_Compound file comes from the requirement tool DOORS by exporting to *.reqif (in essence xhtml to show values).
The xhtml code snippet I have:
<object data="OLE_AB_4e7c971411315592_23_2100089280_2800000003__24775003-8f9c-4f50-8af2-33b21e0265ec_OBJECTTEXT_0.ole"" type="text/rtf"">This is ssss
There are also images which are working by exchanging '<object' by '<img'.
When I try to open the xhtml file with an browser I get the error 'Plugin not supported.
Any idea?
Thanks for your help,
Helmut
How to reproduce:
Grab the html using TinyMCE
https://www.tiny.cloud/docs/demo/full-featured/
Generate docx
var doc = new WordDocument(filename);
doc.Process(new HtmlParser(html));
Conversion works amazing, except for letters like "Č" or "Ć" which are assigned Calibri font, while the rest of the content is in Times New Roman.
<div>111111</div>
<div>222222</div>
== additional blank paragraph
<div>111111</div><div>222222</div>
== good
When I have a html file with nested ul/li tag and I convert them to DocX all the bullets at the same level. They are not nested anymore.
How to fix that ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.