Open HTML to PDF is a pure-Java library for rendering arbitrary well-formed XML (or XHTML) using CSS 2.1 for layout and formatting, output to PDF and images.
Open HTML to PDF is distributed under the LGPL. Open HTML to PDF itself is licensed under the GNU Lesser General Public License, version 2.1 or later, available at http://www.gnu.org/copyleft/lesser.html. You can use Open HTML to PDF in any way and for any purpose you want as long as you respect the terms of the license. A copy of the LGPL license is included as license-lgpl-2.1.txt or license-lgpl-3.txt in our distributions and in our source tree.
Open HTML to PDF uses a couple of FOSS packages to get the job done. A list of these, along with the license they each have, is listed in the LICENSE file in our distribution.
New releases of Open HTML to PDF will be distributed through Maven. Search maven for com.openhtmltopdf. Coming soon!
There is a large amount of sample code under the openhtmltopdf-examples directory (integration guide and template guide to come).
You could try the browser example at /openhtmltopdf-examples/src/main/java/com/openhtmltopdf/demo/browser/BrowserStartup.java
Add these to your maven dependencies section:
<properties>
<!-- Define the version of OPEN HTML TO PDF in the properties section of your POM. -->
<openhtml.version>0.0.1-SNAPSHOT</openhtml.version>
</properties>
<dependency>
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-core</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-pdfbox</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<!-- Optional, leave out if you do not need right-to-left or bi-directional text support. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-rtl-support</artifactId>
<version>${openhtml.version}</version>
</dependency>
Then you can use this code:
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import com.openhtmltopdf.bidi.support.ICUBidiReorderer;
import com.openhtmltopdf.bidi.support.ICUBidiSplitter;
import com.openhtmltopdf.pdfboxout.PdfBoxRenderer;
public class SimpleUsage
{
public static void main(String[] args)
{
new SimpleUsage().exportToPdfBox("file:///Users/user/path-to/document.xhtml", "/Users/user/path-to/output.pdf");
}
public void exportToPdfBox(String url, String out)
{
OutputStream os = null;
try {
os = new FileOutputStream(out);
try {
PdfBoxRenderer renderer = new PdfBoxRenderer(/* testMode = */ false);
// The following three lines are optional. Leave them out if you do not need
// RTL or bi-directional text layout.
renderer.setBidiSplitter(new ICUBidiSplitter.ICUBidiSplitterFactory());
renderer.setDefaultTextDirection(false);
renderer.setBidiReorderer(new ICUBidiReorderer());
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
} catch (Exception e) {
e.printStackTrace();
// LOG exception
} finally {
try {
os.close();
} catch (IOException e) {
// swallow
}
}
}
catch (IOException e1) {
e.printStackTrace();
// LOG exception.
}
}
}
While Open HTML to PDF works with a standard w3c DOM, the project provides a converter from the Jsoup HTML5 parser provided Document to a w3c DOM Document. This allows you to parse and use HTML5, rather than the default strict XML required by the project. To use the converter, add this dependency:
<dependency>
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-jsoup-dom-converter</artifactId>
<version>${openhtml.version}</version>
</dependency>
Then you can use one of the Jsoup.parse
methods to parse HTML5 and DOMBuilder.jsoup2DOM
to convert the Jsoup document to a w3c DOM one.
public org.w3c.dom.Document html5ParseDocument(String urlStr, int timeoutMs) throws IOException
{
URL url = new URL(urlStr);
org.jsoup.nodes.Document doc;
if (url.getProtocol().equalsIgnoreCase("file")) {
doc = Jsoup.parse(new File(url.getPath()), "UTF-8");
}
else {
doc = Jsoup.parse(url, timeoutMs);
}
return DOMBuilder.jsoup2DOM(doc);
}
Then you can set the renderer document with renderer.setDocument(doc, url)
in place of renderer.setDocument(url)
.
Three options are provided by Open HTML to PDF. The default is to use java.util.logging. If you prefer to output using log4j or slf4j, adapters are provided:
<!-- Use one of these, not both. -->
<dependency>
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-slf4j</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-log4j</artifactId>
<version>${openhtml.version}</version>
</dependency>
Then at the start of your code, before calling any Open HTML to PDF methods, use this code:
XRLog.setLoggingEnabled(true);
// For slf4j:
XRLog.setLoggerImpl(new Slf4jLogger());
// or for log4j 1.2.17:
XRLog.setLoggerImpl(new Log4JXRLogger());
Open HTML to PDF is based on Flying-saucer. Credit goes to the contributors of that project. Code will also be used from neoFlyingSaucer
- No, you can not use it on Android or Google App Engine.
- Flowing columns are not implemented.
- No, it's not a web browser, although the 'browser' example is pretty impressive.
- Added Jsoup HTML5 to DOM converter module
- Fixed divide-by-zero error in BorderPainter class. Thanks @fenrhil
- Added slf4j logging facade adapter
- Added right-to-left(RTL) and bi-directional text support
- Added output device using PDF-BOX 2.0.0
- Make sure XML Document Builder doesn't resolve external DTDs
- Removed obsolete ITEXT based output devices
- Removed SWT support
- Regressions (please open issue if required):
- PDF form controls.
- PDF font types other than built-in and truetype.
- XMP PDF metadata in PDFs.
- PDF encryption.
- PDF text justification