pdf2html is a tool to extract text from a pdf and output to an HTML.
Is based on the ExtractText util from pdfbox library from Apache foundation.
To build the project just run ant
on the root directory.
Run jarall
to have a merged jar with dependencies included.
To run the project execute java -jar pdf2html-all.jar filename.pdf
a filename.html will be outputed.
For more information run java -jar pdf2html-all.jar filename.pdf