gotenberg / gotenberg Goto Github PK
View Code? Open in Web Editor NEWA developer-friendly API for converting numerous document formats into PDF files, and more!
Home Page: https://gotenberg.dev
License: MIT License
A developer-friendly API for converting numerous document formats into PDF files, and more!
Home Page: https://gotenberg.dev
License: MIT License
Hi! Are you planning to support TLS?
Ability to overlay the .pdf with a watermark per page on the same position.
I don't know this is something that should be part of the project as something like that could be done with other tools, but I'd like to suggest this and have an opinion from the maintainers.
Thanks for the nice tool!
Hey guys,
you really did a great job! Love the webhook feature!
Now that you already spawned a headless instance of Chrome, how about taking an URL to take a screenshot of a web page (with configurable view port size) or printing to PDF? It would make my live easier!
Thanks and regards,
hjjg
Your issue may already be reported!
Please search on the issue tracker before creating one.
when i merge three or more pdf files , the result pdf should be ordered . for example : i merge 1.pdf , 2.pdf ,3.pdf , 4.pdf , the sum pdf file should be 1,2,3,4 turn ,the asc order.
i got the result pdf ,which have the 2,4,1,3 turn , it's out-of-order
1.prepare three or more pdf files , 1.pdf , 2.pdf ,3.pdf,4.pdf
2.request the merge feature
3.check the order of the result pdf
4.
Expected Behavior
Convert Xlsx file to a PDF file
Current Behavior
If the table length is too long,PDF file Documents will change lines
Possible Solution
Steps to Reproduce (for bugs)
1.Convert Xlsx file to a PDF file
Context
Your Environment
Version used:3
Operating System and version:ubuntu 18.10
Link to your project:
Gotenberg should support the Flat OpenDocument format with extension fodt.
Extension fodt isn't supported.
Since LibreOffice supports it natively, maybe add it to https://github.com/thecodingmachine/gotenberg/blob/master/internal/app/api/office.go ?
The Flat OpenDocument format is nice alternative to the standard zipped OpenDocument format. You can edit it with any editor or edit it with sed and other tools. It's nothing more than a normal OpenDocument file in one flat XML file.
Your issue may already be reported!
Please search on the issue tracker before creating one.
Japanese characters are shown
Japanese characters appear as squares
previous solution pointed out on "updating dockerfile" any specific hints? .
I tried setting:
ENV LC_ALL=ja_JP.UTF-8
ENV LANG=ja_JP.UTF-8
ENV LANGUAGE=ja_JP.UTF-8
in the container but it did not work 🤔
make image
should build the Docker image
W: The repository 'https://packagecloud.io/Keymetrics/pm2/debian stretch Release' does not have a Release file.
E: Failed to fetch https://packagecloud.io/Keymetrics/pm2/debian/dists/stretch/main/source/Sources 404 Not Found
E: Some index files failed to download. They have been ignored, or old ones used instead.
The command '/bin/sh -c wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" | tee /etc/apt/sources.list.d/google-chrome.list && apt-get update && apt-get -y install google-chrome-stable' returned a non-zero code: 100
make: *** [image] Error 100
make image
Your issue may already be reported!
Please search on the issue tracker before creating one.
Finishing all steps for building the image from the Dockerfile
Step 6/11 : COPY .ci/gotenberg /usr/bin/gotenberg
COPY failed: stat /var/lib/docker/tmp/docker-builder832472548/.ci/gotenberg: no such file or directory
I tried to run the following command in terminal:
docker build -t gotenberg .
Mac OS X Mojave 10.14.1
How can i use for go build,don't use docker image?
Calling the API from a webpage is currently not possible, if the webpage is hosted on a different domain than gotenberg. The "Access-Control-Allow-Origin" is missing, thus a browser won't let you make the request.
Please consider adding an option to whitelist certain user defined origins.
For example (Docker):
docker run -e ALLOW_ORIGINS='http://some-domain.com,https://another.domain.com:8080'
Current library version logs directly to stdout.
However, latest version seems to offer an option to disable those logs.
The API should handle text file (.txt
and .rtf
) conversions.
source file
temp.txt
docker container ping image host is success
It seems that files (images, stylesheet), passed in via the files
argument, are correctly accessible in index.html
but not in header.html
or footer.html
.
How can I embed files in the header or footer?
File accessible in index.html
are accessible to header and footer as well.
It currently doesn't seem to be able to read those files.
Context: we run gotenburg in a kubernetes cluster as part of an overall deployment which includes other microservices that sit "in front" of gotenburg. Currently gotenburg is running fine, but when we go to pull logs they are almost completely unusable because every time we've healthcheck-ed it via a /ping
that adds a log line to the output.
Obviously we can pull the logs and grep, but it is a bit annoying given that we have no need to understand that a ping happened cause we can pull that information from kubernetes events.
Hello.
We experiencing some problems with file merging using /merge.
Result errors:
{"message":"dict=markupAnnot entry=IT: unsupported in version 1.5\nThis file could be PDF/A compliant but pdfcpu only supports versions \u003c= PDF V1.7\n"}
{"message":"Read: xRefTable failed: parse: duplicate key"}
{"message":"Read: xRefTable failed: Free: object #0 not found."}
Any ideas?
Thanks.
stupid... please remove me.
Hello,
First of all, let me thank you for your very nice project, it bundles everything needed to handle conversions as smoothly as possible without the hassle of doing it ourselves !
I started doing a PoC with gotenberg in my company to handle file conversions inside another app, but one cool feature IMHO would be to search uploaded files for viruses before conversion.
As of today, our app uses ClamAV to search for viruses before handing the file over to gotenberg for the next steps, but I thought it'd be nice to integrate it directly inside gotenberg, so it'd become a one stop shop for all the file conversion operations.
I'd be okay to try and do a PR with such a feature if you like the idea, what do you think ? :)
Would it be possible to insert an environment variable to turn on/off LibreOffice or Chrome support? Sometimes, I don't need Chrome, so I want to save resources.
Hi!
This looks amazing, we are looking forward to using this within our k8s cluster, however we're struggling to get the image up and running.
Our k8s file looks like
apiVersion: apps/v1
kind: Deployment
metadata:
name: gotenberg
labels:
app: gotenberg
spec:
selector:
matchLabels:
app: gotenberg
template:
metadata:
labels:
app: gotenberg
spec:
containers:
- name: gotenberg
image: thecodingmachine/gotenberg:4
ports:
- name: gotenberg
containerPort: 3000
---
kind: Service
apiVersion: v1
metadata:
name: gotenberg
spec:
type: NodePort
ports:
- protocol: TCP
name: gotenberg
port: 3000
targetPort: 3000
The deployment has the following logs and fails to start
➜ kubectl logs gotenberg-74d485d685-m6frr
⇨ Gotenberg 4.2.0
⇨ Chrome headless started with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ error: failed to launch Chrome headless
If we're doing something wrong it'd be great to see a larger demo/example in the docs for k8s.
➜ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.7", GitCommit:"65ecaf0671341311ce6aea0edab46ee69f65d59e", GitTreeState:"clean", BuildDate:"2019-01-24T19:32:00Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.7-gke.4", GitCommit:"618716cbb236fb7ca9cabd822b5947e298ad09f7", GitTreeState:"clean", BuildDate:"2019-02-05T19:22:29Z", GoVersion:"go1.10.7b4", Compiler:"gc", Platform:"linux/amd64"}
Document should be generated as closely as what's displayed in browser as possible
Generated document is completely blank (empty)
Adding a possibility to specify a delay to let the page load completely before performing the actual conversion.
What could be even better would be to add a mechanism similar to what Puppeteer uses with the waitUntil option (see their documentation) in the page.goto section here
convert/url
endpointTrying to generate a pdf from an html page that would essentially span close to 100 pages in PDF format.
First I want to thank you for this, it just work out of the box and it's well documented.
But I wanted to know if it was plan to create an authentication system for gotenberg so it could be exposed as a service directly.
docker run --rm -p 3000:3000 thecodingmachine/gotenberg:4 gotemberg
JWT_KEY='myJWTKey'
Feel free to close this issue if you think it's out of the scope of gotenberg and should be handled by a proxy for instance.
Hello,
So I'm not sure this is actually a bug, but I can't get the API working using java (groovy/grails to be precise). I use the java HttpClient and MultipartEntity to create the http request, containing the remoteURL of the page I want to make into a PDF. The API do gives me what seems to be a PDF file, but it cannot be opened, neither by a local pdf viewer or the browser, both telling me that the file seems to be damaged or in an unsupported format.
Here is the code I use :
static def execute(def apiURL){
def httpClient = HttpClients.createDefault()
def request = new HttpPost(apiURL)
MultipartEntityBuilder builder = MultipartEntityBuilder.create()
builder.addTextBody("remoteURL", 'https://google.com')
builder.addTextBody("marginTop", '0')
builder.addTextBody("marginBottom", '0')
builder.addTextBody("marginLeft", '0')
builder.addTextBody("marginRight", '0')
HttpEntity multipart = builder.build()
request.setEntity(multipart)
def response = httpClient.execute(request)
BufferedReader rd = new BufferedReader(
new InputStreamReader(response.getEntity().getContent()))
StringBuffer result = new StringBuffer()
String line = ""
while ((line = rd.readLine()) != null) {
result.append(line)
}
return result
}
I do get a StringBuffer starting with
"%PDF-1.4%����1 0 obj<</Creator (Chromium)/Producer (Skia/PDF m74)/CreationDate (D:20190510164315+00'00')..."
that's why I say it looks like a PDF file.
I am sure the API works fine, since when I do a curl request, from a terminal or from my java project, I do have a proper PDF file. I can't use the curl request in my code, though, because it download the file into my project, instead of giving it as a variable I can easily use and convert as base64.
I am coming here for help since I can't figure out if the problem is from my request or the API, so if anyone has an idea, that would be great
Or should we just look to a project like this: https://github.com/zrrrzzt/docker-unoconv-webservice
The unoconv version that installs from apt-get
is 0.7
and the latest version is 0.8.2
. The output from apt-cache search unoconv
is :
unoconv:
Installed: 0.7-1.1
Candidate: 0.7-1.1
Version table:
*** 0.7-1.1 500
500 http://httpredir.debian.org/debian stretch/main amd64 Packages
100 /var/lib/dpkg/status
So I guess I would need to install manually in order to get the latest version. I've already reached out to see if the latest version can be published unoconv/unoconv#482. If that is not possible, I was wondering if there is any way that this repo can support a Dockerfile that builds latest unoconv manually with all its dependencies.
Hi
is it possible to convert/wrap single/multiple pictures (png|jpg|etc) too?
cheers
manu
When I query /
, gotenberg prints
{"message":"code=404, message=Not Found"}
With an HTTP status code of 404.
When I query /
, gotenberg prints
{"message":"code=404, message=Not Found"}
With an HTTP status code of 500.
I imagine this being a one-liner change. I'd be happy to open a PR.
/
.This pollutes logs with 5xx
errors when there's actually nothing wrong.
With an header defined as:
<html>
<head>
</head>
<body>
<span>Ok</span>
</body>
</html>
Header and footer should display text with the same size as the body.
The size of the text is much smaller:
I think I remember this was a common problem with wkhtmltopdf or some other engine, but I can't seem to find the reference.
Hi guys! Would it be possible to add a parameter specifying the resulting pdf filename after converting from html/url?
Response Headers always include a random filename, e.g content-disposition: attachment; filename="73952a7767eb9af0c94c05e525c7f327.pdf"
I'd suggest adding a new parameter in addition to the existing ones (remoteURL, webhookURL, paperWidth, paperHeight, etc), e.g. "fileName", so that users can define custom value.
In real world it would allow consumer of the gotenberg api, e.g. html page to perform a POST request using an html form and expect Save File dialog to appear with correct file name populated, instead of seeing some random name.
Hi there,
Is there any change of getting a health check endpoint added to gotenberg, so that is usable behind an ALB?
Many thanks!
Convert Xlsx file to a PDF file
I want to control the margins of the PDF output
Hello,
Nice work and documentation.
Checking the Dockerfile
and this repository code should be enough to have a pretty good idea whether this is secured or infected. (e.g. data leakage
)
In your Dockerfile
:
FROM thecodingmachine/gotenberg:3.2.0 AS hack
I am guessing that to reduce your Dockerfile
you reuse a previous build but... this also hides what your previous build is based on or what is inside it, does not it?
Unless one investigate this particular container and look for issues, which is obviously harder that checking if a Dockerfile
contains (more obvious) security issues.
Would it be possible to have a longer Dockerfile
without any hack
, some says explicit is better than implicit
😉
Otherwise I guess I could try to reuse your Dockerfile
from tag
3.2.0 and add it at the top of your latest one.
I am open to hear your point of view, I may be missing something.
See #66
Your issue may already be reported!
Please search on the issue tracker before creating one.
Convert HTML to a PDF file, and page after page is displayed
There is a probability that the data between the two pages is incomplete and half of it is erased
ill pdf file
1.create a table HTML, more than one page
2.convert HTML to pdf
Hey guys, thanks a lot for this amazing project, you are doing an amazing job with it.
So far I am using it to convert PPTX files to PDF, but right now my deployed version seems to be much slower than local version. I am running Gotenberg with k8s, and my problem is, one sample PPTX file is converted to PDF in 25s on my deployed version, and 4s on my local machine. I have tried doubling the resources on k8s and the PDF creation came to 13s, which is half of the previous version, but this way I am giving it more and more resources, which is still not as good as my local + horizontal scaling becomes more expensive.
Since you guys seem to be running this for a while, you probably have some best-practices around how to deploy this and what are the optimum resource requirements, etc.. It would be great if you can include your tips in the documentation, whether making the service run faster by disabling something or excluding some binary for some use cases for example, or the resource requirements for the sweet spot between performance and scalability. If this is out of scope for you I can try to write some docs in a PR, but it would be much easier using your experience with it rather than me inspecting every moving piece.
PDF generated from HTML contain all elements of HTML.
Sometimes (1 of 10 in average) parts of HTML elements lost in resulted PDF. Biggest problem is that response status code is 200.
index.html
with content https://gist.github.com/server-may-cry/57b062186f648a16f6e4c5c7ac354e38curl --request POST --url http://gotenberg:3000/convert/html --header 'Content-Type: multipart/form-data' --form [email protected] -o result.pdf
No difference in logs from container for failed and ok requests
ok
gotenberg_1 | {"time":"2019-01-28T17:56:49.563200149Z","id":"","remote_ip":"172.18.0.16","host":"gotenberg:3000","method":"POST","uri":"/convert/html","user_agent":"curl/7.52.1","status":200,"error":"","latency":423976853,"latency_human":"423.976853ms","bytes_in":8501,"bytes_out":28024}
failed
gotenberg_1 | {"time":"2019-01-28T18:02:20.799825196Z","id":"","remote_ip":"172.18.0.16","host":"gotenberg:3000","method":"POST","uri":"/convert/html","user_agent":"curl/7.52.1","status":200,"error":"","latency":566488490,"latency_human":"566.48849ms","bytes_in":8501,"bytes_out":28024}
We are using gotenberg internally against https servers with certificates generated with an internal PKI.
As I usually do for other images, I made a new dockerimage based on thecodingmachine/gotenberg:5
FROM thecodingmachine/gotenberg:5
ADD http://blahblahblah/internal-pki.crt /usr/local/share/ca-certificates/internal-pki.crt
RUN set -xeu \
&& update-ca-certificates
The certificate is properly deployed, rehashed, symlinked and added to ca-certificates.crt and it works with curl, wget and so on.
This is usually enough for being taken in consideration by all our ecosystem including servers and chrome etc on workstations.
Here, it doesn't work. Any clues ?
More generally, it seems that when there is a certificate issue (self signed, unrecognized authority, name issue etc) we only got blank page, without anymore info in the logs :/
Please note that it works like a charm using a commonly accepted Root CA.
It would be interesting to highlight the differences between gotenberg and wkhtml2pdf for converting PDF (aside from the obvious things like "it's a server whereas wkhtml2pdf is a binary) in the README.md.
I am curious to know if you encountered some limitation with wkhtml2pdf that started the creation of gotenberg.
Trying to use this module in other a go mod enabled repo of us is causing issues with some of the unused dependencies (particularly delve). Running go mod tidy
fixes these issues.
GO111MODULE=on go get -u github.com/thecodingmachine/[email protected]
Should allow us to use the go module.
The gotenberg module is marked as incompatible.
run go mod tidy
We are unable to use the go module for talking to the gotenberg service in our k8s cluster.
I have a commit ready on a fork. Wanted to raise an issue first, happy to raise PR if accepted.
To get a better impression of the final images, could you please add the generated documents in the tests to the repo?
I have read the docs but seeing final results could give a better first impression.
For example I'd like to see:
What are the limits to the conversion process, e.g. when rendering footnotes, sections, ...?
Great repo!
Is there a mailmerge feature on the roadmap?
I think about having a csv file (adresses) and a .docx file (letter template) going to be merged into a single pdf, e.g. for generating multiple letters from a template in one step.
Maybe this is little step for gotenberg (libreoffice has this feature builtin) but a big step for generating high quality letters, catalogs, ... in a batch process.
In the Security Section of your Readme there is a Typo:
The API does not provide any authentication mechanisms. Make sure to not put it on a public facing port and your client(s) should always controls what's is sent to the API.
Attaching small size images should include them in the PDF.
Attaching a 32KB image yields:
{"time":"2019-05-03T09:15:36.0025674Z","id":"","remote_ip":"192.168.240.7","host":"documents:3000","method":"POST","uri":"/convert/html","user_agent":"","status":500,"error":"code=500, message=printing page to PDF: cdp.Page: PrintToPDF: rpcc: message too large (increase write buffer size or enable compression)","latency":93544042,"latency_human":"93.544042ms","bytes_in":39667,"bytes_out":133}
I'm trying to add an image to the PDF.
I ran docker in CentOs system. Unfortunately I got "{"message":"code=404, message=Not Found"}" in the browser. And the console logged "{"time":"2019-01-06T10:04:36.297609642Z","id":"","remote_ip":"61.151.178.217","host":"111.231.196.219:3000","method":"GET","uri":"/","user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.2141.400 QQBrowser/9.5.10219.400","status":500,"error":"code=500, message=code=404, message=Not Found","latency":36096","latency_human":"36.096µs","bytes_in":0,"bytes_out":41}
";
Is there something wrong in my operation?.
Can you tell me the right way?
I'd like to download and run Gotenberg in https://kitematic.com/ on Windows.
(HTTP code 404) unexpected - manifest for thecodingmachine/gotenberg:latest not found
Kitematic is a GUI to easily run Docker images on Windows (in connection with Docker Toolbox, it uses VirtualBox for virtualizing the base OS)
Not a issue really. I followed your instructions and scaled up gotenberg to 5 and then 10 containers (on Windows Hyper-V). Calling the service from a java application resulted in the exact same performance (one document converted to pdf every 1 second). This is a good result, but I'm confused as to why the performance stays (exactly!) the same. I'm using Pentaho kettle, java and apache's HttpClient to make the Http posts. I don't expect you to solve my problem...but any ideas on how to gain some insight on this would be appreciated!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.