gotenberg / gotenberg Goto Github PK

View Code? Open in Web Editor NEW

7.0K 63.0 454.0 19.15 MB

A developer-friendly API for converting numerous document formats into PDF files, and more!

Home Page: https://gotenberg.dev

License: MIT License

Go 96.82% Shell 0.71% Makefile 1.01% Dockerfile 1.45%

api pdf html wkhtmltopdf unoconv docker markdown docx pptx puppeteer

gotenberg's Issues

It seem didn't support Chinese characters?

update the Dockerfile , Which Dockerfile is built? In the build folder

I have to update the Dockerfile in order to support Chinese characters.
but I do not know Which Dockerfile is built? In the build folder ,There are many Dockerfiles

Expected Behavior

Ability to overlay the .pdf with a watermark per page on the same position.

Context

I don't know this is something that should be part of the project as something like that could be done with other tools, but I'd like to suggest this and have an opinion from the maintainers.

Thanks for the nice tool!

Add support to convert a URL to PNG or PDF

Hey guys,

you really did a great job! Love the webhook feature!

Now that you already spawned a headless instance of Chrome, how about taking an URL to take a screenshot of a web page (with configurable view port size) or printing to PDF? It would make my live easier!

Thanks and regards,

hjjg

the merge out-of-order issue

Your issue may already be reported!
Please search on the issue tracker before creating one.

Expected Behavior

when i merge three or more pdf files , the result pdf should be ordered . for example : i merge 1.pdf , 2.pdf ,3.pdf , 4.pdf , the sum pdf file should be 1,2,3,4 turn ,the asc order.

Current Behavior

i got the result pdf ,which have the 2,4,1,3 turn , it's out-of-order

Possible Solution

Steps to Reproduce (for bugs)

1.prepare three or more pdf files , 1.pdf , 2.pdf ,3.pdf,4.pdf
2.request the merge feature
3.check the order of the result pdf
4.

Context

Your Environment

Version used:
Operating System and version:
Link to your project:

Office documents: add paper size

Expected Behavior
Convert Xlsx file to a PDF file

Current Behavior
If the table length is too long,PDF file Documents will change lines

Possible Solution
Steps to Reproduce (for bugs)
1.Convert Xlsx file to a PDF file

Context
Your Environment
Version used:3
Operating System and version:ubuntu 18.10
Link to your project:

support for Flat OpenDocument (fodt)

Expected Behavior

Gotenberg should support the Flat OpenDocument format with extension fodt.

Current Behavior

Extension fodt isn't supported.

Possible Solution

Since LibreOffice supports it natively, maybe add it to https://github.com/thecodingmachine/gotenberg/blob/master/internal/app/api/office.go ?

Context

The Flat OpenDocument format is nice alternative to the standard zipped OpenDocument format. You can edit it with any editor or edit it with sed and other tools. It's nothing more than a normal OpenDocument file in one flat XML file.

Japanese characters are not supported

Your issue may already be reported!
Please search on the issue tracker before creating one.

Expected Behavior

Japanese characters are shown

Current Behavior

Japanese characters appear as squares

Possible Solution

previous solution pointed out on "updating dockerfile" any specific hints? .
I tried setting:

ENV LC_ALL=ja_JP.UTF-8
ENV LANG=ja_JP.UTF-8
ENV LANGUAGE=ja_JP.UTF-8

in the container but it did not work 🤔

Docker build failing

Expected Behavior

make image should build the Docker image

Current Behavior

W: The repository 'https://packagecloud.io/Keymetrics/pm2/debian stretch Release' does not have a Release file.
E: Failed to fetch https://packagecloud.io/Keymetrics/pm2/debian/dists/stretch/main/source/Sources  404  Not Found
E: Some index files failed to download. They have been ignored, or old ones used instead.
The command '/bin/sh -c wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - &&    echo "deb http://dl.google.com/linux/chrome/deb/ stable main" | tee /etc/apt/sources.list.d/google-chrome.list &&    apt-get update &&    apt-get -y install google-chrome-stable' returned a non-zero code: 100
make: *** [image] Error 100

Steps to Reproduce (for bugs)

checkout then make image

Your Environment

Version used: Latest version - commit id: 2648992
Operating System and version: macOS 10.14.2

Issue building

Your issue may already be reported!
Please search on the issue tracker before creating one.

Expected Behavior

Finishing all steps for building the image from the Dockerfile

Current Behavior

Step 6/11 : COPY .ci/gotenberg /usr/bin/gotenberg
COPY failed: stat /var/lib/docker/tmp/docker-builder832472548/.ci/gotenberg: no such file or directory

Context

I tried to run the following command in terminal:
docker build -t gotenberg .

Your Environment

Mac OS X Mojave 10.14.1

How can i use for go build,don't use docker image?

Add option to enable CORS

Calling the API from a webpage is currently not possible, if the webpage is hosted on a different domain than gotenberg. The "Access-Control-Allow-Origin" is missing, thus a browser won't let you make the request.
Please consider adding an option to whitelist certain user defined origins.
For example (Docker):
docker run -e ALLOW_ORIGINS='http://some-domain.com,https://another.domain.com:8080'

Removing outputs from pdfcpu

Current library version logs directly to stdout.

However, latest version seems to offer an option to disable those logs.

Text file support

@Fooriva

The API should handle text file (.txt and .rtf) conversions.

creating new browser context: cdp.Target: CreateBrowserContext: rpc er ror: Not allowed. (code = -32000)

source file
temp.txt
docker container ping image host is success

Assets accessible in header & footer?

It seems that files (images, stylesheet), passed in via the files argument, are correctly accessible in index.html but not in header.html or footer.html.

How can I embed files in the header or footer?

Expected Behavior

File accessible in index.html are accessible to header and footer as well.

Current Behavior

It currently doesn't seem to be able to read those files.

Your Environment

Version used: thecodingmachine/gotenberg:5 (Docker)

Possible to turn off logging for `/ping` ..?

Context: we run gotenburg in a kubernetes cluster as part of an overall deployment which includes other microservices that sit "in front" of gotenburg. Currently gotenburg is running fine, but when we go to pull logs they are almost completely unusable because every time we've healthcheck-ed it via a /ping that adds a log line to the output.

Obviously we can pull the logs and grep, but it is a bit annoying given that we have no need to understand that a ping happened cause we can pull that information from kubernetes events.

Merging Issues

Hello.
We experiencing some problems with file merging using /merge.

Result errors:
{"message":"dict=markupAnnot entry=IT: unsupported in version 1.5\nThis file could be PDF/A compliant but pdfcpu only supports versions \u003c= PDF V1.7\n"}

{"message":"Read: xRefTable failed: parse: duplicate key"}

{"message":"Read: xRefTable failed: Free: object #0 not found."}

Any ideas?

Thanks.

stupid... please remove me.

Antivirus feature ?

Hello,

First of all, let me thank you for your very nice project, it bundles everything needed to handle conversions as smoothly as possible without the hassle of doing it ourselves !

I started doing a PoC with gotenberg in my company to handle file conversions inside another app, but one cool feature IMHO would be to search uploaded files for viruses before conversion.

As of today, our app uses ClamAV to search for viruses before handing the file over to gotenberg for the next steps, but I thought it'd be nice to integrate it directly inside gotenberg, so it'd become a one stop shop for all the file conversion operations.

I'd be okay to try and do a PR with such a feature if you like the idea, what do you think ? :)

environment variable to switch on/off engines

Would it be possible to insert an environment variable to turn on/off LibreOffice or Chrome support? Sometimes, I don't need Chrome, so I want to save resources.

Running gotenberg in GKE

Hi!

This looks amazing, we are looking forward to using this within our k8s cluster, however we're struggling to get the image up and running.

Our k8s file looks like

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gotenberg
  labels:
    app: gotenberg
spec:
  selector:
    matchLabels:
      app: gotenberg
  template:
    metadata:
      labels:
        app: gotenberg
    spec:
      containers:
        - name: gotenberg
          image: thecodingmachine/gotenberg:4
          ports:
            - name: gotenberg
              containerPort: 3000
---
kind: Service
apiVersion: v1
metadata:
  name: gotenberg
spec:
  type: NodePort
  ports:
    - protocol: TCP
      name: gotenberg
      port: 3000
      targetPort: 3000

Output

The deployment has the following logs and fails to start

➜ kubectl logs gotenberg-74d485d685-m6frr
⇨ Gotenberg 4.2.0
⇨ Chrome headless started with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ Chrome headless restarted with PM2
⇨ warming-up Chrome headless
⇨ error: failed to launch Chrome headless

If we're doing something wrong it'd be great to see a larger demo/example in the docs for k8s.

Your Environment

➜  kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.7", GitCommit:"65ecaf0671341311ce6aea0edab46ee69f65d59e", GitTreeState:"clean", BuildDate:"2019-01-24T19:32:00Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.7-gke.4", GitCommit:"618716cbb236fb7ca9cabd822b5947e298ad09f7", GitTreeState:"clean", BuildDate:"2019-02-05T19:22:29Z", GoVersion:"go1.10.7b4", Compiler:"gc", Platform:"linux/amd64"}

Blank document generated

Expected Behavior

Document should be generated as closely as what's displayed in browser as possible

Current Behavior

Generated document is completely blank (empty)

Possible Solution

Adding a possibility to specify a delay to let the page load completely before performing the actual conversion.

What could be even better would be to add a mechanism similar to what Puppeteer uses with the waitUntil option (see their documentation) in the page.goto section here

Steps to Reproduce (for bugs)

Using the convert/url endpoint
Specify an url that corresponds to a page on which the html spans multiple physical pages in length
Send the request

Context

Trying to generate a pdf from an html page that would essentially span close to 100 pages in PDF format.

Your Environment

Version used: 4
Operating System and version: Windows 10, latest stable build

Is authentication planned ?

First I want to thank you for this, it just work out of the box and it's well documented.

But I wanted to know if it was plan to create an authentication system for gotenberg so it could be exposed as a service directly.

docker run --rm -p 3000:3000 thecodingmachine/gotenberg:4 gotemberg
JWT_KEY='myJWTKey'

Feel free to close this issue if you think it's out of the scope of gotenberg and should be handled by a proxy for instance.

Unreadable PDF file

Hello,

So I'm not sure this is actually a bug, but I can't get the API working using java (groovy/grails to be precise). I use the java HttpClient and MultipartEntity to create the http request, containing the remoteURL of the page I want to make into a PDF. The API do gives me what seems to be a PDF file, but it cannot be opened, neither by a local pdf viewer or the browser, both telling me that the file seems to be damaged or in an unsupported format.
Here is the code I use :

static def execute(def apiURL){
        def httpClient = HttpClients.createDefault()
        def request = new HttpPost(apiURL)

        MultipartEntityBuilder builder = MultipartEntityBuilder.create()
        builder.addTextBody("remoteURL", 'https://google.com')
        builder.addTextBody("marginTop", '0')
        builder.addTextBody("marginBottom", '0')
        builder.addTextBody("marginLeft", '0')
        builder.addTextBody("marginRight", '0')


        HttpEntity multipart = builder.build()
        request.setEntity(multipart)
        def response = httpClient.execute(request)

        BufferedReader rd = new BufferedReader(
                new InputStreamReader(response.getEntity().getContent()))

        StringBuffer result = new StringBuffer()
        String line = ""

        while ((line = rd.readLine()) != null) {

            result.append(line)
        }
        return result
}

I do get a StringBuffer starting with
"%PDF-1.4%��1 0 obj<</Creator (Chromium)/Producer (Skia/PDF m74)/CreationDate (D:20190510164315+00'00')..."
that's why I say it looks like a PDF file.

I am sure the API works fine, since when I do a curl request, from a terminal or from my java project, I do have a proper PDF file. I can't use the curl request in my code, though, because it download the file into my project, instead of giving it as a variable I can easily use and convert as base64.
I am coming here for help since I can't figure out if the problem is from my request or the API, so if anyone has an idea, that would be great

Would you consider adding an enhancement to allow output to formats other pdf?

Or should we just look to a project like this: https://github.com/zrrrzzt/docker-unoconv-webservice

Dockerfile with latest unoconv version

The unoconv version that installs from apt-get is 0.7 and the latest version is 0.8.2. The output from apt-cache search unoconv is :

unoconv:
  Installed: 0.7-1.1
  Candidate: 0.7-1.1
  Version table:
 *** 0.7-1.1 500
        500 http://httpredir.debian.org/debian stretch/main amd64 Packages
        100 /var/lib/dpkg/status

So I guess I would need to install manually in order to get the latest version. I've already reached out to see if the latest version can be published unoconv/unoconv#482. If that is not possible, I was wondering if there is any way that this repo can support a Dockerfile that builds latest unoconv manually with all its dependencies.

pictures too?

is it possible to convert/wrap single/multiple pictures (png|jpg|etc) too?

cheers
manu

Root should return 404 not 500

Expected Behavior

When I query /, gotenberg prints

{"message":"code=404, message=Not Found"}

With an HTTP status code of 404.

Current Behavior

When I query /, gotenberg prints

{"message":"code=404, message=Not Found"}

With an HTTP status code of 500.

Possible Solution

I imagine this being a one-liner change. I'd be happy to open a PR.

Steps to Reproduce (for bugs)

Query /.

Context

This pollutes logs with 5xx errors when there's actually nothing wrong.

Your Environment

Version used: 4.4.0
Operating System and version: Any
Link to your project: N/A

documentation for 2.0.0

#3
#5 & #6
#7
#8

Header and footer display very small text

Expected Behavior

With an header defined as:

<html>
    <head>
    </head>
    <body>
        <span>Ok</span>
    </body>
</html>

Header and footer should display text with the same size as the body.

Current Behavior

The size of the text is much smaller:

Possible Solution

I think I remember this was a common problem with wkhtmltopdf or some other engine, but I can't seem to find the reference.

Your Environment

Version used: thecodingmachine/gotenberg:5 (Docker)

Add a parameter specifying the resulting pdf filename

Expected Behavior

Hi guys! Would it be possible to add a parameter specifying the resulting pdf filename after converting from html/url?

Current Behavior

Response Headers always include a random filename, e.g content-disposition: attachment; filename="73952a7767eb9af0c94c05e525c7f327.pdf"

Possible Solution

I'd suggest adding a new parameter in addition to the existing ones (remoteURL, webhookURL, paperWidth, paperHeight, etc), e.g. "fileName", so that users can define custom value.

Context

In real world it would allow consumer of the gotenberg api, e.g. html page to perform a POST request using an html form and expect Save File dialog to appear with correct file name populated, instead of seeing some random name.

Request for Healthcheck endpoing

Hi there,

Is there any change of getting a health check endpoint added to gotenberg, so that is usable behind an ALB?

Many thanks!

How do I set the margins when I convert excel

Expected Behavior

Convert Xlsx file to a PDF file

Current Behavior

I want to control the margins of the PDF output

Your Environment

Version used: 5
Operating System and version: macos mojave 10.14.1
Link to your project:

Unsure about security

Hello,
Nice work and documentation.

Expected situation imho

Checking the Dockerfile and this repository code should be enough to have a pretty good idea whether this is secured or infected. (e.g. data leakage)

Current situation

In your Dockerfile:

FROM thecodingmachine/gotenberg:3.2.0 AS hack

I am guessing that to reduce your Dockerfile you reuse a previous build but... this also hides what your previous build is based on or what is inside it, does not it?

Unless one investigate this particular container and look for issues, which is obviously harder that checking if a Dockerfile contains (more obvious) security issues.

Possible solution

Would it be possible to have a longer Dockerfile without any hack, some says explicit is better than implicit 😉

Otherwise I guess I could try to reuse your Dockerfile from tag 3.2.0 and add it at the top of your latest one.

I am open to hear your point of view, I may be missing something.

5.0.0 roadmap

See #66

When paged, the data display is incomplete

Your issue may already be reported!
Please search on the issue tracker before creating one.

Expected Behavior

Convert HTML to a PDF file, and page after page is displayed

Current Behavior

There is a probability that the data between the two pages is incomplete and half of it is erased
ill pdf file

Possible Solution

Steps to Reproduce (for bugs)

1.create a table HTML, more than one page
2.convert HTML to pdf

Context

Your Environment

Version used:1.11.2
Operating System and version:ubuntu 16.10
Link to your project:

Deployment tips in docs

Hey guys, thanks a lot for this amazing project, you are doing an amazing job with it.

So far I am using it to convert PPTX files to PDF, but right now my deployed version seems to be much slower than local version. I am running Gotenberg with k8s, and my problem is, one sample PPTX file is converted to PDF in 25s on my deployed version, and 4s on my local machine. I have tried doubling the resources on k8s and the PDF creation came to 13s, which is half of the previous version, but this way I am giving it more and more resources, which is still not as good as my local + horizontal scaling becomes more expensive.

Since you guys seem to be running this for a while, you probably have some best-practices around how to deploy this and what are the optimum resource requirements, etc.. It would be great if you can include your tips in the documentation, whether making the service run faster by disabling something or excluding some binary for some use cases for example, or the resource requirements for the sweet spot between performance and scalability. If this is out of scope for you I can try to write some docs in a PR, but it would be much easier using your experience with it rather than me inspecting every moving piece.

[BUG] HTML-to-PDF sometimes lose some parts of HTML

Expected Behavior

PDF generated from HTML contain all elements of HTML.

Current Behavior

Sometimes (1 of 10 in average) parts of HTML elements lost in resulted PDF. Biggest problem is that response status code is 200.

Possible Solution

Steps to Reproduce (for bugs)

Create index.html with content https://gist.github.com/server-may-cry/57b062186f648a16f6e4c5c7ac354e38
curl --request POST --url http://gotenberg:3000/convert/html --header 'Content-Type: multipart/form-data' --form [email protected] -o result.pdf
See file size
Repeat 10-20 times each time check file size

Context

No difference in logs from container for failed and ok requests

ok
gotenberg_1                | {"time":"2019-01-28T17:56:49.563200149Z","id":"","remote_ip":"172.18.0.16","host":"gotenberg:3000","method":"POST","uri":"/convert/html","user_agent":"curl/7.52.1","status":200,"error":"","latency":423976853,"latency_human":"423.976853ms","bytes_in":8501,"bytes_out":28024}

failed
gotenberg_1                | {"time":"2019-01-28T18:02:20.799825196Z","id":"","remote_ip":"172.18.0.16","host":"gotenberg:3000","method":"POST","uri":"/convert/html","user_agent":"curl/7.52.1","status":200,"error":"","latency":566488490,"latency_human":"566.48849ms","bytes_in":8501,"bytes_out":28024}

Your Environment

Version used: tested on 3.2 and 4 (on version 4 it happens very rarely)
Operating System and version: docker image
Link to your project:

Use of custom PKI

We are using gotenberg internally against https servers with certificates generated with an internal PKI.

As I usually do for other images, I made a new dockerimage based on thecodingmachine/gotenberg:5

FROM thecodingmachine/gotenberg:5

ADD http://blahblahblah/internal-pki.crt /usr/local/share/ca-certificates/internal-pki.crt
RUN set -xeu \
  && update-ca-certificates

The certificate is properly deployed, rehashed, symlinked and added to ca-certificates.crt and it works with curl, wget and so on.
This is usually enough for being taken in consideration by all our ecosystem including servers and chrome etc on workstations.

Here, it doesn't work. Any clues ?

More generally, it seems that when there is a certificate issue (self signed, unrecognized authority, name issue etc) we only got blank page, without anymore info in the logs :/
Please note that it works like a charm using a commonly accepted Root CA.

Gotenberg vs WKHTML2PDF

It would be interesting to highlight the differences between gotenberg and wkhtml2pdf for converting PDF (aside from the obvious things like "it's a server whereas wkhtml2pdf is a binary) in the README.md.

I am curious to know if you encountered some limitation with wkhtml2pdf that started the creation of gotenberg.

go mod incompatibility

Trying to use this module in other a go mod enabled repo of us is causing issues with some of the unused dependencies (particularly delve). Running go mod tidy fixes these issues.

Expected Behavior

GO111MODULE=on go get -u github.com/thecodingmachine/[email protected]
Should allow us to use the go module.

Current Behavior

The gotenberg module is marked as incompatible.

Possible Solution

run go mod tidy

Context

We are unable to use the go module for talking to the gotenberg service in our k8s cluster.

I have a commit ready on a fork. Wanted to raise an issue first, happy to raise PR if accepted.

Add final documents to tests

To get a better impression of the final images, could you please add the generated documents in the tests to the repo?

I have read the docs but seeing final results could give a better first impression.

For example I'd like to see:

header an footer styles
a complex microsoft word document converted to pdf

What are the limits to the conversion process, e.g. when rendering footnotes, sections, ...?

Mail merge feature

Great repo!

Is there a mailmerge feature on the roadmap?

I think about having a csv file (adresses) and a .docx file (letter template) going to be merged into a single pdf, e.g. for generating multiple letters from a template in one step.

Maybe this is little step for gotenberg (libreoffice has this feature builtin) but a big step for generating high quality letters, catalogs, ... in a batch process.

Small Typo in Readme

In the Security Section of your Readme there is a Typo:

The API does not provide any authentication mechanisms. Make sure to not put it on a public facing port and your client(s) should always controls what's is sent to the API.

PrintToPDF: rpcc: message too large

Expected Behavior

Attaching small size images should include them in the PDF.

Current Behavior

Attaching a 32KB image yields:

{"time":"2019-05-03T09:15:36.0025674Z","id":"","remote_ip":"192.168.240.7","host":"documents:3000","method":"POST","uri":"/convert/html","user_agent":"","status":500,"error":"code=500, message=printing page to PDF: cdp.Page: PrintToPDF: rpcc: message too large (increase write buffer size or enable compression)","latency":93544042,"latency_human":"93.544042ms","bytes_in":39667,"bytes_out":133}

Steps to Reproduce (for bugs)

Start Gotenberg 5.0.1 using the official docker image
Send a cURL request with an image of 32KB+

Context

I'm trying to add an image to the PDF.

Your Environment

Version used: 5.0.1
Operating System and version: macOS 10.13.6 on the host

Run CentOs system.And I got message:{"message":"code=404, message=Not Found"}

I ran docker in CentOs system. Unfortunately I got "{"message":"code=404, message=Not Found"}" in the browser. And the console logged "{"time":"2019-01-06T10:04:36.297609642Z","id":"","remote_ip":"61.151.178.217","host":"111.231.196.219:3000","method":"GET","uri":"/","user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.2141.400 QQBrowser/9.5.10219.400","status":500,"error":"code=500, message=code=404, message=Not Found","latency":36096","latency_human":"36.096µs","bytes_in":0,"bytes_out":41}
";
Is there something wrong in my operation?.
Can you tell me the right way?

Kitematic: manifest not found

Expected Behavior

I'd like to download and run Gotenberg in https://kitematic.com/ on Windows.

Current Behavior

(HTTP code 404) unexpected - manifest for thecodingmachine/gotenberg:latest not found

Steps to Reproduce (for bugs)

Install Docker Toolbox on Windows with Kitematic
Click "New"
Search for "Gotenberg"
Click "Create"

Context

Kitematic is a GUI to easily run Docker images on Windows (in connection with Docker Toolbox, it uses VirtualBox for virtualizing the base OS)

Your Environment

Windows 7 Pro 64bit, Windows 10 Pro 64bit (both latest)
Kitematic & Docker Toolbox (latest)

Performance question

Not a issue really. I followed your instructions and scaled up gotenberg to 5 and then 10 containers (on Windows Hyper-V). Calling the service from a java application resulted in the exact same performance (one document converted to pdf every 1 second). This is a good result, but I'm confused as to why the performance stays (exactly!) the same. I'm using Pentaho kettle, java and apache's HttpClient to make the Http posts. I don't expect you to solve my problem...but any ideas on how to gain some insight on this would be appreciated!