Giter Site home page Giter Site logo

go-fitz's People

Contributors

charlesoconor avatar edwardshen125 avatar gen2brain avatar hudon avatar mbranch avatar mzimmerman avatar pukoren avatar rntrp avatar singhsays avatar xaduha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-fitz's Issues

error: no builtin cmap file: Adobe-GB1-UCS2

hello
when pdf convert to jpg,console report:

error: no builtin cmap file: Adobe-GB1-UCS2
warning: unrecoverable error; ignoring rest of page

whether the the encodings character is missing ?
can i set default encoding when lack to support like Adobe-GB1-UCS2 or others ?
or how can fix it

using this package as a dependency doesn't bring in the include files

I receive this when attempting to build with this package. This was after switching to go 1.18. I have CGO_ENABLED=1, which is required to make arm64 work (M1)

# github.com/gen2brain/go-fitz
vendor/github.com/gen2brain/go-fitz/fitz.go:6:10: fatal error: 'mupdf/fitz.h' file not found
#include <mupdf/fitz.h>
         ^~~~~~~~~~~~~~
1 error generated.
make: *** [build] Error 2

Extract and set metadata?

Hello,

Any plans for implementing more of the API such as for getting and setting metadata? My main need is to extract the table of contents and metadata fields such as title, author, and keywords. I'm moving from Python (PyMuPDF) to Go and this seems like the most developed library. I haven't tried the C bindings but it seems like it might be all there.

Compiling and running on windows and go-fitz crashes on file open

Hi,
I'm running go-fitz on windows, it compiles fine, and runs fine, until it tries to open a PDF. The line in question is:

        fmt.Println("and.... getting ready to read the pdf " + filepath.Join(origDirPath, subject) + ".pdf")
	doc, err := fitz.New(filepath.Join(origDirPath, subject) + ".pdf")
	if err != nil {
		fmt.println("error: " + err.Error())
	}

The println before fitz.New prints:

getting ready to read the directory C:\Users\xxxxx\Desktop\pdfs\ML-xxxx_REV1A.pdf

which is a valid path to a PDF.

I get the error however:

image

A third of the way down in the stack trace you can see interface.go:157 - thats the fitz.New line above.

Any suggestions as to what may have caused this would be very useful, thanks!

EDIT:

package main

import (
	"fmt"
	fitz "github.com/gen2brain/go-fitz"
)

func main() {
	doc, err := fitz.New("C:\\Users\\xxxx\\Desktop\\pdfs\\ML-MT-PH-xxxx.pdf")
	if err != nil {
		fmt.Println("rror new " + err.Error())
		
	}

	defer doc.Close()
	return
}

Returns with:

Segmentation fault

EDIT: Managed to get it to spit out exit status 3221225477 which I think is a buffer overflow....

Note this is Windows 64 (Virtual Machine)

undefined reference to `__imp__ftelli64'

Trying to test it at Windows, but got:

# github.com/gen2brain/go-fitz
C:/Users/hasan/Documents/GoWorkPlace/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_windows_amd64.a(output.o):output.c:(.text$file_tell+0xe): undefined reference to `__imp__ftelli64'
C:/Users/hasan/Documents/GoWorkPlace/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_windows_amd64.a(output.o):output.c:(.text$file_seek+0x14): undefined reference to `__imp__fseeki64'
C:/Users/hasan/Documents/GoWorkPlace/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_windows_amd64.a(stream-open.o):stream-open.c:(.text$seek_file+0x1d): undefined reference to `__imp__fseeki64'
C:/Users/hasan/Documents/GoWorkPlace/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_windows_amd64.a(stream-open.o):stream-open.c:(.text$seek_file+0x2e): undefined reference to `__imp__ftelli64'
collect2.exe: error: ld returned 1 exit status

Detect corrupt/broken files

MuPDF uses a setjmp based exception handling system, i.e fz_try/fz_always/fz_catch. In order to detect if file is broken or not this needs to be somehow implemented in Go.

Modify the image size

Converting the 4x6 inch size pdf to jpg image using the code as given on readme file for pdf as image. The image is generated easily but its size is 1200 x 1800 pixel. Didn't find any options for size related things. Can you please help with this.

"link failed"

We've been encountering some build issues on Travis since the recent commits to go-fitz:

https://travis-ci.com/RTradeLtd/Lens/builds/99836987 with -tags nopie enabled

$ go vet ./...
# github.com/gen2brain/go-fitz
/usr/bin/ld: ../../gen2brain/go-fitz/libs/libmupdf_linux_amd64.a(colorspace.o): unrecognized relocation (0x2a) in section `.text.fz_init_cached_color_converter'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
# github.com/RTradeLtd/Lens/vendor/github.com/otiai10/gosseract
tessbridge.cpp: In function ‘int Init(TessBaseAPI, char*, char*, char*, char*)’:
tessbridge.cpp:46:36: warning: ignoring return value of ‘FILE* freopen(const char*, const char*, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
   freopen("/dev/null", "a", stderr);
                                    ^
tessbridge.cpp:60:36: warning: ignoring return value of ‘FILE* freopen(const char*, const char*, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]
   freopen("/dev/null", "a", stderr);
                                    ^
The command "go vet ./..." failed and exited with 2 during .

without tags: https://travis-ci.com/RTradeLtd/Lens/builds/99567591

11.54s$ go get -u github.com/gen2brain/go-fitz
# github.com/gen2brain/go-fitz
/usr/bin/ld: ../../gen2brain/go-fitz/libs/libmupdf_linux_amd64.a(colorspace.o): unrecognized relocation (0x2a) in section `.text.fz_init_cached_color_converter'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
The command "go get -u github.com/gen2brain/go-fitz" failed and exited with 2 during .

Any help would be appreciated. Thanks!

error: cannot recognize xref format

Pdf can be converted into pictures normally, but there will be err error message. It is unclear how to close it?

error: cannot recognize xref format
warning: trying to repair broken xref
warning: repairing PDF document

go get -u fails with extlib

Can you change file fitz_cgo_extlib.go:
#cgo LDFLAGS: -lmupdf
to
#cgo LDFLAGS: -lmupdf -lm -lmupdfthird

It will run error without mupdfthird.
Thank you very much.

Unable to build deployment package - go-fitz build constraints

In one of my projects, we are using go-fitz (v1.18.0) for pdf generation. However, recently the build package generation fails when we run the below command -
GOOS=linux GOARCH=amd64 go build -o main

The error that I get is -
go build github.com/gen2brain/go-fitz: build constraints exclude all Go files in /<path>/gen2brain/[email protected]

This used to work before. Has something changed recently in the past few months?

Raising issue here as I couldnt find a solution to this. Please take a look.

How to across-compile from mac to linux

Hi:
My os is macOs, now i need to compile binary to run on centos7, i use the command GOOS=linux GOARCH=amd64 go build, and there is an error:
go build github.com/gen2brain/go-fitz: build constraints exclude all Go files in /Users/allen/works/go/src/github.com/gen2brain/go-fitz.

Look forward to your reply.

# github.com/gen2brain/go-fitz | /go/pkg/mod/github.com/gen2brain/[email protected]/fitz.go:8:10: fatal error: mupdf/fitz.h: No such file or directory

Am new to golang, and I need to generate a thumbnail from pdf when I upload it. am using go-fitz library with docker and golang.

But am getting challenged on building docker image including ImageMagick

Docker set up


FROM golang:alpine AS build
RUN apk --no-cache add gcc g++ make git

WORKDIR /go/src/app

COPY go.mod .
COPY go.sum .
RUN go mod download
COPY . .

RUN GOOS=linux go build -tags extlib -ldflags="-s -w" -o ./bin/web-app ./main.go

FROM alpine:3.13
RUN apk --no-cache add ca-certificates
WORKDIR /usr/bin
COPY --from=build /go/src/app/bin /go/bin
EXPOSE 2053
ENTRYPOINT /go/bin/web-app --port 2053


image

github.com/gen2brain/go-fitz

/go/pkg/mod/github.com/gen2brain/[email protected]/fitz.go:8:10: fatal error: mupdf/fitz.h: No such file or directory
8 | #include <mupdf/fitz.h>
| ^~~~~~~~~~~~~~
compilation terminated.


Thanks

There is something wrong when i build my app~

HI, I met a problem when I build my go program, here is the detail info:

github.com/gen2brain/[email protected]/libs/libmupdfthird_linux_amd64.a(one.o): relocation R_X86_64_PC32 against symbol `stdout@@GLIBC_2.2.5' can not be used when making a PDE object; recompile with -fPIE

I am confused about it. I will appreciate it if there is any solution given, thx~~

Go-fitz is incompatible with go-iup

Finally, I figured out the problem I initially had with go-iup! If go-fitz is imported, too, then running the executable results in the dreaded Pango error:

Pango-ERROR **: 17:38:29.868: Harfbuzz version too old (1.3.2)

or a segmentation violation. Since you're the maintainer of both packages, it might be worth looking at a way to fix this problem.

To repeat the problem:

package main

import (
	"github.com/gen2brain/iup-go/iup"
    // _ "github.com/gen2brain/go-fitz"
)

func main() {
	iup.Open()
	defer iup.Close()


	lbl := iup.Label("This is a test label.")

	dlg := iup.Dialog(
		lbl,
		)
	dlg.SetAttribute("TITLE", "Label")

	iup.Show(dlg)
	iup.MainLoop()
}

It works. Uncomment the line importing go-fitz, and the recompiled executable crashes. (My executable shows the Pango error instead of crashing when go-fitz is imported but I wasn't able to repeat that in the minimal example.) I'm putting the issue here although I can't say which package is the real culprit. In any case, they seem to require incompatible Pango versions.

Tested on Linux Mint 20.3, which is based on Ubuntu 20.04.

Occasional errors for NumPage() under high concurrency

Hi there. I'm using go-fitz within a microservice application with a potentially high number of concurrent network requests. I am aware of the concurrency issues one might have with fitz, e.g. documented in #4. However, I am creating a new fitz.Document for each uploaded document, so it seems to be fine that way.

While load testing the application with k6, I started to observe cryptic error messages from fitz when there are many concurrent user requests for NumPage(). Sometimes these message accumulated quickly and my application crashed. Just to show you some of these error messages:

error: expected object number
warning: repairing PDF document
warning: object missing 'endobj' token
error: cannot find object in xref (28 0 R)
warning: cannot load object (28 0 R) into cache
error: invalid key in dict
warning: ignoring broken object (21 0 R)
error: cannot recognize version marker
warning: trying to repair broken xref
warning: repairing PDF document
error: cannot recognize version marker

It appears that something is messing with our input data, but only under heavy load. After some research I found an issue for PyMuPDF akin to the subject and also this one particular comment:

It seems the problem can be solved, if I prevent Python freeing the area via its garbage collection, as long as the Python object fitz.Document lives (i.e. is not closed or deleted). This can be done by recording a reference to the bytes / bytearray object in the Document object.

Originally posted by @JorjMcKie in pymupdf/PyMuPDF#173 (comment)

fitz.Document uses *C.fz_stream, which is just a memory range over a Go byte slice. So potentially Go may also GC the original byte slice too early, i.e. before NumPage() has returned. fitz is then trying to read this memory area and comprehensibly fails or even segfaults. If we add the original byte slice to the fitz.Document struct, it should prevent this from happening.

I will provide a minimal application and a test PDF file in the comment below.

Using lib inside Docker alpine

The lib is working fine locally on my golang setup, however when I try to pack it inside a docker container based on Alpine, I got some error logs.

I would like to try to use Mupdf as external lib, I see in the Readme you have some flags to use external lib but for some reason it seems to not work on my machine:

# github.com/gen2brain/go-fitz
/go/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdfthird_linux_amd64.a(one.o): In function `fmtdate':
one.c:(.text.fmtdate+0x76): undefined reference to `__sprintf_chk'

I build my project using

go build -tags extlib

I'm not used to golang build tags so I don't know if I use it correctly, any kind of help would be appreciated.

Reading image

Hi, this is not an issue in the package itself, but something may you be able to help with.

While reading 'pdf' files, some times I stuck with the file being a 'scan' rather than clean 'pdf' in this case it is appearing as image, and can not read it with this package:

  1. Is there a way to get notified that this 'pdf' is unreadable, to avoid 'panic'?
  2. Is there a way to read the file even if it is scan?

My thought I may use another package to read image, like tesseract if failed to read using 'go-fitz'.

Thanks

aborting at reloc.c line 443 in bfd_get_reloc_size

github.com/gen2brain/go-fitz

/usr/bin/ld: BFD version 2.20.51.0.2-5.44.el6 20100205 internal error, aborting at reloc.c line 443 in bfd_get_reloc_size
/usr/bin/ld: Please report this bug.

这是我的系统信息
Linux version 2.6.32-754.35.1.el6.x86_64 ([email protected]) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-23) (GCC) ) #1 SMP Sat Nov 7 12:42:14 UTC 2020

fatal error: mupdf/fitz.h: No such file or directory

go version go1.17.5 linux/amd64

go mod vendor
go run main.go

github.com/gen2brain/go-fitz

../../vendor/github.com/gen2brain/go-fitz/fitz.go:6:24: fatal error: mupdf/fitz.h: No such file or directory
#include <mupdf/fitz.h>

vendor/github.com/gen2brain/go-fitz/fitz.go:9:10: fatal error: 'mupdf/fitz.h' file not found

Hello there

May be this case was already treated but I can't find a way around this. I am trying to compile a basic example using this library but nothing seems to work. this is the command I use

go build -tags "extlib" -o bin/sample main.go.

Of course I installed mupdf first using homebrew (I am on macos) but the build does not work. I also tried to add env variable such as LIBRARY_PATH or LDFLAGS but with no result. the basic example I am trying to compile is the one on the project page. Based on what I have understood the could be a way to build just by using the bundled library..
any leads that can point me to the right direction ?
thanks for the help.

unknown file format: issue for some epub files

Thank you very much for your nice work!

It works for use for PDF files but throws the following issue for some EPUB files

error: FT_New_Memory_Face((null)): unknown file format
error: aborting process from uncaught error!

The loading document step seems to work fine , but any operation after that (doc.NumPage(), doc.ImagePNG() ) throws.

Here is a file to recreate the issue. FYI the MuPDF desktop app works fine for this exact file.
https://drive.google.com/file/d/1Fu3wZ4iablY-35c9MqzIDRqBN5SzrxOe/view?usp=sharing

Some context about the book: language: Japanese layout: RTL writing direction: vertical

Would you please give us some pointers to deal with this?

my service crash in new verion go-fitz

after upgrading to version v1.18.0 my programm crash with intarnal go-fitz error:

warning: cannot load object (19 0 R) into cache
error: aborting process from uncaught error!

in version v0.0.0-20210316172528-f0a07eb93909 his was not

Colorspace support

Please consider to add colorspace support,
I would need gray/mono conversion

go get -u fails - recompile with -fPIC

I would like to use this library, however go get -u is failing and I'm unsure how to fix it. I'm experienced with Go, yet I haven't done much with shared libraries and cgo. A sample of errors is below. I assume I just need to recompile them with the flag as suggested, however, I'm not sure where they came from.

/usr/bin/ld: /src/github.com/gen2brain/go-fitz/libs/libmupdfthird_linux_amd64.a(tgt.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: /src/github.com/gen2brain/go-fitz/libs/libmupdfthird_linux_amd64.a(inffast.o): relocation R_X86_64_32S against .rodata.str1.1' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /src/github.com/gen2brain/go-fitz/libs/libmupdfthird_linux_amd64.a(cmsalpha.o): relocation R_X86_64_32S against .rodata.FormattersAlpha.7892' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: /src/github.com/gen2brain/go-fitz/libs/libmupdfthird_linux_amd64.a(cmscnvrt.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /src/github.com/gen2brain/go-fitz/libs/libmupdfthird_linux_amd64.a(cmsgamma.o): relocation R_X86_64_32 against `.data.DefaultCurves' can not be used when making a shared object; recompile with -fPIC

Memory leak when rendering pages

The following code is modified from the README, simplified to the greatest extent possible.

package main

import "github.com/gen2brain/go-fitz"

func main() {
	doc, err := fitz.New("test.pdf")
	if err != nil {
		panic(err)
	}

	defer doc.Close()

	// Extract pages as images
	for n := 0; n < doc.NumPage(); n++ {
		_, err := doc.Image(n) // <- Memory leak here
		if err != nil {
			panic(err)
		}
	}
}

The above code produced a memory leak at the call to doc.Image. When executed on a longer PDF where rendering each slide results in an image ranging from 1 to 5 MB (averaging 1.9 MB in size), the program easily uses an entire gigabyte of RAM before 30 seconds has passed.

A similar pattern can be observed when calling doc.SVG and doc.HTML, although both are generally speaking lighter on the RAM usage. I haven't observed this leak when calling doc.Text, but this may just as well be because the operation occurs much faster due to its simplicity.

Dockerfile with alpine - undefined reference to `__fprintf_chk`

Hello,

I'm running the following Dockerfile on my OSX:

FROM golang:alpine
WORKDIR /app
RUN apk add --no-cache git gcc musl-dev
COPY go.mod go.sum ./
COPY example.go test.pdf ./
RUN go mod download
RUN go build -tags musl -o /example
CMD [ "/example" ]

Which results in the following outcome:

Step 1/8 : FROM golang:alpine
 ---> f5ae5d299f4c
Step 2/8 : WORKDIR /app
 ---> Using cache
 ---> faab50477730
Step 3/8 : RUN apk add --no-cache git gcc musl-dev
 ---> Using cache
 ---> 0fa39dfd6012
Step 4/8 : COPY go.mod go.sum ./
 ---> Using cache
 ---> 99ba26559e57
Step 5/8 : COPY example.go test.pdf ./
 ---> Using cache
 ---> f8076516ab72
Step 6/8 : RUN go mod download
 ---> Using cache
 ---> 3f6abaf51608
Step 7/8 : RUN go build -tags musl -o /example
 ---> Running in 14372ec812da
# github.com/gen2brain/go-fitz
/usr/lib/gcc/aarch64-alpine-linux-musl/11.2.1/../../../../aarch64-alpine-linux-musl/bin/ld: /go/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_linux_arm64.a(context.o): in function `fz_new_context_imp':
context.c:(.text.fz_new_context_imp+0x284): undefined reference to `__fprintf_chk'

I have also tried to run it with platform targets:

  • docker build . --platform=linux/arm64 - doesn't work, same error
  • docker build . --platform=linux/amd64 - works

I would want it to actually work properly for multiple targets.

Bazel can't find the mupdf lib

Hello!

I'm trying to create a bazel pipeline in a project that uses the go-fitz, but I'm getting the following error:

INFO: Build option --define has changed, discarding analysis cache.
INFO: Analyzed 2 targets (0 packages loaded, 7401 targets configured).
INFO: Found 2 targets...
ERROR: /home/eduardo.cardozo/.cache/bazel/_bazel_eduardo.cardozo/298e3f4ef742d30c2aa0ed162984106b/external/com_github_gen2brain_go_fitz/BUILD.bazel:3:11: GoCompilePkg external/com_github_gen2brain_go_fitz/go-fitz.a failed: (Exit 1): builder failed: error executing command bazel-out/k8-opt-exec-2B5CBBC6/bin/external/go_sdk/builder compilepkg -sdk external/go_sdk -installsuffix linux_amd64 -tags extlib,pkgconfig,extlib,pkgconfig -src ... (remaining 29 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox
/usr/bin/ld.gold: error: cannot find -lmupdf_linux_amd64
/usr/bin/ld.gold: error: cannot find -lmupdfthird_linux_amd64
/tmp/rules_go_work-002962491/_cgo_main.o:_cgo_main.c:_cgohack_fz_default_color_params: error: undefined reference to 'fz_default_color_params'
/tmp/rules_go_work-002962491/_cgo_main.o:_cgo_main.c:_cgohack_fz_identity: error: undefined reference to 'fz_identity'
...

Basically, it looks like the libmupdf wasn't being found by the linker gold, which bazel uses to link their built binaries.

Link for and example project:
https://github.com/LuizEduardoCardozo/bazel-go-fitz

You can reproduce it by running the following command

bazel build ...

Does anyone knows how can I fix this?

cannot create context: incompatible header (1.20.0) and library (1.18.0) versions

This is more of a question rather than issue.

I am using v1.20.0 of go-fitz and build the docker image using this Dockerfile

FROM golang:1.17-alpine3.14 as builder

RUN apk add --no-cache build-base \
   mupdf-dev \
   freetype-dev \
   harfbuzz-dev \
   jbig2dec-dev \
   jpeg-dev \
   openjpeg-dev \
   zlib-dev

WORKDIR /dist

COPY . .

RUN export CGO_LDFLAGS="-lmupdf -lm -lmupdf-third -lfreetype -ljbig2dec -lharfbuzz -ljpeg -lopenjp2 -lz" \
  && go mod download \
  && go build -tags musl -o pdf-transcoder

FROM  alpine:3.14

RUN apk add --no-cache mupdf \
  freetype \
  harfbuzz \
  jbig2dec \
  jpeg \
  openjpeg \
  zlib

WORKDIR /app

COPY --from=builder /dist/pdf-transcoder ./

CMD ["./pdf-transcoder"]

Build is successful but when I tried to run it, getting the below error:

cannot create context: incompatible header (1.20.0) and library (1.18.0) versions
2022/06/26 08:20:41 fitz: cannot create context

I am not getting what am I doing wrong? @gen2brain thanks in advance

Updating pdf file

hi, In my app, i'm reading list of 'pdf' files, and list those having specific words.

Is there a way to modify the file itself, let's say I want to see if the file contains the word 'fitz', and if yes, I need to go to the file itself and highlight this word by yello color?

I know how to read the text and find if it contains this word, but how can I go back to the file and highlight it, if possible? thanks

Docker: go build fails due to Missing Dependencies

Hi
I'm getting the following issue after following #35
image

Image looks like

FROM golang:1.17-alpine3.14 AS build_api
WORKDIR /go/src/gitlab.com/repo
COPY . .
RUN apk add --no-cache build-base \
    mupdf mupdf-dev \
    freetype freetype-dev \
    harfbuzz harfbuzz-dev \
    jbig2dec jbig2dec-dev \
    jpeg jpeg-dev \
    openjpeg openjpeg-dev \
    zlib zlib-dev && \
    go mod download && \
    go mod vendor && \
    CGO_LDFLAGS="-lmupdf -lm -lmupdf-third -lfreetype -ljbig2dec -lharfbuzz -ljpeg -lopenjp2 -lz" && \
    GOOS=linux CGO_ENABLED=1 GOARCH=amd64 go build -o /tmp/fitz scan.go 

Memory Leak on NewFromMemory function

In the NewFromMemory function you read the byte slice into a data variable (*C.uchar) and then turn it into a stream. When the job is done that variable lingers in memory. I don't know a lot about C or Cgo integration, but I fixed it in my fork of go-fitz by putting the pointer into the document struct and made sure it got freed in the document close method. When I tried to defer its deletion inside the NewFromMemory function I ran into a panic on the stream.

To test the problem I read a pdf into a byte slice and had a for loop open it with NewFromMemory, dump its text, and then close the document. You'll see the build up in ram immediately.

I figure you might have a better solution than I came up with, because I'm literally guessing at the C commands! ;)

Cannot built c-shared

Hello,
Thanks for the great work,
here is my issue

Code :

package main

import "C"
import "github.com/gen2brain/go-fitz"

//export GetKey
func GetKey() *C.char {
	fitz.New("test.pdf")
	return C.CString("test")
}

func main() {
}

when building c-shared
go build -buildmode=c-shared -o lib.a main.go

I got this error :


/usr/bin/ld: /home/near/go/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_linux_amd64_musl.a(buffer.o): in function `fz_new_buffer':
buffer.c:(.text.fz_new_buffer+0x47): undefined reference to `sigsetjmp'
/usr/bin/ld: /home/near/go/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_linux_amd64_musl.a(buffer.o): in function `fz_new_buffer_from_data':
buffer.c:(.text.fz_new_buffer_from_data+0x27): undefined reference to `sigsetjmp'
/usr/bin/ld: /home/near/go/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_linux_amd64_musl.a(buffer.o): in function `fz_new_buffer_from_base64':
buffer.c:(.text.fz_new_buffer_from_base64+0xf6): undefined reference to `sigsetjmp'
/usr/bin/ld: /home/near/go/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_linux_amd64_musl.a(colorspace.o): in function `fz_cached_color_convert':
colorspace.c:(.text.fz_cached_color_convert+0xa0): undefined reference to `sigsetjmp'
/usr/bin/ld: /home/near/go/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_linux_amd64_musl.a(colorspace.o): in function `fz_new_colorspace':
colorspace.c:(.text.fz_new_colorspace+0x6f): undefined reference to `sigsetjmp'
/usr/bin/ld: /home/near/go/pkg/mod/github.com/gen2brain/[email protected]/libs/libmupdf_linux_amd64_musl.a(colorspace.o):colorspace.c:(.text.fz_new_icc_colorspace+0x76): more undefined references to `sigsetjmp' follow
collect2: error: ld returned 1 exit status

Checksum mismatch error on go get

Hello, I'm currently getting the following error on go get this package:

verifying github.com/gen2brain/[email protected]: checksum mismatch
	downloaded: h1:2fa6dEQmSv1utU4Zb8NaY8bjzhfATyiKrs4nffGel7M=
	go.sum:     h1:vZQRGgQZqHzZRGRnvSUDwz26YD0ZnIyprAvR2C+Hh24=

SECURITY ERROR
This download does NOT match an earlier download recorded in go.sum.
The bits may have been replaced on the origin server, or an attacker may
have intercepted the download attempt.

For more information, see 'go help module-auth'.

My go version is go version go1.16.13 linux/amd64. There seems to be some tagging error of some sorts.

error: cannot find startxref

whenever I try to run code on a malformed or corrupted pdf file it throws a fatal error which is given below also I have attached the corrupted pdf file
test.pdf
error: cannot find startxref
warning: trying to repair broken xref
warning: repairing PDF document
error: array not closed before end of file
uncaught error: array not closed before end of file
exit status 1

Docker: go build fails due to undefined references

I'm building a simple microservice based on go-fitz using the official Golang Docker image. When executing the go build step within my container, I'm getting a bunch of error messages from ld. For Alpine-based image it looks like this:

# github.com/gen2brain/go-fitz
/usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../x86_64-alpine-linux-musl/bin/ld: /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../lib/libmupdf.so: undefined reference to `FT_Select_Charmap'
...

I tried the Bullseye image instead, but got almost identical errors:

# github.com/gen2brain/go-fitz
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/10/../../../../lib/libmupdf.a(font.o): in function `ft_char_index':
(.text.ft_char_index+0x10): undefined reference to `FT_Get_Char_Index'
...

My Dockerfile for Alpine:

FROM golang:1.16-alpine
RUN apk add --no-cache build-base mupdf mupdf-dev
WORKDIR /app
COPY go.mod ./
COPY go.sum ./
COPY src/*.go ./
RUN go mod download && go build -tags extlib -o /fitz-rest

Bullseye (may also be changed to Buster, the issue still persists):

FROM golang:1.16-bullseye
RUN apt update && apt -y install build-essential mupdf libmupdf-dev
WORKDIR /app
COPY go.mod ./
COPY go.sum ./
COPY src/*.go ./
RUN go mod download && go build -tags extlib -o /fitz-rest

I do understand, that the errors are coming from unresolved dependencies, e.g. FT_Get_Char_Index obviously originates from the freetype API. However these dependencies are expected to be brought by the mupdf and the corresponding dev packages, and if I look into the container filesystem they are indeed there including header files.

Now I'm running out of ideas how to solve this issue. Maybe you can help me.

P.S. I've already seen #32 but this issue seems to be completely different. The OP of #13 has apparently somehow brought the docker image to work, unfortunately we don't know how exaclty.

Problem with non-pdf input: execution interrupts, no error returned

On attempt of:

func Example_NonPDF() {

	buf := bytes.NewBufferString("Non pdf")
	_, err := fitz.NewFromReader(buf)

	// This line never executed. How to caught uncaught error?
	fmt.Printf("fitz.NewFromReader error: %v\n", err)

	// Output: some error expected
}

this is printed in stdout:

error: cannot recognize version marker
warning: trying to repair broken xref
warning: repairing PDF document
error: no objects found
uncaught error: no objects found

Execution interrupts, no error returned.

I suppose that are cgo errors - how to handle them?

Suggested fix for `recompile with -fPIC` does not work

Following the suggestions in the readme, downloading the package with go get -u github.com/gen2brain/go-fitz gives me the -fPIC issue seen here https://gist.github.com/postables/4b603bb0e82ac4203b35dc49a50622e2

downloading the package with go get -u -tags gcc7 github.com/gen2brain/go-fitz doesn't give me any errors, however running go vet ./... produces the error seen here https://gist.github.com/postables/9eca025ea57f79d22c64acd945bee2fe the same issue is produced with go build and go test

uname -a:

Linux dark 4.15.0-39-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

/etc/lsb-release:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Pop!_OS 18.04 LTS"

go version:

go version go1.11.1 linux/amd64

I have replicated the issue on a separate machine with the following specs:

uname -a:

Linux ipfs-node-1 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

/etc/lsb-release:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"

go version:

go version go1.10.3 linux/amd64

Update:

It appears that running with go vet -tags gcc7 ./... solves the issue. However, I'm unclear as to how one can get around this issue while using this library on systems which require this work-around

macOS build Linux error

> [6/6] RUN CC=x86_64-w64-mingw32-gcc CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build:
#10 1.090 # runtime/cgo
#10 1.090 gcc_linux_amd64.c: In function '_cgo_sys_thread_start':
#10 1.090 gcc_linux_amd64.c:61:2: error: unknown type name 'sigset_t'; did you mean '_sigset_t'?
#10 1.090    61 |  sigset_t ign, oset;
#10 1.090       |  ^~~~~~~~
#10 1.090       |  _sigset_t
#10 1.090 gcc_linux_amd64.c:66:2: error: implicit declaration of function 'sigfillset' [-Werror=implicit-function-declaration]
#10 1.090    66 |  sigfillset(&ign);
#10 1.090       |  ^~~~~~~~~~
#10 1.090 gcc_linux_amd64.c:61:16: error: unused variable 'oset' [-Werror=unused-variable]
#10 1.090    61 |  sigset_t ign, oset;
#10 1.090       |                ^~~~
#10 1.090 cc1: all warnings being treated as errors
------
executor failed running [/bin/sh -c CC=x86_64-w64-mingw32-gcc CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build]: exit code: 2

unable to use library within bazel

The bazel is not able to build

DEBUG: /private/var/tmp/_bazel_mgenov/8cac0968a0c17cc631135ceb621314f4/external/bazel_gazelle/internal/go_repository.bzl:189:18:
com_github_gen2brain_go_fitz: 
gazelle: /private/var/tmp/_bazel_mgenov/8cac0968a0c17cc631135ceb621314f4/external/com_github_gen2brain_go_fit/fitz_cgo_extlib_pkgconfig.go:
error reading go file: /private/var/tmp/_bazel_mgenov/8cac0968a0c17cc631135ceb621314f4/external
/com_github_gen2brain_go_fitz/fitz_cgo_extlib_pkgconfig.go: pkg-config not supported: #cgo pkg-config: mupdf

error: no builtin cmap file: UniGB-UCS2-H

error: no builtin cmap file: UniGB-UCS2-H
warning: unrecoverable error; ignoring rest of page
err 0
error: no builtin cmap file: UniGB-UCS2-H
warning: unrecoverable error; ignoring rest of page
error: no builtin cmap file: UniGB-UCS2-H
warning: unrecoverable error; ignoring rest of page

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.