Unexpected behaviour when extracting images from a PDF file

Description

I just tried to extract images from a PDF file. The command run was:

$ ./unicli extract -r images ~/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.pdf
Images successfully extracted to /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
ahalls-MBP:unicli ahall$ unzip /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip

When unzipping the images I got:

l$ unzip /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
Archive:  /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
  inflating: p1_0.jpg
  inflating: p1_1.jpg
  inflating: p1_2.jpg
  inflating: p1_3.jpg
  inflating: p1_4.jpg
  inflating: p1_5.jpg
  inflating: p1_6.jpg
  inflating: p1_7.jpg
  inflating: p1_8.jpg
  inflating: p1_9.jpg
  inflating: p1_10.jpg

$ ls -la p1*
-rw-r--r--@ 1 ahall  staff  599 Dec 31  1979 p1_0.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_1.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_10.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_2.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_3.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_4.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_5.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_6.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_7.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_8.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_9.jpg

Expected Behavior

The file has no images and should not even return a ZIP file, should just state that the file has no images. The files also should also have more recent timestamp than 1979.

Actual Behavior

A zip file is returned with a few empty .jpg files. The files also have a 1979 timestamp.

File causing issue

heimadaemi_VI_E2_2015_2016_nr_09.pdf

Search replace text

Extend the search functionality so it can also replace text.
E.g.

unicli search -r replacement -o output_file.pdf input_file.pdf text_to_search

Update dependency to unipdf 3 and rename output CLI binary to unipdf

unicli form export/fill commands

Add commands to provide easy filling of PDF forms.

unicli form export file.pdf should generate a JSON output file with an overview of
all the fields and their values in file.pdf
unicli form fill file.pdf formdata.json output.pdf or something like that should fill file.pdf with the form data provided in formdata.json and write to output.pdf

Can be based off of:
https://github.com/unidoc/unidoc-examples/blob/v3/pdf/forms/pdf_form_fill_json.go

unicli invoice template/generate

Invoices can be complex. Instead of a ton of parameters, use a JSON input file.

unicli invoice template outputs the JSON template

$ unicli invoice template
{
  name: "Full name",
  ...
}

unicli invoice generate invoice.pdf invdata.json generates the invoice

Start basic, with basic fields similar to the first example on:
https://unidoc.io/news/simple-invoices

Create Brew formula

Once the package is more stable, it would be good to make it installable via brew, especially now that Brew is cross platform (OSX, GNU/Linux and Windows).

unicli explode command

Add a new CLI command: explode, which explodes all pages into separate PDF files.
The output should be a ZIP archive with all pages with filenames input_1.pdf, input_2.pdf, ... for each page.

unicli flatten form command

Add a CLI command to flatten form and annotations.
E.g.

unicli flatten form flattened.pdf input.pdf

Example to base off:
https://github.com/unidoc/unidoc-examples/blob/v3/pdf/forms/pdf_form_flatten.go

Watermak Sign

Hello, I'm making personal use of unipdf, how to remove the watermark that appears in the files after encryption?

Thanks in advance.

Create a README.MD with examples

The README.MD should have:

Full listing of commands
asciicinema demo of how it works for some basic operation
images of PDFs generated for 1-2 demo cases

Unable to use offline key with compiled binary

I just tried following your blog post, Compressing and Optimizing PDFs in Pure Golang using UniPDF using a compiled binary with my offline key. For example:

UNIDOC_LICENSE_FILE="unidoc_license.key" UNIDOC_LICENSE_CUSTOMER=<customer_name> unipdf license_info

And I get the following output:

License: License Id:
Customer Id:
Customer Name: Unlicensed
Tier: unlicensed
Created At: 30 August 2023 at 17:23 UTC
Expires At: Never
Creator:  <>

The keys I'm providing are used in production so I know they're correct. I then dug into your source code and I found the bug:

unipdf-cli/pkg/pdf/pdf.go

Lines 17 to 25 in b3344a3

    
           func SetLicense(licensePath string, customer string) error { 
        
           	// Read license file 
        
           	content, err := ioutil.ReadFile(licensePath) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	return unilicense.SetLicenseKey(string(content), "") 
        
           }

If you look at line 24, the customer name is hard coded as an empty string. Change that line to pass the customer variable and everything works.

Watched folders for compress/optimize with a task queue

Idea:

unicli optimize --watch /folder --out /path/to/outputs

watches /folder for input files, for each new PDF that is seen, puts compress task info on a task queue.
Runs a task queue with 4 goroutines (configurable number) which processes each new task and writes the optimized PDF to the output folder.

Upon starting should process the PDF files that are in the folder at the time of starting.

Options

Keep a copy of the original (write as filename.orig.pdf in the same output folder)
Error handling. If there is an error, write to an error log, possibly create a filename_error.txt in the output folder.

optimize output - show compression ratio and time

Would be good if unicli can show the file size before and after, along with a compression ratio
Also would be nice to show the time it took to process
Either show by default or consider a flag or parameter to enable.

Cannot handle spaces in filenames

Merging two PDF files, one of which has spaces in the name:

backslash escaped on command line ...

$ unipdf merge output.pdf part1.pdf part\ two\ report.pdf
Could not merge the input files: open part: no such file or directory

enclosed in double quotes ...

$ unipdf merge output.pdf part1.pdf "part two report.pdf"
Could not merge the input files: open part: no such file or directory

I was forced to rename the input file to be "part_two_report.pdf" for it to work properly. This should be an easy fix to not assume filenames won't have spaces in the name.

Support and document installation without Go

Since Go is a compiled language, isn't it better to compile the package on every release for multiple platforms (I think goreleaser already handles that) and use that in instructions instead of requiring Go to install UniCLI?

能解决pdf双层的问题吗？

unicli rotate pages command

Add a CLI command: rotate to rotate pages

Should be able to rotate a specific page or a page range by a specified angle
Angle should be a multiple of 90 degrees

Update installation instructions (README)

Current state

The instructions provided result in an error as does work in loading the module.

Need to update the instructions, probably with a git clone followed by go build (for installation based on the module).

optimize multiple input files

Should be able to

Specify multiple input files, either file1.pdf file2.pdf ..., or
Specify output folder, outputs are placed within that folder

unicli debug command prompt

Opens up a prompt for viewing and debugging PDFs. Once in prompt can execute various commands to debug.

$ unicli debug file.pdf
> version
1.7
> pages
10
> page 3
Page context set to page 3
> images
Page 3
1 images
Img1 XObject: 121 0 R
> wo 121 /tmp/img1.dat
Object 121 written to /tmp/img1.dat
> content
q
Img1 Do
Q
> quit
Closing debug prompt
$

Basic commands

version/v - Print PDF version
catalog/c - Displays the PDF catalog
obj/o num - Displays object number `num`   In a readable form.  If the object is binary then avoid writing to the console
writeobj/wo num path - Writes object num to path
pages/pp - number of pages

Page context

It is also possible to work in page context, i.e. set page context to a specific page.

page/p num - Sets page context to page `num`
resources/res (num) - Prints page resources for page num (parameter not needed if page context set)
fonts (num) - Overview of fonts
xobj (num) - Overview of XObjects
contents (num) - Print the contentstream
text - Outputs as text
images - Overview of images in the content

Other

Other things we would like to be able to see:

filters: Get an overview of encodings/filters that are used in the PDF.

Update modules and update release version to 0.2.0

Update to dependency of unidoc v3.0.0-alpha.3

Ready to release v0.2.0 of UniCLI ?

Release unicli v0.1.0 pointing to unidoc v3.0.0-alpha.1

Change unidoc dependency (go.mod) to newly released unidoc v3.0.0-alpha.1
Release unicli v0.1

Compression ratio 0.19%

Hello I am trying to compress a pdf from the unicli. Everything appears to be working however the compression is negligible (0.19%). This pdf does contain some large images - so I am not sure if there is something wrong with my command.

The watermark does show up as expected on each page.

$ ./unipdf optimize /path/to/myLargeFile.pdf -q 75 -P 100
Optimizing /path/to/myLargeFile.pdf
Unlicensed copy of unidoc
To get rid of the watermark - Please get a license on https://unidoc.io
Original: /path/to/myLargeFile.pdf
Original size: 211795879 bytes
Optimized: /path/to/myLargeFile_optimized.pdf
Optimized size: 212193033 bytes
Compression ratio: -0.19%
Processing time: 2957.33 ms
Status: success
----------

Render PDF to images functionality

Add a render function to render a PDF or page(s) from PDF.
Can be consistent with other functions. Either
render file.pdf file.zip
where entire file.pdf is rendered to images inside file.zip.
or
render file.pdf -p 1 file.zip
to get page 1, or a page range (consistent with what is done in other methods).

$ unicli form fdfmerge merged.pdf fdf1.fdf input.pdf

Base off example:
https://github.com/unidoc/unidoc-examples/blob/v3/pdf/forms/pdf_form_fill_fdf_merge.go

	func SetLicense(licensePath string, customer string) error {
	// Read license file
	content, err := ioutil.ReadFile(licensePath)
	if err != nil {
	return err
	}

	return unilicense.SetLicenseKey(string(content), "")
	}

unidoc / unipdf-cli Goto Github PK

unipdf-cli's Issues