Giter Site home page Giter Site logo

unipdf-cli's Issues

Unexpected behaviour when extracting images from a PDF file

Description

I just tried to extract images from a PDF file. The command run was:

$ ./unicli extract -r images ~/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.pdf
Images successfully extracted to /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
ahalls-MBP:unicli ahall$ unzip /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip

When unzipping the images I got:

l$ unzip /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
Archive:  /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
  inflating: p1_0.jpg
  inflating: p1_1.jpg
  inflating: p1_2.jpg
  inflating: p1_3.jpg
  inflating: p1_4.jpg
  inflating: p1_5.jpg
  inflating: p1_6.jpg
  inflating: p1_7.jpg
  inflating: p1_8.jpg
  inflating: p1_9.jpg
  inflating: p1_10.jpg
$ ls -la p1*
-rw-r--r--@ 1 ahall  staff  599 Dec 31  1979 p1_0.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_1.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_10.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_2.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_3.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_4.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_5.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_6.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_7.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_8.jpg
-rw-r--r--  1 ahall  staff  599 Dec 31  1979 p1_9.jpg

Expected Behavior

The file has no images and should not even return a ZIP file, should just state that the file has no images. The files also should also have more recent timestamp than 1979.

Actual Behavior

A zip file is returned with a few empty .jpg files. The files also have a 1979 timestamp.

File causing issue

heimadaemi_VI_E2_2015_2016_nr_09.pdf

unicli invoice template/generate

Invoices can be complex. Instead of a ton of parameters, use a JSON input file.

  • unicli invoice template outputs the JSON template
$ unicli invoice template
{
  name: "Full name",
  ...
}
  • unicli invoice generate invoice.pdf invdata.json generates the invoice

Start basic, with basic fields similar to the first example on:
https://unidoc.io/news/simple-invoices

Create Brew formula

Once the package is more stable, it would be good to make it installable via brew, especially now that Brew is cross platform (OSX, GNU/Linux and Windows).

unicli explode command

Add a new CLI command: explode, which explodes all pages into separate PDF files.
The output should be a ZIP archive with all pages with filenames input_1.pdf, input_2.pdf, ... for each page.

Watermak Sign

Hello, I'm making personal use of unipdf, how to remove the watermark that appears in the files after encryption?

Thanks in advance.

Create a README.MD with examples

The README.MD should have:

  • Full listing of commands
  • asciicinema demo of how it works for some basic operation
  • images of PDFs generated for 1-2 demo cases

Unable to use offline key with compiled binary

I just tried following your blog post, Compressing and Optimizing PDFs in Pure Golang using UniPDF using a compiled binary with my offline key. For example:

UNIDOC_LICENSE_FILE="unidoc_license.key" UNIDOC_LICENSE_CUSTOMER=<customer_name> unipdf license_info

And I get the following output:

License: License Id:
Customer Id:
Customer Name: Unlicensed
Tier: unlicensed
Created At: 30 August 2023 at 17:23 UTC
Expires At: Never
Creator:  <>

The keys I'm providing are used in production so I know they're correct. I then dug into your source code and I found the bug:

func SetLicense(licensePath string, customer string) error {
// Read license file
content, err := ioutil.ReadFile(licensePath)
if err != nil {
return err
}
return unilicense.SetLicenseKey(string(content), "")
}

If you look at line 24, the customer name is hard coded as an empty string. Change that line to pass the customer variable and everything works.

Watched folders for compress/optimize with a task queue

Idea:

unicli optimize --watch /folder --out /path/to/outputs

watches /folder for input files, for each new PDF that is seen, puts compress task info on a task queue.
Runs a task queue with 4 goroutines (configurable number) which processes each new task and writes the optimized PDF to the output folder.

Upon starting should process the PDF files that are in the folder at the time of starting.

Options

  • Keep a copy of the original (write as filename.orig.pdf in the same output folder)
  • Error handling. If there is an error, write to an error log, possibly create a filename_error.txt in the output folder.

optimize output - show compression ratio and time

  • Would be good if unicli can show the file size before and after, along with a compression ratio
  • Also would be nice to show the time it took to process
  • Either show by default or consider a flag or parameter to enable.

Cannot handle spaces in filenames

Merging two PDF files, one of which has spaces in the name:

backslash escaped on command line ...

$ unipdf merge output.pdf part1.pdf part\ two\ report.pdf
Could not merge the input files: open part: no such file or directory

enclosed in double quotes ...

$ unipdf merge output.pdf part1.pdf "part two report.pdf"
Could not merge the input files: open part: no such file or directory

I was forced to rename the input file to be "part_two_report.pdf" for it to work properly. This should be an easy fix to not assume filenames won't have spaces in the name.

Support and document installation without Go

Since Go is a compiled language, isn't it better to compile the package on every release for multiple platforms (I think goreleaser already handles that) and use that in instructions instead of requiring Go to install UniCLI?

unicli rotate pages command

Add a CLI command: rotate to rotate pages

  • Should be able to rotate a specific page or a page range by a specified angle
  • Angle should be a multiple of 90 degrees

Update installation instructions (README)

Current state

The instructions provided result in an error as does work in loading the module.

Need to update the instructions, probably with a git clone followed by go build (for installation based on the module).

optimize multiple input files

Should be able to

  • Specify multiple input files, either file1.pdf file2.pdf ..., or
  • Specify output folder, outputs are placed within that folder

unicli debug command prompt

Opens up a prompt for viewing and debugging PDFs. Once in prompt can execute various commands to debug.

$ unicli debug file.pdf
> version
1.7
> pages
10
> page 3
Page context set to page 3
> images
Page 3
1 images
Img1 XObject: 121 0 R
> wo 121 /tmp/img1.dat
Object 121 written to /tmp/img1.dat
> content
q
Img1 Do
Q
> quit
Closing debug prompt
$

Basic commands

version/v - Print PDF version
catalog/c - Displays the PDF catalog
obj/o num - Displays object number `num`   In a readable form.  If the object is binary then avoid writing to the console
writeobj/wo num path - Writes object num to path
pages/pp - number of pages

Page context

It is also possible to work in page context, i.e. set page context to a specific page.

page/p num - Sets page context to page `num`
resources/res (num) - Prints page resources for page num (parameter not needed if page context set)
fonts (num) - Overview of fonts
xobj (num) - Overview of XObjects
contents (num) - Print the contentstream
text - Outputs as text
images - Overview of images in the content

Other

Other things we would like to be able to see:

  • filters: Get an overview of encodings/filters that are used in the PDF.

Compression ratio 0.19%

Hello I am trying to compress a pdf from the unicli. Everything appears to be working however the compression is negligible (0.19%). This pdf does contain some large images - so I am not sure if there is something wrong with my command.

The watermark does show up as expected on each page.

$ ./unipdf optimize /path/to/myLargeFile.pdf -q 75 -P 100
Optimizing /path/to/myLargeFile.pdf
Unlicensed copy of unidoc
To get rid of the watermark - Please get a license on https://unidoc.io
Original: /path/to/myLargeFile.pdf
Original size: 211795879 bytes
Optimized: /path/to/myLargeFile_optimized.pdf
Optimized size: 212193033 bytes
Compression ratio: -0.19%
Processing time: 2957.33 ms
Status: success
----------

Render PDF to images functionality

Add a render function to render a PDF or page(s) from PDF.
Can be consistent with other functions. Either
render file.pdf file.zip
where entire file.pdf is rendered to images inside file.zip.
or
render file.pdf -p 1 file.zip
to get page 1, or a page range (consistent with what is done in other methods).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.