unidoc / unipdf-cli Goto Github PK
View Code? Open in Web Editor NEWCLI for PDF processing using unipdf
License: Other
CLI for PDF processing using unipdf
License: Other
I just tried to extract images from a PDF file. The command run was:
$ ./unicli extract -r images ~/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.pdf
Images successfully extracted to /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
ahalls-MBP:unicli ahall$ unzip /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
When unzipping the images I got:
l$ unzip /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
Archive: /Users/ahall/Downloads/heimadaemi_VI_E2_2015_2016_nr_09.zip
inflating: p1_0.jpg
inflating: p1_1.jpg
inflating: p1_2.jpg
inflating: p1_3.jpg
inflating: p1_4.jpg
inflating: p1_5.jpg
inflating: p1_6.jpg
inflating: p1_7.jpg
inflating: p1_8.jpg
inflating: p1_9.jpg
inflating: p1_10.jpg
$ ls -la p1*
-rw-r--r--@ 1 ahall staff 599 Dec 31 1979 p1_0.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_1.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_10.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_2.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_3.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_4.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_5.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_6.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_7.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_8.jpg
-rw-r--r-- 1 ahall staff 599 Dec 31 1979 p1_9.jpg
The file has no images and should not even return a ZIP file, should just state that the file has no images. The files also should also have more recent timestamp than 1979.
A zip file is returned with a few empty .jpg files. The files also have a 1979 timestamp.
Extend the search functionality so it can also replace text.
E.g.
unicli search -r replacement -o output_file.pdf input_file.pdf text_to_search
Related example:
https://github.com/unidoc/unidoc-examples/blob/v3/pdf/text/pdf_search_replace.go
Add commands to provide easy filling of PDF forms.
unicli form export file.pdf
should generate a JSON output file with an overview of
all the fields and their values in file.pdf
unicli form fill file.pdf formdata.json output.pdf
or something like that should fill file.pdf
with the form data provided in formdata.json and write to output.pdf
Can be based off of:
https://github.com/unidoc/unidoc-examples/blob/v3/pdf/forms/pdf_form_fill_json.go
Invoices can be complex. Instead of a ton of parameters, use a JSON input file.
unicli invoice template
outputs the JSON template$ unicli invoice template
{
name: "Full name",
...
}
unicli invoice generate invoice.pdf invdata.json
generates the invoiceStart basic, with basic fields similar to the first example on:
https://unidoc.io/news/simple-invoices
Once the package is more stable, it would be good to make it installable via brew
, especially now that Brew is cross platform (OSX, GNU/Linux and Windows).
Add a new CLI command: explode, which explodes all pages into separate PDF files.
The output should be a ZIP archive with all pages with filenames input_1.pdf
, input_2.pdf
, ... for each page.
Add a CLI command to flatten form and annotations.
E.g.
unicli flatten form flattened.pdf input.pdf
Example to base off:
https://github.com/unidoc/unidoc-examples/blob/v3/pdf/forms/pdf_form_flatten.go
Hello, I'm making personal use of unipdf, how to remove the watermark that appears in the files after encryption?
Thanks in advance.
Update to newest unipdf release.
The README.MD should have:
I just tried following your blog post, Compressing and Optimizing PDFs in Pure Golang using UniPDF using a compiled binary with my offline key. For example:
UNIDOC_LICENSE_FILE="unidoc_license.key" UNIDOC_LICENSE_CUSTOMER=<customer_name> unipdf license_info
And I get the following output:
License: License Id:
Customer Id:
Customer Name: Unlicensed
Tier: unlicensed
Created At: 30 August 2023 at 17:23 UTC
Expires At: Never
Creator: <>
The keys I'm providing are used in production so I know they're correct. I then dug into your source code and I found the bug:
Lines 17 to 25 in b3344a3
If you look at line 24, the customer name is hard coded as an empty string. Change that line to pass the customer
variable and everything works.
unicli optimize --watch /folder --out /path/to/outputs
watches /folder for input files, for each new PDF that is seen, puts compress task info on a task queue.
Runs a task queue with 4 goroutines (configurable number) which processes each new task and writes the optimized PDF to the output folder.
Upon starting should process the PDF files that are in the folder at the time of starting.
Merging two PDF files, one of which has spaces in the name:
$ unipdf merge output.pdf part1.pdf part\ two\ report.pdf
Could not merge the input files: open part: no such file or directory
$ unipdf merge output.pdf part1.pdf "part two report.pdf"
Could not merge the input files: open part: no such file or directory
I was forced to rename the input file to be "part_two_report.pdf" for it to work properly. This should be an easy fix to not assume filenames won't have spaces in the name.
Since Go is a compiled language, isn't it better to compile the package on every release for multiple platforms (I think goreleaser
already handles that) and use that in instructions instead of requiring Go to install UniCLI?
能解决pdf双层的问题吗?
Add a CLI command: rotate
to rotate pages
The instructions provided result in an error as does work in loading the module.
Need to update the instructions, probably with a git clone followed by go build (for installation based on the module).
Should be able to
Opens up a prompt for viewing and debugging PDFs. Once in prompt can execute various commands to debug.
$ unicli debug file.pdf
> version
1.7
> pages
10
> page 3
Page context set to page 3
> images
Page 3
1 images
Img1 XObject: 121 0 R
> wo 121 /tmp/img1.dat
Object 121 written to /tmp/img1.dat
> content
q
Img1 Do
Q
> quit
Closing debug prompt
$
version/v - Print PDF version
catalog/c - Displays the PDF catalog
obj/o num - Displays object number `num` In a readable form. If the object is binary then avoid writing to the console
writeobj/wo num path - Writes object num to path
pages/pp - number of pages
It is also possible to work in page context, i.e. set page context to a specific page.
page/p num - Sets page context to page `num`
resources/res (num) - Prints page resources for page num (parameter not needed if page context set)
fonts (num) - Overview of fonts
xobj (num) - Overview of XObjects
contents (num) - Print the contentstream
text - Outputs as text
images - Overview of images in the content
Other things we would like to be able to see:
Update to dependency of unidoc v3.0.0-alpha.3
Ready to release v0.2.0 of UniCLI ?
Hello I am trying to compress a pdf from the unicli. Everything appears to be working however the compression is negligible (0.19%). This pdf does contain some large images - so I am not sure if there is something wrong with my command.
The watermark does show up as expected on each page.
$ ./unipdf optimize /path/to/myLargeFile.pdf -q 75 -P 100
Optimizing /path/to/myLargeFile.pdf
Unlicensed copy of unidoc
To get rid of the watermark - Please get a license on https://unidoc.io
Original: /path/to/myLargeFile.pdf
Original size: 211795879 bytes
Optimized: /path/to/myLargeFile_optimized.pdf
Optimized size: 212193033 bytes
Compression ratio: -0.19%
Processing time: 2957.33 ms
Status: success
----------
Add a render function to render a PDF or page(s) from PDF.
Can be consistent with other functions. Either
render file.pdf file.zip
where entire file.pdf is rendered to images inside file.zip.
or
render file.pdf -p 1 file.zip
to get page 1, or a page range (consistent with what is done in other methods).
This product should be renamed and the output called unipdf.
Add CLI command to support merging FDF form values into a PDF form and saving output as a flattened PDF.
E.g. (or order consistent with unicli CLIdesign principles)
$ unicli form fdfmerge merged.pdf fdf1.fdf input.pdf
Base off example:
https://github.com/unidoc/unidoc-examples/blob/v3/pdf/forms/pdf_form_fill_fdf_merge.go
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.