Giter Site home page Giter Site logo

web2pdf's Introduction

Web2pdf

CLI to convert webpages to PDFs

Web2pdf is a command line tool that converts webpages to Beautifully formatted pdfs.

If this project proves useful to you in any way, please consider supporting me by buying a coffee!

Buy Me A Coffee

webp2pdf

Features

  • Features
    • ๐Ÿ’ฅ Batch Conversion: Convert multiple webpages to PDFs in one go.
    • ๐Ÿ”„ Custom Styling: Tailor the appearance of your PDFs with customizable CSS, allowing you to adjust everything from fonts to background colors.
    • ๐Ÿ“„ Additional CSS: Flexibility to add custom CSS for further customization.
    • ๐Ÿ”— Multi-column Support: Benefit from multi-column support for more complex PDF layouts.
    • ๐Ÿ“š Page Numbers: Add page numbers to your PDFs for easier navigation.
    • ๐Ÿ”ข Table of Contents: Automatically generate a table of contents based on the headings in your HTML.
    • ๐Ÿ”ข Page Numbers: Add page numbers to your PDFs for easier navigation.
    • ๐Ÿšฆ Page Breaks: Control page breaks to ensure your PDFs are formatted exactly as you want them.
    • ๐Ÿ‘ Much more

Usage/Installation

To install it right away for all UNIX users (Linux, macOS, etc.), type:

git clone https://github.com/dvcoolarun/web2pdf.git

Then you can use the tool as follows

pipenv shell
pipenv install
python main.py

Just add the webpage URLs separated by commas, and the tool will convert them to PDFs.

Development

You can clone the repository and install the package using the following commands

git clone
cd webp2pdf
pipenv install

Contributing

This CLI is in its early version, and we encourage the community to help improve code, testing, and additional features. Feel free to contribute to the project by submitting pull requests, reporting issues, or suggesting new features. Your contributions are highly appreciated!

web2pdf's People

Contributors

dvcoolarun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

web2pdf's Issues

import error

Tried to run this myself, running arch

(web2pdf) $ python main.py 
Fontconfig error: "/etc/fonts/local.conf", line 1: XML declaration not well-formed // I believe this is unrelated
Traceback (most recent call last):
  File "/~~~/p/web2pdf/main.py", line 7, in <module>
    from readability import Document
ImportError: cannot import name 'Document' from 'readability' (/~~~/.local/share/virtualenvs/web2pdf-VRO491p1/lib/python3.9/site-packages/readability/__init__.py)

with:

pipenv, version 2023.12.1
Python 3.9.18

I also checked, I don't see a Document property on readability if I just open a shell and import it. Just from a glance at your code I'm pretty confused what's going wrong here.

Got an error when running this on Mac

I followed the instructions from the Readme and I got this at the end of the stacktrace:

OSError: cannot load library 'gobject-2.0-0': dlopen(gobject-2.0-0, 0x0002): tried: 'gobject-2.0-0' (no such file), '/System/Volumes/Preboot/Cryptexes/OSgobject-2.0-0' (no such file), '/usr/lib/gobject-2.0-0' (no such file, not in dyld cache), 'gobject-2.0-0' (no such file), '/usr/local/lib/gobject-2.0-0' (no such file), '/usr/lib/gobject-2.0-0' (no such file, not in dyld cache).  Additionally, ctypes.util.find_library() did not manage to locate a library called 'gobject-2.0-0'

Cloud Storage

Provide options to save documents to cloud accounts (i.e. Google-Drive).

I've been working in some options to make API calls to save documents to my Google Drive account.

use containers

Looks like this needs pango. Can this be a docker container so it comes with everything it needs?

WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue:
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation
https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting

OSError: cannot load library 'pango-1.0-0': pango-1.0-0: cannot open shared object file: No such file or directory. Additionally, ctypes.util.find_library() did not manage to locate a library called 'pango-1.0-0'

once I got past that, I got

OSError: cannot load library 'pangoft2-1.0-0': pangoft2-1.0-0: cannot open shared object file: No such file or directory. Additionally, ctypes.util.find_library() did not manage to locate a library called 'pangoft2-1.0-0'

once I got past that I got

$ python main.py https://gamersnexus.net/features-deep-dive/how-ram-made-automated-binning-manufacturing-burn-testing-factory-tours
Traceback (most recent call last):
  File "/home/[username]/src/py/web2pdf/main.py", line 7, in <module>
    from readability import Document
ImportError: cannot import name 'Document' from 'readability' (/home/[username]/.local/share/virtualenvs/web2pdf-ZHT1wjR3/lib/python3.9/site-packages/readability/__init__.py)

No images

I'm not sure if this is an "issue" or a "feature", but the PDFs are not including images from many webpages. Particularly Wikipedia pages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.