Giter Site home page Giter Site logo

archive.org-downloader's Introduction

made-with-python


Logo

Archive.org-Downloader

Python3 script to download archive.org books in PDF format

About The Project

There are many great books available on https://openlibrary.org/ and https://archive.org/, however, you can only borrow them for 1 hour to 14 days and you don't have the option to download it as a PDF to read it offline or share it with your friends. I created this program to solve this problem and retrieve the original book in pdf format for FREE!

Of course, the download takes a few minutes depending on the number of pages and the quality of the images you have selected. You must also create an account on https://archive.org/ for the script to work.

Getting Started

To get started you need to have python3 installed. If it is not the case you can download it here : https://www.python.org/downloads/

Installation

Make sure you've already git installed. Then you can run the following commands to get the scripts on your computer:

git clone https://github.com/MiniGlome/Archive.org-Downloader.git
cd Archive.org-Downloader

The script requires the modules requests, tqdm and img2pdf, you can install them all at once with this command:

pip install -r requirements.txt

Usage

usage: archive-org-downloader.py [-h] -e EMAIL -p PASSWORD [-u URL] [-d DIR] [-f FILE] [-r RESOLUTION] [-t THREADS] [-j]

optional arguments:
  -h, --help            show this help message and exit
  -e EMAIL, --email EMAIL
                        Your archive.org email
  -p PASSWORD, --password PASSWORD
                        Your archive.org password
  -u URL, --url URL     Link to the book (https://archive.org/details/XXXX). You can use this argument several times
                        to download multiple books
  -d DIR, --dir DIR     Output directory
  -f FILE, --file FILE  File where are stored the URLs of the books to download
  -r RESOLUTION, --resolution RESOLUTION
                        Image resolution (10 to 0, 0 is the highest), [default 3]
  -t THREADS, --threads THREADS
                        Maximum number of threads, [default 50]
  -j, --jpg             Output to individual JPG's rather then a PDF

The email and password fields are required, so to use this script you must have a registered account on archive.org. The -r argument specifies the resolution of the images (0 is the best quality). The PDF are downloaded in the current folder

Example

This command will download the 3 books as pdf in the best possible quality. To only download the individual images you can use --jpg.

python3 archive-org-downloader.py -e [email protected] -p Passw0rd -r 0 -u https://archive.org/details/IntermediatePython -u https://archive.org/details/horrorgamispooky0000bidd_m7r1 -u https://archive.org/details/elblabladelosge00gaut 

If you want to download a lot of books, you can paste the urls of the books in a .txt file (one per line) and use --file

python3 archive-org-downloader.py -e [email protected] -p Passw0rd --file books_to_download.txt

Donation

If you want to support my work, you can send 2 or 3 Bitcoins ๐Ÿ™ƒ to this address:

bc1q4nq8tjuezssy74d5amnrrq6ljvu7hd3l880m7l

bitcoin_address

archive.org-downloader's People

Contributors

bigchipbag avatar claudeha avatar milahu avatar miniglome avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.