I used some web scraping and web automation techniques within Python to legally download free books from www.leanpub.com
License: MIT License
Python 100.00%
leanpub's Introduction
LeanPub
Steps:
src/leanpub.py: Within this script I used BeautifulSoup for scraping over the pages and finding only FREE books. After finding them I am storing them into BLink.json;
src/leanpub_selenium.py: Within this script I used Selenium WebDriver to automate adding each book to the shopping cart. You can make logging in possible by uncommenting some lines, but it is possible without logging in as well. One of the hardest parts I solved was to change the scrollbar's value and one of the hardest parts I could not solve was to click Continue button to go to the next step for purchase. I think, since they used ReactJS, it was not easy for me to handle it. But I thought I need just all items in the shopping cart, I can handle checkout part manually; so, I did. I got mail and saved the HTML of the mail page into leanpub_gmail.html, because now I have the links to PDF, EPUB, and MOBI version of the books;
src/leanpub_download.py: Although I used download in the name of the file, I don't download anything by using it. Within this script I used BeautifulSoup for scraping over leanpub_gmail.html and store all relevant information (the authors, the language of the book, the links to specific download options, etc.) into the BData.json;
src/leanpub_categorization.py: Within this script I used BeautifulSoup for scraping over the pages of the various categories and updating BData.json by adding the fields of the categories. Don't forget: one book might belong to several categories;
src/leanpub_foldering.py: Within this script I used BeautifulSoup for downloading PDF version of all books using the previously obtained information in BData.json. I used categorization for the folder part; in other words, after downloading you can see one book can appear in multiple folders.
Notes:
This is not hack: I used some web scraping and web automation techniques within Python to legally download free books from LeanPub. Since what I downloaded can be legally downloaded, I am not doing any illegal thing.
I am not going to attach BLink.json, BData.json, and leanpub_gmail.html; they are available upon request!