Giter Site home page Giter Site logo

obiplabon / leanpub Goto Github PK

View Code? Open in Web Editor NEW

This project forked from webscrapist/leanpub

1.0 1.0 0.0 12 KB

I used some web scraping and web automation techniques within Python to legally download free books from www.leanpub.com

License: MIT License

Python 100.00%

leanpub's Introduction

LeanPub

Steps:

  1. src/leanpub.py: Within this script I used BeautifulSoup for scraping over the pages and finding only FREE books. After finding them I am storing them into BLink.json;
  2. src/leanpub_selenium.py: Within this script I used Selenium WebDriver to automate adding each book to the shopping cart. You can make logging in possible by uncommenting some lines, but it is possible without logging in as well. One of the hardest parts I solved was to change the scrollbar's value and one of the hardest parts I could not solve was to click Continue button to go to the next step for purchase. I think, since they used ReactJS, it was not easy for me to handle it. But I thought I need just all items in the shopping cart, I can handle checkout part manually; so, I did. I got mail and saved the HTML of the mail page into leanpub_gmail.html, because now I have the links to PDF, EPUB, and MOBI version of the books;
  3. src/leanpub_download.py: Although I used download in the name of the file, I don't download anything by using it. Within this script I used BeautifulSoup for scraping over leanpub_gmail.html and store all relevant information (the authors, the language of the book, the links to specific download options, etc.) into the BData.json;
  4. src/leanpub_categorization.py: Within this script I used BeautifulSoup for scraping over the pages of the various categories and updating BData.json by adding the fields of the categories. Don't forget: one book might belong to several categories;
  5. src/leanpub_foldering.py: Within this script I used BeautifulSoup for downloading PDF version of all books using the previously obtained information in BData.json. I used categorization for the folder part; in other words, after downloading you can see one book can appear in multiple folders.

Notes:

  • This is not hack: I used some web scraping and web automation techniques within Python to legally download free books from LeanPub. Since what I downloaded can be legally downloaded, I am not doing any illegal thing.
  • I am not going to attach BLink.json, BData.json, and leanpub_gmail.html; they are available upon request!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.