Giter Site home page Giter Site logo

mirusu400 / pinterest-infinite-crawler Goto Github PK

View Code? Open in Web Editor NEW
41.0 3.0 7.0 2.94 MB

An infinite Pinterest crawler/scraper. Crawl image with inifnite-scroll!

License: MIT License

Python 99.77% Batchfile 0.23%
crawler scraper python selenium scraping pinterest pinterest-downloader hacktoberfest

pinterest-infinite-crawler's Introduction

Pinterest-infinite-crawler

An infinite pinterest crawler, crawl image by page. main

Requirements

  • Python 3.7+
  • Selenium, requests, beautifulsoup4, pyyaml
  • Chrome + Chromedriver

Installation

  1. Download requirements
git clone https://github.com/mirusu400/Pinterest-infinite-crawler.git
cd Pinterest-infinite-crawler
pip install -r requirements.txt
  1. Download chromedriver

You MUST download ChromeDriver as the same version of Chrome.

And replace it the same directory with main.py.

  1. (Optional) Set config.yaml

Copy .config.yaml to config.yaml and fill your Pinterest's email, password and directorys to save images

email: [your email here]
password: [your password here]
directory: ./download

Usage

python main.py

Using argument

You can also run crawler by passing argument, here are full document:

usage: main.py [-h] [-e EMAIL] [-p PASSWORD] [-d DIRECTORY] [-l LINK] [-g PAGE]

optional arguments:
  -h, --help                            show this help message and exit
  -e EMAIL, --email EMAIL               Your Pinterest account email
  -p PASSWORD, --password PASSWORD      Your Pinterest account password
  -d DIRECTORY, --directory DIRECTORY   Directory you want to download
  -l LINK, --link LINK                  Link of Pinterest which you want to scrape
  -g PAGE, --page PAGE                  Number of pages which you want to scrape
  -b BATCH, --batch BATCH               Enable batch mode (Please read README.md!!)

Example:

main.py -e [email protected] -p [your_password] -d download_image -l https://pinterest.com/ -g 10

Batch mode

You can download multiple Pinterest links in a one, using batch mode

  1. Copy and paste .batch.json to batch.json and modify json array files.
[
    {
        "index": "1",
        "link": "https://www.pinterest.co.kr/pin/362750944993136496/",
        "dir": "./download1"
    },
    {
        "index": "2",
        "link": "https://www.pinterest.co.kr/",
        "dir": "./download2"
    },
    ...
]
  1. Use Batch mode in command line

main.py -b

Q & A

What is Link to scrape mean?

You can select any pages what you want to scrape in Pinterest, not only main page. Such as:

Does it can download video?

No, you can only download jpg images from this tool. Video is not support for now.

Contribute

If you find an issue or wants to contribute, please issue or pull request.

pinterest-infinite-crawler's People

Contributors

mirusu400 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pinterest-infinite-crawler's Issues

Configuration option to specify the path of Chrome binary

Hi,

Nice work, I tested the app and after debugging some problems it works perfectly well.

The issue is that I really don't use Chrome, so I have just a portable version from portableapps.

To try to make your code work I installed Chrome Beta, but it didn't work.

My guess is that is using the default path for Windows so I changed the folder name from
C:\Program Files\Google\Chrome Beta\Application\chrome.exe
to
C:\Program Files\Google\Chrome\Application\chrome.exe

And now it's working

I tried to make this work with another Chrome Binary but I get lost
for what I was reading must be a Selenium setting but I didn't find the right place to add this

from selenium.webdriver.chrome.options import Options
options = Options()
options.binary_location = "c:\myproject\chromeportable\chrome.exe"
# you may need some other options
#options.add_argument('--no-sandbox')
#options.add_argument('--no-default-browser-check')
#options.add_argument('--no-first-run')
#options.add_argument('--disable-gpu')
#options.add_argument('--disable-extensions')
#options.add_argument('--disable-default-apps')
driver = webdriver.Chrome("c:\myproject\driver\chromedriver.exe",
            options=options)

https://stackoverflow.com/questions/49234703/python3-selenium-and-chrome-portable

If you tell me how I can try to add the settings, options and other stuff myself.

Many thanks.

Is your crawl normal?

Is your crawl normal? It seems to be stuck in the log-in. It may be restricted by anti-crawler

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.