Web crawlers for Project Pets built with Python using the Scrapy web scraping library.
For the moment it contains three usable web crawlers.
To execute a web crawler, from the root of the project
scrapy crawl web_scraper_name
and it will crawl and collect all the products from the website. All the data is parsed and immediately saved to the database (MongoDB).
All the web crawlers are built taking into account the extra load they bring to the crawled websites.
- No more than one request can be made to a website at a given time
- Each request will be made at least one second after the previous one
- No more than one web crawler can be running in a single website